New quadratic delays in Bitcoin scripts

I have a fixation with algorithm complexity. When I was young I was an early optimizer, and, I must admit, that didn’t help me much in dates. Today I occasionally code a sub-optimized algorithm when there is no need for high performance, but it stills bothers me when I do. When I review code, it also bothers me to see quadratic complexity, so I pay special attention to loops. That helped my discover the O(N^2) hashing problem, and later the FindAndDelete() problem in Bitcoin.

Last week I began a research on Segwit scaling. I was particularly interested in the maximum resources that Segwit-enabled node requires for verifying a Segwit block. So I went again to review the code, and more specifically, the EvalScript() function. A few seconds later…voila! I came across two more quadratic complexity loops in Bitcoin Core. By exploiting edge cases for each of these two sub-optimal algorithms, I manage to simulate a Segwit block that takes up to 5.6 seconds to verify on a Ubuntu VM running on a single Core i5 processor. The simulation is based on a single thread executing EvalScript(), the Bitcoin script execution function. The tests were not performed processing actual blocks. These results should not make anyone worry, because there are worse problems in Bitcoin block verification, and because Bitcoin employs several worker threads for verifying scripts in parallel. For example, a Segwit block can request 80000 signature verifications when all transactions are P2WSH. It is said that Bitcoin Core (in a modern multi-core machine, using its multi-threading verification capabilities) can verify 8000 ECDSA signatures per second. Therefore a malicious miner can create a Segwit block that requires approximately 10 seconds to be verified. Since the examples presented in this post consume less than 10 seconds, I don’t consider my findings as vulnerabilities. However, if the block size is to be increased in the future, these problems should be solved prior increasing the block size. The scripts presented here as examples do not leave the value stack empty, but the Bitcoin protocol does not require it. Bitcoin only requires the top value to be true to accept the script.

The unsatisfied questioner: OP_IF abuse

Every time a Bitcoin script executes the OP_IF opcode, a boolean value indicating if the condition was true, false or the conditional was skipped (also represented as false) is pushed into the vfExec stack. Every time an opcode is executed, the number of false values in the vfExec stack is counted using the following line:

bool fExec = !count(vfExec.begin(), vfExec.end(), false);

If the count is non-zero, all subsequent instructions except OP_ELSE and OP_ENDIF are skipped. It is clear that the longer the conditional stack is, the more it takes to count the false elements.

The following scriptPub or ScriptSig exploits this problem:

0
OP_IF { 100 times }

0 { 9798 times }

OP_ENDIF { 100 times }
1

The vfExec vector is filled with 100 elements, and then each element is scanned 9799 times, totaling more than 979K items scanned. This took 2.5 seconds in my test VM (for a block filled with these scriptSigs).

To re-write this logic with a O(1) algorithm, one simply has to count the number of true conditions in one variable (trueCount), and the number of false or skipped conditions following all true conditions in another (ignoreCount). Detecting if code needs to be executed or not requires just testing if ignoreCount is zero.

The handling of OP_IF / OP_NOTIF / OP_ELSE should be like the following pseudo-code:

fExec = (ignoreCount==0);
...
case OP_IF:
case OP_NOTIF:
 {
   if (fExec)
    { 
     ....compute fValue...
     if (fValue) trueCount++; else ignoreCount++;
    } else 
    ignoreCount++;
 } break;
 case OP_ELSE:
 {
 if ((trueCount==0) && (ignoreCount==0))
     return set_error(serror, SCRIPT_ERR_UNBALANCED_CONDITIONAL);
 if (ignoreCount==0) { trueCount--; ignoreCount++; } else
 if (ignoreCount==1) { trueCount++; ignoreCount--; }
 } break;
case OP_ENDIF:
 {
    if ((trueCount==0) && (ignoreCount==0))
        return set_error(serror, SCRIPT_ERR_UNBALANCED_CONDITIONAL);
    if (ignoreCount>0) ignoreCount--; else trueCount--;
 }
 break;

You may have noticed the strange behavior of Bitcoin’s ELSE statement. Bitcoin allows one to switch between true and false conditions several times. For example, the following script is valid and leaves the value 2 on the stack:

1 OP_IF OP_ELSE OP_ELSE 2 OP_ENDIF

Rock-and-ROLL

The second problem lies in the OP_ROLL opcode. This opcode removes a value at a given index from the value stack, and pushes it on top. As the Bitcoin Core stack stores a list of char std::vector by value (not by reference), and because the stack is itself a std::vector (not a linked list), then removing the first elements requires moving all elements one position in memory. The value stack can store a maximum of 1000 elements. The following script fills the stack and then moves each stack element 200 times, so the number of moved elements is 200K. This took almost 5.6 seconds in my test VM (for a block filled with these scriptSigs).

1  {999 times}
998 OP_ROLL { 200 times }

I tried other scripts, such as filling the stack with values of size 520 using DUP3, and then performing rolls, but all of them led to a block that took less time, if the block is to be filled with the scripts.

One solution to this problem is use a linked list data structure instead of a std::vector, to allow O(1) removal of items, but it still requires O(N) for element lookup. A balanced tree where each internal node is augmented with the number of children underneath can be used to provide efficient indexed access and efficient element removal. However, the overhead of such data structure (pointer hopping) may kill its benefits.

What if the attacker has time to prepare outputs?

The results presented in this post correspond to the maximum times I could get without preparing UTXOs in previous blocks. By putting the slow-to-validate scripts in scriptPubs instead of scriptSigs, and creating a block with empty scriptSigs, it is possible to execute the slow-to-validate script approximately 25000 times, as each input will only consume 40 bytes from the block space. For example, it is possible to create a block that takes 33 seconds to verify using prepared scriptPubs filled with OP_ROLLs. Preparing 25000 OP_ROLL outputs requires the equivalent of filling 11 blocks with non-standard outputs created by the attacker. But again, this is not something to worry about, as with a preparation stage an attacker can also create a block that takes 10 minutes to verify by packing 200 OP_CHECKSIGs on each output, and consuming all these outputs in a single block. The 20K signature limit does not apply to spent scriptPubs, but only to scriptSigs and created scriptPubs (Bitcoin could soft-fork to also count sigops on consumed scriptPubs). However, preparing blocks filled with OP_CHECKSIGs in scriptPubs requires almost 250 blocks, because of the 20K signature verification limit restricts the number of outputs that can be included in each block, while preparing a block filled with outputs filled with OP_ROLLS only requires 11 blocks because there is no such limit.

Summary

Although a lot has been done to optimize block processing, there a few pieces of old code that still require some minor optimizations to prevent future surprises in the scaling path, whatever path turns out to be followed.

Note: If you find a reason why these scripts cannot be included in a block (as I haven’t tried it), please send me a comment. Also if you find prior information on these same problems, send me the links.

This entry was posted on April 17, 2017, 4:54 am and is filed under Uncategorized. You can follow any responses to this entry through RSS 2.0. You can leave a response, or trackback from your own site.

#1 by hostfat on April 17, 2017 - 8:23 am

Do you have already an opinion on BUIP033: Parallel Validation ?

https://bitco.in/forum/threads/buip033-passed-parallel-validation.1545/

- #2 by SDLerner on April 17, 2017 - 6:34 pm
  
  I think it is essential to scalability. It reduces the impact of all block-size related issues.

	SDLerner on A New Mystery in Patoshi …
	TechMiX on The Patoshi Mining Machin…
	SDLerner on A New Mystery in Patoshi …
	Jeremy Davis on A New Mystery in Patoshi …
	TechMiX on A New Mystery in Patoshi …

Bitslog