The relation between Segwit and AsicBoost, covert and overt

I will try to explain the relation between Segwit and AsicBoost, in both the covert and overt forms, in certain detail. I will also try to explain why a method was recently proposed to reduce the interference between covert-AsicBoost and some protocol improvements, by reducing the incentives for covert AsicBoost. The proposal makes covert AsicBoost more expensive, but not impossible.

“AsicBoost” is two things:

1. A Bitcoin Proof-of-Work ASIC design that can potentially mine faster that the standard design by fixing the last 64 bytes of the first application of SHA256 (the tail of the block header), and changing the mid-state (the head of the block header), instead of the opposite. This requires fixing and repeatedly using the tail of the header.

2. Several methods to modify the first 64 bytes (head) of the block header, while keeping the tail constant.

I will concentrate on the methods for (2).
There are 2 methods to modify the head (64 bytes) of the block header.

a) Changing the nVersion field of the block header.

b) Changing 28-bytes of the transaction Merkle root in the block header.

The method (a) was called “overt” AsicBoost, because all users can inspect the nVersion field of a block. An nVersion field that has been rolled would have several random bits activated. Since the semantics of the nVersion bits changed according to BIP9, currently a random change in this field is interpreted by Bitcoin Core nodes as the signaling of non-existent soft-fork proposals.

The method (b) was called “covert” AsicBoost, because it’s very difficult to distinguish a block whose Merkle tree has been “rolled”. Even if detecting a single covert AsicBoost may be difficult, detecting if a certain miner that identifies itself is performing covert AsicBoost may be possible using statistical methods.

Covert AsicBoost requires the miner to find “collisions” of the last 4 bytes of the Merkle root field. This is because the last 4 bytes of the Merkle root field are stored in the tail of the block header, not in the head. This is a pre-processing stage that must be done every time a new block must be created (height-changed).

There are several algorithmic methods to find collisions: which method to use depends on the number of collisions required, and the size of the colliding groups (e.g. it’s not the same to find two (different) 2-way collisions that one 4-way collision). The group requirements depends on how the SHA256 core was designed. To find collisions where are several ways to create new Merkle root candidates:

b1) Changing the content of a transaction T.

b2) Changing the order of the transactions.

Changing the content of a transaction T (method b1) requires re-computing the transaction id (a double SHA256 operation) and updating the Merkle tree from the leave of T to the root. At each level, a double SHA256 of a 128-byte data chunk must be performed (this is because two 32-bytes hashes are concatenated, and SHA256 requires extra space to store the bit length). Let’s assume SHA256 compression and message expansion take the same time Q. In total,  6Q operations must be performed per level. If the block has 2048 transactions, then there will be 11 levels, and 66Q would need to be executed every time the transaction is changed. If the tail of the transaction T is modified (instead of the head), then it may be possible to perform only 4Q to recompute the transaction id. In total, 70Q are needed. If the block contains only the coinbase transaction (what is generally called “empty” block), then it is possible to compute the Merkle tree root with only 4Q, because in this case the coinbase transaction id is the Merkle root. Therefore, using the covert AsicBoost by changing the coinbase incentivizes creating blocks with a lower number of  transactions. It’s 17 times faster to find collisions for an empty block than for a block with 2048 transactions, using method b1.

Changing the order of transactions (method b2) can be done in several ways. There are two sub-methods to do it:

c1) The easiest way is the create a set of left subtrees and a set of right subtrees in memory and them pair them in all possible ways. This can only be accomplished if there are no dependencies between right and left subtrees. If there are n candidates for each subtree, then one can build n^2 roots. The number of transactions in each subtree is almost irrelevant, because each subtree can be created by changing the content of a single transaction. This method requires 6Q to produce each root candidate, so it’s better than changing the coinbase even if the the block is “empty”. This method is best suited for a CPU or GPU.

c2) Another way is to compute the left-subtree and right subtree candidates on-the-fly by swapping nodes at different levels. This method requires much less temporary memory, and it’s more suited for an FPGA o ASIC.

The number of SHA256 operations required to find collisions establishes a fixed delay to start AsicBoost mining. During the collision finding period the system needs to mine blocks using the traditional nonce-rolling method until the collisions are ready. There is no restriction that the blocks are empty during this period. If the mining hardware only supports AsicBoost but not standard mining, then the miner cannot generate any block during this period. Therefore, there is a high incentive to reduce the cost of collision finding only if the chip is AsicBoost-only capable (not dual-mode). If the chip is dual-mode, a delay of up to 1 minute may be tolerated (e.g. a 2% revenue loss) so the efficiency of the method to generate collisions is of less importance.

If must be noted that even if transaction reordering is the fastest covert method, the overt method is still faster: it doesn’t require to find any collision at all.

When a segwit (BIP141) enabled blockchain receives a block that contains at least a single segwit transaction, it only accepts the block if the block has a wtxid tree Merkle root, which is stored in the coinbase transaction. This tree contains the transactions wtxids. A wtxid comprises both the transaction data and its witness data. Therefore, if the order of transactions changes (but not the wtxid commitments), then this mismatch makes the block invalid.

This is the reason why Segwit makes the use of transaction reordering more difficult. Transaction reordering is the most efficient covert method for AsicBoost, and the one more suited for FPGA implementation. It forces the covert-AsicBoost miner to mine only blocks not containing Segwit transactions.

The recently proposed method blocks a specific form of covert Segwit using transaction reordering (c2). The other less-efficient covert methods (b1) in a Segwit block still exists.

  1. #1 by Abraham Terger on April 13, 2017 - 4:00 am

    Thank you for bringing so much more clarity to this issue than pretty much anybody else.

    However the gmaxwell scheme seems just as pointless as ever. You write:

    > Therefore, there is a high incentive to reduce the cost of collision finding only if the chip
    > is AsicBoost-only capable (not dual-mode). If the chip is dual-mode, a delay of up to 1 minute
    > may be tolerated (e.g. a 2% revenue loss)

    From this we conclude that the gmaxwell scheme is economically impotent unless there are a huge quantity of NON-dual-mode asicboost chips already deployed.

    This is so incredibly unlikely that it boggles the mind. What makes anybody think that such chips exist? Who made them? Even if some manufacturer (cough cough bitmaintech) has chips with the hidden ability to mine in asicboost mode, they quite obviously aren’t NON-dual-mode chips, since zillions of customers are currently mining with them in non-asicboost mode. They’re either secretly dual-mode chips or not asicboost chips at all.

    So, remind me again, exactly whose legion of miners is this scheme supposed to thwart?

    • #2 by SDLerner on April 17, 2017 - 12:40 am

      That’s the same question I ask. It only limits a supposedly FPGA-based version of ASICBoost pre-processing, that could be easily replaced by a GPU-based version of the pre-processing.

Leave a comment