Ethereum is a new cryptocurrency that attempts to provide a Turing complete scripting system for cryptocurrencies and other complex contracts. It may seem to the newcomers that it is a radical new design, but it is not. The first coin to propose such a system was QixCoin (a Turing complete coin) and it was published more than one year before Ethereum was ever conceived (disclaimer: I was the designer of QixCoin). Since both coins are notably similar, I must say that Ethereum seems well designed at a first glance. Nevertheless, some key features are missing:
- string manipulation opcodes
- Script memory is not byte oriented (which seems to be the natural choice).
- Opcodes are a custom set, instead of using a standard opcode set, such as x86 (which gives you many open source C compilers for free).
But one of the “outstanding” new features in Ethereum is the use of a PoW function specially designed to be memory-hard. While this may be true (though a formal proof is missing and the design document is quite incomplete), the authors completely forgets another key property a PoW function must provide: it must be sequential-memory hard. This means that not only the function should require large amounts of RAM, but it must not allow easy parallelization. Dagger seems to provide almost the best possible scenario for parallelization. In Dagger, a certain amount of RAM is filled by pseudo-random data derived from the header and the nonce. This data is produced in rounds. Each round, a number of elements from the previous round outputs are hashed together. An optimized implementation for an ASIC (or FPGA) is evident for anyone with some discrete logic design background.
One possible implementation could be the following: a central shared memory stores each round output. At each round, a number of hashers (from a pool of say 512) are allocated. Each hasher is connected via a shared bus to the shared memory. If each hasher requires N clocks to output the hash digest, then P hashers can perform P simultaneous computations in (N+C*P) clocks, where C is a low constant related to the memory bus width. We assume that the hashers are not pipelined nor optimized, so N is about 128. In the first C*P clocks, each hasher retrieves their inputs from memory in C clocks. In the remaining N clocks the digests are generated. For P = 128, this provides almost a 122x speedup.
A still more optimized implementation would compute the hash digests of P nonces at the same time, by multiplexing P memories. Each hasher is now optimized to create a hash digest in every clock, by multiplexing the memories. Note that this is already possible as some Bitcoin ASIC hashers output a digest in every clock. Then the PoW searching performance increases exactly P times. With currently 22 nm technology, an ASIC die can hold 512 hashers or more, so a 512x speedup is possible, and clearly there is a huge incentive on mining in an ASICs with an external DRAM memory
Each of these designs present its own challenges in routing and power-consumption, and the previous descriptions are only simplifications. Nevertheless, the key points remain intact: I guess a 100x speedup would be not hard to achieve for a single die.
Because we think things can be done better, we’ll be publishing the QixCoin design document soon. Also, because the wide interest in smart contracts and colored coins, we’ll probably split QixCoin into two projects: one neutral (the raw coin engine) and the gaming part (the QixCoin, which is a coin built over the scripting system).
As a final note I’d like to add some words on other PoW function: in my previous post, I presented SeqMemoHash. This function is clearly sequential memory hard (the proof will be presented in the next revision of the paper). But SeqMemoryHash also requires a huge number of hash operations to verify the PoW, so a DoS attack could be executed by sending blocks with an invalid PoW to node, and force the node to go trough all the computations for nothing. This is not a serious threat, since the victim will immediately ban the sender, and prevent any other attack attempt. Nevertheless I propose a better way to use SeqMemoHash to create a PoW that protects from DoS attacks. The output of the PoW is taken as the is the concatenation of outputs created in the algorithm steps that are powers of two (intermediate results at steps number 1,2,4,8, etc.), plus the last hash produced in the last step. The verifier checks this mid-states as progress is made toward the final computation. This guarantees that the verifier will never do more than the double number of hahes performed by the attacker. See the paper for an example.