New, post-Bitcoin cryptocurrencies have been developed with ASIC resistance to level the competitive playing field for cryptocurrency miners, as discussed in parts 1 and 2 of this blog. ASIC resistance was developed to counteract monopolization of cryptocurrencies by the few entities that can afford to build ASIC solutions, as has happened with Bitcoin. However, ASIC resistance does not ensure the necessary scarcity of a cryptocurrency. Consequently, developers of cryptocurrency algorithms have devised an additional method to fundamentally ensure scarcity. It’s called memory hardness, which can be combined with ASIC resistance. The most prominent and promising examples of these new cryptocurrencies are Monero/XMR and Ethereum/ETH. These new cryptocurrency algorithms are both ASIC-resistant and memory-hard.
One way to make a cryptocurrency-mining algorithm ASIC resistant is to require more hardware to execute the algorithm than can fit in an ASIC. For example, algorithms that require several gigabytes of memory, e.g., the Equihash algorithm used for Ethereum, are memory-hard.
Memory hardness is incorporated into cryptocurrency mining algorithms to bar execution shortcuts that would otherwise permit pre-computation. A memory-hard cryptocurrency mining algorithm requires that the mining process must read a value stored in a memory location, use that value according to the steps specified by the current algorithm for that specific cryptocurrency, and then use the result as the address for the next memory transaction. As a result, the memory address values cannot be determined a priori. Different memory-hard algorithms ensure this feature in somewhat different ways.
Slowing the Algorithm with Memory Accesses
Memory-hard algorithms create high-difficulty, proof-of-work routines by leveraging the fundamental lower bound on the speed of memory transactions. These limits are rooted in the laws of physics. Therefore, the execution time for these cryptocurrency algorithms cannot be reduced below a certain minimum. These memory-hard algorithms guarantee that all transactions must take place sequentially, so implementations cannot accelerate this process by means of parallelization. The number of memory transactions required may be a million or more, which ensures that intermediate results, and therefore. the final result, cannot be pre-calculated.
Memory-hard cryptocurrency algorithms require millions of sequential memory transactions to execute the complete algorithm but these algorithms are not computationally intensive. Therefore, the performance of the miner almost entirely depends on memory bandwidth and transaction latency (i.e., the shortest memory cycle time wins).
Memory hardness drives the architecture of a superior cryptocurrency miner and the ideal hardware platform for such an architecture depends on the algorithm, because the execution performance of different algorithms depends in different ways on memory size and access patterns.
The size of the memory array to be traversed by the algorithm dictates the platform’s architecture. For example, Ethereum cryptocurrency algorithms require memory sizes of several gigabytes, while the Monero algorithm requires memories that are three orders of magnitude smaller. These differences result in very different memory architectures.
Algorithms with smaller memory size requirements can, in theory, be implemented using external memory. However, these algorithms are not as efficient for a variety of reasons, primarily because the bandwidth and latency of on-die memory are orders of magnitude faster than for than off-chip memory. Power consumption and density on the PCB are other key factors that increase the initial and operational costs for implementations that use off-die memories. Cryptocurrency mining farms deploy many thousands of mining machines, so achieving the best power efficiency and highest compute performance per unit of volume are critical.
Too Big for Cache
These cryptocurrency algorithms are intentionally designed to use memory spaces that are too large to fit into the second-level caches on most of today’s microprocessors. This aspect of the algorithm design forces processors to go off chip for memory accesses. External memory devices such as DDR4 SDRAMs are simply too slow, too expensive, too power-hungry, and take up far too much space for competitive execution of these algorithms. SDRAMs are block-oriented memories, and therefore, are very inefficient for the fine-grained transactions performed by cryptocurrency algorithms.
Consequently, PC/server CPUs and GPUs are poor implementation vehicles for cryptocurrency mining rigs that need to execute memory-hard algorithms. Architectures that use on-die memories will be far faster. Furthermore, GPUs are poorly equipped to make sparse traverses through a large memory space, which is yet another reason that they are inefficient engines for memory-hard cryptocurrency mining.
Large, high-end FPGAs do have a considerable amount of on-die memory in the form of many medium and large embedded memory blocks. This fact would seem to make stand-alone FPGAs the hardware implementation of choice for memory-hard cryptocurrency algorithms. However, off-the-shelf FPGAs are designed for general-purpose applications. They are not at all designed for cryptocurrency mining.
The amount of on-die memory in general-purpose FPGAs remains the main factor that severely limits the number of mining processes that can run in parallel on these devices. FPGA memory blocks are evenly and sparsely distributed across an FPGA core, but these sparse embedded memory blocks are individually too small to be used as memory-hard cryptocurrency mining spaces. As a result, these smaller, embedded FPGA memories must be combined into larger memories, drastically slowing overall performance and disqualifying general-purpose FPGAs for these algorithms.
Why eFPGAs are the Best Design Solution
The best implementation technology for these new cryptocurrency miners are ASICs with embedded FPGA (eFPGA) arrays that are specifically designed for implementing cryptocurrency mining algorithms. This design solution allows the mining rig developer to put exactly the right amount of required hardware resources on chip to implement the target algorithm(s) while enabling quick reconfiguration in response to algorithm changes.
Combining ASIC and eFPGA technologies creates the perfect semiconductor vehicle for realizing new cryptocurrency mining architectures. The eFPGA permits hardware reconfiguration of the cryptocurrency mining engine whenever there are changes in the underlying algorithm, which means that the chip need not be re-spun to accommodate algorithm changes. The resulting mining chip merely needs to be reprogrammed by downloading a new bitstream image file remotely into the device’s configuration flash memory.
Embedded FPGAs enable tight integration with multiple, correctly-sized, on-die memories immediately adjacent to the algorithm execution hardware. This use of on-die memories, compared to external memories (as is the case with GPUs and CPUs) gives this design solution tremendous power, performance and area (PPA) advantages. With modern semiconductor technology nodes, it is now possible to integrate the several megabytes of on-die SRAM required by these memory-hard algorithms. In addition, the fact that this is an ASIC enables the most compact memory configuration to be selected, which is much more PPA-efficient when compared to standard FPGAs.
You can use Achronix’s Speedcore eFPGA arrays with their easy customizability to design precisely the FPGA needed for target class(es) of cryptocurrency algorithm(s). This is an extremely important economic driver when contemplating the fabrication and deployment of hundreds of thousands or millions of cryptocurrency mining engines.
This blog is based on the Achronix White Paper titled Mine Cryptocurrencies Sooner, Faster, and Cheaper with Achronix Speedcore Embedded FPGAs (WP014). For more information on this topic, download the complete White Paper.