Our existing customers ask us some pretty big questions: “How can this technology implement a step-change in my specific process? How can Speedcore IP be integrated in my SoC? How can you increase the performance of my ASIC?” We revel in answering such questions.
However, when we first meet with a company interested in our embedded FPGA (eFPGA) IP, often the question is very simply, “At the most basic level, what can it do for me?” This question may be the most important one we’ll ever answer for them.
A few months ago Achronix’s systems architect Kent Orthner made a short video with Ed Sperling, editor-in-chief of Semiconductor Engineering, with a view to answering exactly this question. I encourage you to take a look at it.
As Kent points out, when looking to accelerate functionality using a discrete FPGA chip, the biggest issue that designers face is latency. The most fundamental limiting factor is the communication between a CPU and an FPGA across a relative narrow and slow PCIe interface. Even products that are advertising as having low-latency interconnects have latencies that are in excess of 1 µs. Real-world applications (for example, accelerating a Linux application), in fact, incur around 15 µs of latency. Discrete systems also incur substantial duplication in terms of having to write to, and read from memory, and typically transfer data between two sets of DDR memory as part of the process.
These factors limit the degree of acceleration that can be achieved with a discrete FPGA. As an example, Kent points out that for a fairly typical algorithm, it could easily take up to 25 ms (of latency) for simple transactions between a CPU and an FPGA. Batching large number of operations could help in reducing this latency, but only by a factor of 2.5. Kent illustrates that with a Speedcore eFPGA inside an SoC, the ability to share the DDR memory and cache hierarchy vastly accelerates data movement from the CPU to the eFPGA and back again to about 10 ns or 2,500,000 times faster!
Kent also gives an example of how an eFPGA can vastly improve performance when it comes to FPGA configuration. Whereas a discrete FPGA might use a serial interface to an EPROM or an 8-bit wide processor interface, an on-chip Achronix eFPGA can be connected via a 128-bit wide AXI interface, running at on-chip interconnect frequencies. This high-speed connectivity results in a better than a 16× improvement in configuration time when compared to an 8-bit interface running at 100 Mhz, or 128× when compared to a serial interface. As a result, an Achronix eFPGA with 100,000 lookup tables can be configured in under 2 ms.
This same potential for an incredible richness of pin interfaces within an eFPGA means that it is relatively easy to run multiple logical accelerators inside an eFPGA fabric in parallel, each with its own 128-bit wide AXI interface. In a relatively large eFPGA, with eight 128-bit AXI interfaces running a 1 Ghz, you have a solution with the potential for an incredible 1-terabit-per-second data transfer.
By using an eFPGA, Kent outlines that companies have the potential to see:
- >100× reduction in latency for real-world applications.
- >10× improvement in throughput.
- >2× reduction in power usage.
- >4× reduction in area.
I’m sure you’ll agree these are all very impressive figures.
If you’re interested in understanding more about what Speedcore IP can do for you, take a look at Kent’s full video.