It was exciting to participate in the Next FPGA Platform on January 22nd at the Glasshouse in San Jose. I found it was particularly exciting to have Achronix share in a panel discussion with Xilinx and Intel. The Next Platform co-editors Nicole Hemsoth and Timothy Prickett Morgan did a great job in interviewing experts from FPGA ecosystem with insightful questions. The best part of Next Platform events is their format, where they keep marketing pitches to minimum with no presentations, just discussions. I will summarize a few insights and observations from the event.
Data Acceleration Era
Before the event, Tim wrote an article titled, “The Three Eras of Programmable Logic, and during my one-on-one interview, I expanded on our excitement about FPGA 3.0 – Data Acceleration Era. I have been in the FPGA industry for over 15 years and am very excited about the upcoming third era of FPGAs. There are three key reasons why I find the third era the most exciting.
- Significantly higher incremental TAM/SAM is estimated for FPGAs to be $10B+, driven by rapidly changing data acceleration workloads and an exponential rise in data processing in the cloud and at the network's edge. In this third era, data is driving the new economy. As they say, data is the new oil and valuation of the FAANG companies (Facebook, Amazon, Apple, Netflix and Alphabet (a.k.a. Google) isn’t just driven by their business models, but by the monetization of data. There is a lot of effort to make sense of all this data. That’s where AI and data analytics is relevant and in turn, driving orders of magnitude of increase in compute requirements.
- FPGAs are not just an ASIC prototyping vehicle or sidecar component in the system, but they are gaining acceptance in the industry as compute engines in their own right.
- FPGAs are already being deployed in high volume in data centers. We are in the very early stages of the FPGA 3.0 era, and like the others before, I expect this era to run way beyond a decade.
One-on-One Interview with Manoj Roge and Timothy Pricket Morgan
It was great to receive validation from Microsoft’s Doug Burger that they are currently deploying FPGAs in seven-digit volumes inside their Azure infrastructure. Data centers were traditionally built around CPUs, but all hyperscalers acknowledge that CPUs can’t keep up with increasing challenges in moving and processing data. A common theme in all of the discussions was the need for heterogeneous accelerators in data centers as opposed to a CPU being the one-size-fits-all solution for various workloads. Doug highlighted that previously, FPGAs have provided great flexibility in networking workloads, validated by Telco deployments (I refer to this as the FPGA 2.0 era or the connectivity wave). Hence it makes sense to move data management off CPUs. Secondly, all hyperscalers have their own custom requirements which makes it challenging for infrastructure managers to adapt the hardware to the changing workloads. FPGAs being field programmable hardware, are a great way to future-proof the data center infrastructure. In the FPGA 2.0 era, FPGAs were used to future proof Telco deployments.
I spent many years working with networking customers. They designed their datapath ASICs with proprietary packet processing engines and used FPGAs in their linecards for interface adaptation and memory buffering. A MAC-to-Interlaken smart bridge was the number one use case in networking for many years. Data center and AI ASIC architects should think the same way. Achronix’s Speedster®7t FPGAs offload a lot of complexity by supporting interface standards such as 112G SerDes, 400G Ethernet, PCIe Gen5, GDDR6 and DDR4/5. Architects should focus on their secret sauce in their ASIC for faster time-to-market and significant risk reduction in the same way as telco architects did.
What Could be the Fourth Era of Programmable Logic?
If I were to look in my crystal ball and predict the fourth era of programmable logic, I believe that FPGAs will become ubiquitous programmable building blocks for deployments from the cloud to the edge and to IoT. With this vision, Achronix set up a business model to license FPGA IP (Speedcore™ eFPGA) to be embedded in customer’s ASIC or SoC. With Speedcore IP, customers can design a custom FPGA to fit their application requirements. Embedded FPGAs provide a cost-reduction continuum where fixed functions are hardened in ASIC, but flexible or future-proofed functions are kept reprogrammable.
As the industry is acknowledging the need for domain-specific architectures and localized compute, I see FPGA IP becoming ubiquitous. much like memory IP and compilers are today.
It is well accepted that data needs to be processed closer to where it is produced. By doing so, you reduce compute latency, minimize network traffic and power, as well as provide more encapsulated security and alleviate privacy concerns. Speedcore eFPGA integration is the best approach for high-volume, low-cost/power deployments, and Achronix is uniquely positioned in providing both standalone high-performance FPGAs and embedded FPGA IP.
Panel Discussion with Achronix, Xilinx and Intel
Golden Age of Domain-Specific Architectures and Languages
During my panel discussion with Xilinx and Intel, I (and others) argue that Moore’s law is dead. What kept Moore’s law alive was Dennard scaling and cost improvements through node scaling. Dennard scaling broke down over a decade ago, and cost reduction due to node scaling stopped at the 20nm node. Hence we can no longer rely on process scaling alone, but need to drive innovation with new architectures. We need a Moore’s law equivalent for architecture innovation. Intel pointed the audience to Hennessy and Patterson’s presentation titled, “A New Golden Age for Computer Architecture”. Hennessy and Patterson argue that post-Moore’s law, there is need for domain-specific architectures because they:
- Deliver more effective parallelism for a specific domain
- Use memory bandwidth more effectively
- Eliminate unneeded accuracy
FPGAs are a great blank hardware canvas that offer these benefits.
Besides discussion about hardware innovation in the first half day, there were several sessions in the second half of the day to discuss use cases and software innovation. There was a general consensus that for broader adoption of FPGAs, the industry and ecosystem needs to offer a mature, high-level software design flows that abstract all of the low-level hardware details. The good news is that there is a lot of activity around domain-specific languages such as P4 for networking and TensorFlow for machine learning. With the FPGA industry supporting Python-level programming with TensorFlow, algorithm, designers can write Python scripts and not worry about low-level FPGA RTL programming. FPGA vendors and partners currently provide low-level libraries or overlay architectures optimized in RTL that plug into the high-level frameworks such as TensorFlow. These domain-specific languages, libraries and compilers will eventually drive the software-friendly hardware paradigm similar to what Cuda and CuDNN did for Nvidia GPUs.
Is There a Room for Another FPGA Vendor?
Everyone is familiar with the FPGA duopoly, but Intel’s acquisition of Altera has really shaken things up. Intel has a strong focus on CPUs, and secondly they are betting on various accelerator options such as Nervana and Habana for AI. At Achronix, we have seen an opportunity in the market where customers want a new alternative for high-performance solutions. Having said that, we need to provide some uniqueness. We exploited the innovator’s dilemma because we didn’t have the legacy of supporting a large customer base on older FPGA products. Instead, we picked a few workloads, particularly compute and network acceleration, and developed a uniquely optimized architecture and features to address the applications' key requirements of these workloads. We took a clean-slate approach when developing our new architecture to address the bottlenecks found in traditional FPGA architectures. With Speedster7t FPGAs, Achronix has reinvented high-performance FPGA solutions with a focus on the three key pillars of architecture optimization:
- Efficient compute – We natively optimized for data acceleration and machine learning. We focused on native fabric optimization rather than adopting a heterogeneous multicore solution, allowing us to maximize design reuse, avoid dataflow bottlenecks, and simplify the design flow.
- Balanced memory hierarchy and bandwidth – We designed a high-performance architecture that would balance the compute and memory bandwidth, both on- and off-chip.
- Efficient data transfers – Moving data around the chip with an embedded 20 Tbps network-on-chip (NoC) and being the first FPGA with PCIe Gen5, GDDR6 and 4 × 400G Ethernet support.
Achronix is unique in providing both standalone high-performance FPGAs and embedded eFPGA IP. In short, we’ve carved out our lane very well and are seeing tremendous interest from our customers and solution partners. We sincerely thank them for believing in us and our ability to disrupt the FPGA duopoly.