If you squint your eyes, a modern FPGA looks like a programmable logic device was crossbred with the mutt of a switch ASIC and an SoC. As architected today, FPGA sort of embodies the compute heterogeneity that is happening inside of server nodes and across datacenter clusters as well as down onto itself. The FPGA has memory embedded into its gates, right close to compute as needed, and also DRAM and sometimes HBM memory as well and in a way that CPUs do not.
It is always tempting to integrate as much hard-coded stuff on any CPU or SoC to eliminate some of the latencies between components and the cost of creating them independently. This is why CPUs have long since had integrated hierarchies of cache memory and controllers for main memory (either DDR or HBM), PCI-Express peripherals, and Ethernet network interfaces gradually pulled off the server motherboard and onto the die. And it is also why hard blocks of Arm cores, interconnect SerDes, DDR and HBM memory controllers, and various other components are now part of FPGAs aimed at the datacenter. But this integration can go too far, making over-engineered for specific workloads and more expensive for all workloads because everyone ends up paying some dark silicon tax.
Either way, it is tough to get the balance right, and it is tough for everyone to say heterogeneity under bright lights, apparently. Striking that balance was one of the topics of conversation at The Next FPGA Platform event we recently held at the Glass House in San Jose. We sat down with Patrick Dorsey, vice president and general manager of FPGA and Power product marketing in the Network and Custom Logic Group at Intel; Ivo Bolsens, chief technology officer at Xilinx; and Manoj Roge, vice president of product planning and business development at Achronix to talk about the push and pull of device integration and the touch balancing act that the key FPGA makers do.
https://youtu.be/msBqmoggh8U
“It’s a good question, and as many of you in the audience know, Altera had some long-standing co-development with Intel around putting Xeons and FPGAs in the same package,” says Dorsey, and we reminded him that we thought the integrated Xeon-FPGA was an intriguing project. “One of the challenges is what do you integrate and what do you give up when you integrate? One of the things that we see is that flexibility matters – we are in the flexibility business, we build FPGAs – and it not only matters in the chip but also at the system level. Having the choice of which FPGA and which Xeon, that’s a lot of value, and sometimes when you integrate, you destroy value. It happens. You create value because the form factor gets smaller, the power gets lower, and the performance gets better because the interfaces can be optimized. It’s a trade-off. I have been doing this, along with my counterparts here, for over 25 years, and the great thing that is happening now is that technology enables the choice. So heterogeneous is an option at multiple levels in the system, and there are a lot of things that we can do with the packaging technology and the interfaces that we have.”
Xilinx does not own its own datacenter processors, excepting the Arm cores that it embeds on devices such as the latest “Everest” FPGAs announced last year, but Bolsens concurs that this fractal hierarchy of heterogeneity is an important aspect of modern distributed systems as well as FPGAs themselves.
“Heterogeneity is a value proposition of the FPGA, so it is something that we should not see too much as a problem, but an opportunity,” Bolsens explains. “Now, having said that, if you think about what an FPGA is, first of all it has a very general, programmable switch interconnect that allows you to immerse functionality into that interconnect without being disruptive in your architecture. So you can bring a diverse set of functionality into that switch-based interconnect. So in that sense, integrating hardware building blocks in an FPGA is a little bit more transparent than in many other applications.”
That real challenge of FPGAs themselves and the systems that use them are what proportions of compute, storage, and networking. And Bolsens says that there are some lessons that FPGA makers can learn at the high the datacenter, where components are being disaggregated and then recomposed on the fly, and apply them down to the FPGA device level.
First and foremost, according to Achronix, you need the right compute components for the task. “But our approach was slightly different,” says Roge. “We focused on building the best FPGA fabric and solved some of the bottlenecks you see in the traditional FPGA architectures, with the network on chip structures and with a big focus on ease of design and ease of use. And in terms of the form factors, we support standalone FPGAs, or chiplets, or even monolithic integration, and we let customers pick the right form factor for the right approach. I have been doing FPGA product planning for many years, and you have to map hundreds of use cases onto a few tapeouts – and it is very difficult to get that mix right.”
Some of the big FPGA makers have chiplets, others will get there at some point when Moore’s Law starts slowing down more, and if you want to hear about what they had to say about that, you will have to take a gander at the video above. This offers great potential for having a broad array of FPGA devices, with different capabilities, with only incremental cost increases. We also talked about what the key hardware innovations are coming down the pike for datacenter-class FPGAs.