With DPU-Goosed Switches, HPE Tackles VMware, Security – And Maybe HPC And AI

Pendulums are always swinging back and forth in the datacenter, with functions being offloaded from one thing and onloaded to another cheaper thing that is often more flexible or faster. So it is with network functions that were originally in distinct devices, then pulled onto CPUs during the software-defined networking era by an eager Intel, then offloaded onto SmartNICs and DPUs. Now, some functions are being pulled back up into the switches and embedded in the network again.

The simplest way to do this, of course, is to take the DPU that was going to be populated inside of myriad servers and put a few of them into the switch. The Aruba networking division of Hewlett Packard Enterprise did that with its CX1000 enterprise switches three years ago, which embedded DPUs from AMD’s Pensando unit into its boxes.

Rival Cisco Systems took the same route in February this year with its Nexus N9300 switches, pairing its Silicon One E100 Ethernet ASIC with four Pensando “Elba” Gen 2 or two “Giglio” Gen 2+ DPUs.

The Elba DPU with two 200 Gb/sec ports launched in 2021 and the Giglio DPU came out two years later, with two 200 Gb/sec ports as well as optimizations for low power consumption and relatively high performance. Both of these DPUs were etched in 7 nanometer processes from Taiwan Semiconductor Manufacturing Co. These DPUs have 144 custom match processing units, or MPUs, running at 2 GHz for chewing through P4 algorithms, processing capacity that is augmented by sixteen Arm A72 cores running at 3 GHz. They also have dedicated data encryption and storage offload engines.

With the “Salina” Gen 3 Pensando DPUs, which neither Cisco nor Aruba are using in production and which has not been formally announced as yet even though it was on the roadmap for late 2024, Pensando is switching to 5 nanometer TSMC processes and boosting the performance by around 2X. So that’s two ports running at 400 Gb/sec, 232 MPUs, and sixteen beefier Arm “Ares” N1 Neoverse cores. (We are surprised Salina is not using “Perseus” N2 cores, or even better still “Hermes” N3 cores. The N3s deliver 20 percent performance per watt over the N2s, and every little bit of wattage saved helps. The N3 CSS design can deliver 32 cores in 40 watts, which sounds pretty good for a DPU.

With the CX 10000 hybrid DPU-switches that HPE Aruba division launched three years ago did not use its own family of Ethernet ASICs, but rather the 3.2 Tb/sec “Trident3” StrataXGS switch chips from Broadcom. (We talked about how the Trident3 compared to the “Trident4” and “Tomahawk4” variants of the StrataXGS line way back in December 2020 and an updated Trident4C back in September 2022.) The CX 10000 had a six-core Intel Xeon D-1637 processor embedded into it on one side of the Trident3 ASIC for local compute (with 32 GB of DRAM and 64 GB of flash for CPU storage) and a pair of the Elba DPUs on the other side of the Trident3. The resulting augmented switch had 48 downlink ports running at 25 Gb/sec (perfectly fine for moderate servers) and six uplink ports running at 100 Gb/sec.

This CX 10000 “smart switch” was interesting for testing out the idea of an augmented switch, but not enough, in our opinion, to do interesting things with in the datacenter.

Not so with the new CX 10040, which has higher bandwidth and more DPU oomph and which is a zip code we know in the Washington Heights neighborhood of Manhattan.

Here is the block diagram of what is inside of the CX 10040:

The machine has an 8 Tb/sec Trident4-X9 switch ASIC from Broadcom at the heart, with an AMD Ryzen V3000 embedded CPU as a co-processor. (We are not sure how many cores it has or the memory and flash configuration.) Some of the SerDes on the Trident4-X9 are linked to four Elba DPUs, while the remainder are used to create 48 ports running at 100 Gb/sec for downlinks and six 400 Gb/sec ports for uplinks.

Cisco has done a custom switch gig with a number of hyperscalers and cloud builders to put eight Elba DPUs around a 12.8 Tb/sec Silicon One Q200L Ethernet ASIC, as we reported back in February. This has a lot more DPU offload and compute, obviously, and given the relative dearth of bandwidth on the Silicon One ASIC – we are in an era of 51.2 Tb/sec devices and soon to be 102.4 Tb//sec in the hyperscale and cloud datacenter – so 8 Tb/sec for the CX 10040 is not all that capacious on the port count or the bandwidth.

But, we are keeping an eye on this to see how DPU-switch hybrids might be more common because it is a lot cheaper to put two, four, or eight DPUs in the switch than it is to put one in dozens to hundreds of servers directly connected to the servers. And the Pensando DPUs are programmed in P4 and can obviously be used to accelerate all kinds of offloads and functions, perhaps even collective operations commonly used for HPC and AI applications. We are perhaps getting a little bit ahead of HPE here, which also has a “Rosetta” Slingshot Ethernet business aimed at HPC and AI.

“This is still initially targeted at the enterprise market, although we have had a lot of uptake from cloud service providers,” John Gray, who heads up datacenter networking, AI, and security infrastructure at within the Aruba division, tells The Next Platform. “I wouldn’t say that we’re targeting the hyperscalers with this class of product – at least not yet. This particular switch has the capability of expanding and adding more DPU capability to it. But in general, most enterprises or cloud service providers don’t want to crack open their installed base of servers to add NICs or DPUs. But by putting the DPU in the switch, the DPU functionality can be aggregated at the top of the rack.”

There is not much point in putting DPUs in the aggregation layer of the network (at least not yet), and for this, Aruba recommends its CX 9300, which is a fixed port machine in a 1U form factor that has 32 ports running at 100 Gb/sec, 200 Gb/sec, or 400 Gb/sec. We don’t know the ASIC in this box, but it is a 25.6 Tb/sec device that can drive 5 billion packets per second. (It is either homegrown Aruba or Broadcom, we guess.)

HPE does not like the term “smart switch” even though it does use it from time to time, and prefers to call this CX 10000 series a family of “distributed services” switches. One of the important things as far as enterprise customers are concerned is that HPE can replace the VMware ESXi hypervisor with its Morpheus VM Essentials (VME) variant of the open source KVM hypervisor and use the distributed firewall, microsegmentation, encryption, and telemetry in the CX 10000 and 10040 switches to replace VMware NSX virtual networking functions.

Broadcom is no doubt happy HPE uses its ASICs, but it probably none too happy that HPE is attacking VMware. But, this is business – nothing personal.

Gray says that another early uptake by enterprises and cloud service providers is to use the DPUs in the CX 10000 series to secure AI agents, which may or may not be supported by the VMware stack. (It is not clear why NSX plus ESXi would not be able to do this as we think about it as we are writing.)

One of the things that we have been thinking about is why it has taken so long for DPUs to take off and for some of them to be embedded into switches instead of every server for workloads that are light enough for the DPU functionality to be shared across multiple servers. The answer is that these things take time, and change is slow in the enterprise datacenter. We think it took Microsoft and Amazon Web Services and Google to show the value of SmartNICs and DPUs and for Pensando to create programmable DPUs with performance and flexibility for the idea to start taking hold. And the high cost of an aging fleet of expensive network security devices in the datacenter doesn’t hurt to drive the DPU business, either.

On the consolidation front, a CX 10000 or 10040 switch of a given capacity can lower the cost of firewalls and other security appliances by up to a third while delivering around 10X the performance, according to Gray. A comparison to NSX networking in a VMware environment would probably be illustrative, too, but Gray did not have one handy. In a setup with 500 servers being protected by firewall appliances and using standard top of rack Ethernet switches, Gray reckons that the three year cost of this setup is around $2.69 million. The 8.4 Tb/sec of total bandwidth across the 24 server racks would require eight physical firewalls and two top of rack switches per rack, which works out to a cost of 321 per Gb/sec for the network and its security. With 48 CX 10000 switches and software-based firewalls running on the DPUs, the price drops down to $1.05 million, or $125 per Gb/sec, which is a 53 percent improvement in security and networking bang for the buck.

This around the same bang for the buck improvement that Cisco is seeing in its smart switch comparisons.

The CX 10040 will be available in late June or early July.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

Be the first to comment

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.