Site icon The Next Platform

HLRS Takes First Steps To Exascale

The University of Stuttgart’s High Performance Computing Center (HLRS) in Germany tapped Hewlett Packard Enterprise back in December 2023 to build a prototype hybrid CPU-GPU supercomputer nicknamed “Hunter” to pave the way towards an exascale-class machine it is budgeting to have installed in 2027 called “Herder.”

The Hunter system is now up and running, and it has considerably more compute capacity than expected as it turns out. HPE and HLRS are also giving out a few tidbits of data about the future Herder supercomputer, and we now have more detailed GPU and APU roadmaps from AMD, and we can now see how Herder’s deployment less than three years from now might intersect with those roadmaps.

The Hunter machine is considerably more powerful and proportionately larger, in terms of occupied rack space, than originally planned. A little more than two years ago, HPE and HLRS said that the Hunter machine would have 544 of AMD’s “Antares-C” Instinct MI300A CPU-GPU hybrid compute engines in it, which we calculated would have a peak 64-bit floating point calculating rate of 33.35 petaflops in the aggregate. As it turns out, the Hunter system that was delivered to HLRS by HPE weighs in at 752 of the MI300A compute engines and delivers 48.1 petaflops. That’s about 44.2 percent more performance for a 40.1 percent increase in the number of compute engines.

Interestingly, the Hunter machine has nearly twice the performance of the existing “Hawk” system, based predominantly on “Rome” Epyc processors from AMD and weighing in at 26 petaflops. The Hawk system was installed in 2020 and has 11,264 cores, and burns five times as much electricity as the Hunter machine that replaces it. That is an 80 percent drop in power consumption for 2X the performance, or about an order of magnitude better performance per watt.

The Hunter machine is based on the “Shasta” Cray EX4000 system design, and has very efficient fanless direct liquid cooling, something that is a plus in Europe with space being at a premium and electricity costs being very high compared to North America. Hunter will also use a dynamic power capping governor developed jointly by HLRS and HPE that monitors power usage of all applications on the system and regulates their performance against a power cap for the system. On the prior Hawk system, HLRS says this power capping governor reduced overall power consumption by 20 percent “without significant losses in performance.”

The other thing that the Hunter system has – and that no one was talking about back in December 2023 – are Cray EX nodes equipped with plain vanilla 32-core AMD 9374F processors. To be specific, there are 256 two-socket nodes, each with 768 GB of DDR5 main memory. These CPU nodes were not used in calculating the peak theoretical performance of the Hunter machine as far as we can tell, but clearly with 16,384 cores running at a base speed of 4.1 GHz, these CPU nodes will provide an additional bump in FP64 throughput for Hunter and probably significantly higher than the 26 petaflops provided by the Hawk system.

As we already reported, the Hunter machine has four 200 Gb/sec “Cassini” Slingshot 11 network interfaces for each of the MI300A compute engines. It is not clear how the plain vanilla Epyc 9374F CPUs are hooked into the network, but presumably there is a 200 Gb/sec Slingshot interface for each CPU, if not more. All of this is hooked together over a “Rosetta” Slingshot 11 dragonfly network.

The Hunter machine also has 25 PB of HPE’s Cray ClusterStor E2000 disk arrays attached to it for scratch, which have a total of 2,120 disk drives across the arrays.

The whole shebang burns 560 kilowatts of juice, and Hunter came in at a price tag of €15 million ($15.6 million). With a budget of €115 million for the combined Hunter and Herder machines, that leaves the remaining €100 million available for the follow-on exascale-class Herder system due to be installed in 2027.

Not much is known about Herder as yet. HPE said that Herder will be corralled into a new datacenter and power facility called HLRS III, that will be under construction soon. HLRS III will use sustainable materials, use photovoltaic panels to make electricity, and will distribute heat generated by the Herder supercomputer to warm other buildings at the Vaihingen campus of the University of Stuttgart.

Back in late 2023, HPE said that Herder would be an “exascale system capable of speeds on the order of one quintillion (1018) flops.” But in the Hunter announcement late last week, HPE said this: “With a top speed of several hundred petaflops, Herder will constitute a major jump in peak performance over Hunter.” So once again, we have to look carefully at the precision when looking at the exascale claims.

We don’t know how much of the remaining €100 million is going for the HLRS III facility, so it is hard to say how much computing Herder might have at FP64 precision. If €20 million of the €100 million goes to the HLRS III facility and the remaining €80 million goes for the Herder Cray EX iron – will it be a Cray EX5000? – then at today’s prices that should get 5X the performance of Hunter, or around 400 petaflops at FP64. If there is a future MI400A hybrid CPU-GPU compute engine – AMD has said nothing about this in its most recent roadmaps, but why not? – then there should be an additional bump in performance above and beyond what is available today in the MI300A. We don’t think it will be a factor of 2X, but it could easily be a factor of 1.2X to 1.4X for FP64 math on the GPU parts of a future hybrid AMD socket.

We will have to learn more about AMD’s future GPUs before we can guess with any precision. What we do know is that the engines for Herder will not be chosen until later this year. And it is not a foregone conclusion that AMD compute engines will be used in this machine, even if it is highly likely given that Hunter is intentionally a transition machines for programmers to start working on code for Herder.

Exit mobile version