If you are in the traditional HPC community, it is not hard to be of two minds about the rise of AI and the mainstreaming of generative AI. At the very least, the GenAI tsunami makes it easier to argue for hardware budgets even though it is extremely difficult to get your hands on any datacenter-class GPU these days. And because of this, organizations that are buying hybrid AI-HPC systems are going to be able to do more HPC processing as their systems get upgraded to do more AI – at least in theory.
It is good – and fun – to talk about theory, but eventually you need to get grounded in some reality. And that is why we like to touch base with Scott Tease, general manager of HPC and AI at Lenovo, around the same time that the SC supercomputing conference takes place each year. We did so in the wake of the SC23 event, and among other things, wanted to find out if AI and HPC architectures are diverging and what this might mean for the traditional HPC simulation and modeling market.
Timothy Prickett Morgan: I don’t know about you, but I am sick of hearing about AI as if it is the only important kind of HPC. Does every story have to have this AI angle? And the answer, at least for the past ten months, seems to be, well, yes it does.
Three years ago, system makers were selling a lot of HPC with some AI on the side, and I would guess that now you are selling a lot of AI with some HPC on the side.
Scott Tease: So Lenovo overall is doing a lot of AI, and the vast majority of it is going to our cloud customers like Microsoft, Oracle, and the hyperscalers and big clouds in China. But we also have, to be precise, 205 other cloud clients that we call next wave – the next tier of CSP providers. They are investing wildly and heavily, especially in generative AI.
It’s amazing how quick seeing investment go into AI, which has a multiple many times bigger than our entire HPC business. So AI systems are significant at Lenovo, and that is new over the past call it 16 months to 18 months, which is steep. What is sharper than a hockey stick? A perpendicular line. . . . [Laughter]
TPM: Yeah, it is a wall, because it seems to be going straight up from the floor, and a lot of people are going to be hitting it. . . . You have to be Fred Astaire to dance up that wall these days. [Laughter]
Scott Tease: Or a rock climber. Let me give you a couple stats that you might find interesting.
First of all, depending on the quarter, somewhere between 18 percent to 20 percent of our HPC system sales have some kind of accelerator it. That’s up significantly from, say, five years ago, when we were probably running at about a 5 percent to 7 percent accelerator attachment rate.
If you look at the number of deals that we do today across the systems business that GPUs in them – and the GPUs are not always the bulk of the spending – the attachment rate is well over half – somewhere above 55 percent but less than 60 percent of our overall system deals have some kind of accelerator in there. And typically, it is Nvidia GPUs. But that’s up at least three times versus what it was five years ago. So the amount of accelerators that we’re pushing out the door in machines these days is head and shoulders above what it was even for traditional HPC use cases.
What it does feel like, however, is that having companies invest in the traditional HPC uses has diminished quite a bit in favor of generative AI. It seems like HPC has taken a backseat to what’s going on with AI. And I think that’s problematic. I think that over time, this AI spin up – whatever you want to call it – is going to calm back down. This will calm down as AI moves into business operations part of the enterprise. In HPC, we were still seeing continued growth in accelerated codes, and for sure the growth is there. I just hope the interest in supporting the HPC market is not going away in favor of an all-in approach based on the new shiny object, AI.
TPM: More than four years ago, I wrote that it was extremely convenient that there was a convergence in HPC and AI system design, but also that this convenience may not last because eventually the needs of AI would diverge from the needs of HPC, which I talked about this time last year as Nvidia’s Transformer Engine to boost AI large language model training was put at the heart of its “Hopper” H100 GPU accelerators at the same time that generational improvements from A10o to H100 for low precision floating point math were larger than for the improvements in FP64 double-precision floating point math. Sure, the Nvidia GPUs can do 64-bit, but the real performance gain came from sparsity tricks that are great for AI, and a lot of HPC codes are doing math on dense matrices instead of the sparse matrices more common in AI.
So two things: First of all, how much is AI driving supercomputing architecture in your opinion for HPC in the broadest sense? And second, what happens when this divergence between HPC and AI gets worse? Or will it not get worse and it will just be like this from now on?
Scott Tease: Well, that is a good question – and this may not be the perfect answer your question, but it’s kind of correlated to the same thing. Here goes: A lot of our customers, even though they claim not to care about the Top500, they still buy their supercomputers based on High Performance LINPACK. And if you look at the past, we are accustomed to seeing pretty significant LINPACK jumps per dollar. We are not going to be seeing that any time in the future. The focus of accelerator vendors has been on boosting their AI advantages and is not so much on double precision floating point. We’re already starting to see the HPC desire for double precision improvements over time take a back seat to what we’re working on for accelerating AI workloads.
The one good thing I can say is we have got new competitors coming to market – both AMD and Intel –and my hope is that at least one of those three companies is going to see that this AI market is important, but it’s not the only market. And they’re going to start refocusing more and more effort on the traditional HPC users that have built this datacenter GPU market out.
TPM: I don’t think it is a coincidence that both AMD and Intel have their FP64 and FP32 performance on their vector cores on their GPUs running at the same performance level. So that tells you they’re serious about traditional HPC.
So out in the field, are you seeing HPC centers trying to figure out how to scale down their codes to 32 bits, or is that not really possible?
Scott Tease: For the traditional codes, no it is not really possible to do that. I think HPC centers are still struggling to figure out how to make the best use of these GPUs for AI when they still have to support all of these researchers that are doing very traditional HPC that can accelerate very well on an H100 or whatever. But I think they’re still struggling far more than I thought they would be at this point in time. The HPC community is still struggling a little bit to figure out what its role is versus the hyperscalers. And that’s a healthy thing.
The HPC community is used to being first, and we always considered ourselves as the F1 racing team of computing. We invent the turbochargers and fuel injection and the carbon fiber and then we put that into more general purpose vehicles, to use an analogy. I worry that the HPC community has sort of taken the backseat when it comes to AI and is not leading the charge. Like you, I’m seeing a lot of this AI stuff being led out of the hyperscalers and clouds. And we’ve got to find a way to take that back and carve our own use cases. There are a lot more HPC sites around the world than there are cloud sites, and we have got access to all a lot of data.
TPM: There are at least an order of magnitude more HPC centers than clouds. Exactly.
Scott Tease: We have to be the ones figuring out how to mine value out of that data and showing users how we can be the future. Whether it’s HPC or AI, it all comes down to the data. And we’ve got a lot of the data. And it’s not going to be easy to move that data anywhere around the around the globe.
AI is going to be hybrid because we need to bring AI out to the data.
The hybrid AI thing is really interesting for us. Look at your phone: there’s so much AI in that phone and you don’t even consider it anymore, whether it is powering on my thumbprint or my facial login or natural language processing. AI is so embedded in the way this device works that we no longer even think about it as AI. Some of this AI runs on the phone, some of it runs out in the cloud. And we expect to see the same exact thing in the enterprise: Some AI is going to need to run locally, especially out of the edge where the data is being created and where it’s impossible to take that data where it’s being created and move it out to the cloud over poor networks that we’ve got out of these edge sites for processing. The cost of moving that data is prohibitive, the time in latency for the application is prohibitive, and some countries don’t want that data moved off to some third party cloud. So there are a lot of reasons why some portion of AI is going to need to reside locally on some devices. But some, some is still going to need to go out to the cloud for higher end processing – maybe model building, things like that.
People are going to create private models, taking a large language model such as LLaMA 2 from Meta Platforms that has been trained in the cloud or in their datacenter and run it on an edge device close to where the data is being created. So you are going to have bits of AI everywhere, up and down the value chain. This is not anti-cloud, it is cloud in partnership with what’s going on on premies. To unlock the full value of what we’re trying to do, AI has got to be everywhere, even for our HPC clients. It’s the same concept for them as for anyone else: Bring the AI to the data rather than trying to bring the data to this powerful AI sitting up in the clouds.
Interesting perspective on the tradeoffs between HPC- and AI-oriented supercomputer design! It’s good to see the new SuperMUC-NG Phase 2 (a kind of mini-Aurora?) having nearly the same HPL perf as the previous SuperMUC-NG (Phase 1), but with half the total cores thanks to the Ponte Vecchio GPUs. It is also presumably better able to cross the HPC-AI barrier, in both directions!
Not long ago there was a GPU shortage because of crypto currencies. That didn’t last very long. Now the shortage is because people are training neural networks. I wonder if that will be a lasting use for GPUs or not.
One thing is likely: The type of calculations HPC focuses on were there before generative AI or block chains and that type of science will be useful in the future.
If HPC made money–literally, not as a third order derivative–there would be a Fourier Transform Engine that could handle dense matrix math better at the heart of an Nvidia GPU. It is because HPC is perceived as a cost, not as an actual revenue generator, that it does not see more investment and specialized iron. It is because HPC is expensive and difficult that it is not more pervasive, and that has limited its market appeal and market size. We have been talking about the missing middle in HPC and democratizing HPC to try to encourage people for decades. Has it really worked?
Does any company need to be convinced that they need a generative AI story any more than they needed a Web story 25 years ago?
That said, I don’t think HPC work will go away, and I personally wish we would do more investment and solve big problems quicker for the sake of society.