Intel has spent the past nine months reorganizing itself in the wake of Pat Gelsinger becoming its chief executive officer in January, including new groups and divisions and new managers for them that were revealed in June. But the fate of its HPC organization, which has been less focused than it has been in past years, was still a big unknown until the company quietly did some finer-grained work on its organizational chart and management and engineering assignments in recent weeks, which is now known if not publicly announced.
We will get into new people in charge and their HPC strategy under the watchful eye of Gelsinger, as far as it can be revealed today at least, in a moment. And, as usual, we will provide some historical context for why Intel is changing its HPC strategy and tactics going forward. But the big news as far as we are concerned is that Intel is no longer interested in being a prime contractor for big supercomputer deals, something that it was very eager to do when Raj Hazra was general manager of the Technical Computing Group within Intel’s Data Center Group back in 2015 and interestingly was also put in charge of the Enterprise Computing unit at the same time.
We are getting this statement straight from Jeff McVeigh, the general manager of the new Super Compute Group, which is tucked under the Accelerated Computing Systems and Graphics Group (mysteriously abbreviated AXG by Intel) that was announced in the June reorg and that is led by Raja Koduri, who has been Intel’s chief architect and general manager of its Cores & Visual Computing & Edge Computing solutions.
“We have got a number of different technologies from CPUs to silicon photonics to GPUs now and we are going to bring those all together to help our customers and partners bring this to market,” McVeigh tells The Next Platform. “Honestly, we are not looking to prime, to be the front facing company, but to work with our OEM partners in the channel to make that happen. We are interested in bringing components together into balanced systems that address the customer needs. It’s not just an individual component, but how do those things work well together, and then also the software stacks on top of that.”
Intel has been a prime contractor on some important machines in the past. The Delta program Intel had with the US Defense Advanced Research Projects Agency in the 1980s, built using thousands of Intel i860 RISC processors, resulted in the Touchstone Delta system at Caltech and the larger and commercial-grade Paragon XP/S systems and their Unix software stack. Intel was also prime contractor for the ASCI Red supercomputer installed at Sandia National Laboratories, the first of the US Department of Energy’s machines to be acquired under the Accelerated Strategic Computing Initiative and, importantly, the first machine in the world to break through the teraflops barrier on the Linpack benchmark. (ASCI Red was based on the Paragon design, but swapped out the i860 and i960 RISC processors for Pentium Pro processors.) In many ways, the future “Aurora A21” exascale supercomputer that Intel is prime contractor for — and that is yet to be installed at Argonne National Laboratory — is a reprise of Intel’s strategy with ASCI Red. It was supposed to be the first exascale machine, but very likely will not be.
What was clear to us when the bidding opened up for pre-exascale and exascale systems is that only companies with deep pockets could play the role of a prime contractor, and we have said as much since the founding of The Next Platform. The machines are just too expensive. When they were $25 million or $50 million, the bill of materials was within the range of relatively smaller companies like Silicon Graphics or Cray. But once they started costing $100 million for a big machine (tens of petaflops), then $200 million for pre-exascale machines (hundreds of petaflops), and then $500 million or more for exascale machines (more than 1,000 petaflops), only a few companies could be prime. Intel correctly saw this and stepped into the fray as it had done in the past. To be sure, Intel wanted to set the pace for HPC and reap the rewards directly — no question about this. And we think this did not make Intel’s OEMs happy, but they were not big enough to take down the deals. IBM, Fujitsu, and Intel were, and now HPE, Nvidia, and Atos are, too. And any of the hyperscalers and cloud builders are, too, if they are so inclined.
But the combination of Intel’s inability to get the Aurora technology settled and in the field at Argonne in a timely fashion and the acquisition of Cray by Hewlett Packard Enterprise has more or less pushed Intel back into the position of arms dealer rather than Army general.
The Long Road Back To Where Intel Was In HPC
The high performance computing market, with its many segments — and of which the traditional simulation and modeling workloads, often called technical computing, are only a part — was much too big and much too important, for Intel to ignore for the past decade. And the emerging and much broader HPC market that we see in front of us is certainly too big for Intel to ignore now.
Some history is perhaps in order. Starting in 2012 or so, just as HPC was democratizing and the machine learning flavor of artificial intelligence started to actually work (big data was available at the same time parallel GPU engines were fast enough and cheap enough to chew on it), Intel set about to assemble a portfolio of compute engines, network ASICs, and software to take on traditional HPC workloads as well as other forms of HPC such as AI and data analytics, with a smidgeon of technical workstations and a dash of high-frequency trading. In some ways, you can consider any processor SKUs and system designs aimed at hyperscalers for search engine and analytics work as another kind of HPC.
The idea was for Intel to do for HPC at large what it had done for the X86 server: extending it from a CPU supplier to a platform architect and, in special cases, a full-blown platform supplier.
It was not a bad idea, as strategies go, but it is safe to say that Intel has had a tough time capitalizing on the broader HPC opportunity beyond selling its CPUs and occasionally other kinds of accelerators, such as the “Knights” many-core processors (the aftermath of a failed effort in “Larrabee” discrete GPUs) or the various FPGA accelerators that came with the $16.7 billion acquisition of Altera in 2015. That acquisition was admittedly aimed more at the hyperscalers and cloud builders and their desire to accelerate storage, network, and machine learning inference processing, but everything that has emerged for these customers with regard to what we now generally call DPUs — short for Data Processing Unit — will apply equally well to the other segments of HPC in due course.
Intel’s HPC investments have been enormous, starting with more than two decades of research and development into silicon photonics; we have no idea what this cost. The development of SIMD, vector, and now matrix math units within its Xeon family of processors over the years, which have clearly been aimed at HPC, is also hard to quantify. Together, it is safe to say these two efforts alone have cost billions of dollars. And to be fair, for the vast majority of traditional HPC customers who run CPU-only clusters and for the vast majority of machine learning inference that is still on CPUs, the Xeon evolution has been fine. But in the long run, as Moore’s Law runs out of gas, hybrid collections of domain-specific processing and the languages and libraries that drive them are going to be more common.
Hence Intel is spending what we presume is a fortune to create its Xe family of GPUs, and a big part of that spending is probably going to the “Ponte Vecchio” Xe HPC GPU accelerator, the main compute engine in the Aurora system at Argonne. The original Aurora design was supposed to be based on the “Knights Hill” many core processor and be delivered in late 2018 as the first exascale computer in the world. For reasons Intel has never been clear about, Knights Hill was canceled, and it probably has something to do with Intel’s delays in getting 10 nanometer chips into the field and probably a desire by Argonne to have a hybrid CPU-GPU architecture instead of an all-CPU machine (and thus matching the basic architecture of other big Department of Energy HPC labs in the United States).
But it is more than that. Intel is investing in the “Loihi” line of neuromorphic processors, and paid $350 million to acquire AI chip startup Nervana Systems in August 2016. Intel then hedged its bets on AI engines when it paid $2 billion to acquire Habana Labs, thus wiping out the Nervana investment with the sweep of a pen.
And Intel has not just spent on HPC processing (in the broadest sense) inside of the processor and through accelerators of various kinds, but also on interconnects, flash and 3D XPoint memory, and software.
The network buying binge started early and was central to Intel’s broad HPC strategy a decade ago. It all started in July 2011 when Intel snapped up innovative Ethernet chip maker Fulcrum Microsystems for an undisclosed sum, which was followed by the $125 million acquisition of the InfiniBand business from QLogic and the $140 million acquisition of the “Gemini” and “Aries” HPC interconnect business from supercomputer maker Cray. The result is the Omni-Path networking stack, which we have always called a flavor of InfiniBand.
Other big HPC investments and power plays by Intel in software include the acquisition of Whamcloud, the commercial entity behind the open source Lustre parallel file system in July 2012, a whopping $740 million investment in commercial Hadoop provider Cloudera in December 2014, and the establishment of the OpenHPC software stack in November 2015 to try to create an enterprise-grade analog to Red Hat Enterprise Linux for the HPC community.
It is obvious that a lot of this investment and effort by Intel in the broader HPC market has not paid off directly — meaning lots of systems at HPC centers, hyperscalers, and cloud builders using its exotic wares. But we can say that Intel has learned some important lessons about what it needs to focus on and the room it needs to leave in the market for others to do what they do best. And what risks it needs to take and what risks it can avoid. In its fourth quarter, the Intel Federal division, which is the prime contractor for the Aurora system, is taking a $300 million write-down. Gelsinger can’t afford the public drama that Aurora has represented — it will be nearly four years late when it is delivered next year — nor can he afford such hits to Intel’s books.
With that, let’s finish by talking about the reorganization of the HPC business at Intel.
The Super Compute Group under McVeigh will be in charge of CPU and GPU compute engines that are aimed specifically at HPC workloads (in the broad sense) as well as datacenter GPUs, and McVeigh will be responsible for the profit and loss statements for these products.
To be precise, the Super Compute Group products will include the Ponte Vecchio GPU as well as the HBM variant of the “Sapphire Rapids” Xeon SP chip that is also being deployed in Aurora. It also includes the HPC and AI software stacks, all the way down to the drivers and libraries for these compute engines as they relate to HPC and AI. (At some point, we will drill down into this with McVeigh a little bit more.)
We would not be surprised to see HPE sell off the Slingshot interconnect to Intel at some point, particularly given the investment that will be required to deliver 400 Gb/sec, 800 Gb/sec, and 1.6 Tb/sec variants of the “Rosetta” ASIC at the heart of Slingshot. While Intel has a good and highly programmable Ethernet ASIC family in the Tofino line from its Barefoot Networks acquisition, as yet there is nothing in particular that has been tweaked to make it suitable for the low latency required for traditional HPC workloads and increasingly for AI workloads, too. In any event, we think the Super Compute Group should have an interconnect strategy, including switch ASICs and DPUs, as well as a compute strategy. The old guard at Intel who saw this a decade ago were correct, and Nvidia surely sees this now as it is adding the “Grace” Arm CPU to its GPU accelerators and InfiniBand and Ethernet interconnects (the latter compliments of its Mellanox acquisition).
McVeigh got his Bachelor’s in electrical engineering from Duke University, then his PhD in electrical and computer engineering from Carnegie Mellon University, and promptly joined Intel, where he has been vice president and general manager of its visual computing products for a long time. (We are not sure how long.) For almost two years now, McVeigh has been vice president and general manager of its Data Center XPU products.
The other piece of the HPC business at Intel is called the Super Compute Platform Engineering Group, which has Brijesh Tripathi as its general manager and chief architect. This is where the architectures of future Intel products are going to be designed, based on input from customers as channeled through the Super Compute Group and through research done by Intel itself and insight from competitive products in the market. This is the kind of split that Intel has always liked — money and customers on one side, research, development, and engineering on the other.
Tripathi got his Bachelor’s in electrical engineering from the Indian Institute of Technology and then his Master’s in electrical engineering from Stanford University, and then was a senior design engineer at Nvidia in the early 2000s before joining Apple as an SoC engineer and then platform architect until 2015. He ran a startup of his own until landing the lead for platform engineering for electric car maker Tesla for two years and dabbled in some startups before landing the job of vice president and chief technology officer at Intel’s Client Computing Group in July 2019 before landing his current position. Tripathi is no doubt taking some of the load off Koduri’s heavily laden desk.