Depending on how you want to look at it, the half dozen companies that have aspired to bring ARM architecture to the datacenter through chips designed specifically to run server workloads are either very late to market or very early. The opportunity to take on Intel was arguably many years ago, when the world’s largest chip maker was weaker, and yet despite all of the excitement and hype, no one could get an ARM chip into the field that clearly and cleanly competed against Intel’s Xeons and did so publicly with design wins that generated real volumes that took a bite out of Intel’s business.
Qualcomm, which makes its billions selling chips for smartphones, tablets, and wireless networks, wants to be the first, and it may fall to Qualcomm to actually make ARM servers a commercial product because the other players in the market that have tried are not yet succeeding. Broadcom, the networking giant that is the result of the $37 billion merger between Avago Technologies and the prior iteration of Broadcom in May 2015, was perhaps the best hope of a giant bringing a volume-class ARM server product to the field, but we have heard persistent rumors since ARM TechCon a month ago that Broadcom has ceased development of its “Vulcan” homegrown ARM server chips. After that merger between Avago and Broadcom, the new Broadcom was roughly half the size of Qualcomm and a quarter the size of Intel, just too give you a sense of scale, which is vital when it comes to chip manufacturing. Broadcom has refused to confirm or deny the rumors about Vulcan being canceled, talk that really started as soon as Avago bought Broadcom, which announced its intent to enter the server fray in 2014 and which was expected to get chips in the field in 2015.
Anand Chandrasekher, senior vice president and general manager at Qualcomm Datacenter Technologies, who is spearheading the development of the “Amberwing” Centriq ARM server chips, is undaunted by the failure of Calxeda, Samsung deciding to not do server chips, and maybe Broadcom, AMD, and possibly Applied Micro pulling back, too. Because of the intense competition between Intel and the rest of the chip world, the dominant supplier of chips for smartphones and wireless networks has been secretive about its homegrown ARM server chip core, which ironically is called “Falkor” after the good luck dragon in The Neverending Story. And Chandrasekher did not revealing much when he divulged some details about the Amberwing chip, either, as the company has started shipping samples of its second server-class chip to early customers.
In October 2015, Qualcomm showed off a prototype generation of ARM server chips, based on an earlier iteration of the Falkor core, which is a custom ARMv8 design and which is not just a cookie-cuttered variant of the “Kryo” cores used in the Snapdragon line of chips that The Q sells for clients and networking devices. This prototype ARM server chip, whose name was not revealed, is limited to a single processor socket per machine, and for many workloads, particularly at hyperscalers, that is fine so long as that socket has enough oomph and memory. But Qualcomm has admitted to The Next Platform that Qualcomm and its early adopter customers do anticipate needing for SMP or NUMA clustering to glue ARM chips into a shared memory system with more oomph; we suspect that the Centriq line of Qualcomm ARM server chips will span two sockets at least, and possibly more.
In February this year, when Qualcomm was talking about its broader business with Wall Street, we analyzed everything it said about the server chip business and guessed that the 24-core prototype ARM chip from the company was implemented in a 14 FinFET nanometer process, just like the Snapdragon 820 is, although we conceded that it could be fabbed by Taiwan Semiconductor Manufacturing Corp instead of Samsung Electronics. The caches, clock speed, accelerators, NUMA or SMP scaling, memory capacity, and other important aspects of the prototype Centriq ARM server chip were not divulged.
That chip doesn’t really matter, except to demonstrate the custom ARMv8 core design to potential hyperscaler customers, and Qualcomm has previously confirmed to The Next Platform that it is working with all of the eight key hyperscalers (chant them along with us, Google, Amazon, Facebook, Microsoft, Baidu, Tencent, Alibaba, and China Mobile) in one form or another on the Centriq project – a good reason for Broadcom, which has other fish to fry, to back out. We have heard that one of the big hyperscalers in Silicon Valley – there are only two – are behind Qualcomm’s move into server chips. Our guess is Google is the one pushing hard.
Earlier this year, when Qualcomm was ramping up it ARM server chip PR campaign, the company said that it would be using the latest FinFET process node to make the Amberwing chips. That statement was vague enough to give Qualcomm some wiggle room and sow some confusion and keep Intel and everyone else guessing. TSMC and Samsung were aiming to get 10 nanometer manufacturing processes up and running in production by the end of 2016, and we guessed if Qualcomm was hoping to get a part with 48, 64, or 96 cores out the door, it no doubt wants to be on these 10 nanometer processes. The number of cores always depends on how brawny they are, and considering the customers are hyperscalers, these cores would have to provide about the same performance as a Xeon thread to be interesting. Our wild guess was that if Qualcomm can do 24 cores in a prototype in 14 nanometer FinFET processes, then it can possibly get 32 cores or 36 cores in a 10 nanometer process with fairly brawny cores and maybe 48 cores or 64 cores if they are a bit wimpier.
As it turns out, Chandrasekher confirmed to The Next Platform that the Amberwing chip will be called the Centriq 2400 line, with variations in core counts, cache sizes, and clock speeds as you would expect, and that it would top out at 48 cores and be implemented in the latest 10 nanometer processes like the latest-greatest Snapdragon 835 chips, which were unveiled two weeks ago, are. These Snapdragon 835 chips will ship in the first half of 2017, and the 48-core Amberwing Centriq 2400 chips.
We asked Chandrasekher who was the foundry for the Centriq 2400 chips, and he did not reveal who it was, but when we suggested it could be Intel, his former employer who is working on 10 nanometer processes, he nearly choked on his tea laughing and confirmed that it, indeed, was not Intel that is making Qualcomm’s chips. (Hey, that’s how Intel started down its $16.7 billion road to buying Altera. First comes foundry, then comes acquisition, then comes integration in the Xeon package. . . .)
While the design of the Amberwing chip will be interesting, the process node and cadence between mobile and server chips is perhaps more revealing about Qualcomm’s plan and its prospects for success in ARM server chips where others have either failed or not exactly set the bit barns on fire. The adoption of 10 nanometer processes for Amberwing puts this chip at roughly the same launch schedule as Intel’s future 28-core “Skylake” Xeon E5 v5 processors, which are expected in the middle of 2017. (We are hearing about a July launch from server OEMs.) But Intel is only on 14 nanometer processes for Skylake Xeons, and won’t get 10 nanometer chips into the field until maybe 2019 with “Cannonlake” Xeons if there is an interim 14 nanometer “Kaby Lake” Xeon in 2018, as we expect. The gap between a design and process between PCs and servers for Intel has widened to about 23 months or so, and Qualcomm is going to cut that gap to somewhere less than a year between the Snapdragon and Centriq.
The natural thing to wonder is how much commonality there is between Snapdragon and Centriq. “Some things carry over, but a lot does not,” Chandrasekher tells The Next Platform. “The core is a dedicated server core, and the system on a chip is also dedicated to servers as well. Those two things are the bulk of the investment, and we are on a leading edge node. So when you compound all of those, it is not a trivial undertaking from a development cost standpoint.”
Being out first is a key factor in Qualcomm’s strategy, and it is in stark contrast to the hang-back attitude that AMD, Applied Micro, Cavium, and Calxeda have had with their ARM server chip efforts. While Chandrasekher said that Qualcomm was committed to keeping the gap small between mobile chips and server chips and on being first on process nodes for server chips, he said there was no chance that The Q could get server chips out ahead on any given process, as FPGA makers sometimes are able to do. The size of the server chips requires for the yields to be improved using volume mobile chips, and their volumes are so much higher than for FPGAs. The numbers do not work.
We had a very long conversation about the chip business and servers with Chandrasekher, and we will be following up with that shortly. It was interesting and fun. But here’s a teaser:
“We are accelerating innovation in the datacenter,” says Chandrasekher. “That is our goal, and part of that is being first in the market on leading edge nodes, and part of that is being able to take advantage of the SoC skills and competencies that Qualcomm already has so we can integrate a lot. And then we have cadence. We think the market is crying for this kind of accelerated innovation. If we execute – and at the end of the day, strategy is just strategy, and we have to execute – I think the market will be quite welcoming.”
While two would be better, the world only needs one good, strong ARM server chip vendor to be successful.