The Most Obvious Hyperscaler To Do Custom Chips Was Always Facebook

Et tu, Meta?

Of all of the world’s hyperscalers and large cloud builders, only Meta Platforms, dominated by its Facebook and related social network businesses like Instagram and WhatsApp, is a pure-play hyperscaler. Meaning, it is not a cloud and it therefore does not have to buy X86 processors from Intel and AMD and GPUs from Nvidia to provide hardware substrate on a rental basis. And so, the indications that Facebook is putting together a chip design team does not surprise us one bit.

If anything, we wonder what took Facebook, and now Meta Platforms, so long.

Ditto for Apple, which is big and which also does not run a cloud but does not, we think, operate at hyperscale. Apple is close, though, but still not on par with the giants. Draw that line where you will, and then we can say that Apple is naturally suited to not only designing and building integrated consumer devices, but could also probably build a damned good, user friendly, premium priced datacenter – in a perfect, stylish metal shell enclosure with a glass roof that doubled as a solar panel or some such – if it was so inclined.

To one extent or another, all of the other hyperscalers have also created cloud businesses, or the big clouds have morphed into hyperscalers by providing software as a service. IBM, with its eponymous cloud, is the only big cloud that doesn’t provide consumer applications supported by ads or fees. Big Blue does sell some of its software “as a service” on the IBM Cloud, and perhaps in a sign of how well that business is performing, has inked a huge partnership with Amazon Web Services to port its systems and application software – Red Hat OpenShift being the key piece of code – to AWS. VMware threw in the towel on building its own cloud and just moved to bare metal AWS infrastructure, and we would not be at all surprised if IBM does the same thing at some point.

All of the hyperscalers and cloud builders, again to one extent or another, have AWS envy, especially after the success of the Nitro DPU and its embiggened Graviton GPU; the Trainium and Inferentia, the AI training and AI inference chips, not so much. Especially Trainium, which no one talks about much. And of course, AWS is rumored to be working on its own networking ASICs, just to keep Broadcom and other switch and router ASIC suppliers on their toes. Microsoft has been rumored to be working on its own Arm server chips since picking up a lot of the Marvell ThunderX team since late 2020 and has just tapped Ampere Computing’s Altra CPUs for instances on its Azure cloud.

Google has made it pretty clear that it doesn’t want to design its own CPUs, first by joining IBM’s OpenPower consortium and then not using Power8 and Power9 chips much, and then by declaring that the SoC is the new motherboard and stating emphatically that it wants to buy, rather than build, stuff and cram it all onto SoCs. The TPU – the TPUv4 chips are just coming out now – and some security and video encoding chips excepted, of course. And Google is perfectly happy to pitch custom AMD Epyc CPUs against Graviton. But we also think that Google may eventually deploy the Altra line from Ampere Computing in its cloud. For all we know, it is already using Altra CPUs in its own datacenters for internally facing applications as it prepares for such a launch.

Which brings us all the way back to Meta Platforms and some key hires.

Alexis Black Bjorlin, who is a materials science expert with a BS in materials science and engineering from MIT and a PhD in materials science from the University of California at Santa Barbara and who worked at a number of silicon photonics startups before running the SiPho business and then its overall Connectivity Group at Intel between 2014 and 2018, and who also ran the Optical Systems division at Broadcom for nearly three years, was tapped in December 2021 to be vice president of infrastructure at Meta Platforms.

A tip of the hat to Dylan Martin, the intrepid semiconductor reporter over at our sister publication, The Register, for seeing this and for also seeing the new appointment of Jon Dama, another key Intel executive with deep switch and DPU ASIC experience, moving over to the social networking giant.

Dama is an interesting hire for Meta Platforms because he has worked on key pieces of silicon, ones that Bjorlin eventually ended up managing. Significantly, Dama led the development of the FlexPipe programmable switch pipeline embedded in the FocalPoint and PivotPoint switch ASICs created by Fulcrum Microsystems, which Intel acquired in 2011. After managing silicon engineering teams at Intel for more than six years, he was tapped to be director of cloud and IPU silicon, and was director of silicon engineering for the Connectivity Group for the past three years.

On his LinkedIn profile, Dama said: “I’m excited to announce that I am starting a new position as Director of Silicon in the Infrastructure Engineering group at Meta! This is a team that is stronger than the sum of its people. I am humbled by the opportunity to be their captain. It is a chance to work amid the best talents and to innovate with agility to scale the next several doublings of data processing. Truly an amazing team has been assembled and will grow further. I am confident we haven’t run out of human ingenuity yet.”

Not for a few more years, at least. And humans won’t run out of ingenuity ever, but the AI systems of the world might run out ahead of human ingenuity when it comes to circuit design at some point in the next X years. We shall see. . . .

Which begs the question: What is Meta Platforms up to? We strongly suspect that the company will continue to use merchant silicon in servers, switches, and storage, as it has been doing since its inception, but also will add custom silicon where the economics and price/performance warrant it. Who better to steer such a strategy than engineers who have also worked at some of the largest chip suppliers in the world?

After the embarrassment earlier this year of having to buy an Nvidia AI supercomputer rather than design and build one out of customized parts in an Open Compute Project form factor, we think Facebook and the other parts of the Meta Platforms conglomerate are absolutely looking into doing custom silicon and at the very least pushing for tweaks to upcoming designs from third party suppliers.

And we also think that the metaverse, which the company is spending a fortune on, is a workload that is going to require a very tight coupling of hardware and software design for it to not be as expensive as it might otherwise be. Meta Platforms can’t afford to overspend on that metaverse buildout – not with its user base on Facebook flattening out and income from its social platforms on a downward trend since the end of 2020. Revenue has been growing, but these metaverse investments are very high and are impacting the bottom line.

The good news for Meta Platform is that there is ample competition for CPUs and GPUs and even specialized ASICs for doing machine learning, which would seem to obviate the need for custom Meta chips with its squished googol symbol on them. (Funny that. . . . ) And even still, the rumors persist that the company wants to create its own AI training chips, and perhaps more. Meta Platforms already makes custom chips for video streaming and for accelerating recommendation engines with AI inference, so why not to more custom silicon?

Because Facebook will have to get in line for foundry capacity at Taiwan Semiconductor Manufacturing Co. Any great team can design a new chip that does all kinds of things, tuned to specific hardware. That’s the easy part these days, even if it is time consuming and expensive. But getting it fabbed and then packaged and tested, now that is the real challenge these days.

Which is why we think that Facebook is putting together a decade-long strategy to make sure all of the design and sourcing mistakes it has made with building the social network are not repeated as it builds the metaverse – or some neighborhoods in it at the very least. And that all of the things it did correctly, at the right place and at the right time, are repeated as it moves from a 2D web to a 3D augmented digital twin of the world.

The Most Obvious Hyperscaler To Do Custom Chips Was Always Facebook

Sign up to our Newsletter

1 Comment

Leave a Reply Cancel reply

Sign up to our Newsletter

Related Articles

Meta Lets Its Largest Llama AI Model Loose Into The Open Field

Presto Is The Third Time Charm For Federated Databases

Cerebras Trains Llama Models To Leap Over GPUs

1 Comment

Leave a Reply Cancel reply