For any public cloud to succeed, it has to offer best of breed technologies reasonably close to the cutting edge and supporting the wide variety of compute that the enterprises of the world would otherwise acquire and run on premises. So being a hyperscaler and cloud builder doesn’t just mean leveraging economies of scale, but also delivering economies of scope to capture the largest possible number of customers.
This is a tough trick to pull off, and only the wealthiest companies – meaning those that have other cash pools to draw from other lines of business – will be able to do it, we think, excepting for some possible niches.
Amazon Web Services kickstarted itself with an online retail business that is dependent not only on that cloud to run itself, but it even more dependent on the profits that this cloud business generates to prop up that cut-throat retail business and Amazon’s stock price – which makes Jeff Bezos, the company’s founder, the richest man in the world. Google Cloud Platform is not where Google runs its search engines, or streams its videos, or provides email services, but that online advertising business generates huge pools of cash that it can plow into the cloud. Microsoft similarly has huge and profitable businesses supplying operating systems and applications for PCs and servers and can invest at the same pace as AWS and Google into its public cloud, and as we have pointed out before, it actually has an installed base of tens of millions of customers using its Windows Server platform that will naturally gravitate towards services on its Azure public cloud. Provided it offers a wide enough and a deep enough portfolio of compute, networking, and storage at a competitive price.
At its Ignite 2018 conference in Orlando, Florida this week, Microsoft is trotting out a slew of new compute instances and features on Azure to attract new workloads that have not usually been associated with the Windows Server platform, namely machine learning and inference as well as simulation and modeling applications that, roughly speak, constitute high performance computing these days.
First up is a new HB series of virtual machines on Azure, which are based on AMD’s “Naples” Epyc processors and reinforcing the use of these chips by the first big name cloud provider in the United States or Europe. Back in December 2017, Microsoft previewed its Lv2 series of instances, also based on the Epyc processors and based on its “Project Olympus” server designs, which have been open sourced through the Open Compute Project. These L series virtual machines on the Azure cloud had a pair of top bin, 32-core Epyc 7551 processors, with virtual CPUs scaling from 8 to 80 and main memory scaling from 64 GB to 640 GB; a 1.92 TB flash drive is allocated for every eight vCPUs, and as far as we know, the networking tops out at 10 Gb/sec. Those vCPU limits seem to imply that threading is turned on to scale the vCPUs but Microsoft is not willing to push it all the way up to 128 vCPUs and 1 TB of memory using 64 GB memory sticks. Which is odd.
In China, Tencent has already copped to using Epyc chips on its cloud instances, based on two socket designs, and Baidu has put Epyc chips into single socket machines that underpin its cloud and machine learning workloads, and has also embraced AMD’s Radeon Instinct GPU accelerators alongside of them.
With today’s announcement of the HB instances on Azure, the HB60rs instance has the same Epyc 7551 processor as the Lv2 instances, but only 60 of the 64 cores in the two socket machine are accessible to the instance and hyperthreading is turned off. The instance has 240 GB of main memory (4 GB per core) and a 700 GB NVM-Express flash drive for local storage. The nodes behind the HB instance have the standard 40 GB/sec Ethernet link coming out of them, but they also have 100 Gb/sec EDR InfiniBand, with low latency RDMA available, from Mellanox Technology as an alternative network for customers who want to do something that smells more like traditional HPC or new AI who need the lower latency and higher bandwidth EDR InfiniBand offers. Evan Burness, principal program manager for Azure HPC at Microsoft, says that these Epyc machines can deliver up to 260 GB/sec of memory bandwidth, which is 33 percent faster than the “Skylake” Xeon SP processors can do when paired up in a system. (The numbers we have seen from Intel on a pair of Skylake Xeon SP-8180 Platinum processors show Intel can drive somewhere around 215 GB/sec to 220 GB/sec in a two socket box, depending on where you want to put the knee in the STREAM Triad performance benchmark curve.) Burness adds that the 260 GB/sec of memory bandwidth that its two-socket Olympus machines are delivering through the HB60rs instance is about 2.5X that of the memory bandwidth that the typical HPC shop has in its clusters today.
Microsoft is also introducing a companion HPC instance, called the HC44rs, which is based on a pair of Skylake Xeon SP-8168 Platinum processors, which have 24 cores running at 2.7 GHz with a 3.7 GHz top Turbo core speed. This is not the top bin SP-8180M, which has 28 cores running at 2.5 GHz, but it is less than half the price per processor for the next item down on the SKU chart, so this is a better option. This two socket Olympus server has 44 cores, which can often run all cores at 3 GHz for a lot of workloads according to Microsoft, and is configured with 352 GB of main memory (8 GB per core), which is twice the capacity per core as the Epyc configuration above. This is odd that Microsoft would configure half as much memory capacity per core on the HB60rs Epyc instance than on the HC44rs Skylake instance. The HC44rs instances have the same 40 Gb/sec Ethernet and 100 Gb/sec InfiniBand network links as the HB60rs instances. The same 700 GB NVM-Express flash drive is in the HC44rs node.
Microsoft has tuned up both the HB and HC instances with Message Passing Interface (MPI) stacks used for parallel HPC applications and also used by a growing number of machine learning frameworks, and the Azure CycleCloud (from Microsoft’s Cycle Computing acquisition) can be used to manage the clusters. “We are also introducing SRIOV for InfiniBand, which allows customers to use standard Mellanox/OFED drivers just like they would on a bare metal HPC cluster,” Burness explained in a blog post covering the announcement. “In doing so, the new H-series VMs officially support all MPI types and versions, such as OpenMPI, MVAPICH2, Platform MPI, and Intel MPI, as well as all RDMA verbs. The new H-series also will support SparkRDMA for acceleration of the popular Spark analytics framework.”
Microsoft is taking applications for the public preview of the H-Series instances here, and says they will be available for that preview by the end of the year.
In addition to these two CPU-powered instances, Microsoft also is talking about two GPU-accelerated N-Series instances, both based on Nvidia Tesla GPU accelerators. The first, which is in preview today, is the NVv2 instance, which is based on the older “Maxwell” Tesla M60 accelerators. This one is really aimed at remote visualization and other graphics workloads, not compute as we talk about it here. It is in preview now, and the machine is based on early Microsoft servers using Intel “Broadwell” Xeon E5 processors that are now two generations back; they are equipped with a pair of 12-core chips and 448 GB of main memory plus a quad of Tesla M60 accelerators; Microsoft cuts the machine in half and then in half again to offer three different variants of the NVv2 instance.
The new member of the ND instance series is more interesting for big compute jobs, and is based on the HGX-1 chassis that Microsoft and Nvidia cooked up together and launched in March 2017. This machine eight of the “Volta” Tesla V100 accelerators linked to each other by the NVLink interconnect plus two 20-core Skylake Xeon SP processors with a maximum of 672 GB of main memory in the box. Microsoft has not released the granularity of the NDv2 instance as yet, or its pricing since it won’t even be in preview until the end of this year.
This is a much heavier instance in terms of raw GPU compute compared to the original ND instances, which had four of the “Pascal” Tesla P40 accelerators and two 12-core Xeon processors and 448 GB of memory in the full configuration, which could be cut in half and then in half again down to 6 cores with 112 GB of memory and one P40.
The NDv2 instance is aimed at both machine learning training and inference workloads, according to Microsoft, but even with 40 Gb/sec stock Azure networking, it could be a beast on HPC codes, too. That’s why Microsoft is trotting out Altair’s UltraFluidX aerodynamics simulator and NanoFluidX fluid dynamics simulator as part of the NDv2 instance chat. Microsoft will be using these instances, by the way, to power the speech recognition training for its Cortana assistant and the translators used in Skype, Windows, and Office.