If you can’t beat the largest cloud players at economies of scale, the only option is to try to outrun them in performance, capabilities, or price.
While go head to head with Amazon, Google, Microsoft, or IBM on cloud infrastructure prices is a challenge, one way to gain an edge is by being the first to deliver bleeding-edge hardware to those users with emerging, high-value workloads. The trick is to be at the front of the wave, often with some of the most expensive iron, which is risky with AWS and others nipping at heels and quick to follow. It is via this strategy that cloud provider, Nimbix, is trying to carve out a niche against the powerhouse infrastructure providers–and it is investing big in this game.
While the company’s CEO, Steve Hebert, could not say how many of the $130,000 Nvidia DGX-1 appliances it purchased to lend out to machine learning end users, he did tell The Next Platform that it is more than one (and fewer than fifty). As a cloud provider, any infrastructure investments come with risk that can be washed by providing a general purpose compute platform, but this purpose-build AI box will require a steady lineup of paying machine learning customers at scale, scope, and consistency–something Hebert says he sees on the horizon in its first round of use cases expected to train large-scale models on the machines.
“We will keep scaling this as demand grows,” Hebert says. “It has been our model since day one; launch new capabilities, invest in a strategic amount of capacity and scale that. Obviously, the big workload is machine learning, but many are now working on parallelizing that across P100 clusters since so many of these frameworks for machine learning are MPI enabled and can scale across many machines and take advantage of NVlink.”
Other cloud providers have been offering the Pascal generation P100s including IBM, which just announced the PCIe version of the GPUs will be available via BlueMix. We can make the natural assumption that AWS will at some point round out its GPU computing portfolio (which tops out performance-wise at the 16-count K80 machines) to capture a broader set of GPU users in machine learning and HPC with Pascal with the same being true of Azure and Google Cloud. Of course, simply making the P100 available without NVlink and the optimized software stack for deep learning baked into the DGX-1 appliance isn’t the same thing–something Nimbix is counting on for its early users.
Nimbix is targeting enterprise end users versus developers with the DGX-1 cloud, something that is clear by their pricing–a charge that is understandable given the unique GPU scalability and framework and deep learning/CUDA library integrations of the DGX-1. At $29.50 per hour (compared with around $15.00 for AWS K80-based instances as an example baseline) it sounds like a steal for those reticent to invest in an actual DGX-1; but can certainly add up for massive model training for days on end. But still, this is cloud economics at its best for users–if not a bit jarring when considering it from the view Nimbix internally.
The economics of cloud in general is another conversation entirely–it’s one thing to look at how AWS rolls its infrastructure profits into fresh investments, but for players that lack that kind of scale, new pushes for pricey new gear must be hard fought and won.
Nimbix isn’t new to the P100. The company brought IBM’s Minsky servers with Power last year but Hebert says the DGX-1 has exceeded other platforms they’ve seen for training at scale. The goal is to help users develop a pipeline for machine learning with DGX-1 for training and inference on more cost-effective platforms, including the standard choice for their user base (general compute X86) and eventually, FPGAs for accelerated inference. He says that the pricing on the DGX-1 doesn’t lend itself well to inference on DGX-1 and while there could be use cases when the desire to keep the whole model on the same machine for training and inference could outweigh the costs, there is much cheaper hardware for getting the inference job done.
“We are really working on the idea of training machine learning pipelines with models that are constantly being retrained based on affordability of continuously improving neural networks–so as more data is being accumulated from the inference engine, you keep retraining. We want to work on streamlining low cost inference as well so there is a single cloud-based platform for model training and scalable inference,” Hebert explains.
On that note, it is just this strategizing that makes Nimbix interesting economically and technically. They have carved a niche catering to high performance over mass appeal. Coming from the HPC world, the earlier incarnation of the company catered to scientific and commercial supercomputing customers and later saw that such hardware had an easy path to the lucrative machine learning world. In addition to stocking up on the DGX-1, the company is doing some interesting work with FPGAs for accelerating inference workloads. Although AWS has taken a step in that direction with its new F1 instances, Nimbix was ahead of this curve and will share more about how early users of FPGAs as a service are faring in the near future.
It is a wonder that there haven’t been more companies stepping up to the DGX-1 cloud plate since the appliances have been available for several months (barring rumors of short supplies on the P100s themselves). We have not heard about many key sales of the appliances, in part because of their expense (we expect).
As we’ve discussed before, many shops doing machine learning at scale are doing so with their own finely-tuned, homegrown clusters outfitted not with the P100, but with lower-end graphics cards (TitanX, for example). However, as deep learning frameworks mature and meet hardware in the middle, we expect the demand for beefier GPUs (especially now that multi-GPU scaling of frameworks is becoming “easier” or more commonplace) to increase for training at least. As for the inference market and where that’s headed, that’s the subject of some upcoming stories on the custom ASIC and general purpose hardware fronts.