Mike Henry was interim chief product officer at AI inference company Groq in 2023, a position that put him in close contact with a lot of datacenter administrators and managers. During those six months, he noticed shifts in the ever-evolving landscape that has been the domain of the dominant cloud service providers, Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform.
While those hyperscalers continue to take up a lot of the space in AI, Henry saw the growing number of GPU cloud providers coming onto the scene, standing up datacenters packed with thousands of the Nvidia chips that are driving the compute needed for inferencing and other AI workloads.
“I had this realization that a majority of AI infrastructure was now being built outside of the big three legacy cloud providers,” Henry tells The Next Platform. “Having lived in a world where the hyperscalers always won, I saw this big sea change and this big opportunity.”
Heny and Tim Harris, co-founder and chief executive officer of autonomous vehicle company Swift Navigation, in late 2023 used the opportunity to found Parasail, which emerged from stealth this week with $10 million in seed money and a network designed to connect enterprises with inference workloads with available GPU compute power. The company acts somewhat like a power company, connecting those that need the power with those who have it.
“The customers want to deploy AI models at scale, and they want to keep it very simple,” says Henry, Parasail’s chief executive officer. “They’re barely keeping up with new open source model releases, let alone thinking about which of the hundred GPU cloud providers they use. ‘What hardware do I run it on? How do I set it up?’ Things like that. They want simplicity and scale. … We’re kind of evoking the idea of original content delivery networks, saying, ‘I have this thing I want to run globally and I don’t want to think about it.’ You want to put it in a box and let it run.”
Backgrounds In AI And Automation
Both Henry and Harris, who is on Parasail’s board, have experience starting and building companies. Henry in 2012 founded Mythic, an AI platform company that raised $165 million and builds AI acceleration hardware, including its Analog Matric Processors and Key Cards for power-efficient inferencing. He jumped to Groq in 2023, staying there for six months before co-founding Parasail. Harris 2013 co-founded Swift, which creates precise navigation software for autonomous vehicles.
Parasail is leveraging the growth of AI inference providers, cloud companies that offer on-demand access to GPUs and include high-profile companies like CoreWeave – which went public last month – and Lambda Labs. Parasail has developed partnerships with such inference suppliers, creating an aggregated, contract-free GPU capacity that Harris boasts exceeds that of Oracle Cloud Infrastructure. The San Francisco-based company uses its AI Deployment Network to link enterprises with GPU providers and an orchestration engine to ensure workloads get access to the compute power they need.
Lower costs are a key benefit, with Parasail claiming companies moving from OpenAI or Anthropic can see 15X to 30X cost savings, with the cost advantage over other open source providers being two to five times. Setting up with GPUs is done in a matter of hours, enabling inferencing within minutes. Right now, Parasail offers access to Nvidia H200, H100, and A100 GPUs and RTX4090 graphics cards, with prices ranging from 65 cents to $3.25 an hour.
Building The Deployment Network
Pulling together a deployment network to do this was no easy feat, Henry says. Every GPU cloud is built differently, including in the way they handle compute, storage, and networking, and provisioning, billing, and setup can be automatic, semi-automatic, or manual. Kubernetes and containers can solve many of these challenges, but some GPU clouds have Kubernetes and others don’t, and those that do are different in setup and quality.
The key challenge was that Kubernetes inherently does not span clusters, regions, datacenters, or providers.
“We essentially had to solve this problem to enable our strategy,” Henry explains. “We can build Kubernetes clusters that spans multiple providers and essentially can cover the globe. The control plane can run somewhere rock-solid with high reliability, and then all of the GPU clouds across the world become worker nodes. This was very challenging as it was taking a massive and complex open source ecosystem of software and have it do something it was not designed to do.”
By doing this, Parasail got orchestration and containerization at a global scale beyond what’s been done before.
“Imagine an entire provider going down,” says Henry. “To a well-functioning Kubernetes control plane, this is just some worker nodes going down that need to be replaced. This also bypasses all of the challenges of different setups. It doesn’t matter what is running in the GPU vendor; we mostly bypass it. If we couple this with automatic purchasing, you can imagine a very resilient, scalable, and highly efficient – just-in-time – GPU fleet.”
Orchestration Is A Challenge
Matching and optimizing workloads was what Harris calls a “permutation problem.” With AI, there are thousands of models, hundreds of transformer architectures, and dozens of GPUs, Henry says. Throw in new AI ASICs, dozens of ways of combining GPUs to handle larger loads, three popular inference stacks, and myriad settings and performance optimizations, and the challenge grew.
“Our answer to this was a combination of models, AI, and humans in the loop,” says Henry. “Modeling will never be perfect or even good at this, both because it’s a massive dimensional fuzzy problem and because things are rapidly evolving. AI can step in and help quite a bit by catching anomalies, monitoring logs, and rapidly build out exception handling. This will always require humans in the loop though, there’s just too many exceptions. With this, we are able to hit a large scale with a relatively lean engineering team.”
The company began a closed stealth beta in January and demand has accelerated so that Parasail as passed seven-figures of annual recurring revenue (ARR). Its technology is now generally available and customers include AI chip maker SambaNova, AI production platform maker Oumi, conversational AI firm Rasa, and Elicit, whose AI-based assistant automates research tasks.
Going forward, the startup, plans to rapidly add to its roster of twelve employees, particularly in engineering positions. Henry and Harris also are keeping their options open when it comes to GPUs to offer access to. Right now, Nvidia GPUs are king, but the market will move, Harris believes. The insight they have from building Parasail also highlights a “weird paradox” in the market.
“There’s seemingly scarcity around the hardware,” Harris explains. “People can’t get enough GPUs, but they all have a ton of free capacity. A datacenter has tons of free capacity around GPUs and they can’t buy enough. How could that possibly be true? It really comes down to the fact that it’s not well optimized and utilized, and it’s not well connected to the customer so they can just deploy their models.”
He adds: “The demand is there. There’s almost infinite demand right now for next-generation AI applications to replace Internet 1.0 and 2.0 applications. But it’s all about how you get those applications running at scale with high utilization. That’s what we do. Our inference platform makes it super easy for customers to deploy AI at scale and then the network piece is how we optimize and get every last bit of performance out that.”