Greg Kurtzer, one of the co-founders of the CentOS Linux distribution, the creator of the Singularity container environment for HPC workloads, the founder of the new Rocky Linux distribution that seeks to replace the now defunct CentOS, and an HPC guru in his own right, is on a mission. And it is a mission that we admire and see all the time: To build an integrated application platform that radically removes the complexity of modern distributed computing without dumbing it down.
This is a very hard thing to do, and Kurtzer knows this full well, even though he comes at it from a slightly oblique angle.
Kurtzer is a funny guy, even if the work he has done is deadly serious. By his own admission, he wandered a bit in college, studying mechanical engineering, music, and pre-med before getting a bachelor’s of science in biochemistry in 1997 from what he refers to as the “Northern California Chicken Grooming School,” adding that “you might think I am indecisive, but I am not so sure.” (That is presumably the University of California at Berkeley, one of the major hotbeds of computer hardware and software architecture in the world.) Kurtzer did work at a biotech firm doing genomics work after graduation, and then landed a job as a software engineer at Linuxcare, one of a zillion Linux distros, just before the dot-com boom went bust in late 2000.
Then Kurtzer changed gears – or so it might seem – to become an HPC systems architect and technical lead at Lawrence Berkeley National Laboratory, one of the major HPC centers run by the US Department of Energy, which was a joint appointment with the university, and during that time he integrated around 40 different clusters for HPC work while at the same time writing the Warewulf cluster management tool, and in his spare time he worked with Rocky McGough and Lance Davis to create the CentOS Linux distribution, which is downstream from Red Hat Enterprise Linux, and also built the Singularity HPC-flavored Kubernetes container system and tried to commercialize it at a company called RStor and then another one called Sylabs after leaving the Berkeley Lab in early 2017. Since this time last year, Kurtzer was up to something called the HPCng project, and we now see that this mission is being fulfilled by a new company called Ctrl IQ, where Kurtzer is founder and chief executive officer.
“When I first got involved in this and even now, we always built HPC systems in the same way,” Kurtzer tells The Next Platform. “Each one is a little different of course, but the architecture is basically the same, and it is derived from Beowulf clusters back in the day. Since I left the Department of Energy and started working with enterprises, I realized that there are two parties going on. We have the HPC party and we have the enterprise IT party, and people don’t usually mingle. They might go out to lunch once in a while, but they have completely separate infrastructures, teams, and skill sets. But there is a trend. At the HPC party, people want more modern environments to run their workloads, and they want more enterprise-class capabilities. Containers are a good example of this, but it is only the first step. HPC centers want more modernization, more integration with DevOps, more orchestration, more data orchestration, and the ability to deploy across hybrid environments. Enterprises are getting more and more interested in data analytics, machine learning, and other types of HPC, but when I explain how we do this in HPC, they look at me like I am living in the past and explain that is not how they build systems today. So what Ctrl IQ is doing is bringing these parties together and building a complete stack.”
And when Kurtzer says complete, he means it, even if that means global replacing CentOS, which no longer really exists, with the shiny new Rocky Linux, which he has made sure will exist. Take a look at the Ctrl IQ stack:
The Warewulf cluster manager and server provisioning system that Kurtzer created at the Berkeley Lab starting back in 2001 has been extended to be able to provision stateless containers to bare metal servers – and do so at scale, obviously. The idea behind Warewulf was to be able to provision and manage thousands of nodes with a small number of administrators. The nodes in the distributed system will, of course, be configured with Rocky Linux, and the container environment is Singularity, which is a container format that is compatible with Kubernetes with some special sauce for HPC shops, including encrypted and digitally signed container images.
The new stuff in the Ctrl IQ stack includes Fuzzball, which Kurtzer says is an orchestration tool for performance-critical workflows. And that “performance-critical” qualifier against “workflows” is what makes it different from existing job schedulers and orchestration systems work, according to Kurtzer – and that includes the raw Kubernetes container controller, which is very good at podding up microservices. Fuzzball moves data to compute and compute to data, whichever is the best choice, and it can do it on premises and across multiple public clouds as the situation demands. It creates a giant, virtual compute and data fabric, says Kurtzer, so it can look at dependencies and costs and availability of resources to decide when and how to run these workflows.
Ctrl IQube is a spin of the open source Kubernetes container controller, and it was created by Ctrl IQ because Fuzzball itself requires Kubernetes and a lot of the HPC centers that Kurtzer has worked for and worked with said they did not want to mess with raw Kubernetes because it is such a heavy lift. (And HPC centers almost certainly will not pay for a commercial-grade OpenShift Kubernetes environment from IBM’s Red Hat division any more than they would pay for Red Hat Enterprise Linux instead of using the freebie CentOS distribution.) “A couple of our engineers figured out how to simplify and secure the existing Kubernetes resources such that anybody can installed and maintain it.
Ctrl Cloud is a hosted version of the entire Ctrl IQ stack, and as you can see at the bottom of the chart above, there are specialized Ctrl IQ stacks – traditional MPI-based HPC workloads and what Kurtzer describes as more modern HPC workloads as well as AI and machine learning, plus stacks aimed at chip design, big pharma, and so on.
Here is how the Ctrl IQ software stacks up and interconnects:
You will notice something right away. This is not just a Kubernetes platform, and the containers do not and will not always be maintained by containers. In some cases, as was the case with the Singularity used in HPC centers like the Berkeley Lab, a containerized app that has a complex workflow and that is performance sensitive cannot be impeded by a heavy-weight Kubernetes implementation. These performance sensitive applications will run directly on the Fuzzball container service. In a sense, the Kubernetes container podding service is just another application running on top of the Fuzzball container service. This is exactly how both Mesos and OpenStack treated Kubernetes, although you could have gone the other way and containerized Mesos or OpenStack and run them atop Kubernetes, as we discussed a long time ago when it still mattered. The message here is that this distinction it still matters, and Kurtzer understands why and is doing something about it by mashing up Warewulf, Singularity, and Kubernetes in a slightly different and definitely more interesting way.
The Kubernetes services tend to be more perpetual and massively oversubscribed because most of the time they are idle, says Kurtzer, and the Fuzzball containers tend to be more ephemeral and running full tilt boogie. But that is a distinction that will not necessarily hold true in all cases, we believe. Things will get precisely as ephemeral as the situation allows, we think. The hassle of configuring HPC systems is what makes them static, even if their workloads change like a giant game of Tetris using job schedulers to cram apps into their nodes. This is really something we might call thin containering.
One of the key components of the Ctrl IQ stack is the zero trust service mesh that links the Fuzzball and Kubernetes containers to each other and allows for the kind of security that both enterprises and HPC centers need these days. This service mesh and the belts, suspenders, ropes, shotguns, rifles, and slingshots that are deployed to secure it six ways from Sunday are what will give organizations the confidence to intermingle on-premises clusters with those running out on the public clouds in a hybrid fashion.
While Kurtzer is one of the most famous people in the open source community, and is pretty certain about how to go about building the next platform, he is not as yet certain about the right way to price the Ctrl IQ stack and how much should be open source and if some of it should be closed source. And Kurtzer, true to form, is perfectly open about that and is mulling the options. What we know is that Ctrl IQ has $4 million in Series A funding from IAG Capital Partners and OpenDrives, and they expect a return on their investment. And it is hard to do that with an open core model – but not impossible. There’s time yet to figure that all out, with Ctrl Cloud not being ready until early Q2 2021 and the rest of the stack being ready sometime around the middle of the year. Whatever Kurtzer does, we are pretty sure it will cost a lot less than Red Hat OpenShift. Almost by default, if not by design.