We recently considered how computing was often being called distributed even when it shouldn’t be, or being called something else when it really isn’t. Here’s a new one for you. Decentralized computing. In this new own nothing, side gigging, super hustle economy – we increasingly own no vehicles and yet share rides, don’t go to hotels and yet sharing homes – comes the concept of owning no computing and yet sharing computing.
Wait. Haven’t we all seen this film multiple times before? We certainly remember a number of the actors, and at least one of them was probably on “The Bill.” Memories of grids, e-science, the search for extraterrestrial intelligence, randomly folding proteins for drug discovery – the list of distributed and decentralized computing paradigms goes on and on. We have all been there, and many of us have closets bursting with the aging printed tee-shirts.
So why this new decentralized hype? In a word: Blockchain. A Google search for blockchain returns 30.3 million results with IBM, Accenture, and MIT popping up at the fully promoted advertising end of the highly targeted and funded search spectrum. Blockchain is clearly a big deal. Distributed ledgers and eliminating double spend helped to focus a global fascination with decentralized currency, with the ability to computationally mine new currency and big ticket initial coin offerings (ICOs). From this technology, a shady financial underworld grew that is shrouded in mystery with an ever-increasing number of cloaked charlatans peddling a dystopia that many do not fully understand; fraud is rife, fraud is common, money is made, but money is also lost.
Let us be clear that blockchain is not evil. The technology itself isn’t corrupt, no technology can be, and the blockchain technology is neat. It is an incredibly cool idea to build provenance and security into a chain and be able to play events from it back and forth. We do however have to decouple the financial and cryptocurrency chains from any legitimate conversation about the actual technology. This is why the heavy hitters such as IBM and so forth are deep into blockchain in a big way. They aren’t peddling ICOs, or strange fakecoins, they are however pushing data provenance, credibility, and accountability in our digital transactions.
As a superb example, educational leadership are now looking to blockchain to support educational provenance. It’s a seriously big deal to be able to follow a student through a system and check their digital credentials as they complete assignments and work, especially when more courses are now online. Blocks of encrypted data with secure hashes for traceability are a really good idea.
Enter Decentralized Computing
The Next Platform spoke recently with decentralized computing proponent, Chris Matthieu, who is also chief executive officer of Computes.com. Computes says that it is providing “infinite computing” via a secure decentralized peer-to-peer mesh compute platform. This certainly has all of the right words in the description, but how does Computes achieve this and what does Matthieu actually mean here? After a long conversation, it turns out it is actually both rather subtle and also from a distance rather clever.
With blockchain for financial applications, you write data to the block, where you then cryptographically hash and sign the block into the ledger, which is a kind of database. Every computer on the chain has a copy of the entire ledger, which can be a problem. Also stored along with this is what is called a “proof of work,” which is a mechanism that exists in Bitcoin to make the blockchain immutable. A valid proof of work shows that a miner is proving that they did a certain amount of work to produce a block.
Recent attacks have unfortunately also proven there are always weak links in any given chain. The ledger can become quite large and cumbersome, which if you are trying to build a decentralized computing infrastructure for workload scheduling and want to target thin edge devices, you would essentially rule out a number of IoT devices because they are too small to take part in it. It is also super slow to cart around a complete ledger. Matthieu realized that he needed a distributed ledger but also didn’t need the burden of the proof of work component to be able to schedule compute and track attributes of the load scheduling. Think of Computes as a decentralized master batch scheduler and you are mostly on the right track.
Now Enter IPFS
IPFS began as an effort by Juan Benet to build a system that is very fast at moving around versioned scientific data. This versioning gives the ability to track how states of software change over time (think Git). IPFS is a distributed file system that seeks to connect all computing devices with the same system of files. This preexisting distributed system saved the Computes team years of effort – the framework was essentially already written and supported by considerable resources from Protocol Labs, a well respected technology group involved in not only IPFS, but also Filecoin, libp2p, ipld and a number of decentralized and distributed networking technologies that are underpinning what some are calling “new web.”
So back to IPFS, here’s where it gets really interesting. The totality of all IPFS objects forms a cryptographically authenticated data structure known as a Merkle DAG. Merkle to signify that it is a cryptographically authenticated data structure that uses cryptographic hashes to address content and DAG being the Directed Acyclic Graph, which fits perfectly with a distributed scheduler for compute. DAGs have also been used most effectively by distributed computing systems such as HTCondor for decades, Dagman inside of HTCondor has been busy under the covers scheduling billions upon billions of workloads to the Open Science Grid for years. This is essentially a classic case of not changing what isn’t broken.
So from IPFS, there is a DAG, an immutable record, a parallel and distributed ledger for job state. Basically all you need to do is to hook it all up together with a few extra pieces of software to schedule jobs and you are off to the races. This is exactly what Computes did, as the company explains in considerable detail in frequent postings. The Computes whitepaper about its mesh decentralized scheduler and the Lattice whitepaper for its distributed data store describe the underlying technology in more depth along with the now ubiquitous GitHub repository. Computes scales to thousands of transactions per second at the moment with a design goal of 1 million TPS – that’s a lot of simultaneously scheduled workloads. There are currently 600 registered developers for Computes, but we wanted to understand more about the people and scientists that are actually using the software today.
Real World Application
We had a long and insightful conversation with Carrie Rountrey, who is a researcher in speech pathology in the department of communication sciences and disorders at the University of Wisconsin at Madison. To summarize our conversation, Rountrey’s research focuses on Parkinson’s and from a computational perspective, they are working to find more efficient and effective measurement tools to understand recordings of patients, research subjects, and healthy controls. They then apply automatic feature extraction and automatic speech recognition systems to examine the audio files for specific traits. In our conversation with Rountrey it was noted that Parkinson’s patients can exhibit very subtle changes in vocalization and this could in theory eventually lead to more accurate and early diagnosis and better support for patients downstream.
Rountrey purposefully pointed out that this is still extremely early research, but we noted that the approach does indeed eventually map to decentralized computing and data collection paradigms. The most important research reason is because the audio recordings are best made when patients are at home, going through their regular day to day activities. A patient chopping vegetables at home while also having a conversation with their spouse is in a very different environment requiring motor skills to avoid the blade and this can also produce more subtle vocal utterances than would necessarily be found in the more sterile setting of the audio recording clinic. These small variations in speech pattern is what Rountrey analyzes and spends time looking for. Clinical recordings are subject to environmental issues that are not like those found in an at home setting. We made the analogy to Rountrey of this environmental recording issue is similar to that of “white coat hypertension” akin to where patients have high blood pressure readings in the clinic versus the proven and more accurate readings achieved by in-situ ambulatory networked blood pressure monitoring devices. Voice recording is clearly a totally different field with significantly larger data rates and computing but it is this remote and distributed nature and application of less invasive technology that is important here.
Rountrey eventually aims to build distributed data collection and computing, but is currently starting out super small. The current Computes environment lives inside the lab, the data are strictly controlled and under IRB regulations, and today, nothing can leave the laboratory. Rountrey’s software stack however is built upon many years of research by a number of different scientists. Many interleaved and complex pieces of software. Having a rigorous scientific method to contain the software in computer images and being able to further containerize compute and data is especially important to the research. This research also unfortunately isn’t funded to the level where vast amounts of computing infrastructure are available to the project, so being able to securely burst workloads to unowned computing is going to be important.
It is an interesting use case of a researcher using computing who is a specialist in speech pathology but who’s day job is not computing. There isn’t anything really specific about Computes as a use case here today that could not be achieved by any other distributed computing effort but through close collaboration with Matthieu’s team they have a working proof of concept for the platform it runs in the lab, and they can see what the future could look like from where they stand.
Is the future decentralized? Unclear. However, watching Matthieu’s steampunk inspired video clips and his enthusiasm for the field having successfully exited his last IoT communications company Octoblu in a sale to Citrix Systems, he also has seen our distributed computing movie before. The plans to monetize the platform are currently unclear with a number of different options on the table, even one that includes using the ledger to power bitcoin like currency transactions for direct billing of compute and storage. At that point we came full circle. Time will tell if this new decentralized becomes the old distributed or ends up being exactly the same.