A combination of Lenovo HPC and system optimization could help genomics detect and defuse healthcare timebombs
Sponsored Feature The concept of precision medicine is revolutionizing healthcare by ‘personalizing’ treatment using genomics to provide insights into a patient’s genetic make-up gleaned from study of their own DNA.
This subfield of bioinformatics can also help in the development of mass treatments at ‘population scale’ by identifying which vaccine prototype is most likely to prove efficacious for the largest number of people in a given subgroup.
Long before COVID-19 struck, genomics – the study of our genes, their functions, and their influence on the workings of the body – was already established as the driving impetus for precision medicine. Large national population-based genomics sequencing initiatives have rolled out around the world, perhaps most notably in China, Denmark, Estonia, Japan, Qatar, the UK and US.
Genomics’ pivotal role in the development of population-scale COVID vaccines gave the field’s profile a major boost, in terms both of R&D investment levels, and in public awareness of how analysis of their DNA helped save lives.
“The silver-lining of the COVID pandemic was that it brought the principles of genomics to the wider non-specialist world,” says Dana Alegre, Solutions Architect for HPC Life Sciences at Lenovo. “Mainstream media has taken an interest in biosciences and created an appetite for information about breakthroughs in bioinformatics. It’s given a new sense of urgency to the field.”
The Pandemic Drove Infrastructure Investment
As governments around the world redirected resources to combat the COVID-19 pandemic, it was quickly discovered that biosciences played an integral role in these efforts. Fortune Business Insights forecasts that investments in biosciences will grow by a CAGR of 19.4 percent between 2021 and 2028. The ability to track the spread of a virus and the rise of new variants of further COVID-sized public healthcare crises – as bodies such as the IPBES have suggested – would boost that value further as governments invest in near-real-time tracking.
The efficacy of COVID-19 vaccines depended greatly on researchers’ ability to assemble, process, and analyze many genomes faster and at an extremely large scale – especially as the number of COVID-19 variant strains grew.
These urgent and extraordinary workloads presented critical challenges. DNA sequencing technologies required to understand a specific genome profile need super-powerful HPC resources to return results fast enough to meet ‘time to answer’ deadlines.
Lenovo found that it typically was taking HPC data centers 150-160 hours to process a single whole genome, after it had undergone an initial phase called library preparation and been processed by a sequencer.
This long-standing processing overhead has acted as a bottleneck on genomics initiatives, slowing research programs and other initiatives. Because of it, researchers have tended to limit their focus to organisms with small genomes or single genomes.
“Clearly, for precision medicine and other bioinformatics-based applications to deliver their full potential, the time required to process genomes must be substantially reduced,” says Alegre, “but our bioinformatics R&D team saw that achieving this is not simply a matter of allocating additional hardware resources to the task.”
Lenovo’s team realized that while more compute power can help, other options could come into play when it came to optimizing genome processing timescales, such as the way that compute power is assigned and managed. Lenovo began by examining the genomics HPC “journey.”
Matching Genome Processing To Resources
Generally speaking, HPC data centers involved in genomics sequencing adopt different combinations of workloads and analyses workflows. They also have varying active and archiving storage needs, a different mix of research types to support.
Therefore, they each require a customized architecture tailored to their specific operational parameters. Recognizing this proved an important first step toward the evolution of Lenovo’s reach into genomics optimization.
Lenovo began by performing permutation tests of the hardware, software, and system factors affecting the performance of genomics workflows. The results of this analysis went on to inform the development of Lenovo’s ground-breaking GOAST (Genomics Optimization And Scalability Tool) solution.
With this solution it was able to implement an optimized HPC architecture that can process one entire genome in under 50 minutes, and one exome in two minutes, using standard hardware, based on industry-standard Intel CPUs.
GOAST also features a genomics sizing tool that calculates the projected HPC usage for an expected workload, explains Alegre: “This enables compute resources to be planned and allocated according to the specific requirements of a given workload”. The sizing tool can also be used to size the current production capabilities of an existing HPC resource.
Saving Money For Cash-Strapped Genome Projects
As well as not needing premium specialized hardware, GOAST also makes genomics sequencing more affordable, more scalable, and lowers operational costs by leveraging open-source software that comes validated and widely-accepted in the sciences – in GOAST’s case the GATK (Genomic Analysis ToolKit) platform from research organization Broad Institute.
“Open source makes our multipurpose genomics solution available to a wider range of the research community,” explains Alegre. “It’s typically not as expensive as ‘closed’ genome software equivalents. Genome researchers can have sight of the code and know what they are working with.”
She adds: “There are strong arguments in favor of open-source approach over the longer term. The software gets updates regularly. Applications may be tweaked to bring incremental optimization. It all contributes to improving the end-to-end speed of delivery.”
Two GOAST Versions
Lenovo’s GOAST is a multi-purpose platform that has been made available in two versions of the GOAST reference architecture, called GOAST Base and GOAST Plus.
GOAST Base, the standard version of the platform, has enabled researchers to reduce a single genome processing time down from 60-150+ hours to 2 hours – that’s 75 times faster than the industry average, according to Lenovo. That result is accomplished on an industry-standard two-socket rack server – running Intel 8380CPUs – without the use of extra acceleration (read, GPUs) or premium-priced specialist hardware.
GOAST Plus is a more powerful configuration built on a higher powered eight processor – Intel 8280 CPUs – system. This can bring down genomic processing time to under 50 minutes. That’s equal to or better than boutique solutions powered by expensive accelerators and proprietary software, at a much lower cost, Lenovo maintains.
Cost is an integral factor in what is and isn’t possible in genomics sequencing, explains Alegre. “Often what is achievable within genomics comes down to what is affordable, so making genomic processing more economical is another key consideration,” she says. “There is perhaps an assumption that, because genomics is at the cutting-edge of biosciences, projects in that space will have adequate budgets to allow researchers to finance their projects fully. But in fact, much of the genomics work on important research areas is conducted with quite limited budgets.”
Clinical, Agriculture: Multipurpose Platform
The need for vaccine studies will, of course, not decrease as the COVID-19 threat subsides. Lessons learned during the 2020-2021 pandemic are being applied to other vaccine research projects that involve genome sequencing.
It should also be remembered that the benefits of genomic analysis can be applied with equal benefit to other types of organisms’ DNA, such as agricultural crops. At the University of Delhi’s Department of Genetics, for example, Lenovo’s GOAST is helping the Centre for Genetic Manipulation of Crop Plants to accelerate vital research on oilseed brassicas.
The center’s researchers can now process more genomes concurrently and get results faster. Previously, it took 48 hours to process a ~110X Brassica juncea (mustard) genome. Now it takes six hours, due to GOAST’s 8-times boost in performance for ultra-deep genomes, the researchers say.
It may have some distance to go still to be fully worthy of the name, but the promise of population genomics has already demonstrated its value due to the acceleration wrought on all biosciences by the global pandemic.
Multidisciplinary, Collaborative Approach
These days, computing analysis must be inclusive, accessible and easy to learn to use. Bioscientists may work alongside computer scientists and data scientists across a range of projects. For this reason, Lenovo’s GOAST team is accustomed to collaborating with the data center and HPC engineers, academic researchers, and software developers in the genomics discipline.
“We know the importance of understanding the kind of throughput the customer needs – but we also gain an understanding of the nature of work they are engaged in, which is a big benefit,” explains Alegre. “That makes it easier to recommend a certain configuration for them – so issues of scalability are important in this area, in terms of working out compute power and memory size needed.”
Alegre adds: “Working with genomics specialists is a two-way street because technologists are not experts in biological sciences, so they have to gain a better understanding of the purpose of the software and what it is doing. Bioinformatics is a multidisciplinary field, of course, but GOAST is coming at it with a different focus, and everyone involved has something they can learn.”
Population genomics – whether geared for predictive healthcare for individuals or rapid vaccine development for millions – requires scaling up in input data from exomes (the portions of a genome that code information for protein synthesis) to whole genomes, scaling-up production levels (from a few to tens of thousands of samples), and having to do so under very short timeframes. With the help of HPC optimization solutions like Lenovo’s GOAST, the delivery of processes to shorten those timeframes is itself shortened.
Through its commitment to using technological innovation to help solve humanity’s greatest challenges, Lenovo is enabling the worldwide movement to sequence entire populations, bringing us closer to making precision medicine routine practice.
Sponsored by Lenovo.