Each year, at the ISC and SC supercomputing conference shows every year, a central focus tends to be the release of the Top500 list of the world’s most powerful supercomputers. As we’ve noted in The Next Platform, the 25-year-old list may have some issues with it, but it still captures the imagination, with lineups of ever-more powerful systems that reflect the trend toward heterogeneity and accelerators and illustrate the growing competition between the United States and China for dominance in the HPC field, the continued strength of Japan’s supercomputing industry and the desire of European Union countries to become a larger presence in the market.
What gets less attention is the release of the Green500, the list of the world’s most power-efficient supercomputers. Now in its 11th year, the list and the way to measure efficiency in the systems have evolved as systems and technologies have changed, such as growing to include three different levels for submitting measurement data for systems and including networking in the metrics. About 18 months ago the Green500 merged with the Top500 group, given companies a single place for submitting data for both lists. However, despite the changes, the message remains the same, according to Wu Feng, professor and Turner Fellow of Computer Science at Virginia Tech and one of the creators in 2001 of the Green Destiny supercomputer, a 240-node system that consumed 3.2 kilowatts of power and a driving force behind the creation of the Green500 list.
“The ultimate goal of the Green500 list is to seek to raise awareness and encourage the reporting of the energy efficiency of supercomputers,” Feng said during a session at the recent SC 17 show this month. “The broader goal is to drive energy efficient as a first order of design constraint that is on par with performance.”
At the top of this year’s list is the Shoubu system B supercomputer in Japan, a ZettaScaler 2.2 designed by PEZY and Exascaler and powered by Intel’s 16-core Xeon D-1571 chips and PEZY-SC2 accelerators. It delivers just over 17 gigaflops per watt.
Power efficiency wasn’t always a top concern of supercomputer makers, but as the systems have grown and added more powerful processors and other components, the cost of running and powering them has risen rapidly, pushing organizations to give more thought to efficiency as well as performance. More energy-efficient components and the growing use of GPUs and other accelerators have helped stem the rising costs to some degree. However, as the industry pushes towards exascale computing, the balancing of performance and efficiency is still a tough challenge to solve.
A goal of exascale computing is to develop systems that can run at 50 gigaflops/watt, which means the supercomputers can’t consume more than 20 megawatts. Initially, DARPA predicted the industry would get there in 2015, and later amended that to 2017. Now that level of power efficiency is still several years away, Feng said. In addition, Chinese officials had said that they would get to 20 megawatts by 2020, but later updated the prediction to 30 megawatts.
“Now we’re buying ourselves some runway,” Feng said. “People said, ‘Oh, power’s not a problem. Power’s not a problem. We’re gonna get to exascale at 20 megawatts.’ Yeah, we’re getting there, but we’re getting there quite a long ways after what we were originally targeting. So we’ve bought ourselves some runway as we’re getting to exaflops.”
While the Green500 is still putting the spotlight on efficient supercomputing, it also is helping to drive some of the work that will help get the industry to exascale and shining a light on the path to getting there. Feng noted that a look at the list illustrated the importance of heterogeneous systems, pointing out that CPU-only homogenous supercomputers are reaching a peak of 5 to 6 gigaflops per watt. All of the systems in the top 10 slots of the Green500 use accelerators, such as PEZY chips or Nvidia GPUs.
Louis Capps, solutions architect at Nvidia, said the that the work the company did to get its DGX Saturn V cluster using Volta GPU accelerators, which came in fourth on the list at just over 15 gigaflops per watt – not only forced the company to focus on power efficiency, but also brought up some issues that may not have been caught otherwise. That included discovering that the CPU was consuming more power than expected when idling while the GPUs were doing the work. In addition, Nvidia engineers, understanding that they can see exponential power savings by dropping the clock speed of the chips a certain percentage, are now trying to determine the right balance for the systems.
“Finding the right mix of that won’t be easy, but it’s important for the future,” Capps said during the session. “The work we’re doing here for the Green500 is important to those types of decisions.”
Nvidia is in the process of building out the next-generation Saturn V cluster, which eventually will reach 660 servers. (We are going to profile that separately.) The company submitted a 33-node system with eight Tesla V100 GPUs per node and Nvidia’s NVLink interconnect to the Green500 organizers for consideration. (The system ranked 149 on the Top500 list).
“We are working hard on bringing deep learning and data science to HPC,” Capps said. “That’s a big theme you’re going to see next year and it’s going to change how we look at the efficiency of our computers, because if you use deep learning to look at computation in a completely different way, now you’re efficiently using the system in a whole other manner, and it may even give us a whole other list which we need to do.”
He also echoed Feng’s comments on the importance of power efficiency when looking at exascale computing.
“We’re trying to move forward,” Capps said, adding that the work done to get a system on the Green500 “forces you to understand the importance of power in the systems as we move toward exascale. This exercise to us is very good.”