There are a few unignorable trends in high performance computing, especially in the exascale age. First, heterogenous and differing architectures at the major HPC sites are diverging with some using AMD CPUs and GPUs, others Nvidia, and of course, still others sticking with the well-tread Intel path.
Second, codes are shifting to be able to scale on these new machines and one trend is the broadening use of C++. While there are plenty of Fortran codes that will undergo routine maintenance for years to come, it’s safe to say that software tools and compilers that emphasize portability and readiness for the C++ shift are future-proofing for HPC.
The third trend is a bit more nebulous and well, cultural. The hardware vendors have put enormous investments into their own software stacks for their devices and at the same time, the HPC community as a whole has traditionally been religious about its compiler ecosystems. In other words, it’s going to be hard to talk big centers out of their years of investment and learning in deeply entrenched, directives-based approaches like OpenMP.
Today, however, might mark a shift as SYCL finally found its way into NERSC at Lawrence Berkeley National Lab (LBNL) for its Nvidia GPU-dense Perlmutter supercomputer. More important than just this system, it means developers there will be able to write their HPC software using the SYCL standard and that will enable the same software on the forthcoming all-Intel Aurora supercomputer at Argonne National Lab and potentially, supercomputers with other architectures, including AMD or even RISC-V.
NERSC has entered into a contract with UK-based Codeplay Software to support the LLVM SYCL GPU compiler tools for their their wealth of Nvidia A100 GPUs. The company, which has its roots in the early days of GPUs and gaming, is known for its implementation of SYCL and is a major contributor to the open source force for Nvidia V100 GPUs via the DPC++ project in addition to other work, spanning from AI chip startups to enterprise.
SYCL (pronounced “sickle”) is an open standard that is maintained under The Khronos Group. Initially released in 2014, it is a royalty-free, cross-platform abstraction layer that enables code for heterogeneous processors to be written using standard ISO C++ with the host and kernel code for an application contained in the same source file. SYCL has been closely aligned to OpenCL, but over time has evolved into its own completely distinct programming model.
“With thousands of users and a wide range of applications using NERSC’s resources, we must support a wide range of programming models. In addition to directive-based approaches, we see modern C++ language-based approaches to accelerator programming, such as SYCL, as an important component of our programming environment offering for users of Perlmutter,” said Brandon Cook, application performance specialist at NERSC. “Further, this work supports the productivity of scientific application developers and users through performance portability of applications between Aurora and Perlmutter.”
Andrew Richards, CEO and founder of Codeplay is well aware that HPC software and compilers have near-religious tenacity but says that the growth of C++ in HPC will allow expanded thinking.
“The main thing that everyone’s been trying to figure out is how to write code once and run in on three different supercomputers and how to develop software for the pre-exascale systems that can run on exascale machines. People are struggling to get some approaches like OpenMP to scale well with GPUs, it takes a lot of directives in your code to get decent performance on GPUs and it’s not the easiest way to get performance portability,” Richards tells us. “The way we’re doing it now is with C++ frameworks and SYCL is a much better fit. These frameworks allow domain experts to build their applications on top of the framework, which then handles performance portability between the different GPUs.”
“We are seeing a lot of code being ported across to C++ in HPC, especially those being ported from Fortran. Some labs are already C++ focused like CERN, for instance,” Richards says. While SYCL and Fortran don’t mix, Codeplay is banking on this C++ revolution in HPC for broader adoption. “SYCL will enable far more access to the hardware and is more designed around this new approach,” he adds.
Richards says the biggest competition isn’t OpenMP or any other directives-based approach it’s a future of closed hardware where the hardware companies write all the software, leaving a programming model that only works on that hardware, thus necessitating a rewrite. “Companies have put a lot of investment into their own software infrastructure and they don’t want to put their hands up and say SYCL is a good idea because they’ve made so much progress down one path.”
The Fortran base is strong in HPC, especially at the major labs. This might make SYCL’s adoption in broader HPC a bit slower as those legacy codes aren’t likely to change for practical reasons, particularly NNSA and other decades-old applications. Still, as AI, expanded data science, and new areas in scientific computing emerge SYCL might be the right approach at the perfect time, especially as future systems will continue to be more mixed architecturally.