Site icon The Next Platform

How MCR Memory Can More Than Double HPC And AI Performance

Intel recently demonstrated a new type of DIMM memory technology called Multiplexer Combined Rank (MCR), also referred to as MRDIMMs, that provides up to 2.3X better performance for HPC workloads and up to a 2X better on AI inference workloads in comparison to 5th Gen Intel Xeon processors, based on internal Intel analysis.

Matt Langman, vice president and general manager of Xeon 6 products at Intel recently spoke about the new technology running an HPC workload called Nemo – Nucleus for European Modeling of the Ocean. Nemo is a simulation workload for modeling ocean temperature, sea level change, salinity and other thermodynamic and bio geochemistry metrics. The Xeon 6 CPUs with P-cores, commonly known as “Granite Rapids” and launched two weeks ago by Intel, combined with MCR memory runs up to 2.3X faster compared to the 5th Gen Intel “Sapphire Rapids” Xeon SPs with traditional DDR memory. Based on innovations in the Intel memory controller combined with more cores and memory channels, the high bandwidth MRDIMM technology delivers significant performance gains.

Nate Mather, strategic planner at Intel, explained the value proposition behind this faster memory technology for many customers by noting, “MRDIMMs provide an interesting new choice point for customers by delivering a large bandwidth boost compared to DDR5 RDIMMs. Large performance improvements of 30 percent to 40 percent that just works in existing platforms gives customers the flexibility of choice for their AI and HPC workloads.”

Performance Projections And Verification

Intel performance projections as of May indicate that twelve channels of MRDIMM memory in combination with architectural enhancements mean the newest Intel Xeon processors (codename Granite Rapids) will provide an overall performance increase for HPC and AI workloads.

This has been proven in practice.

At the International Supercomputing Conference (ISC) 2024 in Hamburg Germany, Intel made several announcements, including reporting early performance results that show that MRDIMM provisioned systems deliver up to 2.3X performance improvement for real-world HPC applications like NEMO when compared to the previous generation systems. This demonstrates that the new technology provides a strong foundation as the preferred host CPU choice for HPC solutions.

Micron Technology confirms that the average bandwidth increase for their recently announced MRDIMM modules in 64 GB, 96 GB and 128 GB capacities compared to RDIMM meets or exceeds 1.3X.

Platform Compatible – No Software Changes Required

The addition of a data buffer (as shown below) between two ranks of DDR5 memory means that MR memory technology can be packaged and accessed so that it is fully platform compatible with DDR5 RDIMMs along with a host of new processor features:

Two modes of operation. Note the DIMM is form factor compatible with current DDR5 memory and can provide the same RAS features.

DDR5 Form Factor Compatibility

Bhanu Jaiswal, Xeon product manager in the Intel Data Center and AI group, observed: “MRDIMMs are form factor compatible with today’s DDR5 RDIMMs. You don’t need to redesign your system board or sacrifice any DDR5 reliability, availability, and serviceability (RAS) features. No changes to the software are required. Succinctly, more bandwidth in the same system design.” DDR5 RAS features are absolutely necessary in moderns servers, of course.

Jaiswal expressed the HPC benefits MRDIMM-enabled Intel Xeon new memory technology will benefit most memory bandwidth bound workloads. For HPC workloads, this is reflected in many technology articles.

Higher memory bandwidth means that the processor can keep more cores active to deliver more useful work (e.g., better performance). This boost in bandwidth is critical for feeding the fast growing core counts of modern CPUs and ensuring the cores can be efficiently utilized. This increase in memory bandwidth also benefits other performance features that Intel has been developing to support a variety of AI, HPC, and datacenter workloads.

Intel’s forthcoming AVX10 Converged Vector ISA contains improvements target Deep Learning and HPC workloads that benefit from vector processing such as scientific simulations and data analysis. A large register size means that CPU cores to perform the same operation on multiple pieces of data in a single clock cycle, rather than having to perform multiple cycles on smaller pieces of data. MRDIMMS will prove useful in providing the data to keep this vector ISA busy. AVX10 Version 1 will be introduced with Granite Rapids Xeon 6 CPUs only and the full version will be supported on subsequent generations.

So when you look at servers, remember that memory bandwidth is a first order performance limiter on many workloads including those in the AI and HPC space, but memory bandwidth alone is not sufficient for many customer workloads. This is the reason for the on-chip enhanced modular mesh and optimized cores inside the Xeon 6 processors, which are projected to deliver a 2X to 3X performance increase compared to previous generation Xeon processors. Even better, increased memory bandwidth helps unleash the performance needed to run these crucial workloads at ever-higher CPU core counts.

Rob Farber is a global technology consultant and author with an extensive background in HPC and machine learning technology.

Exit mobile version