Purpose-Built to Train the World’s Largest Models

Breakthrough CPU for the Largest AI and HPC Workloads

AI models are exploding in complexity and size as they improve conversational AI with hundreds of billions of parameters, enhance deep recommender systems with embedding tables of tens of terabytes of data, and enable new scientific discoveries. These massive models are pushing the limits of today's systems. Continuing to scale them for accuracy and usefulness requires fast access to a large pool of memory and a tight coupling of the CPU and GPU.

Watch NVIDIA founder and CEO Jensen Huang deliver the must-see GTC keynote where he unveils the NVIDIA Grace CPU.


Designed to Solve Complex Problems

The NVIDIA Grace CPU leverages the flexibility of the Arm® architecture to create a CPU and server architecture designed from the ground up for accelerated computing. This innovative design will deliver up to 30X higher aggregate bandwidth compared to today's fastest servers and up to 10X higher performance for applications running terabytes of data. NVIDIA Grace is designed to enable scientists and researchers to train the world’s largest models to solve the most complex problems.

The Latest Technical Innovations

Fourth-Generation NVIDIA NVLink

Solving the largest AI and HPC problems requires both high-capacity and high-bandwidth memory (HBM). The fourth-generation NVIDIA® NVLink® delivers 900 gigabytes per second (GB/s) of bidirectional bandwidth between the NVIDIA Grace CPU and NVIDIA GPUs.The connection provides a unified, cache-coherent memory address space that combines system and HBM GPU memory for simplified programmability. This coherent, high-bandwidth connection between CPU and GPUs is key to accelerating tomorrow’s most complex AI and HPC problems.

New High-Bandwidth Memory Subsystem Using LPDDR5x with ECC

Memory bandwidth is a critical factor in server performance, and standard double data rate (DDR) memory consumes a significant portion of overall socket power. The NVIDIA Grace CPU is the first server CPU to harness LPDDR5x memory with server-class reliability through mechanisms like error-correcting code (ECC) to meet the demands of the data center while delivering 2X the memory bandwidth and up to 10X better energy efficiency compared to today’s server memory. The NVIDIA Grace LPDDR5x solution coupled with the large, high-performance, last-level cache delivers the bandwidth necessary for large models while reducing system power to maximize performance for the next generation of workloads.

Next-Generation Arm Neoverse Cores

As the parallel compute capabilities of GPUs continue to advance, workloads can still be gated by serial tasks run on the CPU. A fast and efficient CPU is a critical component of system design to enable maximum workload acceleration. The NVIDIA Grace CPU integrates next-generation Arm Neoverse™ cores to deliver high-performance in a power-efficient design, making it easier for scientists and researchers to do their life’s work.

Watch NVIDIA founder and CEO Jensen Huang deliver the must-see GTC keynote where he unveils the NVIDIA Grace CPU, and read the press release for more information.