NVIDIA Vera Rubin NVL72

Building the next frontier of agentic AI.

Overview

Seven New Chips, One AI Supercomputer

NVIDIA Vera Rubin NVL72 unifies leading-edge technologies from NVIDIA—72 Rubin GPUs, 36 Vera CPUs, ConnectX®-9 SuperNIC™s, and BlueField®-4 DPUs. It scales up intelligence in a rack-scale platform with the NVIDIA NVLink™ 6 switch and scales out with NVIDIA Quantum-X800 InfiniBand and Spectrum-X™ Ethernet to power the AI industrial revolution at scale. When deployed with NVIDIA Groq 3 LPX racks, Vera Rubin NVL72 delivers a new class of inference performance for trillion-parameter models and million-token context.

Vera Rubin NVL72 is built on the third-generation NVIDIA MGX™ NVL72 rack design, offering a seamless transition from prior generations. It delivers AI training with one-fourth the GPUs and AI inference at one-tenth the cost per million tokens versus NVIDIA Blackwell. Featuring cable‑free modular tray designs and support from over 80 MGX ecosystem partners, the rack-scale AI supercomputer delivers world‑class performance with rapid deployment.

NVIDIA Vera Rubin Ramps Into Full Production to Power Agentic AI Factories Worldwide

The NVIDIA Vera Rubin is ramping into full production, with Taiwan’s top server makers and global supply chain leaders manufacturing at scale and shipping Vera Rubin-based systems— fueling AI labs, cloud providers, and hyperscalers to build tomorrow’s intelligence.

Read the Press Release

NVIDIA Vera Rubin Opens the Agentic AI Frontier

The NVIDIA Vera Rubin platform offers seven new chips, now in full production, to scale the world’s largest AI factories.

Read the Press Release

Performance

Massive Efficiency Gains in AI Inference and Training

LLM inference performance subject to change. Cost per 1 million tokens based on Kimi-K2-Thinking model using 32K/8K ISL/OSL comparing NVIDIA GB200 NVL72 and NVIDIA Vera Rubin NVL72.

Driving Down Inference Costs

NVIDIA Vera Rubin NVL72 delivers one-tenth the cost per million tokens compared to NVIDIA GB200 NVL72 for highly interactive, deep reasoning agentic AI.

Maximizing AI Factory Throughput

NVIDIA Vera Rubin NVL72 delivers up to 10x more tokens per megawatt than NVIDIA GB200 NVL72, scaling intelligence within the same power footprint.

LLM inference performance subject to change. Tokens per second per MW based on Kimi-K2 Thinking model using 32K/8K ISL/OSL comparing NVIDIA GB200 NVL72 and NVIDIA Vera Rubin NVL72.

Projected performance subject to change. Number of GPUs based on a 10T MoE model trained on 100T tokens in a fixed timeframe of 1 month comparing NVIDIA GB200 NVL72 and NVIDIA Vera Rubin NVL72.

Boosting Training Efficiency

NVIDIA Vera Rubin NVL72 trains mixture-of-experts (MoE) models with one-fourth the number of GPUs compared to NVIDIA GB200 NVL72.

35x Higher Throughput for Trillion-Parameter Models

Agentic systems consume up to 15x more tokens than traditional AI applications. AI factories must deliver on token volume and massive context windows with low latency and efficient economics. When paired with LPX, Vera Rubin NVL72 delivers up to 35x higher throughput per megawatt for trillion-parameter models.

Projected performance subject to change. Free Tier ($0): Qwen-3 235-billion parameter model with 32K KV-cached tokens. Medium Tier ($3): Kimi K2.5 1-trillion parameter model with 128K KV-cached tokens. High Tier ($6): GPT-MoE 2-trillion parameter model with 128K KV-cached tokens. Premium ($45) and Ultra ($150) Tiers: GPT-MoE 2-trillion parameter model with 400K KV-cached tokens.

Powering the Era of AI Agents

Inside the Vera Rubin Platform

NVIDIA Rubin GPU

Rubin GPUs with HBM4 and 50 PF NVFP4 Transformer Engine made for the next generation of AI.

Learn More

NVIDIA Vera CPU

Vera CPUs are purpose-built for data movement and agentic reasoning, delivering high-bandwidth, energy-efficient compute with deterministic performance.

Learn More

NVIDIA NVLink 6 Switch

NVLink 6 switches feature 3.6 terabytes per second (TB/s) of all-to-all, scale-up bandwidth per GPU, enabling high-speed GPU-to-GPU communications for AI.

Learn More

NVIDIA ConnectX-9 SuperNIC

ConnectX‑9 SuperNICs deliver 1.6 terabits per second (Tb/s) of per-GPU bandwidth, with programmable remote direct-memory access (RDMA) for low‑latency, GPU‑direct networking at massive scale.

Learn More

NVIDIA BlueField-4 DPU

BlueField-4 DPUs accelerate data processing across storage, networking, cybersecurity, and elastic scaling in AI factories.

Learn More

NVIDIA Spectrum-X Ethernet Co-Packaged Optics

Spectrum‑X Ethernet scale‑out switches with integrated silicon photonics deliver 5x better power efficiency, 10x higher network resiliency, and up to 5x more uptime over traditional networking with pluggable transceivers.

Learn More

NVIDIA Groq 3 LPU

This is the inference accelerator for NVIDIA Vera Rubin NVL72, designed to meet the low-latency and large-context demands of agentic systems. The NVIDIA Groq 3 LPX rack features 256 LPUs with 128 GB SRAM, 40 PB/s memory bandwidth, and 640 TB/s scale-up bandwidth per rack. It is co-designed with Vera Rubin NVL72 to deliver 35x inference performance per watt and up to 10x more revenue opportunity for trillion parameter models relative to Blackwell.

Learn More

Specifications¹

NVIDIA Vera Rubin NVL72 Specs

	NVIDIA Vera Rubin NVL72	NVIDIA Vera Rubin Superchip	NVIDIA Rubin GPU
Configuration	72 NVIDIA Rubin GPUs \| 36 NVIDIA Vera CPUs	2 NVIDIA Rubin GPUs \| 1 NVIDIA Vera CPU	1 NVIDIA Rubin GPU
NVFP4 Inference	3,600 PFLOPS	100 PFLOPS	50 PFLOPS
NVFP4 Training²	2,520 PFLOPS	70 PFLOPS	35 PFLOPS
FP8/FP6 Training²	1,260 PFLOPS	35 PFLOPS	17.5 PFLOPS
INT8²	18 POPS	500 TOPS	250 TOPS
FP16/BF16²	288 PFLOPS	8 PFLOPS	4 PFLOPS
TF32²	144 PFLOPS	4 PFLOPS	2 PFLOPS
FP32	9,360 TFLOPS	260 TFLOPS	130 TFLOPS
FP64	2,400 TFLOPS	67 TFLOPS	33 TFLOPS
FP32 SGEMM³	28,800 TFLOPS	800 TFLOPS	400 TFLOPS
FP64 DGEMM³	14,400 TFLOPS	400 TFLOPS	200 TFLOPS
GPU Memory \| Bandwidth	20.7 TB HBM4 \| 1,580 TB/s	576 GB HBM4 \| 44 TB/s	288 GB HBM4 \| 22 TB/s
NVIDIA NVLink	Sixth Generation
NVLink Bandwidth	260 TB/s (NVLink 6 Switch Bandwidth)	7.2 TB/s	3.6 TB/s
NVLink-C2C Bandwidth	65 TB/s	1.8 TB/s	-
CPU Core Count	3,168 custom NVIDIA Olympus cores (Arm® compatible)	88 custom NVIDIA Olympus cores (Arm® compatible)	-
CPU Memory	54 TB LPDDR5X	1.5 TB LPDDR5X	-
Networking Bandwidth (Scale Out)	28.8 TB/s	0.8 TB/s	0.4 TB/s
Total NVIDIA + HBM4 Chips	1,296	30	12

1. Preliminary information. All values are up to and subject to change.
2. Dense specification.
3. Peak performance using Tensor Core-based emulation algorithms.

Read the NVIDIA Vera Rubin Datasheet

Get Started

Stay Up to Date on NVIDIA News

Stay Informed