NVIDIA HGX Platform

Supercharging AI and high-performance computing for every data center.

Overview
NVIDIA Vera CPU
Networking
Specifications

Overview
NVIDIA Vera CPU
Networking
Specifications

Overview

Supercharging AI and HPC for Every Data Center

The NVIDIA HGX™ platform brings together the full power of NVIDIA GPUs, NVIDIA Vera CPUs, NVIDIA NVLink™, NVIDIA networking, and fully optimized AI and high-performance computing (HPC) software stacks to provide the highest application performance and drive the fastest time to insights for every data center.

The NVIDIA HGX Rubin NVL8 integrates eight NVIDIA Rubin GPUs with sixth-generation high-speed NVLink interconnects, delivering up to 10x more token factory throughput versus HGX B200 and matching its training performance with 4x fewer GPUs. NVIDIA Rubin-based HGX systems are designed for the most demanding agentic AI, data analytics, and HPC workloads. NVIDIA HGX Rubin NVL8 can be paired with either NVIDIA Vera CPUs—configured as HGX Vera Rubin NVL8—or with x86-based CPU baseboards.

NVIDIA Vera Rubin Ramps Into Full Production to Power Agentic AI Factories Worldwide

The NVIDIA Vera Rubin is ramping into full production, with Taiwan’s top server makers and global supply chain leaders manufacturing at scale and shipping Vera Rubin-based systems— fueling AI labs, cloud providers, and hyperscalers to build tomorrow’s intelligence.

Read the Press Release

Accelerating the Next Generation of Agentic AI

Boost Token Factory Throughput With HGX Rubin NVL8

Serving agentic AI and reasoning models at scale demands extreme inference throughput. With architectural innovations including 400 PFLOPS of NVFP4 compute, 3x more memory bandwidth at 176 TB/s, and 2x more NVLink Switch bandwidth at 28.8 TB/s for high-throughput inter-GPU communication, HGX Rubin NVL8 delivers 10x more token factory throughput versus HGX B200. This leap in performance allows AI factories to serve more users, maximize token revenue, and lower cost per token.

Projected performance subject to change. Kimi K2-Thinking model with FTL<=500ms, ISL=4K, OSL=4K. HGX Rubin NVL8 with Sparse NVFP4, HGX B200 with Dense NVFP4

Projected performance subject to change. Number of GPUs based on DeepSeek-R1 pretrained on 15T tokens with 4K sequence length.

Train Next-Generation AI Models With 4x Fewer GPUs

HGX Rubin NVL8 brings breakthrough mixture-of-experts pretraining to the 8 GPU server form factor, training next-generation agentic AI models with 4x fewer GPUs, enabled by architectural innovations including 4x more NVFP4 training FLOPS, 1.6x more high-speed HBM memory capacity, and 2x more NVLink bandwidth versus HGX B200. This leap in training efficiency enables organizations to train more models within the same infrastructure footprint, lower the cost of model development, and maximize the return on AI infrastructure investment.

NVIDIA Vera CPU

NVIDIA Vera is the CPU for the age of AI—purpose-built for agentic AI, reinforcement learning, and data processing at scale. NVIDIA Olympus cores, high-bandwidth LPDDR5X memory, and NVIDIA Scalable Coherency Fabric deliver fast, efficient CPU execution alongside accelerated compute, helping AI factories run more agents, evaluations, and data pipelines.

Learn More

Accelerating HGX With NVIDIA Networking

AI factories and supercomputing centers span thousands of GPUs as a single distributed computing engine. To keep accelerators fully utilized, AI and scientific workloads demand deterministic latency, lossless throughput, stable iteration times, and the ability to scale not only within a data center but also across multiple sites.

NVIDIA networking delivers the full-stack fabric that makes this possible, combining NVIDIA NVLink scale-up, NVIDIA Quantum InfiniBand and Spectrum-X™ Ethernet scale-out, Spectrum-XGS Ethernet multi-data-center scale-across, NVIDIA® BlueField® DPU and DOCA™ for infrastructure services, and next-generation silicon-photonics platforms—enabling the world’s most demanding AI data centers.

NVIDIA HGX Specifications

NVIDIA HGX is available in a single baseboard with eight NVIDIA Rubin, NVIDIA Blackwell, or NVIDIA Blackwell Ultra SXMs. Rubin GPUs can be paired with an NVIDIA Vera CPU or x86-based baseboard. These powerful combinations of hardware and software lay the foundation for unprecedented AI and supercomputing performance.

NVIDIA Rubin NVL8
NVIDIA Blackwell

System Specifications	NVIDIA HGX Vera Rubin NVL8<sup>1</sup>	NVIDIA HGX Rubin NVL8<sup>1</sup>
Configuration	8x NVIDIA Rubin SXM with Single Socket Vera CPU	8x NVIDIA Rubin SXM
CPU \| Core Count	NVIDIA Vera CPU \| 88 Custom NVIDIA Olympus Cores (Arm® compatible) with Spatial Multithreading (SMT)	x86 CPU<sup>4</sup>
CPU Memory \| Bandwidth	1.5TB LPDDR5X \| 1.2 TB/s	x86 CPU<sup>4</sup>
NVFP4 Inference	400 PFLOPS
NVFP4 Training<sup>2</sup>	280 PFLOPS
FP8/FP6 Training<sup>2</sup>	140 PFLOPS
INT8<sup>2</sup>	2 POPS
FP16/BF16<sup>2</sup>	32 PFLOPS
TF32<sup>2</sup>	16 PFLOPS
FP32	1,040 TFLOPS
FP64	265 TFLOPS
FP32 SGEMM<sup>3</sup>	3,200 TFLOPS
FP64 DGEMM<sup>3</sup>	1,600 TFLOPS
GPU Memory \| Bandwidth	2.3 TB HBM4 \| 176 TB/s
NVLink Switch Bandwidth	28.8 TB/s
NVIDIA NVLink	Sixth Generation
Networking Bandwidth	1.6 TB/s

Individual GPU Specifications	NVIDIA Rubin GPU<sup>1</sup>
NVFP4 Inference	50 PFLOPS
NVFP4 Training<sup>2</sup>	35 PFLOPS
FP8/FP6 Training<sup>2</sup>	17.5 PFLOPS
INT8<sup>2</sup>	250 TOPS
FP16/BF16<sup>2</sup>	4 PFLOPS
TF32<sup>2</sup>	2 PFLOPS
FP32	130 TFLOPS
FP64	33 TFLOPS
FP32 SGEMM<sup>2</sup>	400 TFLOPS
FP64 DGEMM<sup>2</sup>	200 TFLOPS
NVLink Bandwidth	3.6 TB/s
NVIDIA NVLink	Sixth Generation
GPU Memory \| Bandwidth	288 GB HBM4 \| 22 TB/s

1. Preliminary information. All values are up to and subject to change. NVFP4 Inference specification is sparse.
2. Dense specification.
3. Peak performance using Tensor Core-based emulation algorithms.
4. CPU and memory specs are defined by OEM offerings.

Read the NVIDIA Vera Rubin Datasheet

	HGX B300<sup>4</sup>	HGX B200<sup>4</sup>
Form Factor	8x NVIDIA Blackwell Ultra SXM	8x NVIDIA Blackwell SXM
FP4 Tensor Core<sup>1</sup>	144 PFLOPS \| 108 PFLOPS	144 PFLOPS \| 72 PFLOPS
FP8/FP6 Tensor Core<sup>2</sup>	72 PFLOPS	72 PFLOPS
INT8 Tensor Core<sup>2</sup>	3 POPS	72 POPS
FP16/BF16 Tensor Core<sup>2</sup>	36 PFLOPS	36 PFLOPS
TF32 Tensor Core<sup>2</sup>	18 PFLOPS	18 PFLOPS
FP32	600 TFLOPS	600 TFLOPS
FP64/FP64 Tensor Core	10 TFLOPS	296 TFLOPS
Total Memory	2.1 TB	1.4 TB
NVIDIA NVLink	Fifth generation	Fifth generation
NVIDIA NVLink Switch™	NVLink 5 Switch	NVLink 5 Switch
NVLink GPU-to-GPU Bandwidth	1.8 TB/s	1.8 TB/s
Total NVLink Bandwidth	14.4 TB/s	14.4 TB/s
Networking Bandwidth	1.6 TB/s	0.8 TB/s
Attention Performance<sup>3</sup>	2x	1x

1. Specification in Sparse | Dense
2. Specification in Sparse. Dense is ½ sparse spec shown.
3. vs. NVIDIA Blackwell.
4. HGX B300 and HGX B200 shipping now.

	HGX B300	HGX B200
Form Factor	8x NVIDIA Blackwell Ultra SXM	8x NVIDIA Blackwell SXM
FP4 Tensor Core<sup>1</sup>	144 PFLOPS \| 108 PFLOPS	144 PFLOPS \| 72 PFLOPS
FP8/FP6 Tensor Core<sup>2</sup>	72 PFLOPS	72 PFLOPS
INT8 Tensor Core<sup>2</sup>	3 POPS	72 POPS
FP16/BF16 Tensor Core<sup>2</sup>	36 PFLOPS	36 PFLOPS
TF32 Tensor Core<sup>2</sup>	18 PFLOPS	18 PFLOPS
FP32	600 TFLOPS	600 TFLOPS
FP64/FP64 Tensor Core	10 TFLOPS	296 TFLOPS
Total Memory	2.1 TB	1.4 TB
NVIDIA NVLink	Fifth generation	Fifth generation
NVIDIA NVLink Switch™	NVLink 5 Switch	NVLink 5 Switch
NVLink GPU-to-GPU Bandwidth	1.8 TB/s	1.8 TB/s
Total NVLink Bandwidth	14.4 TB/s	14.4 TB/s
Networking Bandwidth	1.6 TB/s	0.8 TB/s
Attention Performance<sup>3</sup>	2x	1x

Read the NVIDIA Blackwell Ultra Datasheet

Read the NVIDIA Blackwell Datasheet

Learn more about the NVIDIA Vera Rubin Platform.

Learn More