The NVIDIA HGX™ platform brings together the full power of NVIDIA GPUs, NVIDIA NVLink™, NVIDIA networking, and fully optimized AI and high-performance computing (HPC) software stacks to provide the highest application performance and drive the fastest time to insights for every data center.
The NVIDIA HGX Rubin NVL8 integrates eight NVIDIA Rubin GPUs with sixth-generation high-speed NVLink interconnects, delivering 5.5x more NVFP4 FLOPS than HGX B200 to propel the data center into a new era of accelerated computing and generative AI.
AI factories and supercomputing centers span thousands of GPUs as a single distributed computing engine. To keep accelerators fully utilized, AI and scientific workloads demand deterministic latency, lossless throughput, stable iteration times, and the ability to scale not only within a data center but also across multiple sites.
NVIDIA networking delivers the full-stack fabric that makes this possible, combining NVIDIA NVLink scale-up, NVIDIA Quantum InfiniBand and Spectrum-X™ Ethernet scale-out, Spectrum-XGS Ethernet multi-data-center scale-across, NVIDIA® BlueField® DPU and DOCA™ for infrastructure services, and next-generation silicon-photonics platforms—enabling the world’s most demanding AI data centers.
NVIDIA HGX is available in a single baseboard with eight NVIDIA Rubin, NVIDIA Blackwell, or NVIDIA Blackwell Ultra SXMs. These powerful combinations of hardware and software lay the foundation for unprecedented AI supercomputing performance.
| HGX Rubin NVL8* | |
|---|---|
| Form Factor | 8x NVIDIA Rubin SXM |
| NVFP4 Inference | 400 PFLOPS |
| NVFP4 Training | 280 PFLOPS |
| FP8/FP6 Training | 140 PF |
| INT8 Tensor Core<sup>1</sup> | 2 PFLOPS |
| FP16/BF16 Tensor Core<sup>1</sup> | 32 PFLOPS |
| TF32 Tensor Core<sup>1</sup> | 16 PFLOPS |
| FP32 | 1040 TFLOPS |
| FP64/FP64 Tensor Core | 264 TFLOPS |
| FP32 SGEMM | FP64 DGEMMCore<sup>2</sup> | 3200 TF | 1600 TF |
| Total Memory | 2.3 TB |
| NVIDIA NVLink | Sixth generation |
| NVIDIA NVLink Switch | NVLink 6 Switch |
| NVLink GPU-to-GPU Bandwidth | 3.6 TB/s |
| Total NVLink Switch Bandwidth | 28.8 TB/s |
| Networking Bandwidth | 1.6 TB/s |
* Preliminary specification, subject to change
1. Specification in Dense.
2. Peak performance using tensor core-based emulation algorithms.
| HGX B300 | HGX B200 | |
|---|---|---|
| Form Factor | 8x NVIDIA Blackwell Ultra SXM | 8x NVIDIA Blackwell SXM |
| FP4 Tensor Core<sup>1</sup> | 144 PFLOPS | 108 PFLOPS | 144 PFLOPS | 72 PFLOPS |
| FP8/FP6 Tensor Core<sup>2</sup> | 72 PFLOPS | 72 PFLOPS |
| INT8 Tensor Core<sup>2</sup> | 3 POPS | 72 POPS |
| FP16/BF16 Tensor Core<sup>2</sup> | 36 PFLOPS | 36 PFLOPS |
| TF32 Tensor Core<sup>2</sup> | 18 PFLOPS | 18 PFLOPS |
| FP32 | 600 TFLOPS | 600 TFLOPS |
| FP64/FP64 Tensor Core | 10 TFLOPS | 296 TFLOPS |
| Total Memory | 2.1 TB | 1.4 TB |
| NVIDIA NVLink | Fifth generation | Fifth generation |
| NVIDIA NVLink Switch™ | NVLink 5 Switch | NVLink 5 Switch |
| NVLink GPU-to-GPU Bandwidth | 1.8 TB/s | 1.8 TB/s |
| Total NVLink Bandwidth | 14.4 TB/s | 14.4 TB/s |
| Networking Bandwidth | 1.6 TB/s | 0.8 TB/s |
| Attention Performance<sup>3</sup> | 2x | 1x |
1. Specification in Sparse | Dense
2. Specification in Sparse. Dense is ½ sparse spec shown.
3. vs. NVIDIA Blackwell.
Learn more about the NVIDIA Rubin Platform.