NVIDIA HGX AI Supercomputer

The most powerful end-to-end AI supercomputing platform.

Purpose-Built for the Convergence of Simulation, Data Analytics, and AI

Massive datasets, exploding model sizes, and complex simulations require multiple GPUs with extremely fast interconnections and a fully accelerated software stack. The NVIDIA HGX AI supercomputing platform brings together the full power of NVIDIA GPUs, NVIDIA® NVLink®, NVIDIA InfiniBand networking, and a fully optimized NVIDIA AI and HPC software stack from the NVIDIA NGC catalog to provide highest application performance. With its end-to-end performance and flexibility, NVIDIA HGX enables researchers and scientists to combine simulation, data analytics, and AI to drive scientific progress.

Unmatched End-to-End Accelerated Computing Platform

NVIDIA HGX combines NVIDIA A100 Tensor Core GPUs with high-speed interconnects to form the world’s most powerful servers. With 16 A100 GPUs, HGX  has up to 1.3 terabytes (TB) of GPU memory and over 2 terabytes per second (TB/s) of memory bandwidth for unprecedented acceleration. 

Compared to previous generations, HGX provides up to a 20X AI speedup out of the box with Tensor Float 32 (TF32) and a 2.5X HPC speedup with FP64. NVIDIA HGX delivers a staggering 10 petaFLOPS, forming the world’s most powerful accelerated scale-up server platform for AI and HPC.

HGX Stack

NVIDIA HGX A100 8-GPU

NVIDIA HGX A100 with 8x A100 GPUs

NVIDIA HGX A100 4-GPU

NVIDIA HGX A100 with 4x A100 GPUs

Deep Learning Performance

Up to 3X Higher AI Training on Largest Models

DLRM Training

Up to 3X Higher AI Training on Largest Models

Deep learning models are exploding in size and complexity, requiring a system with large amounts of memory, massive computing power, and fast interconnects for scalability. With NVIDIA NVSwitch providing high-speed, all-to-all GPU communications, HGX can handle the most advanced AI models. With A100 80GB GPUs, GPU memory is doubled, delivering up to 1.3TB of memory in a single HGX. Emerging workloads on the very largest models like deep learning recommendation models (DLRM), which have massive data tables, are accelerated up to 3X over HGX powered by A100 40GB GPUs.

Machine Learning Performance

2X Faster than A100 40GB on Big Data Analytics Benchmark

2X Faster than A100 40GB on Big Data Analytics Benchmark

Machine learning models require loading, transforming, and processing extremely large datasets to glean critical insights. With up to 1.3TB of unified memory and all-to-all GPU communications with NVSwitch, HGX powered by A100 80GB GPUs has the capability to load and perform calculations on enormous datasets to derive actionable insights quickly.

On a big data analytics benchmark, A100 80GB delivered insights with 2X higher throughput over A100 40GB, making it ideally suited for emerging workloads with exploding dataset sizes.

HPC Performance

HPC applications need to perform an enormous amount of calculations per second. Increasing the compute density of each server node dramatically reduces the number of servers required, resulting in huge savings in cost, power, and space consumed in the data center. For simulations, high-dimension matrix multiplication requires a processor to fetch data from many neighbors for computation, making GPUs connected by NVIDIA NVLink ideal. HPC applications can also leverage TF32 in A100 to achieve up to 11X higher throughput in four years for single-precision, dense matrix-multiply operations.

An HGX powered by A100 80GB GPUs delivers a 2X throughput increase over A100 40GB GPUs on Quantum Espresso, a materials simulation, boosting time to insight.

11X More HPC Performance in Four Years

Top HPC Apps​

11X More HPC Performance in Four Years

Up to 1.8X Higher Performance for HPC Applications

Quantum Espresso​

Up to 1.8X Higher Performance for HPC Applications

NVIDIA HGX Specifications

NVIDIA HGX is available in single baseboards with four or eight A100 GPUs, each with 40GB or 80GB of GPU memory. The 4-GPU configuration is fully interconnected with NVIDIA NVLink®, and the 8-GPU configuration is interconnected with NVSwitch. Two NVIDIA HGX A100 8-GPU baseboards can be combined using an NVSwitch interconnect to create a powerful 16-GPU single node.

HGX is also available in a PCIe form factor for a modular, easy-to-deploy option, bringing the highest computing performance to mainstream servers, each with either 40GB or 80GB of GPU memory.

This powerful combination of hardware and software lays the foundation for the ultimate AI supercomputing platform.

  A100 PCIe 4-GPU 8-GPU 16-GPU
GPUs 1x NVIDIA A100 PCIe HGX A100 4-GPU HGX A100 8-GPU 2x HGX A100 8-GPU
Form factor PCIe 4x NVIDIA A100 SXM 8x NVIDIA A100 SXM 16x NVIDIA A100 SXM
HPC and AI compute (FP64/TF32*/FP16*/INT8*) 19.5TF/312TF*/624TF*/1.2POPS* 78TF/1.25PF*/2.5PF*/5POPS* 156TF/2.5PF*/5PF*/10POPS* 312TF/5PF*/10PF*/20POPS*
Memory 40 or 80GB per GPU Up to 320GB Up to 640GB Up to 1,280GB
NVLink Third generation Third generation Third generation Third generation
NVSwitch N/A N/A Second generation Second generation
NVSwitch GPU-to-GPU bandwidth N/A N/A 600GB/s 600GB/s
Total aggregate bandwidth 600GB/s 2.4TB/s 4.8TB/s 9.6TB/s

Accelerating HGX with NVIDIA Networking

With HGX, it’s also possible to include NVIDIA networking to accelerate and offload data transfers and ensure the full utilization of computing resources. Smart adapters and switches reduce latency, increase efficiency, enhance security, and simplify data center automation to accelerate end-to-end application performance.

The data center is the new unit of computing, and HPC networking plays an integral role in scaling application performance across the entire data center. NVIDIA InfiniBand is paving the way with software-defined networking, In-Network Computing acceleration, remote direct-memory access (RDMA), and the fastest speeds and feeds.

HGX-1 and HGX-2 Reference Architectures

Powered by NVIDIA GPUs and NVLINK

NVIDIA HGX-1 and HGX-2 are reference architectures that standardize the design of data centers accelerating AI and HPC. Built with NVIDIA SXM2 V100 boards, with NVIDIA NVLink and NVSwitch interconnect technologies, HGX reference architectures have a modular design that works seamlessly in hyperscale and hybrid data centers to deliver up to 2 petaFLOPS of compute power for a quick, simple path to AI and HPC.

Powered by NVIDIA GPUs and NVLINK

Specifications

8-GPU
HGX-1 
16-GPU
HGX-2 
GPUs 8x NVIDIA V100 16x NVIDIA V100
AI Compute 1 petaFLOPS (FP16) 2 petaFLOPS (FP16)
Memory 256 GB 512 GB
NVLink 2nd generation 2nd generation
NVSwitch N/A Yes
NVSwitch GPU-to-GPU Bandwidth N/A 300 GB/s
Total Aggregate Bandwidth 2.4 TB/s 4.8 TB/s

Inside the NVIDIA Ampere Architecture

Read this technical deep dive to learn what's new with the NVIDIA Ampere architecture and its implementation in the NVIDIA A100 GPU.