NVIDIA HGX A100

The most powerful end-to-end AI supercomputing platform

Purpose-Built for the Convergence of Simulation, Data Analytics, and AI

Massive datasets, exploding model sizes, and complex simulations require multiple GPUs with extremely fast interconnections. The NVIDIA HGX platform brings together the full power of NVIDIA GPUs, NVIDIA® NVLink®, NVIDIA Mellanox® InfiniBand® networking, and a fully optimized NVIDIA AI and HPC software stack from NGC to provide highest application performance. With its end-to-end performance and flexibility, NVIDIA HGX enables researchers and scientists to combine simulation, data analytics, and AI to advance scientific progress.

Unmatched Accelerated Computing Platform

NVIDIA HGX A100 combines NVIDIA A100 Tensor Core GPUs with high-speed interconnects to form the world’s most powerful servers. With A100 80GB GPUs, a single HGX A100 has up to 1.3 terabytes (TB) of GPU memory and over 2 terabytes per second (TB/s) of memory bandwidth, delivering unprecedented acceleration.

HGX A100 delivers up to a 20X AI speedup out of the box compared to previous generations with Tensor Float 32 (TF32) and a 2.5X HPC speedup with FP64. Fully tested and easy to deploy, HGX A100 integrates into partner servers to provide guaranteed performance. NVIDIA HGX A100 with 16 GPUs delivers a staggering 10 petaFLOPS, forming the world’s most powerful accelerated scale-up server platform for AI and HPC.

NVIDIA HGX A100 8-GPU

NVIDIA HGX A100 with 8x A100 GPUs

NVIDIA HGX A100 4-GPU

NVIDIA HGX A100 with 4x A100 GPUs

Deep Learning Performance

Machine Learning Performance

Up to 83X Faster than CPU, 2X Faster than A100 40GB on Big Data Analytics Benchmark

Up to 83X Faster than CPU, 2X Faster than A100 40GB on Big Data Analytics Benchmark

Machine learning models require loading, transforming, and processing extremely large datasets to glean critical insights. With up to 1.3 TB of unified memory and all-to-all GPU communications with NVSwitch, HGX A100 powered by A100 80GB GPUs has the capability to load and perform calculations on enormous datasets to derive actionable insights quickly.

On a big data analytics benchmark, A100 80GB delivered insights with 83X higher throughput than CPUs and 2X higher performance over A100 40GB, making it ideally suited for emerging workloads with exploding dataset sizes.

HPC Performance

HPC applications need to perform an enormous amount of calculations per second. Increasing the compute density of each server node dramatically reduces the number of servers required, resulting in huge savings in cost, power, and space consumed in the data center. For simulations, high-dimension matrix multiplication requires a processor to fetch data from many neighbors for computation, making GPUs connected by NVIDIA NVLink ideal. HPC applications can also leverage TF32 in A100 to achieve up to 11X higher throughput in four years for single-precision, dense matrix-multiply operations.

An HGX A100 powered by A100 80GB GPUs delivers a 2X throughput increase over A100 40GB GPUs on Quantum Espresso, a materials simulation, boosting time to insight.

11X More HPC Performance in Four Years

Top HPC Apps​

11X More HPC Performance in Four Years

Up to 1.8X Higher Performance for HPC Applications

Quantum Espresso​

Up to 1.8X Higher Performance for HPC Applications

HGX A100 Specifications

HGX A100 is available in single baseboards with four or eight A100 GPUs. The four-GPU configuration is fully interconnected with NVIDIA NVLink, and the eight-GPU configuration is interconnected with NVSwitch. Two NVIDIA HGX A100 8-GPU baseboards can also be combined using an NVSwitch interconnect to create a powerful 16-GPU single node.

4-GPU 8-GPU 16-GPU
GPUs 4x NVIDIA A100 8x NVIDIA A100 16x NVIDIA A100
HPC and AI Compute FP64/TF32*/FP16*/INT8* 78TF/1.25PF*/2.5PF*/5POPS* 156TF/2.5PF*/5PF*/10POPS* 312TF/5PF*/10PF*/20POPS*
Memory Up to 320GB Up to 640GB Up to 1,280GB
NVLink 3rd generation 3rd generation 3rd generation
NVSwitch N/A 2nd generation 2nd generation
NVSwitch GPU-to-GPU Bandwidth N/A 600 GB/s 600 GB/s
Total Aggregate Bandwidth 2.4 TB/s 4.8 TB/s 9.6 TB/s

HGX-1 and HGX-2 Reference Architectures

Powered by NVIDIA GPUs and NVLINK

NVIDIA HGX-1 and HGX-2 are reference architectures that standardize the design of data centers accelerating AI and HPC. Built with NVIDIA SXM2 V100 boards, with NVIDIA NVLink and NVSwitch interconnect technologies, HGX reference architectures have a modular design that works seamlessly in hyperscale and hybrid data centers to deliver up to 2 petaFLOPS of compute power for a quick, simple path to AI and HPC.

Powered by NVIDIA GPUs and NVLINK

Specifications

8-GPU
HGX-1 
16-GPU
HGX-2 
GPUs 8x NVIDIA V100 16x NVIDIA V100
AI Compute 1 petaFLOPS (FP16) 2 petaFLOPS (FP16)
Memory 256 GB 512 GB
NVLink 2nd generation 2nd generation
NVSwitch N/A Yes
NVSwitch GPU-to-GPU Bandwidth N/A 300 GB/s
Total Aggregate Bandwidth 2.4 TB/s 4.8 TB/s

Inside the NVIDIA Ampere Architecture

Read this technical deep dive to learn what's new with the NVIDIA Ampere architecture and its implementation in the NVIDIA A100 GPU.