A Need for Faster, More Scalable Interconnects

Increasing compute demands in AI and high-performance computing (HPC)—including an emerging class of trillion-parameter models—are driving a need for multi-node, multi-GPU systems with seamless, high-speed communication between every GPU. To build the most powerful, end-to-end computing platform that can meet the speed of business, a fast, scalable interconnect is needed.

NVIDIA A100 PCIe with NVLink GPU-to-GPU connection
NVIDIA A100 with NVLink GPU-to-GPU connections

NVLink Performance

NVLink Performance

Fully Connect GPUs with NVIDIA NVSwitch

The third generation of NVIDIA NVSwitch builds on the advanced communication capability of NVLink to deliver higher bandwidth and reduced latency for compute-intensive workloads. To enable high-speed, collective operations, each NVSwitch has 64 NVLink ports equipped with engines for NVIDIA Scalable Hierarchical Aggregation Reduction Protocol (SHARP) for in-network reductions and multicast acceleration.

Fully Connect GPUs with NVIDIA NVSwitch
NVLink and NVSwitch Work Together

How NVLink and NVSwitch Work Together

NVLink is a direct GPU-to-GPU interconnect that scales multi-GPU input/output (IO) within the server. NVSwitch connects multiple NVLinks to provide all-to-all GPU communication at full NVLink speed within a single node and between nodes.    

With the combination of NVLink and NVSwitch, NVIDIA won MLPerf 1.1, the first industry-wide AI benchmark.

Scale-Up to Train Trillion Parameter Models with NVLink Switch System

With NVSwitch, NVLink connections can be extended across nodes to create a seamless, high-bandwidth, multi-node GPU cluster—effectively forming a data center-sized GPU. By adding a second tier of NVLink Switches externally to the servers, the NVLink Switch System can connect up to 256 GPUs and deliver a staggering 57.6 terabytes per second (TB/s) of all-to-all bandwidth, making it possible to rapidly solve even the largest AI jobs. 

Scale-Up to Train Trillion Parameter Models
NVIDIA NVLink Switch

NVIDIA NVLink Switch

The NVIDIA NVLink Switch features 128 NVLink ports with a non-blocking switching capacity of 3.2 terabytes per second (TB/s). The rack switch is designed to provide high bandwidth and low latency in NVIDIA DGX and NVIDIA HGX systems supporting external fourth-generation NVLink connectivity.

Scaling from Enterprise to Exascale

Full Connection for Unparalleled Performance

NVSwitch is the first on-node switch architecture to support eight to 16 fully connected GPUs in a single server node. The third-generation NVSwitch interconnects every GPU pair at an incredible 900GB/s. It supports full all-to-all communication. The GPUs can be used as a single high-performance accelerator with up to 15 petaFLOPS of deep learning compute power. 

The Most Powerful AI and HPC Platform

NVLink and NVSwitch are essential building blocks of the complete NVIDIA data center solution that incorporates hardware, networking, software, libraries, and optimized AI models and applications from the NVIDIA AI Enterprise software suite and the  NVIDIA NGC catalog. The most powerful end-to-end AI and HPC platform, it allows researchers to deliver real-world results and deploy solutions into production, driving unprecedented acceleration at every scale. 

Specifications

  • NVLink

    NVLink

  • NVSwitch

    NVSwitch

  • NVLink Switch System

    NVLink Switch System

  Second Generation Third Generation Fourth Generation
NVLink bandwidth per GPU 300GB/s 600GB/s 900GB/s
Maximum Number of Links per GPU 6 12 18
Supported NVIDIA Architectures NVIDIA Volta architecture NVIDIA Ampere Architecture NVIDIA Hopper Architecture
  First Generation Second Generation Third Generation
Number of GPUs with direct connection / node Up to 8 Up to 8 Up to 8
NVSwitch GPU-to-GPU bandwidth 300GB/s 600GB/s 900GB/s
Total aggregate bandwidth 2.4TB/s 4.8TB/s 7.2TB/s
Supported NVIDIA architectures NVIDIA Volta architecture NVIDIA Ampere architecture NVIDIA Hopper architecture
  NVLink Switch System
Number of GPUs with direct connection Up to 256
NVSwitch GPU-to-GPU bandwidth 900GB/s
Total aggregate bandwidth 57.6TB/s
In-network reductions SHARP reductions in NVSwitch
Key software support CUDA®, CUDA-X, Magnum IO
Supported NVIDIA architectures NVIDIA Hopper architecture

Take a Deep Dive into the NVIDIA Hopper Architecture