The Building Blocks of Advanced Multi-GPU Communication

How NVLink and NVSwitch Work Together


NVIDIA A100 PCIe with NVLink GPU-to-GPU connection
NVIDIA A100 with NVLink GPU-to-GPU connections


The NVSwitch topology diagram

Maximizing System Throughput

Third-Generation NVLINK

NVIDIA NVLink technology addresses interconnect issues by providing higher bandwidth, more links, and improved scalability for multi-GPU system configurations. A single NVIDIA A100 Tensor Core GPU supports up to 12 third-generation NVLink connections for a total bandwidth of 600 gigabytes per second (GB/sec)—almost 10X the bandwidth of PCIe Gen 4. 

Servers like the NVIDIA DGX A100 take advantage of this technology to deliver greater scalability for ultrafast deep learning training. NVLink is also available in A100 PCIe two-GPU configurations.  

NVLink Performance

NVLink in NVIDIA A100


NVSwitch—The Fully Connected NVLink

The rapid adoption of deep learning has driven the need for a faster, more scalable interconnect, as PCIe bandwidth often creates a bottleneck at the multi-GPU-system level. For deep learning workloads to scale, dramatically higher bandwidth and reduced latency are needed.

NVIDIA NVSwitch builds on the advanced communication capability of NVLink to solve this problem. It takes deep learning performance to the next level with a GPU fabric that enables more GPUs in a single server and full-bandwidth connectivity between them. Each GPU has 12 NVLinks per NVSwitch to enable high-speed, all-to-all communication.


The Most Powerful End-to-End AI and HPC Data Center Platform

NVLink and NVSwitch are essential building blocks of the complete NVIDIA data center solution that incorporates hardware, networking, software, libraries, and optimized AI models and applications from NGC. The most powerful end-to-end AI and HPC platform, it allows researchers to deliver real-world results and deploy solutions into production, driving unprecedented acceleration at every scale.

Full Connection for Unparalleled Performance

NVSwitch is the first on-node switch architecture to support eight to 16 fully connected GPUs in a single server node. The second-generation NVSwitch drives simultaneous communication between all GPU pairs at an incredible 600 GB/s. It supports full all-to-all communication with direct GPU peer-to-peer memory addressing. These 16 GPUs can be used as a single high-performance accelerator with unified memory space and up to 10 petaFLOPS of deep learning compute power.




  • NVIDIA NVSwitch

    NVIDIA NVSwitch

  Second Generation Third Generation
Total NVLink Bandwidth 300 GB/s 600 GB/s
Maximum Number of Links per GPU 6 12
Supported NVIDIA Architectures NVIDIA Volta NVIDIA Ampere Architecture
  First Generation Second Generation
Number of GPUs with Direct Connection Up to 16 Up to 16
NVSwitch GPU-to-GPU Bandwidth 300 GB/s 600 GB/s
Total Aggregate Bandwidth 4.8 TB/s 9.6 TB/s
Supported NVIDIA Architectures NVIDIA Volta NVIDIA Ampere Architecture

Get Started

Experience NVIDIA DGX A100, the universal system for AI infrastructure and the world’s first AI system built on the NVIDIA A100 Tensor Core GPU.