NVLink and NVSwitch

The Building Blocks of Advanced Multi-GPU Communication

A Need for Faster, More Scalable Interconnects

How NVLink and NVSwitch Work Together


Tesla V100 with NVLink GPU-to-GPU and GPU-to-CPU Connections


NVIDIA NVLink Performance Since 2014

NVLink Maximizes System Throughput

NVIDIA NVLink technology addresses interconnect issues by providing higher bandwidth, more links, and improved scalability for multi-GPU system configurations. A single NVIDIA Tesla® V100 GPU supports up to six NVLink connections for a total bandwidth of 300 gigabytes per second (GB/sec)—10X the bandwidth of PCIe Gen 3. Servers like the NVIDIA DGX-1™ and DGX-2 take advantage of this technology to give you greater scalability for ultrafast deep learning training.

NVIDIA NVLink Performance Since 2014
NVLink Connecting Eight Tesla V100 Accelerators in a Hybrid Cube Mesh Topology as Used in the DGX-1V Server

Highest Levels of GPU-to-GPU Acceleration

First introduced with the NVIDIA Pascal™ architecture, NVLink on Tesla V100 has increased the signaling rate from 20 to 25 GB/s in each direction. This direct communication link between two GPUs, improves accuracy and convergence of high-performance computing (HPC) and AI and achieves speeds over an order of magnitude faster than PCIe.

NVLink Delivers Up To 70% Speedup vs PCIe

NVLink: GPU Servers: Dual Xeon Gold 6140@2.30GHz or E5-2698 v4@3.6GHz for PyTorch with 8xV100 PCIe vs 8xV100 NVLink. SW benchmarks: MILC (APEX medium). HOOMD-Blue (microsphere), LAMMPS (LJ 2.5).

New Levels of Performance

NVLink can bring up to 70 percent more performance to an otherwise identically configured server. Its dramatically higher bandwidth and reduced latency enables even larger deep learning workloads to scale in performance as they grow.


NVSwitch: The Fully Connected NVLink

The rapid adoption of deep learning has driven the need for a faster, more scalable interconnect, as PCIe bandwidth often creates a bottleneck at the multi-GPU system level.

NVIDIA NVSwitch builds on the advanced communication capability of NVLink to solve this problem. It takes deep learning performance to the next level with a GPU fabric that enables more GPUs in a single server and full-bandwidth connectivity between them.

Full Connection for Unparalleled Performance

NVSwitch is the first on-node switch architecture to support 16 fully connected GPUs in a single server node and drive simultaneous communication between all eight GPU pairs at an incredible 300 GB/s. These 16 GPUs can be used as a single large-scale accelerator with 0.5 terabyte of unified memory space and 2 petaFLOPS of deep learning compute power. A single HGX-2 or DGX-2 system with NVSwitch delivers up to 2.7X more application performance than 2 HGX-1 or DGX-1 systems connected with InfiniBand.

NVSwitch Delivers a >2X Speedup for Deep Learning and HPC

2 HGX-1V servers have dual-socket Xeon E5 2698v4 Processor, 8X V100 GPUs. Servers connected via 4X 100 Gb IB ports (run on DGX-1) | HGX-2 server has dual-socket Xeon Platinum 8168 Processor, 16X V100 GPUs, NVSwitch (run on DGX-2).


Explore the world’s most powerful accelerated server platform for deep learning, machine learning, and HPC.