NVLink Fabric

Advancing Multi-GPU Processing

NVIDIA NVLink Performance Since 2014

Maximizing System Throughput

NVIDIA® NVLink technology addresses this interconnect issue by providing higher bandwidth, more links, and improved scalability for multi-GPU and multi-GPU/CPU system configurations. A single NVIDIA Tesla® V100 GPU supports up to six NVLink connections and total bandwidth of 300 GB/sec—10X the bandwidth of PCIe Gen 3. Servers like the new NVIDIA DGX-1 take advantage of these technologies to give you greater scalability for ultrafast deep learning training.

New Levels Of GPU-TO-GPU Acceleration

First introduced with the NVIDIA Pascal architecture, NVLink on Tesla V100 has increased the signaling rate from 20 to 25 GB/second in each direction. It can be used for GPU-to-CPU or GPU-to-GPU communication, as in the DGX-1 with Tesla V100.

Tesla V100 with NVLink GPU-to-GPU and GPU-to-CPU Connections
NVLink Connecting Eight Tesla V100 Accelerators in a Hybrid Cube Mesh Topology as Used in the DGX-1V Server

NVLink Delivers Up To 70% Speedup vs PCIe

NVLink: GPU Servers: Dual Xeon Gold 6140@2.30GHz or E5-2698 v4@3.6GHz for PyTorch with 8xV100 PCIe vs 8xV100 NVLink. SW benchmarks: MILC (APEX medium). HOOMD-Blue (microsphere), LAMMPS (LJ 2.5).

New Levels Of Performance

NVIDIA NVLink can bring up to 70% more performance to an otherwise identically configured server. Its dramatically higher bandwidth and reduced latency will enable even larger deep learning workloads to scale in performance as they continue to grow.

NvSwitch: Fully Connected NvLink

The rapid growth in deep learning workloads has driven the need for a faster and more scalable interconnect, as PCIe bandwidth increasingly becomes the bottleneck at the multi-GPU system level.

NVLink is a great advance to enable eight GPUs in a single server, and accelerate performance beyond PCIe. But taking deep learning performance to the next level will require a GPU fabric that enables more GPUs in a single server, and full-bandwidth connectivity between them.

NVIDIA NVSwitch is the first on-node switch architecture to support 16 fully-connected GPUs in a single server node and drive simultaneous communication between all eight GPU pairs at an incredible 300 GB/s each. These 16 GPUs can be used as a single large-scale accelerator with 0.5 Terabytes of unified memory space and 2 petaFLOPS of deep learning compute power. A single HGX-2/DGX-2 system with NVSwitch delivers up to 2.7x more application performance than 2 HGX-1/DGX-1 systems connected via infinband.

NVSwitch Delivers a >2X Speedup for Deep Learning and HPC

2 HGX-1V servers have dual socket Xeon E5 2698v4 Processor, 8X V100 GPUs. Servers connected via 4X 100Gb IB ports (run on DGX-1) . HGX-2 server has dual-socket Xeon Platinum 8168 Processor, 16X V100 GPUs (run on DGX-2).