Maximizing System Throughput

Maximizing System Throughput

NVIDIA® NVLink technology addresses this interconnect issue by providing higher bandwidth, more links, and improved scalability for multi-GPU and multi-GPU/CPU system configurations. A single NVIDIA Tesla® V100 GPU supports up to six NVLink connections and total bandwidth of 300 GB/sec—10X the bandwidth of PCIe 3. Servers like the new NVIDIA DGX-1 take advantage of these technologies to give you greater scalability for ultrafast deep learning training.

New Levels Of GPU-TO-GPU Acceleration

First introduced with the NVIDIA Pascal architecture, NVLink on Tesla V100 has increased the signaling rate from 20 to 25 GB/second in each direction. It can be used for GPU-to-CPU or GPU-to-GPU communication, as in the DGX-1V server with Tesla V100.

Tesla V100 with NVLink GPU-to-GPU and GPU-to-CPU connections
NVLink connecting eight Tesla V100 accelerators in a Hybrid Cube Mesh Topology as used in the DGX-1V server
New Levels Of Performance

New Levels Of Performance

NVIDIA NVLink can bring up to 46% more performance to an otherwise identically configured server. Its dramatically higher bandwidth and reduced latency will enable even larger deep learning workloads to scale in performance as they continue to grow.

NvSwitch: Fully Connected NvLink

The rapid growth in deep learning workloads has driven the need for a faster and more scalable interconnect, as PCIe bandwidth increasingly becomes the bottleneck at the multi-GPU system level.

NVLink was a great advance to enable eight GPUs in a single server, and accelerate performance beyond PCIe. But taking deep learning performance to the next level will require a GPU fabric that enables more GPUs in a single server, and full-bandwidth connectivity between them.

NVIDIA NVSwitch is the first on-node switch architecture to support 16 fully-connected GPUs in a single server node and drive simultaneous communication between all eight GPU pairs at an incredible 300 GB/s each. These 16 GPUs can be used as a single large-scale accelerator with 0.5 Terabytes of unified memory space and 2 petaFLOPS of deep learning compute power. A single HGX-2/DGX-2 system with NVSwitch delivers up to 2.7x more application performance than 2 HGX-1/DGX-1 systems connected via infinband.

NVSwitch Delivers a >2X Speedup for Deep Learning and HPC

2 HGX-1V servers have dual socket Xeon E5 2698v4 Processor, 8X V100 GPUs. Servers connected via 4X 100Gb IB ports (run on DGX-1) . HGX-2 server has dual-socket Xeon Platinum 8168 Processor, 16X V100 GPUs (run on DGX-2).