The Building Blocks of Advanced Multi-GPU Communication
Increasing compute demands in AI and high-performance computing (HPC) are driving a need for multi-GPU systems with seamless connections between GPUs, so they can act together as one gigantic accelerator. But while PCIe is standard, its limited bandwidth often creates a bottleneck. To build the most powerful end-to-end computing platform, a faster, more scalable interconnect is needed.
NVIDIA® NVLink® is a high-speed, direct GPU-to-GPU interconnect. NVIDIA NVSwitch™ takes interconnectivity to the next level by incorporating multiple NVLinks to provide all-to-all GPU communication at full NVLink speed within a single node like NVIDIA HGX™ A100. The combination of NVLink and NVSwitch enabled NVIDIA to efficiently scale AI performance to multiple GPUs and win MLPerf 0.6, the first industry-wide AI benchmark.
NVIDIA A100 PCIe with NVLink GPU-to-GPU connection
NVIDIA A100 with NVLink GPU-to-GPU connections
The NVSwitch topology diagram shows the connection of two GPUs for simplicity. Eight or 16 GPUs connect all-to-all through NVSwitch in the same way.
NVIDIA NVLink technology addresses interconnect issues by providing higher bandwidth, more links, and improved scalability for multi-GPU system configurations. A single NVIDIA A100 Tensor Core GPU supports up to 12 third-generation NVLink connections for a total bandwidth of 600 gigabytes per second (GB/sec)—almost 10X the bandwidth of PCIe Gen 4.
Servers like the NVIDIA DGX™ A100 take advantage of this technology to deliver greater scalability for ultrafast deep learning training. NVLink is also available in A100 PCIe two-GPU configurations.
NVLink in NVIDIA A100 doubles inter-GPU communication bandwidth compared to the previous generation, so researchers can use larger, more sophisticated applications to solve more complex problems.
The rapid adoption of deep learning has driven the need for a faster, more scalable interconnect, as PCIe bandwidth often creates a bottleneck at the multi-GPU-system level. For deep learning workloads to scale, dramatically higher bandwidth and reduced latency are needed.
NVIDIA NVSwitch builds on the advanced communication capability of NVLink to solve this problem. It takes deep learning performance to the next level with a GPU fabric that enables more GPUs in a single server and full-bandwidth connectivity between them. Each GPU has 12 NVLinks per NVSwitch to enable high-speed, all-to-all communication.
NVLink and NVSwitch are essential building blocks of the complete NVIDIA data center solution that incorporates hardware, networking, software, libraries, and optimized AI models and applications from NGC™. The most powerful end-to-end AI and HPC platform, it allows researchers to deliver real-world results and deploy solutions into production, driving unprecedented acceleration at every scale.
NVSwitch is the first on-node switch architecture to support eight to 16 fully connected GPUs in a single server node. The second-generation NVSwitch drives simultaneous communication between all GPU pairs at an incredible 600 GB/s. It supports full all-to-all communication with direct GPU peer-to-peer memory addressing. These 16 GPUs can be used as a single high-performance accelerator with unified memory space and up to 10 petaFLOPS of deep learning compute power.
Experience NVIDIA DGX A100, the universal system for AI infrastructure and the world’s first AI system built on the NVIDIA A100 Tensor Core GPU.