When running distributed AI training workloads that involve data transfers between GPUs on different hosts, servers often run into performance, scalability, and density limitations. Typical enterprise servers don’t include a PCIe switch, so the CPU becomes a bottleneck for this traffic, especially for virtual machines. Data transfers are bound by the speed of the host PCIe backplane. Contention can be caused by an imbalance between the number of GPUs and NICs. Although a one-to-one ratio is ideal, the number of PCIe lanes and slots in the server can limit the total number of devices.
The H100 CNX alleviates this problem. With a dedicated path from the network to the GPU, it allows GPUDirect® RDMA to operate at near line speeds. The data transfer also occurs at PCIe Gen5 speeds regardless of host PCIe backplane. Scaling up GPU power in a host can be done in a balanced manner, since the ideal GPU-to-NIC ratio is achieved. A server can also be equipped with more acceleration power, because converged accelerators require fewer PCIe lanes and device slots than discrete cards.