NVIDIA H100 CNX Converged Accelerator

Unprecedented performance for GPU-powered, IO-intensive workloads.

Unified Network and Compute Acceleration

Experience the unprecedented performance of converged acceleration. NVIDIA H100 CNX combines the power of the NVIDIA H100 Tensor Core GPU with the advanced networking capabilities of the NVIDIA® ConnectX®-7 smart network interface card (SmartNIC) to accelerate GPU-powered, input/output (IO)-intensive workloads, such as distributed AI training in the enterprise data center and 5G processing at the edge.

Better I/O Performance

Better I/O Performance

NVIDIA H100 and ConnectX-7 are connected via an integrated PCIe Gen5 switch, which provides a dedicated high-speed path for data transfers between the GPU and network. This eliminates bottlenecks of data going through the host and provides low, predictable latency, which is important for time-sensitive applications such as 5G signal processing.

Balanced, Optimized Design

Balanced, Optimized Design

The integration of a GPU and a SmartNIC into a single device results in a balanced architecture by design. In systems where multiple GPUs are desired, a converged accelerator card enforces the optimal one-to-one ratio of GPU to NIC.  The design also avoids contention on the server’s PCIe bus, so performance scales linearly with additional devices.

Cost Savings

Cost Savings

Because the GPU and SmartNIC are connected directly, customers can leverage mainstream PCIe Gen4 or even Gen3 servers to achieve a level of performance only possible with high-end or purpose-built systems.  Using a single card also saves on power, space, and PCIe device slots, enabling further cost savings by allowing a greater number of accelerators per server.

Application-Ready

Application-Ready

Core acceleration software libraries such as the NVIDIA Collective Communications Library (NCCL) and Unified Communication X (UCX®) automatically make use of the best-performing path for data transfers to GPUs. As a result, existing accelerated multi-node applications can take advantage of the H100 CNX without any modification, delivering immediate benefits.

Faster and More Efficient AI Systems

Distributed Multi-node AI Training

Distributed Multi-node AI Training

When running distributed AI training workloads that involve data transfers between GPUs on different hosts, servers often run into performance, scalability, and density limitations. Typical enterprise servers don’t include a PCIe switch, so the CPU becomes a bottleneck for this traffic, especially for virtual machines. Data transfers are bound by the speed of the host PCIe backplane. Contention can be caused by an imbalance between the number of GPUs and NICs. Although a one-to-one ratio is ideal, the number of PCIe lanes and slots in the server can limit the total number of devices.

The H100 CNX alleviates this problem. With a dedicated path from the network to the GPU, it allows GPUDirect® RDMA to operate at near line speeds. ​The data transfer also occurs at PCIe Gen5 speeds regardless of host PCIe backplane. Scaling up GPU power in a host can be done in a balanced manner, since the ideal GPU-to-NIC ratio is achieved. A server can also be equipped with more acceleration power, because converged accelerators require fewer PCIe lanes and device slots than discrete cards.

Accelerating Edge AI-on-5G

NVIDIA AI-on-5G is made up of the NVIDIA EGX enterprise platform, the NVIDIA Aerial SDK for software-defined 5G virtual radio area networks (vRANs), and enterprise AI frameworks, including SDKs such as NVIDIA Isaac and NVIDIA Metropolis. This platform enables edge devices such as video cameras and industrial sensors and robots to use AI and communicate with servers over 5G.

NVIDIA converged accelerators provide the highest-performing platform for running 5G applications. Because data doesn’t need to go through the host PCIe system, processing latency is greatly reduced. The same converged accelerator used to accelerate 5G signal processing can also be used for edge AI with NVIDIA’s Multi-Instance GPU (MIG) technology, which makes it possible to share a GPU among several different applications. The H100 CNX provides all this functionality in a single enterprise server, without having to deploy more costly purpose-built systems.

NVIDIA AI-on-5G

H100 CNX Specifications

  Specifications
GPU Memory 80GB HBM2e
Memory Bandwidth > 2.0TB/s
MIG instances 7 instances @ 10GB each
3 instances @ 20GB each
2 instances @ 40GB each
Interconnect PCIe Gen5 128GB/s
NVLINK Bridge Two-way
Networking 1x 400Gb/s, 2x 200Gb/s ports, Ethernet or InfiniBand
Form Factor Dual-slot full-height, full length (FHFL)
Max Power 350W

Take a Deeper Dive into the NVIDIA Hopper Architecture