The IO Subsystem for the Modern, GPU-Accelerated Data Center

Maximizing Data Center Storage and Network IO Performance

The new unit of computing is the data center and at its core are NVIDIA GPUs and NVIDIA networks. Accelerated computing requires accelerated input/output (IO) to maximize performance. NVIDIA® Magnum IO™, the IO subsystem of the modern data center, is the architecture for parallel, asynchronous, and intelligent data center IO, maximizing storage and network IO performance for multi-GPU, multi-node acceleration.

Magnum IO Key Benefits

Optimized IO Performance

Bypasses the CPU to enable direct IO among GPU memory, network, and storage, resulting in 10X higher bandwidth.

System Balance and Utilization

Relieves CPU contention to create a more balanced GPU-accelerated system that delivers peak IO bandwidth, resulting in up to 10X fewer CPU cores and 30X lower CPU utilization.

Seamless Integration

Provides optimized implementation for current and future platforms, whether data transfers are fine-grained and latency-sensitive, coarse-grained and bandwidth-sensitive, or collectives.

Magnum IO Optimization Stack

Magnum IO utilizes storage IO, network IO, in-network compute, and IO management to simplify and speed up data movement, access, and management for multi-GPU, multi-node systems. Magnum IO supports NVIDIA CUDA-X™ libraries and makes the best use of a range of NVIDIA GPU and NVIDIA networking hardware topologies to achieve optimal throughput and low latency.

 [Developer Blog] Magnum IO - Accelerating IO in the Modern Data Center

Magnum IO Optimization Stack

Storage IO

In multi-node, multi-GPU systems, slow CPU, single-thread performance is in the critical path of data access from local or remote storage devices. With storage IO acceleration, the GPU bypasses the CPU and system memory, and accesses remote storage via 8X 200 Gb/s NICs, achieving up to 1.6Terabits/s of raw storage bandwidth.

Technologies Included:

Network IO

NVIDIA NVLink® fabric and RDMA-based network IO acceleration reduces IO overhead, bypassing the CPU and enabling direct GPU to GPU data transfers at line rates.

Technologies Included:

In-Network Compute

In-network computing delivers processing within the network, eliminating the latency introduced by traversing to the endpoints, and any hops along the way. Data Processing Units (DPUs) introduce software defined, network hardware-accelerated computing, including pre-configured data processing engines and programmable engines.

Technologies Included:

IO Management

To deliver IO optimizations across compute, network, and storage, users need advanced telemetry and deep troubleshooting techniques. Magnum IO management platforms empower research and industrial data center operators to efficiently provision, monitor, manage, and preventatively maintain the modern data center fabric.

Technologies Included:

Accelerating IO Across Applications

Magnum IO interfaces with NVIDIA CUDA-X high performance computing (HPC) and artificial intelligence (AI) libraries to speed up IO for a broad range of use cases—from AI to scientific visualization.

  • Data Analytics
  • High Performance Computing
  • Deep Learning
Data Analytics

Data Analytics

Today, data science and machine learning (ML) are the world's largest compute segments. Modest improvements in the accuracy of predictive ML models can translate into billions of dollars to the bottom line. To enhance accuracy, the RAPIDS Accelerator library has a built-in accelerated Apache Spark shuffle based on UCX that can be configured to leverage GPU-to-GPU communication and RDMA capabilities. Combined with NVIDIA networking,  Magnum IO software, GPU-accelerated Spark 3.0, and RAPIDS, the NVIDIA data center platform is uniquely positioned to speed up these huge workloads at unprecedented levels of performance and efficiency.

 Adobe Achieves 7X Speedup in Model Training with Spark 3.0 on Databricks for a 90% Cost Savings

 19.5X Faster TPCx-BB Performance, UCX and RAPIDS Data Science Software Surges on NVIDIA DGX™ A100

High Performance Computing

High Performance Computing

HPC is a fundamental pillar of modern science. To unlock next-generation discoveries, scientists rely on simulation to better understand complex molecules for drug discovery, physics for potential new sources of energy, and atmospheric data to better predict and prepare for extreme weather patterns. Magnum IO exposes hardware-level acceleration engines and smart offloads, such as RDMA, NVIDIA GPUDirect®, and NVIDIA Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)™ capabilities, while bolstering the high bandwidth and ultra-low latency of HDR 200Gb/s InfiniBand. This delivers the highest performance and most efficient HPC and ML deployments at any scale.

Largest Interactive Volume Visualization - 150TB NASA Mars Lander Simulation

Deep Learning

Deep Learning

AI models continue to explode in complexity as they take on next-level challenges, such as conversational AI and deep recommender systems. Conversational AI models like NVIDIA’s Megatron-BERT take over 3000X more computing power to train compared to image classification models like ResNet-50. Enabling researchers to continue pushing the envelope of what's possible with AI requires powerful performance and massive scalability. The combination of HDR 200Gb/s InfiniBand networking and the Magnum IO software stack delivers efficient scalability to thousands of GPUs in a single cluster.  

Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems

Sign up for news and updates.