MLPerf™ benchmarks—developed by MLCommons, a consortium of AI leaders from academia, research labs, and industry—are designed to provide unbiased evaluations of training and inference performance for hardware, software, and services. They’re all conducted under prescribed conditions. To stay on the cutting edge of industry trends, MLPerf continues to evolve, holding new tests at regular intervals and adding new workloads that represent the state of the art in AI.
MLPerf Inference v6.0 measures inference performance across a wide variety of model architectures, including dense and mixture-of-expert (MoE) large language models (LLMs), vision language models, text-to-video models, generative recommenders, and more.
MLPerf Training v5.1 measures the time-to-train models to a specified quality level across various types of models, including LLMs, text-to-image, recommender, graph neural networks, and object detection.
Built to accelerate the next generation of agentic AI, NVIDIA Blackwell Ultra delivers breakthrough inference performance with dramatically lower cost. Cloud providers such as Microsoft, CoreWeave, and Oracle Cloud Infrastructure are deploying NVIDIA GB300 NVL72 systems at scale for low-latency and long-context use cases, such as agentic coding and coding assistants.
This is enabled by deep co-design across NVIDIA Blackwell, NVLink™, and NVLink Switch for scale-out; NVFP4 for low-precision accuracy; and NVIDIA Dynamo and TensorRT™ LLM for speed and flexibility—as well as development with community frameworks SGLang, vLLM, and more.
The NVIDIA platform achieved the fastest time to train on all seven MLPerf Training v5.1 benchmarks. Blackwell Ultra made its debut, delivering large leaps for large language model pretraining and fine-tuning, enabled by architectural enhancements and breakthrough NVFP4 training methods that increase performance and meet strict MLPerf accuracy requirements. NVIDIA also increased Blackwell Llama 3.1 405B pretraining performance at scale by 2.7x through a combination of twice the scale and large increases in performance per GPU enabled by NVFP4. NVIDIA also set performance records on both newly-added benchmarks—Llama 3.1 8B and FLUX.1—while continuing to hold performance records on existing recommender, object detection, and graph neural network benchmarks.
MLPerf™ Training v5.0 and v5.1 results retrieved from www.mlcommons.org on November 12, 2025, from the following entries: 4.1-0050, 5.0-0014, 5.0-0067, 5.0-0076, 5.1-0058, 5.1-0060. The MLPerf™ name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use is strictly prohibited See www.mlcommons.org for more information.
The NVIDIA platform delivered the fastest time to train on every MLPerf Training v5.1 benchmark, with innovations across chips, systems, and software enabling sustained training performance leadership, as shown on industry-standard, peer-reviewed performance data.
| Benchmark | Time to Train |
|---|---|
| LLM Pretraining (Llama 3.1 405B) | 10 minutes |
| LLM Pretraining (Llama 3.1 8B) | 5.2 minutes |
| LLM Fine-Tuning (Llama 2 70B LoRA) | 0.40 minutes |
| Image Generation (FLUX.1) | 12.5 minutes |
| Recommender (DLRM-DCNv2) | 0.71 minutes |
| Graph Neural Network (R-GAT) | 0.84 minutes |
| Object Detection (RetinaNet) | 1.4 minutes |
MLPerf™ Training v5.0 and v5.1 results retrieved from www.mlcommons.org on November 12, 2025, from the following entries: 5.0-0082, 5.1-0002, 5.1-0004, 5.1-0060, 5.1-0070, 5.1-0072. The MLPerf™ name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use is strictly prohibited. See www.mlcommons.org for more information.
Blackwell Ultra GPUs powered the highest-performing submissions across the broadest range of models and scenarios in MLPerf Inference v6.0, and only the NVIDIA platform submitted on every newly added benchmark. Through software optimizations alone, the throughput of the GB300 NVL72 increased by up to 2.7x in just one round. And, for the first time, NVIDIA submitted MLPerf Inference results using 288 Blackwell Ultra GPUs across four GB300 NVL72 systems interconnected with NVIDIA Quantum-X800 InfiniBand—the largest submission scale in the benchmark’s history—to deliver record reasoning inference throughput of 2.5 million tokens per second.
MLPerf Inference v5.1 and v6.0, Closed Division. Results retrieved from www.mlcommons.org on April 1, 2026. NVIDIA platform results from the following entries: 5.1-0072,and 6.0-0082. Per-chip performance derived by dividing total throughput by number of reported chips. Per-chip performance is not a primary metric of MLPerf Inference v5.1 or v6.0. The MLPerf name and logo are registered and unregistered trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use is strictly prohibited. See www.mlcommons.org for more information.
MLPerf Inference v6.0, Closed Division. Results retrieved from www.mlcommons.org on April 1, 2026. NVIDIA platform results from the following entries: 6.0-0076. The MLPerf name and logo are registered and unregistered trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use is strictly prohibited. See www.mlcommons.org for more information.
288 NVIDIA Blackwell Ultra GPUs
Up to 2.5 million tokens/second DeepSeek-R11
MLPerf Inference v6.0, Closed Division. Results retrieved from www.mlcommons.org on April 1, 2026. NVIDIA platform results from the following entries: 6.0-0076. The MLPerf name and logo are registered and unregistered trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use is strictly prohibited. See www.mlcommons.org for more information.
1 Offline scenario
The complexity of AI demands a tight integration between all aspects of the platform. As demonstrated in MLPerf’s benchmarks, the NVIDIA AI platform delivers leadership performance with the world’s most advanced GPU, powerful and scalable interconnect technologies, and cutting-edge software—an end-to-end solution that can be deployed in the data center, in the cloud, or at the edge with amazing results.
NVIDIA Dynamo is an open source distributed inference-serving framework to deploy models in multi-node environments at AI-factory-scale. It streamlines distributed serving by disaggregating inference, optimizing routing, and extending memory through data caching to cost-effective storage tiers.
Dynamo works by disaggregating (separating) the prefill and decoding phases of LLM inference across different GPUs, allowing for independent optimization and higher throughput. It was featured prominently in the MLPerf Inference v5.1 benchmarks, demonstrating superior performance in Llama 3.1 405B Interactive and DeepSeek-R1 reasoning tests.
Achieving world-leading results across training and inference requires infrastructure that’s purpose-built for the world’s most complex AI challenges. The NVIDIA AI platform delivered leading performance powered by the NVIDIA Blackwell and Blackwell Ultra platforms, including the NVIDIA GB300 NVL72 and GB200 NVL72 systems, NVLink and NVLink Switch, and Quantum InfiniBand. These are at the heart of AI factories powered by the NVIDIA data center platform, the engine behind our benchmark performance.
In addition, NVIDIA DGX™ systems offer the scalability, rapid deployment, and incredible compute power that enable every enterprise to build leadership-class AI infrastructure.
Learn more about our data center training and inference performance.