MLPerf™ benchmarks are designed to provide unbiased evaluations of training and inference performance for hardware, software, and services. Developed by MLCommons, a consortium of AI leaders from academia, research labs, and industry, these evaluations are all conducted under prescribed conditions. To stay on the cutting edge of industry trends, MLPerf continues to evolve, holding new tests at regular intervals and adding new workloads that represent the state of the art in AI.
MLPerf Inference v6.0 measures inference performance across a wide variety of model architectures, including dense and mixture-of-expert (MoE) large language models (LLMs), vision language models, text-to-video models, generative recommenders, and more.
MLPerf Training v6.0 measures the time to train models to a specified quality level across various types of models, including LLMs, text-to-image, and recommenders.
Built to accelerate the next generation of agentic AI, NVIDIA Blackwell Ultra delivers breakthrough inference performance with dramatically lower cost. Cloud providers such as Microsoft, CoreWeave, and Oracle Cloud Infrastructure are deploying NVIDIA GB300 NVL72 systems at scale for low-latency and long-context use cases, such as agentic coding and coding assistants.
This is enabled by deep co-design across NVIDIA Blackwell, NVLink™, and NVLink Switch for scale-out; NVFP4 for low-precision accuracy; and NVIDIA Dynamo and TensorRT™ LLM for speed and flexibility—as well as development with community frameworks SGLang, vLLM, and more.
The NVIDIA platform excelled in delivering the fastest time to train and the highest performance per GPU in MLPerf Training v6 benchmarks. This round, NVIDIA submitted results on both GB200 NVL72 and GB300 NVL72 systems. At the same scale, GB300 NVL72 delivered up to 1.6x faster training than GB200 NVL72. This round added two new MoE pretraining workloads, DeepSeek-V3 671B and GPT-OSS-20B, and NVIDIA set performance records on both. On DeepSeek-V3 671B, NVIDIA scaled to 8,192 GPUs using GB200 NVL72 systems, the largest-scale NVIDIA Blackwell-based submission in MLPerf Training to date.
MLPerfTraining v5.0, v5.1 and v6.0 results retrieved from www.mlcommons.org on June 16, 2026. MLPerf GPU scale results from entries 5.0-0004, 5.1-004, 6.0-0001, 6.0-0005, and 6.0-0014. MLPerf Blackwell training comparison from the following entries: 6.0-0006, 6.0-0013, 6.0-0017, 6.0-0018, 6.0-0078, and 5.1-0072. The MLPerf name and logo are registered and unregistered trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.
The NVIDIA platform delivered the fastest time to train on every MLPerf Training v6 benchmark, with innovations across chips, systems, and software enabling sustained training performance leadership, as shown on industry-standard, peer-reviewed performance data.
| Benchmark | Time to Train |
|---|---|
| DeepSeek-v3 671B | 2.02 minutes |
| GPT-OSS-20B | 7.43 minutes |
| Llama 3.1 405B | 7.07 minutes |
| Llama 2 70B LoRA | 0.40 minutes |
| Llama 3.1 8B | 4.46 minutes |
| FLUX.1 | 17.1 minutes |
| DLRM-dcnv2 | 0.67 minutes |
MLPerf™ Training v6.0 retrieved from www.mlcommons.org on June 16, 2026, from the following entries: 6.0-0001, 6.0-0005, 6.0-0015, 6.0-0062, 6.0-0100 and 6.0-0101. The MLPerf name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use is strictly prohibited. See www.mlcommons.org for more information.
NVIDIA Blackwell Ultra GPUs powered the highest-performing submissions across the broadest range of models and scenarios in MLPerf Inference v6.0, and only the NVIDIA platform submitted on every newly added benchmark. Through software optimizations alone, the throughput of the GB300 NVL72 increased by up to 2.7x in just one round, slashing the cost per million tokens. And, for the first time, NVIDIA submitted MLPerf Inference results using 288 Blackwell Ultra GPUs across four GB300 NVL72 systems interconnected with NVIDIA Quantum-X800 InfiniBand—the largest submission scale in the benchmark’s history. This delivered record reasoning inference throughput of 2.5 million tokens per second.
MLPerf Inference v5.1 and v6.0, Closed Division. Results retrieved from www.mlcommons.org on April 1, 2026. NVIDIA platform results from the following entries: 5.1-0072,and 6.0-0082. Per-chip performance derived by dividing total throughput by number of reported chips. Per-chip performance is not a primary metric of MLPerf Inference v5.1 or v6.0. The MLPerf name and logo are registered and unregistered trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use is strictly prohibited. See www.mlcommons.org for more information.
MLPerf Inference v5.1 and v6.0, Closed Division. Results retrieved from www.mlcommons.org on April 1, 2026. NVIDIA platform results from the following entries: 5.1-0072, 6.0-0082. Token cost is not an official MLPerf metric. The baseline is the reciprocal of reported token throughput, and February 2026 is derived by dividing the reciprocal of reported token throughput by the baseline. The MLPerf name and logo are registered and unregistered trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use is strictly prohibited. See www.mlcommons.org for more information
288 NVIDIA Blackwell Ultra GPUs
Up to 2.5 million tokens/second DeepSeek-R11
MLPerf Inference v6.0, Closed Division. Results retrieved from www.mlcommons.org on April 1, 2026. NVIDIA platform results from the following entries: 6.0-0076. The MLPerf name and logo are registered and unregistered trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use is strictly prohibited. See www.mlcommons.org for more information.
1 Offline scenario
The complexity of AI demands a tight integration between all aspects of the platform. As demonstrated in MLPerf’s benchmarks, the NVIDIA AI platform delivers leadership performance with the world’s most advanced GPU, powerful and scalable interconnect technologies, and cutting-edge software—an end-to-end solution that can be deployed in the data center, in the cloud, or at the edge with amazing results.
NVIDIA Dynamo is an open source distributed inference-serving framework to deploy models in multi-node environments at AI-factory-scale. It streamlines distributed serving by disaggregating inference, optimizing routing, and extending memory through data caching to cost-effective storage tiers.
Dynamo works by disaggregating (separating) the prefill and decoding phases of LLM inference across different GPUs, allowing for independent optimization and higher throughput. It was featured prominently in the MLPerf Inference v5.1 benchmarks, demonstrating superior performance in Llama 3.1 405B Interactive and DeepSeek-R1 reasoning tests.
Achieving world-leading results across training and inference requires infrastructure that’s purpose-built for the world’s most complex AI challenges. The NVIDIA AI platform delivered leading performance powered by the NVIDIA Blackwell and Blackwell Ultra platforms, including the NVIDIA GB300 NVL72 and GB200 NVL72 systems, NVLink and NVLink Switch, NVIDIA Quantum InfiniBand and NVIDIA Spectrum-X Ethernet scale-out networking. These are at the heart of AI factories powered by the NVIDIA data center platform, the engine behind our benchmark performance.
In addition, NVIDIA DGX™ systems offer the scalability, rapid deployment, and incredible compute power that enable every enterprise to build leadership-class AI infrastructure.
Learn more about our data center training and inference performance.