MLPerf Benchmarks

The NVIDIA AI platform achieves world-class performance and versatility in both training and inference, enabled by extreme co-design.

What Is MLPerf?

MLPerf™ benchmarks are designed to provide unbiased evaluations of training and inference performance for hardware, software, and services. Developed by MLCommons, a consortium of AI leaders from academia, research labs, and industry, these evaluations are all conducted under prescribed conditions. To stay on the cutting edge of industry trends, MLPerf continues to evolve, holding new tests at regular intervals and adding new workloads that represent the state of the art in AI.

Inside the MLPerf Benchmarks

MLPerf Inference v6.0 measures inference performance across a wide variety of model architectures, including dense and mixture-of-expert (MoE) large language models (LLMs), vision language models, text-to-video models, generative recommenders, and more.

MLPerf Training v6.0 measures the time to train models to a specified quality level across various types of models, including LLMs, text-to-image, and recommenders.

Reasoning LLMs

AI models that generate intermediate “thinking” tokens to enhance response accuracy.

Details

Vision Language Models

Multimodal, generative AI models capable of understanding and processing video, image, and text.

Details

LLMs

Deep learning algorithms trained on large-scale datasets that can recognize, summarize, translate, predict, and generate content for a variety of use cases.

Details

Text-to-Video

Generative AI models that generate video outputs based on text inputs.

Details

Text-to-Image

Generates images based on text prompts.

Details

Recommender

Delivers personalized results in user-facing services such as social media or ecommerce websites by understanding interactions between users and service items, like products or ads.

Details

Graph Neural Network

Uses neural networks designed to work with data structured as graphs.

Details

Speech-to-Text

Converts spoken language into written text.

Details

NVIDIA Blackwell Ultra Delivers up to 50x Better Performance and 35x Lower Cost for Agentic AI

Built to accelerate the next generation of agentic AI, NVIDIA Blackwell Ultra delivers breakthrough inference performance with dramatically lower cost. Cloud providers such as Microsoft, CoreWeave, and Oracle Cloud Infrastructure are deploying NVIDIA GB300 NVL72 systems at scale for low-latency and long-context use cases, such as agentic coding and coding assistants.

This is enabled by deep co-design across NVIDIA Blackwell, NVLink™, and NVLink Switch for scale-out; NVFP4 for low-precision accuracy; and NVIDIA Dynamo and TensorRT™ LLM for speed and flexibility—as well as development with community frameworks SGLang, vLLM, and more.

NVIDIA MLPerf Benchmark Results

The NVIDIA platform excelled in delivering the fastest time to train and the highest performance per GPU in MLPerf Training v6 benchmarks. This round, NVIDIA submitted results on both GB200 NVL72 and GB300 NVL72 systems. At the same scale, GB300 NVL72 delivered up to 1.6x faster training than GB200 NVL72. This round added two new MoE pretraining workloads, DeepSeek-V3 671B and GPT-OSS-20B, and NVIDIA set performance records on both. On DeepSeek-V3 671B, NVIDIA scaled to 8,192 GPUs using GB200 NVL72 systems, the largest-scale NVIDIA Blackwell-based submission in MLPerf Training to date.

NVIDIA Blackwell Platform Raises the Bar for Performance and Scale

MLPerfTraining v5.0, v5.1 and v6.0 results retrieved from www.mlcommons.org on June 16, 2026. MLPerf GPU scale results from entries 5.0-0004, 5.1-004, 6.0-0001, 6.0-0005, and 6.0-0014. MLPerf Blackwell training comparison from the following entries: 6.0-0006, 6.0-0013, 6.0-0017, 6.0-0018, 6.0-0078, and 5.1-0072. The MLPerf name and logo are registered and unregistered trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.

Annual Rhythm and Extreme Co-Design for Sustained Training Leadership

The NVIDIA platform delivered the fastest time to train on every MLPerf Training v6 benchmark, with innovations across chips, systems, and software enabling sustained training performance leadership, as shown on industry-standard, peer-reviewed performance data.

Max-Scale Performance

Benchmark Time to Train
DeepSeek-v3 671B 2.02 minutes
GPT-OSS-20B 7.43 minutes
Llama 3.1 405B 7.07 minutes
Llama 2 70B LoRA 0.40 minutes
Llama 3.1 8B 4.46 minutes
FLUX.1 17.1 minutes
DLRM-dcnv2 0.67 minutes

MLPerf™ Training v6.0 retrieved from www.mlcommons.org on June 16, 2026, from the following entries: 6.0-0001, 6.0-0005, 6.0-0015, 6.0-0062, 6.0-0100 and 6.0-0101. The MLPerf name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use is strictly prohibited. See www.mlcommons.org for more information.

NVIDIA Delivers Highest Inference Performance, Unmatched Versatility

NVIDIA Blackwell Ultra GPUs powered the highest-performing submissions across the broadest range of models and scenarios in MLPerf Inference v6.0, and only the NVIDIA platform submitted on every newly added benchmark. Through software optimizations alone, the throughput of the GB300 NVL72 increased by up to 2.7x in just one round, slashing the cost per million tokens. And, for the first time, NVIDIA submitted MLPerf Inference results using 288 Blackwell Ultra GPUs across four GB300 NVL72 systems interconnected with NVIDIA Quantum-X800 InfiniBand—the largest submission scale in the benchmark’s history. This delivered record reasoning inference throughput of 2.5 million tokens per second.

MLPerf Inference v5.1 and v6.0, Closed Division. Results retrieved from www.mlcommons.org on April 1, 2026. NVIDIA platform results from the following entries: 5.1-0072,and 6.0-0082. Per-chip performance derived by dividing total throughput by number of reported chips. Per-chip performance is not a primary metric of MLPerf Inference v5.1 or v6.0. The MLPerf name and logo are registered and unregistered trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use is strictly prohibited. See www.mlcommons.org for more information.

Higher Token Throughput and Lower Token Cost From Software Optimization

MLPerf Inference v5.1 and v6.0, Closed Division. Results retrieved from www.mlcommons.org on April 1, 2026. NVIDIA platform results from the following entries: 5.1-0072, 6.0-0082. Token cost is not an official MLPerf metric. The baseline is the reciprocal of reported token throughput, and February 2026 is derived by dividing the reciprocal of reported token throughput by the baseline. The MLPerf name and logo are registered and unregistered trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use is strictly prohibited. See www.mlcommons.org for more information

NVIDIA GB300 NVL72 and NVIDIA Quantum-X800 InfiniBand Power Largest-Ever MLPerf Inference Submission

Record Scale

288 NVIDIA Blackwell Ultra GPUs 

Highest Token Throughput

Up to 2.5 million tokens/second DeepSeek-R11

MLPerf Inference v6.0, Closed Division. Results retrieved from www.mlcommons.org on April 1, 2026. NVIDIA platform results from the following entries: 6.0-0076. The MLPerf name and logo are registered and unregistered trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use is strictly prohibited. See www.mlcommons.org for more information.

1 Offline scenario

The Technology Behind the Results

The complexity of AI demands a tight integration between all aspects of the platform. As demonstrated in MLPerf’s benchmarks, the NVIDIA AI platform delivers leadership performance with the world’s most advanced GPU, powerful and scalable interconnect technologies, and cutting-edge software—an end-to-end solution that can be deployed in the data center, in the cloud, or at the edge with amazing results.

Optimized Software That Accelerates AI Workflows

NVIDIA Dynamo is an open source distributed inference-serving framework to deploy models in multi-node environments at AI-factory-scale. It streamlines distributed serving by disaggregating inference, optimizing routing, and extending memory through data caching to cost-effective storage tiers.

Dynamo works by disaggregating (separating) the prefill and decoding phases of LLM inference across different GPUs, allowing for independent optimization and higher throughput. It was featured prominently in the MLPerf Inference v5.1 benchmarks, demonstrating superior performance in Llama 3.1 405B Interactive and DeepSeek-R1 reasoning tests.

Leadership-Class AI Infrastructure

Achieving world-leading results across training and inference requires infrastructure that’s purpose-built for the world’s most complex AI challenges. The NVIDIA AI platform delivered leading performance powered by the NVIDIA Blackwell and Blackwell Ultra platforms, including the NVIDIA GB300 NVL72 and GB200 NVL72 systems, NVLink and NVLink Switch, NVIDIA Quantum InfiniBand and NVIDIA Spectrum-X Ethernet scale-out networking. These are at the heart of AI factories powered by the NVIDIA data center platform, the engine behind our benchmark performance.

In addition, NVIDIA DGX™ systems offer the scalability, rapid deployment, and incredible compute power that enable every enterprise to build leadership-class AI infrastructure. 

Learn more about our data center training and inference performance.

Reasoning LLMs

MLPerf Inference uses: 

DeepSeek-R1 with samples sourced from the AIME, MATH500, GPQA Diamond, MMLU-Pro, and LiveCodeBench datasets.

GPT-OSS-120B with samples from the AIME 2024, LivecodeBench v6, and GPQA Diamond datasets.

Vision Language Model

MLPerf Inference uses the Qwen3-VL-235B-A22B-Instruct model with the Shopify Product Catalog dataset.

LLMs

MLPerf Inference uses:

 Llama 3.1 405B with samples sourced from LongBench, LongDataCollection, RULER, and GovReport summary. Llama 2 70B uses OpenOrca. Llama 3.1 8B uses CNN/DailyMail. Mixtral 8x7B with samples sourced from OpenOrca, GSM8K, and MBXP datasets.

MLPerf Training uses:

The Llama 3.1 generative language model with 405 billion parameters and a sequence length of 8,192 for the LLM pretraining workload with the c4 (v3.0.1) dataset. For the LLM fine-tuning test, it uses the Llama 2 70B model with the GovReport dataset with sequence lengths of 8,192. Llama 3.1 8B also uses the C4 dataset with sequence lengths of 8,192..

Text-to-Video

MLPerf Inference uses Wan-2.2-T2V-A14B with the VBench dataset.

Text-to-Image

MLPerf Training uses the FLUX.1 text-to-image model trained on the CC12M dataset with the COCO 2014 dataset for eval.

Recommender

MLPerf Inference uses DLRMv3 with a Synthetic Streaming 100B dataset.

MLPerf Training and Inference use the Deep Learning Recommendation Model v2 (DLRMv2) that employs DCNv2 cross-layer and a multi-hot dataset synthesized from the Criteo dataset.

Graph Neural Network

MLPerf Inference uses the Illinois Graph Benchmark (IGB) heterogeneous dataset.

MLPerf Training uses R-GAT with the Illinois Graph Benchmark (IGB) heterogeneous dataset.

Speech-to-Text

MLPerf Inference uses Whisper-Large-V3 with the LibriSpeech dataset.

Server

4X

 

Offline

3.7X

 

AI Superchip

208B Transistors

2nd Gen Transformer Engine

FP4/FP6 Tensor Core

5th Generation NVLINK

Scales to 576 GPUs

RAS Engine

100% In-System Self-Test

Secure AI

Full Performance Encryption and TEE

Decompression Engine

800 GB/Sec