MLPerf Benchmarks

The NVIDIA AI platform achieves world-class performance and versatility in both training and inference, enabled by extreme co-design.

See Our Results

About MLPerf
Benchmarks
Our Results
How We Do It

About MLPerf
Benchmarks
Our Results
How We Do It

What Is MLPerf?

MLPerf™ benchmarks are designed to provide unbiased evaluations of training and inference performance for hardware, software, and services. Developed by MLCommons, a consortium of AI leaders from academia, research labs, and industry, these evaluations are all conducted under prescribed conditions. To stay on the cutting edge of industry trends, MLPerf continues to evolve, holding new tests at regular intervals and adding new workloads that represent the state of the art in AI.

Inside the MLPerf Benchmarks

MLPerf Inference v6.0 measures inference performance across a wide variety of model architectures, including dense and mixture-of-expert (MoE) large language models (LLMs), vision language models, text-to-video models, generative recommenders, and more.

MLPerf Training v5.1 measures the time-to-train models to a specified quality level across various types of models, including LLMs, text-to-image, recommender, graph neural networks, and object detection.

Reasoning LLMs

AI models that generate intermediate “thinking” tokens to enhance response accuracy.

Details

Vision Language Models

Multimodal, generative AI models capable of understanding and processing video, image, and text.

Details

LLMs

Deep learning algorithms trained on large-scale datasets that can recognize, summarize, translate, predict, and generate content for a variety of use cases.

Details

Text-to-Video

Generative AI models that generate video outputs based on text inputs.

Details

Text-to-Image

Generates images based on text prompts.

Details

Recommender

Delivers personalized results in user-facing services such as social media or ecommerce websites by understanding interactions between users and service items, like products or ads.

Details

Graph Neural Network

Uses neural networks designed to work with data structured as graphs.

Details

Speech-to-Text

Converts spoken language into written text.

Details

NVIDIA Blackwell Ultra Delivers up to 50x Better Performance and 35x Lower Cost for Agentic AI

Built to accelerate the next generation of agentic AI, NVIDIA Blackwell Ultra delivers breakthrough inference performance with dramatically lower cost. Cloud providers such as Microsoft, CoreWeave, and Oracle Cloud Infrastructure are deploying NVIDIA GB300 NVL72 systems at scale for low-latency and long-context use cases, such as agentic coding and coding assistants.

This is enabled by deep co-design across NVIDIA Blackwell, NVLink™, and NVLink Switch for scale-out; NVFP4 for low-precision accuracy; and NVIDIA Dynamo and TensorRT™ LLM for speed and flexibility—as well as development with community frameworks SGLang, vLLM, and more.

Explore Key Results

NVIDIA MLPerf Benchmark Results

Training
Inference

The NVIDIA platform achieved the fastest time to train on all seven MLPerf Training v5.1 benchmarks. NVIDIA Blackwell Ultra made its debut, delivering large leaps in large language model pretraining and fine-tuning, enabled by architectural enhancements and breakthrough NVFP4 training methods that increase performance and meet strict MLPerf accuracy requirements. NVIDIA also increased Blackwell Llama 3.1 405B pretraining performance at scale by 2.7x through a combination of twice the scale and large increases in performance per GPU enabled by NVFP4. NVIDIA also set performance records in both newly-added benchmarks—Llama 3.1 8B and FLUX.1—while continuing to hold performance records on existing recommender, object detection, and graph neural network benchmarks.

NVIDIA Blackwell Ultra Delivers Large Leap in MLPerf Training Debut

MLPerf™ Training v5.0 and v5.1 results retrieved from www.mlcommons.org on November 12, 2025, from the following entries: 4.1-0050, 5.0-0014, 5.0-0067, 5.0-0076, 5.1-0058, 5.1-0060. The MLPerf™ name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use is strictly prohibited See www.mlcommons.org for more information.

Annual Rhythm and Extreme Co-Design for Sustained Training Leadership

The NVIDIA platform delivered the fastest time to train on every MLPerf Training v5.1 benchmark, with innovations across chips, systems, and software enabling sustained training performance leadership, as shown on industry-standard, peer-reviewed performance data.

Max-Scale Performance

Benchmark	Time to Train
LLM Pretraining (Llama 3.1 405B)	10 minutes
LLM Pretraining (Llama 3.1 8B)	5.2 minutes
LLM Fine-Tuning (Llama 2 70B LoRA)	0.40 minutes
Image Generation (FLUX.1)	12.5 minutes
Recommender (DLRM-DCNv2)	0.71 minutes
Graph Neural Network (R-GAT)	0.84 minutes
Object Detection (RetinaNet)	1.4 minutes

MLPerf™ Training v5.0 and v5.1 results retrieved from www.mlcommons.org on November 12, 2025, from the following entries: 5.0-0082, 5.1-0002, 5.1-0004, 5.1-0060, 5.1-0070, 5.1-0072. The MLPerf™ name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use is strictly prohibited. See www.mlcommons.org for more information.

NVIDIA Delivers Highest Inference Performance, Unmatched Versatility

NVIDIA Blackwell Ultra GPUs powered the highest-performing submissions across the broadest range of models and scenarios in MLPerf Inference v6.0, and only the NVIDIA platform submitted on every newly added benchmark. Through software optimizations alone, the throughput of the GB300 NVL72 increased by up to 2.7x in just one round, slashing the cost per million tokens. And, for the first time, NVIDIA submitted MLPerf Inference results using 288 Blackwell Ultra GPUs across four GB300 NVL72 systems interconnected with NVIDIA Quantum-X800 InfiniBand—the largest submission scale in the benchmark’s history. This delivered record reasoning inference throughput of 2.5 million tokens per second.

MLPerf Inference v5.1 and v6.0, Closed Division. Results retrieved from www.mlcommons.org on April 1, 2026. NVIDIA platform results from the following entries: 5.1-0072,and 6.0-0082. Per-chip performance derived by dividing total throughput by number of reported chips. Per-chip performance is not a primary metric of MLPerf Inference v5.1 or v6.0. The MLPerf name and logo are registered and unregistered trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use is strictly prohibited. See www.mlcommons.org for more information.

Higher Token Throughput and Lower Token Cost From Software Optimization

MLPerf Inference v5.1 and v6.0, Closed Division. Results retrieved from www.mlcommons.org on April 1, 2026. NVIDIA platform results from the following entries: 5.1-0072, 6.0-0082. Token cost is not an official MLPerf metric. The baseline is the reciprocal of reported token throughput, and February 2026 is derived by dividing the reciprocal of reported token throughput by the baseline. The MLPerf name and logo are registered and unregistered trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use is strictly prohibited. See www.mlcommons.org for more information

NVIDIA GB300 NVL72 and NVIDIA Quantum-X800 Power Largest-Ever MLPerf Inference Submission

Record Scale

288 NVIDIA Blackwell Ultra GPUs

Highest Token Throughput

Up to 2.5 million tokens/second DeepSeek-R1¹

MLPerf Inference v6.0, Closed Division. Results retrieved from www.mlcommons.org on April 1, 2026. NVIDIA platform results from the following entries: 6.0-0076. The MLPerf name and logo are registered and unregistered trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use is strictly prohibited. See www.mlcommons.org for more information.

¹Offline scenario

The Technology Behind the Results

The complexity of AI demands a tight integration between all aspects of the platform. As demonstrated in MLPerf’s benchmarks, the NVIDIA AI platform delivers leadership performance with the world’s most advanced GPU, powerful and scalable interconnect technologies, and cutting-edge software—an end-to-end solution that can be deployed in the data center, in the cloud, or at the edge with amazing results.

Optimized Software That Accelerates AI Workflows

NVIDIA Dynamo is an open source distributed inference-serving framework to deploy models in multi-node environments at AI-factory-scale. It streamlines distributed serving by disaggregating inference, optimizing routing, and extending memory through data caching to cost-effective storage tiers.

Dynamo works by disaggregating (separating) the prefill and decoding phases of LLM inference across different GPUs, allowing for independent optimization and higher throughput. It was featured prominently in the MLPerf Inference v5.1 benchmarks, demonstrating superior performance in Llama 3.1 405B Interactive and DeepSeek-R1 reasoning tests.

Seamlessly Deploy Across Multiple Nodes With Dynamo

Leadership-Class AI Infrastructure

Achieving world-leading results across training and inference requires infrastructure that’s purpose-built for the world’s most complex AI challenges. The NVIDIA AI platform delivered leading performance powered by the NVIDIA Blackwell and Blackwell Ultra platforms, including the NVIDIA GB300 NVL72 and GB200 NVL72 systems, NVLink and NVLink Switch, and Quantum InfiniBand. These are at the heart of AI factories powered by the NVIDIA data center platform, the engine behind our benchmark performance.

In addition, NVIDIA DGX™ systems offer the scalability, rapid deployment, and incredible compute power that enable every enterprise to build leadership-class AI infrastructure.

Learn More About NVIDIA’s AI Factory Solutions