This site requires Javascript in order to view all its content. Please enable Javascript in order to access all the functionality of this web site. Here are the instructions how to enable JavaScript in your web browser.

MLPerf Benchmarks

The NVIDIA AI platform achieves world-class performance and versatility in MLPerf Training, Inference, and HPC benchmarks for the most demanding, real-world AI workloads.

See Our Results

What Is MLPerf?

MLPerf™ benchmarks—developed by MLCommons, a consortium of AI leaders from academia, research labs, and industry—are designed to provide unbiased evaluations of training and inference performance for hardware, software, and services. They’re all conducted under prescribed conditions. To stay on the cutting edge of industry trends, MLPerf continues to evolve, holding new tests at regular intervals and adding new workloads that represent the state of the art in AI.

Chalmers University is one of the leading research institutions in Sweden, specializing in multiple areas from nanotechnology to climate studies. As we incorporate AI to advance our research endeavors, we find that the MLPerf benchmark provides a transparent apples-to-apples comparison across multiple AI platforms to showcase actual performance in diverse real-world use cases.

— Chalmers University of Technology, Sweden

TSMC is driving the cutting edge of global semiconductor manufacturing, like our latest 5nm node, which leads the market in process technology. Innovations like machine-learning-based lithography and etch modeling dramatically improve our optical proximity correction (OPC) and etch simulation accuracy. To fully realize the potential of machine learning in model training and inference, we are working with the NVIDIA engineering team to port our Maxwell simulation and inverse lithography technology (ILT) engine to GPUs and see very significant speedups. The MLPerf benchmark is an important factor in our decision-making.

— Dr. Danping Peng, Director, OPC Department, TSMC, San Jose, CA, USA

Computer vision and imaging are at the core of AI research, driving scientific discovery and readily representing core components of medical care. We've worked closely with NVIDIA to bring innovations like 3DUNet to the healthcare market. Industry-standard MLPerf benchmarks provide relevant performance data to the benefit of IT organizations and developers to get the right solution to accelerate their specific projects and applications.

— Prof. Dr. Klaus Maier-Hein, Head of Medical Image Computing, Deutsches Krebsforschungszentrum (DKFZ, German Cancer Research Center)

As the preeminent leader in research and manufacturing, Samsung uses AI to dramatically boost product performance and manufacturing productivity. Productizing these AI advances requires us to have the best computing platform available. The MLPerf benchmark streamlines our selection process by providing us with an open, direct evaluation method to assess uniformly across platforms.

— Samsung Electronics

Inside the MLPerf Benchmarks

MLPerf Inference v4.0 measures inference performance using nine different kinds of neural networks, including LLMs, text-to-image, natural language processing, speech, recommenders, computer vision, and medical image segmentation.

MLPerf Training v3.1 measures the time to train models across nine different use cases, including large language models (LLMs), image generation, computer vision, medical image segmentation, speech recognition, and recommendation.

MLPerf HPC v3.0 measures training performance across four different scientific computing use cases, including climate atmospheric river identification, cosmology parameter prediction, quantum molecular modeling, and protein structure prediction.

Large Language Models

Deep learning algorithms trained on large-scale datasets that can recognize, summarize, translate, predict, and generate content for a breadth of use cases. details.

Text-to-Image

Generates images from text prompts. details.

Recommendation

Delivers personalized results in user-facing services such as social media or ecommerce websites by understanding interactions between users and service items, like products or ads. details.

Object Detection (Lightweight)

Finds instances of real-world objects such as faces, bicycles, and buildings in images or videos and specifies a bounding box around each. details.

Object Detection (Heavyweight)

Detects distinct objects of interest appearing in an image and identifies a pixel mask for each. details.

Image Classification

Assigns a label from a fixed set of categories to an input image, i.e., applies to computer vision problems. details.

Natural Language Processing (NLP)

Understands text by using the relationship between different words in a block of text. Allows for question answering, sentence paraphrasing, and many other language-related use cases. details.

Automatic Speech Recognition (ASR)

Recognizes and transcribes audio in real time. details.

Biomedical Image Segmentation

Performs volumetric segmentation of dense 3D images for medical use cases. details.

Climate Atmospheric River Identification

Identify hurricanes and atmospheric rivers in climate simulation data. details.

Cosmology Parameter Prediction

Solve a 3D image regression problem on cosmological data. details.

Quantum Molecular Modeling

Predict energies or molecular configurations. details.

Protein Structure Prediction

Predicts three-dimensional protein structure based on one-dimensional amino acid connectivity. details.

NVIDIA MLPerf Benchmark Results

Training
Inference
HPC

The NVIDIA accelerated computing platform, powered by NVIDIA H100 Tensor Core GPUs and NVIDIA Quantum-2 InfiniBand networking, shattered large LLM training performance records in MLPerf Training v3.1, powering two submissions on the GPT-3 175B benchmark at an unprecedented scale of 10,752 H100 GPUs with near-linear scaling efficiency. And, on the newly-added text-to-image test based on Stable Diffusion v2, the NVIDIA platform set the bar, achieving the highest performance and demonstrating unrivaled scalability. Through relentless full-stack engineering at data center scale, NVIDIA continues to accelerate AI training performance at the speed of light.

NVIDIA Sets a New Large Language Model Training Record With Largest MLPerf Submission Ever

Benchmark	Per-Accelerator Records (NVIDIA H100 Tensor Core GPU)
Large Language Model (GPT-3 175B)	548 hours (23 days)
Natural Language Processing (BERT)	0.71 hours
Recommendation (DLRM-DCNv2)	0.56 hours
Speech Recognition (RNN-T)	2.2 hours
Image Classification (ResNet-50 v1.5)	1.8 hours
Object Detection, Heavyweight (Mask R-CNN)	2.6 hours
Object Detection, Lightweight (RetinaNet)	4.9 hours
Image Segmentation (3D U-Net)	1.6 hours

Results retrieved from www.mlperf.org on November 8, 2023 from the following entries 3.0-2003, 3.1-2007.

The MLPerf™ name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.

NVIDIA AI Platform Achieves Highest Performance on Every MLPerf Training Test

In addition to breakthrough performance at scale on the cutting-edge large language model and text-to-image tests, NVIDIA also achieved new performance records on the recommender, object detection, medical image segmentation, and natural language processing workloads in MLPerf Training v3.1. With NVIDIA H100 GPUs and NVIDIA Quantum-2, the NVIDIA platform continues to deliver the fastest time to train on every benchmark, demonstrating its unmatched performance, scalability, and versatility to handle the full range of AI workloads.

Max-Scale Performance

Benchmark	Time to Train
GPT-3	3.92 minutes
Stable Diffusion v2	2.47 minutes
DLRM-DCNv2	1.0 minutes
BERT-large	0.12 minutes
ResNet-50 v1.5	0.18 minutes
Mask R-CNN	1.5 minutes
RetinaNet	0.92 minutes
3D U-Net	0.77 minutes
RNN-T	1.7 minutes

MLPerf™ Training v3.1. Results retrieved from www.mlperf.org on November 8, 2023 from entries 3.1-2007, 3.1-2010, 3.1-2050, 3.1-2051, 3.1-2052 3.1-2053, 3.1-2054, 3.1-2056, 3.1-2064. The MLPerf™ name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.

The NVIDIA accelerated computing platform, fueled by the NVIDIA Hopper™ architecture, delivered exceptional performance across every workload in the MLPerf Inference v4.0 data center category. NVIDIA TensorRT™-LLM software nearly tripled GPT-J LLM performance on Hopper GPUs in just six months. The NVIDIA HGX™ H200, powered by NVIDIA H200 Tensor Core GPUs with 141GB HBM3e memory, also made its debut, setting new records on the new Llama 2 70B and Stable Diffusion XL generative AI tests. The NVIDIA GH200 Grace Hopper™ Superchip also demonstrated outstanding performance, while NVIDIA Jetson Orin remained at the forefront in the edge category, running the most diverse set of models including generative AI models like GPT-J and Stable Diffusion XL.

NVIDIA H200 Delivers a Giant Boost for Llama 2 70B

NVIDIA H200 Delivers Bost for LLama 2 70B

Benchmark	Per-Accelerator Records (NVIDIA H100 Tensor Core GPU)
Large Language Model (GPT-3 175B)	548 hours (23 days)
Natural Language Processing (BERT)	0.71 hours
Recommendation (DLRM-DCNv2)	0.56 hours
Speech Recognition (RNN-T)	2.2 hours
Image Classification (ResNet-50 v1.5)	1.8 hours
Object Detection, Heavyweight (Mask R-CNN)	2.6 hours
Object Detection, Lightweight (RetinaNet)	4.9 hours
Image Segmentation (3D U-Net)	1.6 hours

MLPerf Inference v4.0 data center results retrieved from www.mlperf.org on March 27, 2024 from entries 4.0-0062 and 4.0-0068. The MLPerf name and logo are registered and unregistered trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org. For more information.

TensorRT-LLM Nearly Triples Hopper LLM Performance

Benchmark	Per-Accelerator Records (NVIDIA H100 Tensor Core GPU)
Large Language Model (GPT-3 175B)	548 hours (23 days)
Natural Language Processing (BERT)	0.71 hours
Recommendation (DLRM-DCNv2)	0.56 hours
Speech Recognition (RNN-T)	2.2 hours
Image Classification (ResNet-50 v1.5)	1.8 hours
Object Detection, Heavyweight (Mask R-CNN)	2.6 hours
Object Detection, Lightweight (RetinaNet)	4.9 hours
Image Segmentation (3D U-Net)	1.6 hours

MLPerf Inference v3.1 and v4.0 data center results retrieved from www.mlperf.org on March 27, 2024 from entries 3.1-0107 and 4.0-0060, respectively. The MLPerf name and logo are registered and unregistered trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org. For more information.

Offline Scenario for Data Center and Edge (Single GPU)

	NVIDIA GH200 Grace Hopper Superchip (Inferences/Second)	NVIDIA H100 (Inferences/Second)	NVIDIA L4 (Inferences/Second)	NVIDIA Jetson AGX Orin (Max Inferences/Query)	NVIDIA Jetson Orin NX (Max Inferences/Query)
GPT-J (Large Language Model)	13.34	13.29	1.30	N/A	N/A
DLRMv2 (Recommender)	49,002	42,856	3,673	N/A*	N/A*
BERT (Natural Language Processing)**	8,646	7,878	631	554	195
ResNet-50 v1.5 (Image Classification)	93,198	88,526	12,882	6,424	2,641
RetinaNet (Object Detection)	1,849	1,761	226	149	67
RNN-T (Speech Recognition)	25,975	23,307	3,899	1,170	432
3D U-Net (Medical Imaging)	6.8	6.5	1.07	0.51	0.20

* DLRMv2 is not part of the edge category suite.

** BERT 99.9% accuracy target used for H100,A100, and L4. BERT 99% used for Jetson AGX Orin and Jetson Orin NX as that is the highest accuracy target supported in the MLPerf Inference: Edge category for the BERT benchmark

1) MLPerf Inference v3.1 data center results for offline scenario retrieved from www.mlperf.org on September 11, 2023, from entries 3.1-0106, 3.1-0107, 3.1-0108, and 3.1-0110. Per-processor performance is not a primary metric of MLPerf Inference v3.1. Per-processor performance is calculated by dividing the primary metric of total performance by the number of accelerators reported.

2) MLPerf Inference v3.1 edge results for offline scenario retrieved from www.mlperf.org on September 11, 2023, from entries 3.1-0114, 3.1-0116. Per-processor performance is not a primary metric of MLPerf Inference v3.1. Per-processor performance is calculated by dividing the primary metric of total performance by the number of accelerators reported.

The NVIDIA H100 Tensor Core supercharged the NVIDIA platform for HPC and AI in its MLPerf HPC v3.0 debut, enabling up to 16X faster time to train in just three years and delivering the highest performance on all workloads across both time-to-train and throughput metrics. The NVIDIA platform was also the only one to submit results for every MLPerf HPC workload, which span climate segmentation, cosmology parameter prediction, quantum molecular modeling, and the latest addition, protein structure prediction. The unmatched performance and versatility of the NVIDIA platform makes it the instrument of choice to power the next wave of AI-powered scientific discovery.

Up to 16X More Performance in 3 Years

NVIDIA Full-Stack Innovation Fuels Performance Gains

MLPerf™ HPC v3.0 Results retrieved from www.mlperf.org on November 8, 2023. Results retrieved from entries 0.7-406, 0.7-407, 1.0-1115, 1.0-1120, 1.0-1122, 2.0-8005, 2.0-8006, 3.0-8006, 3.0-8007, 3.0-8008. CosmoFlow score in v1.0 is normalized to new RCPs introduced in MLPerf HPC v2.0. Scores for v0.7, v1.0, and v2.0 are adjusted to remove data staging time from the benchmark, consistent with new rules adopted for v3.0 to enable fair comparisons between the submission rounds. The MLPerf™ name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.

MLPerf™ HPC v3.0 Results retrieved from www.mlperf.org on November 8, 2023. Results retrieved from entries 3.0-8004, 3.0-8009, and 3.0-8010. The MLPerf™ name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.

The Technology Behind the Results

The complexity of AI demands a tight integration between all aspects of the platform. As demonstrated in MLPerf’s benchmarks, the NVIDIA AI platform delivers leadership performance with the world’s most advanced GPU, powerful and scalable interconnect technologies, and cutting-edge software—an end-to-end solution that can be deployed in the data center, in the cloud, or at the edge with amazing results.

Pre-trained models and Optimized Software from NVIDIA NGC

Optimized Software that Accelerates AI Workflows

An essential component of NVIDIA’s platform and MLPerf training and inference results, the NGC™ catalog is a hub for GPU-optimized AI, HPC, and data analytics software that simplifies and accelerates end-to-end workflows. With over 150 enterprise-grade containers—including workloads for generative AI, conversational AI, and recommender systems; hundreds of AI models; and industry-specific SDKs that can be deployed on premises, in the cloud, or at the edge—NGC enables data scientists, researchers, and developers to build best-in-class solutions, gather insights, and deliver business value faster than ever.

Visit the NGC Catalog

Leadership-Class AI Infrastructure

Achieving world-leading results across training and inference requires infrastructure that’s purpose-built for the world’s most complex AI challenges. The NVIDIA AI platform delivered leading performance powered by the NVIDIA HGX™ platform, including the NVIDIA HGX H100, NVIDIA HGX H200, as well as the NVIDIA GH200 Grace Hopper Superchip, and the scalability and flexibility of NVIDIA interconnect technologies—NVIDIA NVLink, NVSwitch™, and Quantum-2 InfiniBand. These are at the heart of the NVIDIA data center platform, the engine behind our benchmark performance.

In addition, NVIDIA DGX™ systems offer the scalability, rapid deployment, and incredible compute power that enable every enterprise to build leadership-class AI infrastructure.

Learn More About NVIDIA's Data Center Platform

Learn More About Our Data Center Training and Inference Performance.

View Performance

MLPerf Benchmarks

What Is MLPerf?