MLPerf Benchmarks

The NVIDIA AI platform showcases leading performance and versatility in MLPerf Training, Inference, and HPC for the most demanding, real-world AI workloads.

What Is MLPerf?

MLPerf™ is a consortium of AI leaders from academia, research labs, and industry whose mission is to “build fair and useful benchmarks” that provide unbiased evaluations of training and inference performance for hardware, software, and services—all conducted under prescribed conditions. To stay on the cutting edge of industry trends, MLPerf continues to evolve, holding new tests at regular intervals and adding new workloads that represent the state of the art in AI.

Chalmers University is one of the leading research institutions in Sweden, specializing in multiple areas from nanotechnology to climate studies. As we incorporate AI to advance our research endeavors, we find that the MLPerf benchmark provides a transparent apples-to-apples comparison across multiple AI platforms to showcase actual performance in diverse real-world use cases.

— Chalmers University of Technology, Sweden

TSMC is driving the cutting edge of global semiconductor manufacturing, like our latest 5nm node which leads the market in process technology. Innovations like machine learning based lithography and etch modeling dramatically improve our optical proximity correction (OPC) and etch simulation accuracy. To fully realize the potential of machine learning in model training and inference, we are working with the NVIDIA engineering team to port our Maxwell simulation and inverse lithography technology (ILT) engine to GPUs and see very significant speedups. The MLPerf benchmark is an important factor in our decision making.

— Dr. Danping Peng, Director, OPC Department, TSMC, San Jose, CA, USA

Computer vision and imaging are at the core of AI research, driving scientific discovery and readily representing core components of medical care. We've worked closely with NVIDIA to bring innovations like 3DUNet to the healthcare market. Industry-standard MLPerf benchmarks provide relevant performance data to the benefit of IT organizations and developers to get the right solution to accelerate their specific projects and applications.

— Prof. Dr. Klaus Maier-Hein, Head of Medical Image Computing, Deutsches Krebsforschungszentrum (DKFZ, German Cancer Research Center)

As the pre-eminent leader in research and manufacturing, Samsung uses AI to dramatically boost product performance and manufacturing productivity. Productizing these AI advances requires us to have the best computing platform available. The MLPerf benchmark streamlines our selection process by providing us with an open, direct evaluation method to assess uniformly across platforms.

— Samsung Electronics

MLPerf Submission Categories

MLPerf Training v2.1 is the seventh instantiation for training and consists of eight different workloads covering a broad diversity of use cases, including vision, language, recommenders, and reinforcement learning. 

MLPerf Inference v3.0 is the seventh instantiation for inference and tested seven different use cases across seven different kinds of neural networks. Three of these use cases were for computer vision, one was for recommender systems, two were for language processing, and one was for medical imaging.

MLPerf HPC v2.0 is the third iteration for HPC and tested three different scientific computing use cases, including climate atmospheric river identification, cosmology parameter prediction, and quantum molecular modeling.

Image Classification

Image Classification

Assigns a label from a fixed set of categories to an input image, i.e., applies to computer vision problems. details.

Object Detection (Lightweight)

Object Detection (Lightweight)

Finds instances of real-world objects such as faces, bicycles, and buildings in images or videos and specifies a bounding box around each. details.

Object Detection (Heavyweight)

Object Detection (Heavyweight)

Detects distinct objects of interest appearing in an image and identifies a pixel mask for each. details.

Biomedical Image Segmentation

Biomedical Image Segmentation

Performs volumetric segmentation of dense 3D images for medical use cases. details.

Translation (Recurrent)

Translation (Recurrent)

Translates text from one language to another using a recurrent neural network (RNN). details.

Automatic Speech Recognition (ASR)

Automatic Speech Recognition (ASR)

Recognize and transcribe audio in real time. details.

Natural Language Processing (NLP)

Natural Language Processing (NLP)

Understands text by using the relationship between different words in a block of text. Allows for question answering, sentence paraphrasing, and many other language-related use cases. details.

Recommendation

Recommendation

Delivers personalized results in user-facing services such as social media or e-commerce websites by understanding interactions between users and service items, like products or ads. details.

Reinforcement Learning

Reinforcement Learning

Evaluates different possible actions to maximize reward using the strategy game Go played on a 19x19 grid. details.

Climate Atmospheric River Identification Category

Climate Atmospheric River Identification

Identify hurricanes and atmospheric rivers in climate simulation data. details.

Cosmology Parameter Prediction Category

Cosmology Parameter Prediction

Solve a 3D image regression problem on cosmological data. details.

Quantum Molecular Modeling Category

Quantum Molecular Modeling

Predict energies or molecular configurations. details.

NVIDIA’s MLPerf Benchmark Results

  • Training

    Training

  • Inference

    Inference

  • HPC

    HPC

The NVIDIA AI platform delivered leading performance across all MLPerf Training v2.1 tests, both per chip and at scale. This breakthrough performance came from the tight integration of hardware, software, and system-level technologies. NVIDIA’s relentless investment across the entire stack has driven performance improvements with each MLPerf submission. The NVIDIA platform is unmatched in overall performance and versatility, delivering a single platform for training and inference that’s available everywhere—from the data center to the edge to the cloud.

A Nearly 7X Performance Boost in 2.5 Years

NVIDIA's Full-Stack Innovation Fuels Continuous Gains

MLPerf Training Performance Benchmarks

Leading Performance and Versatility at Scale

The NVIDIA Selene AI supercomputer, based on the NVIDIA DGX SuperPOD™ reference architecture, delivered leading performance in the MLPerf Training 2.1 suite at scale, demonstrating the performance and versatility of the full-stack NVIDIA AI platform for all workloads.    

NVIDIA DGX A100 SuperPOD
Benchmark At-Scale (Min) Per-Accelerator (Min)
Recommendation (DLRM) 0.59 (DGX SuperPOD) 12.78 (A100)
NLP (BERT) 0.21 (DGX SuperPOD) 126.95 (A100)
Speech Recognition—Recurrent (RNN-T) 2.15 (DGX SuperPOD) 230.07 (A100)
Object Detection—Heavyweight (Mask R-CNN) 3.09 (DGX SuperPOD) 327.34 (A100)
Object Detection—Lightweight (RetinaNet) 4.25 (DGX SuperPOD) 675.18 (A100)
Image Classification (ResNet-50 v1.5) 0.32 (DGX SuperPOD) 217.82 (A100)
Image Segmentation (3D U-Net) 1.22 (DGX SuperPOD) 170.23 (A100)
Reinforcement Learning (MiniGo) 16.23 (DGX SuperPOD) 2045.4 (A100)

Powered by the NVIDIA H100 Tensor Core GPU, the NVIDIA platform took inference to new heights in MLPerf Inference v3.0, delivering performance leadership across all workloads and scenarios in the data center category. And the NVIDIA L4 Tensor Core GPU delivered over 3X more performance than the prior generation in its MLPerf Inference debut. For edge AI and robotics, NVIDIA Jetson AGX Orin™ continued to deliver leadership system-on-a-chip inference performance.

Offline Scenario for Data Center and Edge (Single GPU)

  NVIDIA H100
(Inferences/Second)
NVIDIA A100
(Inferences/Second)
NVIDIA L4
(Inferences/Second)
NVIDIA Jetson AGX Orin
(Max Inferences/Query)
NVIDIA Jetson Orin NX
(Max Inferences/Query)
DLRM
(Recommender)
745,480 282,771 94,603 N/A* N/A*
BERT
(Natural Language Processing)**
8,007 1,828 631 544 164
ResNet-50 v1.5
(Image Classification)
91,826 40,577 13,158 6,438 2518
RetinaNet
(Object Detection)
1,479 725 179 92 36
RNN-T
(Speech Recognition)
23,106 13,278 3,980 1,170 405
3D U-Net
(Medical Imaging)
7 4 1 0.5 0.2

In MLPerf HPC v2.0, the NVIDIA Selene supercomputer, built on the NVIDIA DGX SuperPOD reference architecture,  demonstrated leading performance across all three workloads spanning climate segmentation, cosmology parameter prediction, and quantum molecular modeling. The NVIDIA AI platform delivered the fastest time to train as measured by the strong scaling metric, as well as the highest throughput as measured by the weak scaling metric. Through full-stack innovation, the NVIDIA AI platform continues to raise the bar to meet the increasing compute demands fueled by the convergence of HPC and AI, delivering up to a 9X performance boost in just two years of MLPerf HPC.

Up to 9X More Performance in Two Years

NVIDIA's Full-Stack Innovation Delivers Continuous Improvements​

MLPerf HPC v2.0 Performance

The Technology Behind the Results

The complexity of AI demands a tight integration between all aspects of the platform. As demonstrated in MLPerf’s benchmarks, the NVIDIA AI platform delivers leadership performance with the world’s most advanced GPU, powerful and scalable interconnect technologies, and cutting-edge software—an end-to-end solution that can be deployed in the data center, in the cloud, or at the edge with amazing results.

Pre-trained models and Optimized Software from NVIDIA NGC

Optimized Software that Accelerates AI Workflows

An essential component of NVIDIA’s platform and MLPerf training and inference results, the NGC catalog is a hub for GPU-optimized AI, high-performance computing (HPC), and data analytics software that simplifies and accelerates end-to-end workflows. With over 150 enterprise-grade containers—including workloads for conversational AI and recommender systems, hundreds of AI models, and industry-specific SDKs that can be deployed on premises, in the cloud, or at the edge—NGC enables data scientists, researchers, and developers to build best-in-class solutions, gather insights, and deliver business value faster than ever before.

Leadership-Class AI Infrastructure

Achieving world-leading results across training and inference requires infrastructure that’s purpose-built for the world’s most complex AI challenges. The NVIDIA AI platform delivered leading performance using the power of the NVIDIA H100 Tensor Core GPU, the NVIDIA A100 Tensor Core GPU, the NVIDIA L4 Tensor Core GPU, and the scalability and flexibility of NVIDIA interconnect technologies—NVIDIA® NVLink®, NVIDIA NVSwitch™, and NVIDIA ConnectX® smart network interface cards (SmartNICs). These are at the heart of the NVIDIA data center platform, the engine behind our benchmark performance.

In addition, NVIDIA DGX systems offer the scalability, rapid deployment, and incredible compute power that enables every enterprise to build leadership-class AI infrastructure. 

Leadership-Class AI Infrastructure

Learn more about our data center training and inference product performance.