Setting a New Bar in MLPerf

NVIDIA training and inference solutions deliver record-setting performance in MLPerf, the leading industry benchmark for AI performance.

What is MLPerf?

MLPerf is a consortium of AI leaders from academia, research labs, and industry whose mission is to “build fair and useful benchmarks” that provide unbiased evaluations of training and inference performance for hardware, software, and services—all conducted under prescribed conditions. To stay on the cutting edge of industry trends, MLPerf continues to evolve, holding new tests at regular intervals and adding new workloads that represent the state of the art in AI.

MLPerf Submission Categories

MLPerf Training v0.7 is the third instantiation for training and consisted of eight different workloads covering a broad diversity of use cases, including vision, language, recommenders, and reinforcement learning.

MLPerf Inference v0.5 tested three different use cases across five different kinds of neural networks. Four of these use cases were for computer vision, and the fifth was for language translation.

Image Classification

Image Classification

Assigns a label from a fixed set of categories to an input image, i.e., applies to computer vision problems such as autonomous vehicles. details.

Object Detection (Lightweight)

Object Detection (Lightweight)

Finds instances of real-world objects such as faces, bicycles, and buildings in images or videos and specifies a bounding box around each. details.

Object Detection (Heavyweight)

Object Detection (Heavyweight)

Detects distinct objects of interest appearing in an image and identifies a pixel mask for each. details.

Translation (Recurrent)

Translation (Recurrent)

Translates text from one language to another using a recurrent neural network (RNN). details.

Translation (Non-recurrent)

Translation (Non-recurrent)

Translates text from one language to another using a feed-forward neural network. details.

Natural Language Processing (NLP)

Natural Language Processing (NLP)

Understands text by using the relationship between different words in a block of text. Allows for question answering, sentence paraphrasing, and many other language-related use cases. details.

Recommendation

Recommendation

Delivers personalized results in user-facing services such as social media or e-commerce websites by understanding interactions between users and service items, like products or ads. details.

Reinforcement Learning

Reinforcement Learning

Evaluates different possible actions to maximize reward using the strategy game Go played on a 19x19 grid. details.

NVIDIA’s MLPerf Benchmark Results

  • Training

    Training

  • Inference

    Inference

The NVIDIA A100 Tensor Core GPU and the NVIDIA DGX SuperPOD set all 16 training performance records, both in per-chip and at-scale workloads for commercially available systems. This breakthrough performance came from the tight integration of hardware, software, and system-level technologies. NVIDIA’s continuous investment in full-stack performance has led to an improvement in throughput over the three MLPerf submissions.

UP TO 4X THE PERFORMANCE IN 1.5 YEAR​S OF MLPERF

NVIDIA's Full Stack Innovation Delivers Continuous Improvements

UP TO 4X THE PERFORMANCE IN 1.5 YEAR​S OF MLPERF

NVIDIA sets all 16 records

For Commercially Available Solutions

  Max Scale Records Per-Accelerator Records
Recommendation (DLRM) 3.33 min 0.44 hrs
NLP (BERT) 0.81 min 6.53 hrs
Reinforcement Learning (MiniGo) 17.07 min 39.96 hrs
Translation (Non-recurrent) (Transformer) 0.62 min 1.05 hrs
Translation (Recurrent) (GNMT) 0.71 min 1.04 hrs
Object Detection (Heavyweight) (Mask R-CNN) 10.46 min 10.95 hrs
Object Detection (Lightweight) (SSD) 0.82 min 1.36 hrs
Image Classification (ResNet-50 v1.5) 0.76 min 5.30 hrs

NVIDIA achieves top results in all four scenarios (Server, Offline, Single-Stream, and Multi-Stream). In addition, we deliver the best per-accelerator performance among commercially available products across all five benchmark tests. These results are a testament not only to NVIDIA's inference performance leadership but the versatility of our inference platform.

Server scenario for data center and edge

NVIDIA Turing Architecture

  NVIDIA T4
(Inferences/Second)
NVIDIA TITAN RTX
(Inferences/Second)
NVIDIA Jetson Xavier
(Max Inferences/Query)
MobileNet-v1 16,884 47,775 302
ResNet-50 v1.5 5,193 15,008 100
SSD MobileNet-v1 7,078 20,501 102
SSD ResNet-34 126 338 2
GNMT 198 645 N/A

The Technology Behind the Results

The complexity of AI demands a tight integration between all aspects of the platform. As demonstrated in MLPerf's benchmarks, the NVIDIA AI platform delivers leadership performance with the world's most advanced GPU, powerful and scalable interconnect technologies, and cutting-edge software—an end-to-end solution that can be deployed in the data center, in the cloud, or at the edge with incredible results.

Optimized Software that Accelerates AI Workflows

Optimized Software that Accelerates AI Workflows

An essential component of NVIDIA’s platform and MLPerf training and inference results, NGC is a hub for GPU-optimized AI, high-performance computing (HPC), and data analytics software that simplifies and accelerates end-to-end workflows. With over 150 enterprise-grade containers, including workloads for Conversational AI and recommender systems, over 100 models, and industry-specific SDKs that can be deployed on-premises, in the cloud, or at the edge, NGC enables data scientists, researchers, and developers to build best-in-class solutions, gather insights, and deliver business value faster than ever before.

Leadership-Class AI Infrastructure

Achieving world-leading results across training and inferences requires infrastructure that’s purpose-built for the world’s most complex AI challenges. The NVIDIA AI platform delivered using the power of the NVIDIA A100 Tensor Core GPU, the NVIDIA T4 Tensor Core GPU, and the scalability and flexibility of NVIDIA interconnect technologies—NVLink®, NVSwitch, and the Mellanox ConnectX-6 VPI. These are at the heart of the NVIDIA DGX A100, the engine behind our benchmark performance.

NVIDIA DGX systems offer the scalability, rapid deployment, and incredible compute power that can enable every enterprise to build leadership-class AI infrastructure.

Leadership-Class AI Infrastructure

Learn more about our data center training and inference product performance.