AI Training Solutions
From pretraining to post-training, NVIDIA accelerated computing optimizes time to train, cost to train, and training goodput at production scale.
Overview
Frontier open, closed, and industry models are built on NVIDIA. Using extreme co-design across compute, networking, storage, memory, and software, NVIDIA delivers industry-leading performance per watt and continuous optimizations. This accelerates time to train, reduces cost to train, and enables agentic capabilities for model builders.
Whether pretraining a trillion-parameter mixture-of-experts (MoE) model, post-training a reasoning agent with reinforcement learning, or fine-tuning a domain-specific researcher, the NVIDIA accelerated computing platform delivers production-scale training for the world’s leading AI companies.
Benchmarks
The NVIDIA platform excelled in delivering the fastest time to train and the highest performance per GPU in MLPerf Training 6.0 benchmarks. This round, NVIDIA submitted results on both GB200 NVL72 and GB300 NVL72 systems. At the same scale, GB300 NVL72 delivered up to 1.6x faster training than GB200 NVL72. This round added two new mixture-of-experts pretraining workloads, DeepSeek-V3 671B and GPT-OSS-20B, and NVIDIA set performance records on both. On DeepSeek-V3 671B, NVIDIA scaled to 8,192 GPUs using GB200 NVL72 systems, the largest-scale Blackwell-based submission in MLPerf Training to date.
MLPerfTraining v5.0, v5.1 and v6.0 results retrieved from www.mlcommons.org on June 16, 2026. MLPerf GPU scale results from entries 5.0-0004, 5.1-004, 6.0-0001, 6.0-0005, and 6.0-0014. MLPerf Blackwell training comparison from the following entries: 6.0-0006, 6.0-0013, 6.0-0017, 6.0-0018, 6.0-0078, and 5.1-0072. The MLPerf name and logo are registered and unregistered trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.
Benefits
The NVIDIA platform delivers faster time to train, high training goodput, and a full software stack that continuously optimizes for ongoing performance improvements.
NVIDIA Blackwell delivers up to 3x faster training and nearly 2x higher training performance per dollar than the previous generation. NVIDIA wins every MLPerf Training benchmark, from LLM pretraining to fine-tuning to graph neural networks, and continues to extend leadership with each software release.
NVIDIA NeMo™ RL with end-to-end FP8 precision delivers higher rollout throughput and stable convergence for production-scale reinforcement learning (RL). Train multi-environment, multi-turn agents that learn continuously and deliver the same accuracy as BF16 at a fraction of the cost.
Hybrid expert parallelism, NVFP4 precision, and NVL72 scale-up make NVIDIA the platform of choice for the world's most intelligent MoE models, from open OSS models to frontier proprietary systems. Train more parameters faster with the same power envelope.
From NVIDIA libraries such as Megatron-Core and NeMo to industry frameworks, including PyTorch, JAX, vLLM, SGLang, and Ray, every major training, post-training, and inference framework runs natively on NVIDIA. Open weights, open recipes, open contributions: 1.5 million AI models on Hugging Face run on NVIDIA® CUDA®.
Products
NVIDIA delivers extreme co-design across compute, networking, storage, memory, and AI software, the only vertically integrated, horizontally open AI training platform in the industry.
Use Cases
Pretrain AI models to give them foundational capabilities, such as understanding language, recognizing audio, video, or images, or identifying relationships in data, before adapting them for specific tasks. Maximize AI training goodput by enabling system-wide health checks, quickly detecting faults at runtime, and resuming training automatically.
Train reasoning agents and tool-using models with NeMo RL, which includes end-to-end FP8 rollouts, speculative decoding, and multi-environment training. NeMo RL is used by frontier AI labs to advance the era of agentic AI.
Adapt open foundation models, such as Nemotron, Meta Llama, Mistral, and Google Gemma 4, to domain-specific applications with NeMo AutoModel's recipes and curated datasets. NeMo AutoModel is Hugging Face-native, PyTorch-native, and supported on every NVIDIA cloud and on-premises cluster.
Resources
Explore NVIDIA technical blogs, on-demand sessions, and developer videos to go deeper on AI training infrastructure, benchmarks, and best practices.
Case Studies
From frontier AI labs to leading enterprises, the world's most ambitious teams use NVIDIA to train smarter models, faster.
The NVIDIA full-stack platform supports pretraining, reinforcement learning post-training, and supervised fine-tuning at production scale, optimizing time to train, cost to train, and goodput across frontier, open, and enterprise AI models.
NVIDIA’s accelerated computing platform includes NVIDIA Blackwell Ultra and Rubin GPUs, Vera and Grace CPUs, NVLink and NVSwitch scale-up fabric, ConnectX and BlueField DPU SuperNICs, and Spectrum-X networking, all with extreme co-design for training trillion-parameter frontier AI models.
NVIDIA supports PyTorch, JAX, vLLM, SGLang, and Ray natively, alongside proprietary libraries, including Megatron Core, NeMo RL, NeMo AutoModel, and NCCL. Over 1.5 million Hugging Face models run on CUDA.
Leading AI labs, AI natives, startups, and enterprises—including OpenAI, xAI, Arcee AI, Roche, Eli Lilly, and Hudson River Trading—train frontier and domain-specific models on NVIDIA Blackwell-based systems.
NVIDIA Blackwell delivers up to 3x faster training and nearly 2x higher training performance per dollar versus the prior generation, winning every MLPerf Training benchmark across LLM pretraining, fine-tuning, and graph neural networks.
Train frontier models with validated AI infrastructure solutions and blueprints from NVIDIA partners.
Get the latest research, benchmark results, and customer stories delivered to your inbox.