AI Training Solutions

The Full-Stack Platform for Frontier Model Training

From pretraining to post-training, NVIDIA accelerated computing optimizes time to train, cost to train, and training goodput at production scale.

Get Started

Read Series | Performance Benchmarks | For Developers

Overview
Benefits
Products
Use Cases
Resources
Success Stories
FAQs
Next Steps

Overview
Benefits
Products
Use Cases
Resources
Success Stories
FAQs
Next Steps

Get Started

Overview

Leading Model Builders Train on NVIDIA

Frontier open, closed, and industry models are built on NVIDIA. Using extreme co-design across compute, networking, storage, memory, and software, NVIDIA delivers industry-leading performance per watt and continuous optimizations. This accelerates time to train, reduces cost to train, and enables agentic capabilities for model builders.

Whether pretraining a trillion-parameter mixture-of-experts (MoE) model, post-training a reasoning agent with reinforcement learning, or fine-tuning a domain-specific researcher, the NVIDIA accelerated computing platform delivers production-scale training for the world’s leading AI companies.

OpenAI’s New GPT-5.5 Powers Codex on NVIDIA Infrastructure—and NVIDIA Is Already Putting It to Work

OpenAI GPT‑5.5 was co-designed for, trained with, and served on NVIDIA GB200 and GB300 NVL72 systems. Codex, OpenAI’s agentic coding application, is powered by GPT-5.5.

Read the GPT 5.5 Blog Post

Fastest, Largest, Strongest: NVIDIA Blackwell Sweeps MLPerf Training 6.0

In MLPerf Training 6.0, the NVIDIA platform led across every category. This includes: fastest time to train on every benchmark, largest-scale training across 8,192 GPUs using NVIDIA Blackwell NVL72 systems, and the only platform with submissions across all seven benchmarks.

Read MLPerf Training Blog

Benchmarks

Industry-Leading AI Training Performance

The NVIDIA platform excelled in delivering the fastest time to train and the highest performance per GPU in MLPerf Training 6.0 benchmarks. This round, NVIDIA submitted results on both GB200 NVL72 and GB300 NVL72 systems. At the same scale, GB300 NVL72 delivered up to 1.6x faster training than GB200 NVL72. This round added two new mixture-of-experts pretraining workloads, DeepSeek-V3 671B and GPT-OSS-20B, and NVIDIA set performance records on both. On DeepSeek-V3 671B, NVIDIA scaled to 8,192 GPUs using GB200 NVL72 systems, the largest-scale Blackwell-based submission in MLPerf Training to date.

NVIDIA Blackwell Platform Raises the Bar for Performance and Scale

MLPerfTraining v5.0, v5.1 and v6.0 results retrieved from www.mlcommons.org on June 16, 2026. MLPerf GPU scale results from entries 5.0-0004, 5.1-004, 6.0-0001, 6.0-0005, and 6.0-0014. MLPerf Blackwell training comparison from the following entries: 6.0-0006, 6.0-0013, 6.0-0017, 6.0-0018, 6.0-0078, and 5.1-0072. The MLPerf name and logo are registered and unregistered trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.

Benefits

Production-Scale AI Training With High Reliability

The NVIDIA platform delivers faster time to train, high training goodput, and a full software stack that continuously optimizes for ongoing performance improvements.

Frontier-Scale AI Training Performance

NVIDIA Blackwell delivers up to 3x faster training and nearly 2x higher training performance per dollar than the previous generation. NVIDIA wins every MLPerf Training benchmark, from LLM pretraining to fine-tuning to graph neural networks, and continues to extend leadership with each software release.

Fast, Low-Cost Reinforcement Learning Post-Training

NVIDIA NeMo™ RL with end-to-end FP8 precision delivers higher rollout throughput and stable convergence for production-scale reinforcement learning (RL). Train multi-environment, multi-turn agents that learn continuously and deliver the same accuracy as BF16 at a fraction of the cost.

Built for MoE Model Training

Hybrid expert parallelism, NVFP4 precision, and NVL72 scale-up make NVIDIA the platform of choice for the world's most intelligent MoE models, from open OSS models to frontier proprietary systems. Train more parameters faster with the same power envelope.

The Largest AI Training Ecosystem

From NVIDIA libraries such as Megatron-Core and NeMo to industry frameworks, including PyTorch, JAX, vLLM, SGLang, and Ray, every major training, post-training, and inference framework runs natively on NVIDIA. Open weights, open recipes, open contributions: 1.5 million AI models on Hugging Face run on NVIDIA® CUDA®.

Products

Full-Stack AI Training, Built for Every Scale

NVIDIA delivers extreme co-design across compute, networking, storage, memory, and AI software, the only vertically integrated, horizontally open AI training platform in the industry.

AI Compute for Frontier Model Training

NVIDIA Rubin GPU: Designed for frontier agentic model training and inference, including complex, multi-modal workflows.
NVIDIA Blackwell GPU: Delivers up to 3x faster LLM training versus the prior generation with NVFP4 support and record-setting MLPerf Training performance.
NVIDIA Vera CPU: Purpose-built CPU for agentic AI and reinforcement learning, including code execution, tool use, sandboxing, analytics, data pipelines, and orchestration beyond the model.
NVIDIA Grace™ CPU: Arm®-based CPU purpose-built for AI and high-performance computing (HPC), delivering high-bandwidth coherent memory that tightly couples to GPU clusters for scalable training workloads.

Explore NVIDIA Vera Rubin NVL72

Scale Up and Scale Out

NVIDIA NVLink™ and NVLink Switch: High-speed GPU interconnect fabric delivering up to 1.8 terabytes per second (TB/s) of bandwidth, enabling coherent multi-GPU scaling for training the largest frontier models.
NVIDIA ConnectX® SuperNIC: Ultra-low-latency network adapters accelerating distributed AI training across thousands of GPUs with hardware-offloaded remote direct-memory access (RDMA) and intelligent congestion management.
NVIDIA BlueField® DPU: Data processing unit for offloading networking, storage, and security from GPUs, maximizing training goodput and cluster efficiency without compromising accelerator utilization.
NVIDIA Spectrum-X™ Ethernet: NVIDIA's AI-optimized Ethernet platform, delivering congestion-free, predictable bandwidth across large GPU clusters for high-throughput distributed model training.

Explore NVLink

AI Training Libraries and Data Formats

NVFP4: NVIDIA's 4-bit floating-point format that doubles training throughput on NVIDIA Blackwell Ultra while satisfying strict MLPerf accuracy requirements for frontier LLM pretraining.
Megatron-Core: Distributed training library for implementing tensor, pipeline, and sequence parallelism to efficiently scale transformer model training to trillions of parameters.
nvCOMP: GPU-accelerated compression library that reduces input and output (IO) bottlenecks when streaming large training datasets, maximizing GPU utilization and overall cluster throughput.
NVIDIA Collective Communications Library (NCCL): Optimized communications library enabling high-bandwidth all-reduce and gradient synchronization across GPU clusters for scalable distributed model training.

Explore NVFP4

Open Models and Frameworks for Post-Training

NeMo RL and Gym: Open reinforcement learning post-training framework enabling scalable reinforcement learning from human feedback (RLHF) and reward-modeling pipelines to build reasoning and agentic capabilities into frontier models.
NeMo Data Designer: Generates high-quality synthetic training datasets at scale, reducing labeling costs and accelerating data pipelines for domain-specific and frontier model development.
NeMo AutoModel: Automates model architecture search and training configurations, reducing time to train by discovering optimal hyperparameters for large-scale production training runs.
NVIDIA Nemotron™ 3: NVIDIA's open family of enterprise-grade foundation models, optimized for fine-tuning and post-training on NVIDIA infrastructure with full production support.

Explore NeMo

AI Infrastructure Observability

NVIDIA Run:ai: AI workload orchestration platform that maximizes GPU utilization and accelerates training throughput through intelligent scheduling and dynamic cluster resource management.
NVIDIA Mission Control™: Centralized platform providing real-time visibility, diagnostics, and control across AI training clusters to ensure peak reliability and operational efficiency at scale.

Explore Mission Control

Use Cases

AI Model Training: Pretraining to Domain-Specific Agents

Pretraining
Reinforcement Learning Post-Training
Supervised Fine-Tuning

Pretraining

Pretrain AI models to give them foundational capabilities, such as understanding language, recognizing audio, video, or images, or identifying relationships in data, before adapting them for specific tasks. Maximize AI training goodput by enabling system-wide health checks, quickly detecting faults at runtime, and resuming training automatically.

Explore NeMo for Pretraining

Explore Megatron Core for Pretraining

Explore NVIDIA Resiliency Extension for Pretraining

Reinforcement Learning Post-Training

Train reasoning agents and tool-using models with NeMo RL, which includes end-to-end FP8 rollouts, speculative decoding, and multi-environment training. NeMo RL is used by frontier AI labs to advance the era of agentic AI.

Explore NeMo RL for Post-Training

Supervised Fine-Tuning (SFT)

Adapt open foundation models, such as Nemotron, Meta Llama, Mistral, and Google Gemma 4, to domain-specific applications with NeMo AutoModel's recipes and curated datasets. NeMo AutoModel is Hugging Face-native, PyTorch-native, and supported on every NVIDIA cloud and on-premises cluster.

Explore NeMo AutoModel for SFT

Resources

Dive Deeper Into AI Training

Explore NVIDIA technical blogs, on-demand sessions, and developer videos to go deeper on AI training infrastructure, benchmarks, and best practices.

Blogs
Developer Sessions
Developer Videos

View Blogs

View Technical Blogs

View All Sessions

Understanding Reinforcement Learning With Prime Intellect and Unsloth | Nemotron Labs

Learn how RL is transforming agentic intelligence and how to demystify the RL workflow to build practical pipelines, bridge environment gaps, and integrate verifiable feedback loops to build autonomous agents.

Watch Livestream

Reinforcement Learning at Scale: Engineering the Next Generation of Intelligence

Hearn from Applied Compute, Humans&, Periodic Labs, and NVIDIA on scaling RL with emerging RL paradigms, which unlock scientific discovery, multimodal reasoning, collaborative agents, and continual learning.

Watch Panel

Get Started With Unsloth Studio: Generate Data and Fine-Tune LLMs Locally

Get started with Unsloth Studio to easily generate synthetic datasets, fine-tune, test, and export your own AI models locally on your NVIDIA GPU, all without writing a single line of code.

Watch Session

Case Studies

AI Training Success Stories: Built and Trained on NVIDIA

From frontier AI labs to leading enterprises, the world's most ambitious teams use NVIDIA to train smarter models, faster.

Thinking Machines Lab

Thinking Machines Lab is expanding its use of Google Cloud's AI Hypercomputer, running A4X Max virtual machines with NVIDIA GB300 NVL72 GPUs to double training and serving speeds, accelerate frontier model research, and scale continuous AI model training for its Tinker platform.

Read Google Cloud Blog

Higgsfield

Higgsfield scales cinematic, social-first video creation for millions of users by training and deploying its Soul 2.0 generative video models on the NVIDIA accelerated computing platform, using NVIDIA HGX™ B300 and B200 systems to speed AI model training by 30 percent, maximize inference throughput, and deliver high-quality video at lower cost.

Read Higgsfield Case Study

Roche

Roche is scaling NVIDIA-powered AI factories globally, using more than 3,500 NVIDIA Blackwell GPUs and Omniverse™-based digital twins to train biological and diagnostic models, unify R&D and manufacturing, and accelerate discovery, clinical insights, and production on a full-stack AI platform.

Read Roche Blog

Hudson River Trading

Hudson River Trading built an NVIDIA AI factory that unifies data ingestion, AI model training, large-scale simulation, and deployment on energy-efficient NVIDIA Blackwell and Spectrum-X infrastructure. The AI factory accelerates algorithmic trading while tightly coupling researchers to high-performance AI compute.

Read Hudson River Blog

Eli Lilly

Lilly's NVIDIA-powered LillyPod AI factory uses more than 1,000 NVIDIA Blackwell Ultra GPUs on a secure, full-stack NVIDIA DGX SuperPOD™ to train large-scale protein, small-molecule, and genomics foundation models, accelerating drug discovery, development, and manufacturing.

Read Eli Lilly Blog

FAQs About the NVIDIA AI Training Platform

The NVIDIA full-stack platform supports pretraining, reinforcement learning post-training, and supervised fine-tuning at production scale, optimizing time to train, cost to train, and goodput across frontier, open, and enterprise AI models.

NVIDIA’s accelerated computing platform includes NVIDIA Blackwell Ultra and Rubin GPUs, Vera and Grace CPUs, NVLink and NVSwitch scale-up fabric, ConnectX and BlueField DPU SuperNICs, and Spectrum-X networking, all with extreme co-design for training trillion-parameter frontier AI models.

NVIDIA supports PyTorch, JAX, vLLM, SGLang, and Ray natively, alongside proprietary libraries, including Megatron Core, NeMo RL, NeMo AutoModel, and NCCL. Over 1.5 million Hugging Face models run on CUDA.

Leading AI labs, AI natives, startups, and enterprises—including OpenAI, xAI, Arcee AI, Roche, Eli Lilly, and Hudson River Trading—train frontier and domain-specific models on NVIDIA Blackwell-based systems.

NVIDIA Blackwell delivers up to 3x faster training and nearly 2x higher training performance per dollar versus the prior generation, winning every MLPerf Training benchmark across LLM pretraining, fine-tuning, and graph neural networks.

Next Steps

Ready to Train What's Next?

Get hands-on with NVIDIA NeMo and the world's leading AI training infrastructure.

Start Pretraining Start Post-Training

Build AI Infrastructure for Model Training

Train frontier models with validated AI infrastructure solutions and blueprints from NVIDIA partners.

Browse NVIDIA Marketplace

Build Models on NVIDIA Exemplar Clouds

Stay Up to Date on NVIDIA AI Infrastructure

Get the latest research, benchmark results, and customer stories delivered to your inbox.

Stay Informed