NVIDIA Mission Control™ streamlines every aspect of the AI factory—from developer workload scheduling and orchestration to monitoring and autonomous recovery—while empowering platform teams to operate efficiently and scale confidently with fully supported software. It powers NVIDIA Blackwell and NVIDIA Rubin data centers for the newest frontiers of AI, combining real‑time visibility with precise control over performance, power, and cooling with always-on resilience for maximized AI factory ROI. Mission Control lets every enterprise run AI with the efficiency of today’s hyperscalers, accelerating AI token production.
Simplify how AI factories are deployed and operated throughout the entire cluster life cycle.
NVIDIA Mission Control 2.3 is fully integrated across the NVIDIA ecosystem with support for NVIDIA GB200 NVL72 and NVIDIA GB300 NVL72. It features a new unified authentication across services and an added option for a virtualized control plane to improve flexibility and scalability. In addition, Mission Control now offers deployment for air-gapped environments and provides leak detection validation checks. NVIDIA DGX™ systems with NVIDIA Blackwell architectures also now have access to the full scope of Mission Control capabilities, including the autonomous recovery engine suite.
NVIDIA Mission Control includes access to NVIDIA’s latest power optimization innovations in a validated workflow with easy-to-use graphical interfaces for monitoring and managing actions at the cluster, system, and workload level. With Mission Control, administrators can access the domain power service and set cluster-wide, dynamic policies that are job-aware for optimizing power.
Bring agility to AI factory operations with seamless multi-node training and inference orchestration, flexibility to integrate with third-party software, and advanced power and cooling automation.
Gain deep visibility into workload uptime, cluster infrastructure, and facilities with integrated, ready-to-use Grafana dashboards and always-on health checks that reduce alert fatigue and optimize performance.
Redefine modern data center resiliency with an end-to-end autonomous recovery engine that spans from anomaly detection to isolation to fast job restart and automated hardware remediation.
Maximize AI factory output with end-to-end validated workflows, continuous operations for improved revenue potential, and NVIDIA Enterprise Support for a new standard of enterprise AI at scale.
Partners
Configure, validate, and operate AI factories built on NVIDIA Grace™ Blackwell NVL72 from leading system providers who have tested and validated NVIDIA Mission Control for their systems.
NVIDIA delivers all the building blocks for an AI factory. Together, NVIDIA Mission Control and NVIDIA AI Enterprise provide state-of-the-art infrastructure and workload management plus developer tools for production AI, allowing enterprises to harness the transformative power of AI with unprecedented, practical scale.