Physical AI
Develop world foundation models to advance physical AI.
Overview
NVIDIA Cosmos™ is a platform with open world foundation models (WFMs), guardrails, and data processing libraries to accelerate the development of physical AI for autonomous vehicles (AVs), robots, and video analytics AI agents.
Models
Open and fully customizable pretrained models for world generation and understanding.
Predict future states of dynamic environments for robotics and AI agent planning.
This world generation model produces up to 30 seconds of high-fidelity video from multimodal prompts.
Accelerate synthetic data generation across various environments and lighting conditions.
This multicontrol model transforms 3D or spatial inputs from physical AI simulation frameworks, such as CARLA or NVIDIA Isaac Sim™, into fully controlled high-fidelity video.
Enable robots and vision AI agents to reason like humans.
This multimodal vision language model (VLM) leverages prior knowledge, physics understanding, and common sense to comprehend the real world and interact with it.
Speed up efficient dataset processing and generation.
Quickly filter, annotate, and deduplicate large amounts of sensor data necessary for physical AI development with Cosmos Curator.
You can also instantly query these datasets and retrieve scenarios with NVIDIA Cosmos Dataset Search (CDS).
Use Cases
Robots need vast, diverse training data to effectively perceive and interact with their environments. Cosmos WFMs solve this in multiple ways:
Diverse, high-fidelity sensor data is critical for safely training, testing, and validating autonomous vehicles. But it’s difficult, time-consuming, and costly to scale.
With Cosmos WFMs post-trained on vehicle data, you can:
Enhance automation, safety, and operational efficiency across industrial and urban environments.
With Cosmos Reason, AI agents can analyze, summarize, and interact with real-time or recorded video streams to:
Starting Options
AI Infrastructure
NVIDIA RTX PRO 6000 Blackwell Series Servers accelerate physical AI development for robots, autonomous vehicles, and AI agents across training, synthetic data generation, simulation, and inference.
Unlock peak performance for Cosmos world foundation models on NVIDIA Blackwell GB200 for industrial post-training and inference workloads.
Ecosystem
Model developers from the robotics, autonomous vehicles, and vision AI industries are using Cosmos to accelerate physical AI development.
Resources
Cosmos WFMs are available under an NVIDIA Open Model License for all.
Refer to the new Cosmos Cookbook, which contains step-by-step recipes and post-training scripts to quickly build, customize, and deploy NVIDIA’s Cosmos world foundation models for robotics and autonomous systems.
Yes, you can leverage Cosmos to build from scratch with your preferred foundation model or model architecture. You can start by using Cosmos Curator for video data preprocessing. Then compress and decode your data with Cosmos tokenizer. Once you have processed the data, you can train or fine-tune your model.
Using NVIDIA NIM™ microservices, you can easily integrate your physical AI models into your applications across cloud, data centers, and workstations.
You can also use NVIDIA DGX Cloud to train AI models and deploy them anywhere at scale.
All three are WFMs with distinct roles:
Cosmos Reason can generate new and diverse text prompts from one starting video for Cosmos Predict, or critique and annotate synthetic data from Predict and Transfer.
Omniverse creates realistic 3D simulations of real-world tasks by using different generative APIs, SDKs, and NVIDIA RTX rendering technology.
Developers can input Omniverse simulations as instruction videos to Cosmos Transfer models to generate controllable photoreal synthetic data.
Together, Omniverse provides the simulation environment before and after training, while Cosmos provides the foundation models to generate video data and train physical AI models.
Learn more about NVIDIA Omniverse.