Cosmos World Foundation Models Get Major Updates
NVIDIA has released significant updates to its Cosmos world foundation models (WFMs), one year after their initial introduction. The latest versions—Cosmos Transfer 2.5, Cosmos Predict 2.5, and Cosmos Reason 2—advance synthetic data generation and physical AI reasoning capabilities for robotics and autonomous vehicle development.
Cosmos Transfer 2.5: Photorealistic Simulation
Cosmos Transfer 2.5 enables faster, more scalable data augmentation from simulation and 3D spatial inputs. Using a ControlNet architecture, it generates high-fidelity photorealistic video sequences from structural inputs while preserving precise spatial alignment and scene composition.
Key features:
- Accepts diverse inputs: segmentation maps, depth maps, edge maps, human motion keypoints, LiDAR scans, trajectories, HD maps, and 3D bounding boxes
- Generates photorealistic videos with controlled layout, object placement, and motion dynamics
- Supports greater diversity across environments, lighting conditions, and scene variations
Cosmos Predict 2.5: Long-Horizon Scenario Generation
Cosmos Predict 2.5 enhances long-tail scenario generation for video sequences up to 30 seconds, delivering up to 10x higher accuracy when post-trained on proprietary or domain-specific data. The model now supports multiview outputs, custom camera layouts, and alternate policy outputs such as action simulation.
Cosmos Reason 2: Advanced Physical AI Reasoning
Cosmos Reason 2 introduces improved spatiotemporal understanding and advanced chain-of-thought reasoning for complex physical AI tasks. Notable improvements include:
- Object detection with 2D/3D point localization and bounding box coordinates
- Reasoning explanations and semantic labels
- Expanded long-context support up to 256K input tokens for more complex reasoning tasks
Integration with NVIDIA Omniverse
Developers can leverage NVIDIA Omniverse (built on OpenUSD) to create 3D scenes that simulate real-world environments. These simulations serve as ground truth inputs for Cosmos Transfer, which enhances photorealism while varying environment and lighting conditions. The NVIDIA Cosmos Cookbook provides step-by-step workflows and concrete examples for building and deploying these models.