NVIDIA Cosmos World Foundation Models Updated
NVIDIA has announced significant updates to its Cosmos world foundation models platform, one year after their initial introduction. The updates—Cosmos Transfer 2.5, Cosmos Predict 2.5, and Cosmos Reason 2—aim to accelerate synthetic data generation and physical AI development for robotics and autonomous vehicle applications.
Cosmos Transfer 2.5: Photorealistic Synthetic Data
Cosmos Transfer 2.5 enables scalable generation of photorealistic videos from structured inputs like segmentation maps, depth maps, LiDAR scans, and 3D bounding boxes. Using a ControlNet architecture, it preserves pretrained knowledge while maintaining precise spatial alignment and scene composition. Key improvements include:
- Faster and more scalable data augmentation from simulation and 3D spatial inputs
- Greater diversity across environments, lighting conditions, and scene variations
- Direct integration with NVIDIA Omniverse for ground truth video inputs
Cosmos Predict 2.5: Enhanced Scenario Generation
Cosmos Predict 2.5 improves upon existing capabilities with better long-tail scenario generation for sequences up to 30 seconds. Notable enhancements:
- Up to 10x higher accuracy when post-trained on proprietary or domain-specific data
- Support for multiview outputs and custom camera layouts
- Alternate policy outputs including action simulation capabilities
Cosmos Reason 2: Advanced Physical AI Reasoning
Cosmos Reason 2 introduces improved spatiotemporal understanding and advanced chain-of-thought reasoning for complex physical AI tasks:
- Object detection with 2D/3D point localization and bounding box coordinates
- Reasoning explanations and labels for interpretability
- Expanded long-context support up to 256K input tokens
- Enhanced timestamp precision for temporal reasoning
Developer Integration
These models are available through GitHub repositories and can be integrated with NVIDIA Omniverse for building, adapting, and deploying world foundation models. The updates address the challenge of collecting massive real-world datasets by enabling synthetic data generation that maintains physics-awareness and real-world applicability.