Cosmos Foundation Model Updates
NVIDIA has released three major updates to its Cosmos world foundation models (WFMs), advancing capabilities for synthetic data generation and physical AI development:
Cosmos Transfer 2.5: Faster, more scalable data augmentation enabling photorealistic video generation from structural inputs (segmentation maps, depth maps, LiDAR scans, trajectories, and 3D bounding boxes). Supports diverse environments, lighting conditions, and scene variations with fine-grained control over object placement and motion dynamics.
Cosmos Predict 2.5: Enhanced scenario prediction generating realistic future world states from multimodal inputs. Extends sequence generation to 30 seconds and delivers up to 10x higher accuracy when post-trained on proprietary data. Adds multiview outputs, custom camera layouts, and alternate policy outputs such as action simulation.
Cosmos Reason 2: Advanced physical AI reasoning with improved spatiotemporal understanding and timestamp precision. Adds object detection with 2D/3D localization and bounding box coordinates, reasoning explanations, and expanded long-context support up to 256K input tokens.
Key Capabilities and Use Cases
Cosmos Transfer uses ControlNet architecture to generate high-fidelity world scenes from structured inputs while preserving pretrained knowledge. It enables controllable synthetic data generation grounded in physics, addressing the challenge of collecting massive real-world datasets for training autonomous systems like humanoids and self-driving vehicles.
Cosmos Predict enhances the ability to generate realistic future world states, with improvements in long-tail scenario accuracy critical for testing edge cases and rare driving or robotics scenarios.
Cosmos Reason advances spatiotemporal reasoning for complex tasks including object localization, motion prediction, and context-aware decision-making, essential for physical AI applications that must understand and reason about dynamic environments.
Developer Access
Developers can access these models through the NVIDIA Cosmos Cookbook, which provides step-by-step workflows, technical recipes, and concrete examples for building, adapting, and deploying Cosmos WFMs. Integration with NVIDIA Omniverse (built on OpenUSD) enables developers to create 3D simulation environments that serve as ground truth inputs for Cosmos Transfer, accelerating the pipeline from simulation to photorealistic synthetic training data.