Open-H-Embodiment: Foundation for Physical AI in Healthcare
NVIDIA has released Open-H-Embodiment, an open-source dataset initiative created with 35 organizations including Johns Hopkins, Stanford, UC Berkeley, and major surgical robotics companies. This represents the first large-scale, standardized dataset designed specifically for training embodied AI systems in healthcare robotics—moving beyond perception-only models to systems that can perform physical tasks with precision.
The Dataset
The dataset comprises 778 hours of CC-BY-4.0 licensed training data, including:
- Real surgical procedures from commercial robots (CMR Surgical, Rob Surgical, Tuodao)
- Research platforms (dVRK, Franka, Kuka)
- Simulation environments, benchtop exercises (e.g., suturing), and clinical procedures
- Vision, force, and kinematics data synchronized across multiple robot embodiments
- Ultrasound and colonoscopy autonomy data alongside surgical robotics
GR00T-H: Surgical Policy Model
GR00T-H is a Vision-Language-Action (VLA) model trained on ~600 hours of the dataset, designed to overcome the unique challenges of surgical robotics:
- Embodiment Projectors: Learnable mappings that translate each robot's specific kinematics to a shared, normalized action space
- State Dropout: Removes proprioceptive input during inference to improve real-world performance
- Relative End-Effector Actions: Uses common action space to handle kinematic inconsistencies across different robots
- Task Metadata Injection: Embeds instrument names and control mappings directly into prompts
Early prototypes have successfully executed end-to-end suturing in the SutureBot benchmark, demonstrating long-horizon dexterity and precision control.
Cosmos-H-Surgical-Simulator: World Foundation Model
Cosmos-H-Surgical-Simulator is a generative world model that predicts realistic surgical video from robotic actions. Key advantages:
- Sim-to-Real Bridging: Learned implicitly from real data, handling complexities like soft tissue deformation, blood, smoke, and light reflections
- Efficiency: Completes 600 simulation rollouts in 40 minutes vs. 2 days on real benchtop systems
- Synthetic Data Generation: Creates realistic video-action pairs to augment sparse datasets
The model was fine-tuned on all 32 Open-H-Embodiment datasets across 9 robot embodiments using 64 A100 GPUs over ~10,000 GPU-hours.
Getting Started
Developers can access the dataset and models immediately:
- Dataset: Hugging Face / GitHub
- GR00T-H Model: Available on Hugging Face
- Cosmos-H: Available for fine-tuning and deployment
The initiative invites community contributions toward version 2, which will focus on reasoning-capable autonomy for surgical robotics.