NVIDIA releases Open-H-Embodiment, 778-hour surgical robotics dataset with GR00T-H policy model

Open-H-Embodiment: Foundation for Physical AI in Healthcare

NVIDIA has released Open-H-Embodiment, an open-source dataset initiative created with 35 organizations including Johns Hopkins, Stanford, UC Berkeley, and major surgical robotics companies. This represents the first large-scale, standardized dataset designed specifically for training embodied AI systems in healthcare robotics—moving beyond perception-only models to systems that can perform physical tasks with precision.

The Dataset

The dataset comprises 778 hours of CC-BY-4.0 licensed training data, including:

Real surgical procedures from commercial robots (CMR Surgical, Rob Surgical, Tuodao)
Research platforms (dVRK, Franka, Kuka)
Simulation environments, benchtop exercises (e.g., suturing), and clinical procedures
Vision, force, and kinematics data synchronized across multiple robot embodiments
Ultrasound and colonoscopy autonomy data alongside surgical robotics

GR00T-H: Surgical Policy Model

GR00T-H is a Vision-Language-Action (VLA) model trained on ~600 hours of the dataset, designed to overcome the unique challenges of surgical robotics:

Embodiment Projectors: Learnable mappings that translate each robot's specific kinematics to a shared, normalized action space
State Dropout: Removes proprioceptive input during inference to improve real-world performance
Relative End-Effector Actions: Uses common action space to handle kinematic inconsistencies across different robots
Task Metadata Injection: Embeds instrument names and control mappings directly into prompts

Early prototypes have successfully executed end-to-end suturing in the SutureBot benchmark, demonstrating long-horizon dexterity and precision control.

Cosmos-H-Surgical-Simulator: World Foundation Model

Cosmos-H-Surgical-Simulator is a generative world model that predicts realistic surgical video from robotic actions. Key advantages:

Sim-to-Real Bridging: Learned implicitly from real data, handling complexities like soft tissue deformation, blood, smoke, and light reflections
Efficiency: Completes 600 simulation rollouts in 40 minutes vs. 2 days on real benchtop systems
Synthetic Data Generation: Creates realistic video-action pairs to augment sparse datasets

The model was fine-tuned on all 32 Open-H-Embodiment datasets across 9 robot embodiments using 64 A100 GPUs over ~10,000 GPU-hours.

Getting Started

Developers can access the dataset and models immediately:

Dataset: Hugging Face / GitHub
GR00T-H Model: Available on Hugging Face
Cosmos-H: Available for fine-tuning and deployment

The initiative invites community contributions toward version 2, which will focus on reasoning-capable autonomy for surgical robotics.

Open-H-Embodiment: Foundation for Physical AI in Healthcare

The Dataset

GR00T-H: Surgical Policy Model

Cosmos-H-Surgical-Simulator: World Foundation Model

Getting Started

Tags

Published

Source