NVIDIA releases Open-H-Embodiment healthcare robotics dataset with 778 hours of surgical training data

Open-H-Embodiment: A Foundational Dataset for Physical AI in Healthcare

Healthcare robotics has traditionally focused on perception tasks like classification and segmentation, but real clinical work requires physical interaction with tissues, precise force control, and closed-loop feedback. To address this gap, NVIDIA and a consortium of 35 organizations—including Johns Hopkins, Stanford, UC Berkeley, and leading surgical robotics companies—have released Open-H-Embodiment, the first large-scale, open-source dataset for training embodied AI models in healthcare robotics.

The dataset comprises 778 hours of CC-BY-4.0 licensed training data that spans:

Surgical robotics from commercial platforms (CMR Surgical, Rob Surgical, Tuodao) and research robots (dVRK, Franka, Kuka)
Benchtop exercises including suturing tasks
Real clinical procedures alongside simulation data
Diverse tasks including ultrasound and colonoscopy autonomy

GR00T-H: Vision-Language-Action Model for Surgical Robotics

The first foundation model trained on this data is GR00T-H, a Vision-Language-Action (VLA) policy model built on NVIDIA's Isaac GR00T architecture and trained on approximately 600 hours of Open-H-Embodiment data.

Key innovations in GR00T-H address surgical robotics' unique challenges:

Embodiment Projectors: Learnable MLPs map each robot's specific kinematics to a normalized, shared action space, enabling cross-robot generalization
Relative End-Effector Actions: Uses relative rather than absolute coordinates to overcome kinematic inconsistencies across different surgical platforms
State Dropout During Inference: Proprioceptive inputs are dropped to create learned bias terms for each system, improving real-world performance
Metadata-Aware Prompting: Instrument names and control mappings are injected into task prompts for improved precision

A prototype has demonstrated complete end-to-end suture execution on the SutureBot benchmark, showcasing long-horizon dexterity required for surgical tasks.

Cosmos-H-Surgical-Simulator: World Foundation Model for Simulation

The second model, Cosmos-H-Surgical-Simulator, is a World Foundation Model (WFM) trained to generate physically plausible surgical video conditioned on robot kinematics. This addresses the critical sim-to-real gap in surgical simulation by implicitly learning tissue deformation, tool interaction, and complex visual phenomena like reflections and fluid dynamics directly from training data.

Performance gains are substantial: generating 600 simulation rollouts took 40 minutes versus 2 days with real benchtop methods. The model can also generate synthetic training data to augment underrepresented surgical tasks.

Getting Started

Both models are available open-source through NVIDIA's ecosystem. Developers and researchers can use Open-H-Embodiment and these foundation models to build and evaluate surgical AI systems without costly real-world experimentation.

Open-H-Embodiment: A Foundational Dataset for Physical AI in Healthcare

GR00T-H: Vision-Language-Action Model for Surgical Robotics

Cosmos-H-Surgical-Simulator: World Foundation Model for Simulation

Getting Started

Tags

Published

Source