← Back
H Company releases Holo2-235B-A22B, reaching 78.5% accuracy on UI localization benchmarks
· releasemodelfeature · huggingface.co ↗

Holo2-235B-A22B: New SOTA in UI Element Localization

H Company has released Holo2-235B-A22B Preview, their latest and largest UI localization model. Available on Hugging Face, this research release achieves record-breaking performance on major benchmarks:

  • 78.5% accuracy on Screenspot-Pro (3-step agentic mode)
  • 79.0% accuracy on OSWorld G
  • 70.6% accuracy on Screenspot-Pro (single-step baseline)

Agentic Localization for High-Resolution Interfaces

The model introduces a key innovation: agentic localization, which allows iterative refinement of UI element predictions. High-resolution 4K interfaces present a challenge—small UI elements are difficult to pinpoint on large displays. By enabling the model to refine its predictions across multiple steps, Holo2 achieves 10-20% relative accuracy improvements compared to single-pass predictions.

Infrastructure and Deployment

H Company trained Holo2 models at scale using SkyPilot, a unified interface for launching training jobs across multiple cloud providers and Kubernetes clusters. This abstraction simplifies infrastructure management, allowing researchers to focus on model development rather than maintaining deployment configurations.

Access and Usage

Developers can access the model directly on Hugging Face for integration into UI automation, accessibility testing, and GUI grounding tasks. The agentic approach enables more accurate element localization in complex, information-dense interfaces.