Holo2-235B-A22B: New SOTA in UI Element Localization
H Company has released Holo2-235B-A22B Preview, their latest and largest UI localization model. Available on Hugging Face, this research release achieves record-breaking performance on major benchmarks:
- 78.5% accuracy on Screenspot-Pro (3-step agentic mode)
- 79.0% accuracy on OSWorld G
- 70.6% accuracy on Screenspot-Pro (single-step baseline)
Agentic Localization for High-Resolution Interfaces
The model introduces a key innovation: agentic localization, which allows iterative refinement of UI element predictions. High-resolution 4K interfaces present a challenge—small UI elements are difficult to pinpoint on large displays. By enabling the model to refine its predictions across multiple steps, Holo2 achieves 10-20% relative accuracy improvements compared to single-pass predictions.
Infrastructure and Deployment
H Company trained Holo2 models at scale using SkyPilot, a unified interface for launching training jobs across multiple cloud providers and Kubernetes clusters. This abstraction simplifies infrastructure management, allowing researchers to focus on model development rather than maintaining deployment configurations.
Access and Usage
Developers can access the model directly on Hugging Face for integration into UI automation, accessibility testing, and GUI grounding tasks. The agentic approach enables more accurate element localization in complex, information-dense interfaces.