H Company releases Holotron-12B, 12B-parameter computer-use model with hybrid SSM architecture for efficient agent inference

Holotron-12B: Production-Ready Computer-Use Agent

H Company has released Holotron-12B, a 12-billion-parameter multimodal model specifically designed to serve as a policy model for computer-use agents. Unlike general-purpose vision models, Holotron-12B is optimized for agents that must perceive, decide, and act in interactive environments—a fundamental shift from static vision or instruction-following optimization.

Architectural Advantages

The model's distinctive architecture combines a hybrid State-Space Model (SSM) and attention mechanism, built atop NVIDIA's Nemotron-Nano-2 VL foundation. This design delivers critical production advantages:

Reduced Memory Footprint: SSMs eliminate the quadratic computation cost of full attention mechanisms by storing only constant state per layer, versus the per-token-per-layer KV cache overhead of transformers
Long-Context Handling: Optimized for processing multiple images and lengthy interaction histories without performance degradation
High-Throughput Inference: The SSM architecture enables dramatically improved serving efficiency compared to pure transformer-based approaches

Availability and Use Cases

Holotron-12B is now available on Hugging Face. The model targets production deployments where agents need to interact with computer interfaces efficiently, including multi-step task completion with visual perception, decision-making, and action execution.

H Company is part of the NVIDIA Inception Program, reflecting the close collaboration in bringing this architecture to market.

Holotron-12B: Production-Ready Computer-Use Agent

Architectural Advantages

Availability and Use Cases

Tags

Published

Source