New Model Release: Holotron-12B
H Company has released Holotron-12B, a multimodal computer-use model optimized for autonomous agents that need to perceive, decide, and act in interactive environments. The model is now available on Hugging Face and built as part of a collaboration between H Company research labs.
Architecture and Performance
Holotron-12B is post-trained from NVIDIA's open-source Nemotron-Nano-2 VL model using H Company's proprietary data mixture. The key technical innovation is its hybrid State-Space Model (SSM) and attention architecture, which delivers significant advantages for production deployments:
- Efficient Long-Context Inference: Unlike purely transformer-based models, the SSM design avoids the quadratic computation cost of full attention mechanisms, particularly beneficial when handling multiple images and lengthy interaction histories
- Reduced Memory Footprint: SSMs store only constant state per layer per generated sequence, dramatically reducing KV cache requirements compared to vanilla attention mechanisms
- High-Throughput Serving: The architecture is optimized for scalable production inference, making it practical for deployed agent systems
Use Cases and Design Philosophy
Rather than optimizing primarily for static vision or instruction-following like most multimodal models, Holotron-12B is specifically designed to serve as a policy model for computer-use agents. This fundamental difference shapes every aspect of the model's training and architecture.
The model handles long contexts with multiple images efficiently while maintaining strong performance on agent benchmarks, making it suitable for real-world deployment scenarios where throughput and latency matter.
Availability
Holotron-12B is now available on Hugging Face. H Company is part of the NVIDIA Inception Program.