Ollama v0.15.5 adds two new models and improves agentic coding support
New Models
Ollama v0.15.5 introduces two new models to the library:
- Qwen3-Coder-Next: A coding-focused language model from Alibaba's Qwen team, optimized for agentic coding workflows and local development environments.
- GLM-OCR: A multimodal OCR model for complex document understanding, built on the GLM-V encoder–decoder architecture for advanced text and layout recognition.
Improvements to Agentic Features
The ollama launch command receives significant enhancements for agent-based workflows:
- Sub-agent support:
ollama launchcan now spawn and manage sub-agents for planning, deep research, and similar multi-step tasks. - Flexible arguments: Arguments can now be passed through
ollama launch, e.g.,ollama launch claude -- --resume. - Model-specific context tuning: Context limits are automatically set for specific models (e.g.,
ollama launch opencode).
System Improvements
- Automatic context length tuning: Ollama now defaults context lengths based on available VRAM:
- Less than 24 GiB: 4,096 tokens
- 24–48 GiB: 32,768 tokens
- 48 GiB or more: 262,144 tokens
- Simplified authentication:
ollama signinnow opens a browser window directly to the connection page. - MLX engine expansion: Added support for GLM-4.7-Flash on the experimental MLX engine.
Bug Fixes
- Fixed an off-by-one error when using the
num_predictAPI parameter. - Resolved issue where tokens from previous sequences would be incorrectly returned when hitting
num_predictlimits.