Ollama v0.17.5 adds Qwen3.5 small models, fixes GPU/CPU split crashes

New Models

Ollama now supports the Qwen3.5 small model series with four size options: 0.8B, 2B, 4B, and 9B parameters. These smaller variants provide more efficient alternatives for resource-constrained environments while maintaining the capabilities of the larger Qwen3.5 lineup.

Bug Fixes and Improvements

This patch release focuses on stability and performance improvements:

GPU/CPU split crash: Fixed a critical crash that occurred when Qwen3.5 models were split between GPU and CPU memory
Token repetition: Resolved an issue where Qwen3.5 models would repeat themselves due to missing presence penalty. Note that users may need to redownload Qwen3.5 models (e.g., ollama pull qwen3.5:35b) to apply this fix
Memory monitoring: The ollama run --verbose command now displays peak memory usage when using Ollama's MLX engine
MLX stability: Fixed memory issues and crashes affecting the MLX runner
GGUF compatibility: Resolved an issue preventing Ollama from running models imported from Qwen3.5 GGUF files

Action Items

Users experiencing Qwen3.5 model issues should pull the latest versions to receive the presence penalty fix. Developers testing Ollama's MLX engine will benefit from improved stability and better visibility into memory consumption via the verbose flag.

New Models

Bug Fixes and Improvements

Action Items

Tags

Published

Source