Ollama v0.17.5 fixes Qwen 3.5 crashes and MLX runner memory issues
Bug Fixes and Improvements
This patch release focuses on stability and performance improvements across Ollama's model runners:
Qwen 3.5 Model Fixes
- Crash prevention: Fixed a critical crash that occurred when Qwen 3.5 models were split between GPU and CPU resources
- Output quality: Resolved an issue causing Qwen 3.5 models to repeat themselves due to missing presence penalty
- Model compatibility: Fixed support for models imported from Qwen 3.5 GGUF files
MLX Engine Enhancements
- Improved memory management to eliminate crashes and resource leaks in the MLX runner
- Enhanced
ollama run --verboseto display peak memory usage statistics when using Ollama's MLX engine
Action Items for Users
Users currently running Qwen 3.5 models should redownload them to apply the presence penalty fix, which improves generation quality:
ollama pull qwen3.5:35b
This release is available across all platforms including macOS, Linux (AMD64, ARM64), and Windows with support for ROCM accelerators.