← Back
Ollama v0.17.5 fixes Qwen 3.5 crashes and MLX runner memory issues
· releasebugfixperformance · github.com ↗

Bug Fixes and Improvements

This patch release focuses on stability and performance improvements across Ollama's model runners:

Qwen 3.5 Model Fixes

  • Crash prevention: Fixed a critical crash that occurred when Qwen 3.5 models were split between GPU and CPU resources
  • Output quality: Resolved an issue causing Qwen 3.5 models to repeat themselves due to missing presence penalty
  • Model compatibility: Fixed support for models imported from Qwen 3.5 GGUF files

MLX Engine Enhancements

  • Improved memory management to eliminate crashes and resource leaks in the MLX runner
  • Enhanced ollama run --verbose to display peak memory usage statistics when using Ollama's MLX engine

Action Items for Users

Users currently running Qwen 3.5 models should redownload them to apply the presence penalty fix, which improves generation quality:

ollama pull qwen3.5:35b

This release is available across all platforms including macOS, Linux (AMD64, ARM64), and Windows with support for ROCM accelerators.