Ollama v0.17.5 adds Qwen 3.5 support; fixes GPU/CPU split crashes and memory issues

New Models

Qwen 3.5 models are now available in Ollama, supporting four parameter sizes:

0.8B
2B
4B
9B
35B

Key Bug Fixes

This release addresses several stability and correctness issues:

GPU/CPU Split Handling: Fixed crashes that occurred when Qwen 3.5 models were split across GPU and CPU resources, improving reliability for users with mixed hardware setups.

Output Quality: Resolved a critical issue where Qwen 3.5 models would repeat themselves due to missing presence penalty configuration. Users may need to redownload the Qwen 3.5 models to receive the corrected behavior.

Memory Management: Fixed memory issues and crashes in the MLX runner, improving stability when running models on Apple Silicon and other supported hardware.

Model Import Compatibility: Addressed an issue preventing models imported from Qwen 3.5 GGUF files from running correctly.

Developer Improvements

The ollama run --verbose command now displays peak memory usage when using Ollama's MLX engine, providing better visibility into resource consumption for debugging and optimization purposes.

New Models

Key Bug Fixes

Developer Improvements

Tags

Published

Source