New Models
Qwen 3.5 models are now available in Ollama, supporting four parameter sizes:
- 0.8B
- 2B
- 4B
- 9B
- 35B
Key Bug Fixes
This release addresses several stability and correctness issues:
GPU/CPU Split Handling: Fixed crashes that occurred when Qwen 3.5 models were split across GPU and CPU resources, improving reliability for users with mixed hardware setups.
Output Quality: Resolved a critical issue where Qwen 3.5 models would repeat themselves due to missing presence penalty configuration. Users may need to redownload the Qwen 3.5 models to receive the corrected behavior.
Memory Management: Fixed memory issues and crashes in the MLX runner, improving stability when running models on Apple Silicon and other supported hardware.
Model Import Compatibility: Addressed an issue preventing models imported from Qwen 3.5 GGUF files from running correctly.
Developer Improvements
The ollama run --verbose command now displays peak memory usage when using Ollama's MLX engine, providing better visibility into resource consumption for debugging and optimization purposes.
Ollama