Alibaba releases Qwen3.5 model family with 8 sizes from 0.8B to 397B parameters
Qwen3.5 Model Family Release
Alibaba has released Qwen3.5, a comprehensive model family with eight distinct sizes designed to serve different hardware constraints and use cases:
- Standard sizes: 27B, 35B-A3B, 122B-A10B, 397B-A17B
- Small series (new): 0.8B, 2B, 4B, 9B
Key Features
All models in the family feature:
- Hybrid reasoning: Support for both thinking (extended reasoning) and non-thinking (fast response) modes
- Extended context: 256K context window across 201 languages (extendable to 1M via YaRN)
- Multimodal capabilities: Vision, coding, chat, and tool-calling optimizations
- Efficient deployment: 35B and 27B models run on 22GB RAM/Mac devices
Performance Improvements (Mar 5 Update)
Unsloth has released updated GGUF quantizations with notable enhancements:
- Improved quantization algorithm applied across all model sizes
- New imatrix data showing improvements in chat, coding, long-context, and tool-calling use cases
- Fixed tool-calling with corrected chat templates (universal fix for all Qwen3.5 formats)
- Dynamic 2.0 quantization: 4-bit versions intelligently upcast important layers to 8-16 bit for better accuracy
Hardware Requirements
Memory requirements for inference (in GB, RAM + VRAM):
| Model | 4-bit | 8-bit | BF16 |
|---|---|---|---|
| 0.8B-2B | 3.5 | 7.5 | 9 |
| 4B | 5.5 | 10 | 14 |
| 9B | 6.5 | 13 | 19 |
| 27B | 17 | 30 | 54 |
| 35B-A3B | 22 | 38 | 70 |
Getting Started
GGUF variants are available on Hugging Face via Unsloth. Thinking mode is enabled by default for standard models but disabled for the Small series (0.8B-9B). Developers can enable/disable reasoning using --chat-template-kwargs '{"enable_thinking":true/false}'.
Fine-tuning support is also available through Unsloth integration.