Alibaba releases Qwen3.5 model family with 8 sizes from 0.8B to 397B parameters

Qwen3.5 Model Family Release

Alibaba has released Qwen3.5, a comprehensive model family with eight distinct sizes designed to serve different hardware constraints and use cases:

Standard sizes: 27B, 35B-A3B, 122B-A10B, 397B-A17B
Small series (new): 0.8B, 2B, 4B, 9B

Key Features

All models in the family feature:

Hybrid reasoning: Support for both thinking (extended reasoning) and non-thinking (fast response) modes
Extended context: 256K context window across 201 languages (extendable to 1M via YaRN)
Multimodal capabilities: Vision, coding, chat, and tool-calling optimizations
Efficient deployment: 35B and 27B models run on 22GB RAM/Mac devices

Performance Improvements (Mar 5 Update)

Unsloth has released updated GGUF quantizations with notable enhancements:

Improved quantization algorithm applied across all model sizes
New imatrix data showing improvements in chat, coding, long-context, and tool-calling use cases
Fixed tool-calling with corrected chat templates (universal fix for all Qwen3.5 formats)
Dynamic 2.0 quantization: 4-bit versions intelligently upcast important layers to 8-16 bit for better accuracy

Hardware Requirements

Memory requirements for inference (in GB, RAM + VRAM):

Model	4-bit	8-bit	BF16
0.8B-2B	3.5	7.5	9
4B	5.5	10	14
9B	6.5	13	19
27B	17	30	54
35B-A3B	22	38	70

Getting Started

GGUF variants are available on Hugging Face via Unsloth. Thinking mode is enabled by default for standard models but disabled for the Small series (0.8B-9B). Developers can enable/disable reasoning using --chat-template-kwargs '{"enable_thinking":true/false}'.

Fine-tuning support is also available through Unsloth integration.

Qwen3.5 Model Family Release

Key Features

Performance Improvements (Mar 5 Update)

Hardware Requirements

Getting Started

Tags

Published

Source