← Back
Unsloth
Alibaba releases Qwen3.5 model family with 8 sizes from 0.8B to 397B parameters
· releasemodelfeature · unsloth.ai ↗

Qwen3.5 Model Family Release

Alibaba has released Qwen3.5, a comprehensive model family with eight distinct sizes designed to serve different hardware constraints and use cases:

  • Standard sizes: 27B, 35B-A3B, 122B-A10B, 397B-A17B
  • Small series (new): 0.8B, 2B, 4B, 9B

Key Features

All models in the family feature:

  • Hybrid reasoning: Support for both thinking (extended reasoning) and non-thinking (fast response) modes
  • Extended context: 256K context window across 201 languages (extendable to 1M via YaRN)
  • Multimodal capabilities: Vision, coding, chat, and tool-calling optimizations
  • Efficient deployment: 35B and 27B models run on 22GB RAM/Mac devices

Performance Improvements (Mar 5 Update)

Unsloth has released updated GGUF quantizations with notable enhancements:

  • Improved quantization algorithm applied across all model sizes
  • New imatrix data showing improvements in chat, coding, long-context, and tool-calling use cases
  • Fixed tool-calling with corrected chat templates (universal fix for all Qwen3.5 formats)
  • Dynamic 2.0 quantization: 4-bit versions intelligently upcast important layers to 8-16 bit for better accuracy

Hardware Requirements

Memory requirements for inference (in GB, RAM + VRAM):

Model 4-bit 8-bit BF16
0.8B-2B 3.5 7.5 9
4B 5.5 10 14
9B 6.5 13 19
27B 17 30 54
35B-A3B 22 38 70

Getting Started

GGUF variants are available on Hugging Face via Unsloth. Thinking mode is enabled by default for standard models but disabled for the Small series (0.8B-9B). Developers can enable/disable reasoning using --chat-template-kwargs '{"enable_thinking":true/false}'.

Fine-tuning support is also available through Unsloth integration.