← Back
Alibaba releases Qwen3.5 model family with sizes from 0.8B to 397B parameters
· releasemodelfeature · unsloth.ai ↗

Qwen3.5 Model Family

Alibaba has launched Qwen3.5, a comprehensive model family designed to serve diverse deployment scenarios. The lineup includes:

  • Large models: 35B-A3B, 27B, 122B-A10B, and 397B-A17B parameters
  • Small models: 0.8B, 2B, 4B, and 9B parameters
  • Multimodal capabilities: Hybrid reasoning LLMs supporting vision, text, and agentic coding tasks

Key Features

Context & Language Support

  • 256K context window (extendable to 1M via YaRN)
  • Multilingual support across 201 languages
  • Supports up to 32,768 output tokens

Reasoning Capabilities

  • Hybrid thinking and non-thinking modes for flexible inference
  • Thinking mode optimized for complex reasoning tasks
  • Non-thinking (Instruct) mode for faster, direct responses
  • Reasoning disabled by default on Small models (0.8B-9B)

Hardware Requirements The models support multiple quantization levels with varying memory footprints:

  • 35B-A3B: 22GB (4-bit) on compatible devices like high-end Macs
  • 27B: 17GB (4-bit)
  • Small models (0.8B-9B): As low as 3GB (3-bit) to 19GB (BF16)

Deployment & Optimization

All model uploads use Unsloth Dynamic 2.0 quantization, which intelligently upcasts important layers to 8 or 16-bit precision within 4-bit quantization for superior performance. GGUF variants are available for llama.cpp-compatible backends (currently not compatible with Ollama).

Fine-tuning support is available through Unsloth, and comprehensive inference tutorials are provided for each model size. Developers can control reasoning behavior via chat template parameters (enable_thinking flag).

Recent Updates

A March 2 update addressed tool-calling improvements following chat template fixes, with benefits applying universally across all Qwen3.5 formats and uploaders. MXFP4 layers have been retired from select quantization variants (Q2_K_XL, Q3_K_XL, Q4_K_XL) based on quantization sensitivity analysis.