Qwen3.5 Model Family Launch
Alibaba has released Qwen3.5, a new multimodal model family with four size variants:
- Qwen3.5-27B: Compact option for resource-constrained deployments
- Qwen3.5-35B-A3B: Balanced model with 3B active parameters
- Qwen3.5-122B-A10B: Mid-tier with 10B active parameters
- Qwen3.5-397B-A17B: Flagship 397B model with only 17B active parameters (mixture-of-experts)
The models support 256K token context window (extendable to 1M), operate in 201 languages, and include hybrid thinking and non-thinking modes for flexible reasoning capabilities.
Performance & Capabilities
The flagship Qwen3.5-397B-A17B delivers performance comparable to Gemini 3 Pro, Claude Opus 4.5, and GPT-5.2. Strengths include:
- Coding: Advanced programming task support
- Vision: Multimodal understanding capabilities
- Agents: Autonomous task execution with tool calling
- Long-context: Efficient processing of extended documents
Deployment & Quantization
Unsloth provides Dynamic 2.0 quantized versions optimized for local deployment:
- 4-bit MXFP4 (~214GB): Runs on 256GB RAM/Mac M3 Ultra with 25+ tokens/s
- 3-bit quantization (~192GB): Fits on 192GB RAM devices
- Full precision (~807GB): Requires 512GB+ memory
All quantizations use dynamic casting to preserve critical layers at 8-16 bit precision. Complete models are available on Hugging Face, with tutorials for llama.cpp, llama-server, and OpenAI API compatibility.
Developer Integration
The release includes comprehensive deployment guides for local inference using llama.cpp with configurable parameters for thinking vs. non-thinking modes. Tool calling support enables building autonomous agents with custom functions. Maximum recommended output is 32,768 tokens per query.