Alibaba releases Qwen3.5 model family; 397B variant matches Gemini 3 Pro and Claude Opus 4.5

Qwen3.5 Model Family Launch

Alibaba has released Qwen3.5, a new multimodal model family with four size variants:

Qwen3.5-27B: Compact option for resource-constrained deployments
Qwen3.5-35B-A3B: Balanced model with 3B active parameters
Qwen3.5-122B-A10B: Mid-tier with 10B active parameters
Qwen3.5-397B-A17B: Flagship 397B model with only 17B active parameters (mixture-of-experts)

The models support 256K token context window (extendable to 1M), operate in 201 languages, and include hybrid thinking and non-thinking modes for flexible reasoning capabilities.

Performance & Capabilities

The flagship Qwen3.5-397B-A17B delivers performance comparable to Gemini 3 Pro, Claude Opus 4.5, and GPT-5.2. Strengths include:

Coding: Advanced programming task support
Vision: Multimodal understanding capabilities
Agents: Autonomous task execution with tool calling
Long-context: Efficient processing of extended documents

Deployment & Quantization

Unsloth provides Dynamic 2.0 quantized versions optimized for local deployment:

4-bit MXFP4 (~214GB): Runs on 256GB RAM/Mac M3 Ultra with 25+ tokens/s
3-bit quantization (~192GB): Fits on 192GB RAM devices
Full precision (~807GB): Requires 512GB+ memory

All quantizations use dynamic casting to preserve critical layers at 8-16 bit precision. Complete models are available on Hugging Face, with tutorials for llama.cpp, llama-server, and OpenAI API compatibility.

Developer Integration

The release includes comprehensive deployment guides for local inference using llama.cpp with configurable parameters for thinking vs. non-thinking modes. Tool calling support enables building autonomous agents with custom functions. Maximum recommended output is 32,768 tokens per query.

Qwen3.5 Model Family Launch

Performance & Capabilities

Deployment & Quantization

Developer Integration

Tags

Published

Source