Qwen3.5 Model Overview
Alibaba has released Qwen3.5, a new open-source vision-language model purpose-built for native multimodal agents. The model features a ~400B parameter architecture combining mixture of experts (MoE) and Gated Delta Networks, with only 17B active parameters per inference token. Key capabilities include understanding and navigating user interfaces, visual reasoning across mobile and web applications, coding assistance, and complex search tasks.
Model Specifications:
- Total parameters: 397B with 4.28% activation rate
- Input context: 256K tokens (extensible to 1M)
- Language support: 200+ languages
- Architecture: 512 experts with 11 experts per token (10 routed + 1 shared)
Access and Integration
Developers can immediately access Qwen3.5 through multiple channels:
- Free GPU endpoints on build.nvidia.com powered by NVIDIA Blackwell GPUs, enabling browser-based experimentation and real-world performance evaluation
- NVIDIA API integration available to NVIDIA Developer Program members with free registration
- NVIDIA NIM containerized deployment for production workloads on-premises, in the cloud, or across hybrid environments
The NVIDIA API provides OpenAI-compatible chat completions with support for tool calling and extended generation features like thinking modes.
Customization and Fine-tuning
The NVIDIA NeMo framework enables specialized domain adaptation through the NeMo Automodel PyTorch-native training library. Developers can perform supervised fine-tuning or memory-efficient adaptation methods like LoRA without tedious model conversions. The framework supports multinode deployment via Slurm and Kubernetes, with Hugging Face integration for direct checkpoint loading. A reference tutorial on Medical Visual QA demonstrates fine-tuning Qwen3.5 on radiological datasets for domain-specific reasoning.
Next Steps
Developers should visit the Qwen3.5 model page on Hugging Face and the build.nvidia.com platform to begin building multimodal agent applications with immediate GPU-accelerated access.