Alibaba releases Qwen3.5, a 400B-parameter vision-language model optimized for agentic AI on NVIDIA endpoints

Qwen3.5 VLM Architecture

Alibaba has released Qwen3.5, a new open-source vision-language model designed for native multimodal agents. The model features a hybrid architecture combining mixture of experts (MoE) and Gated Delta Networks, totaling 397B parameters with only 17B active parameters per token (4.28% activation rate). This design enables efficient reasoning while maintaining support for 256K token context windows (extensible to 1M) and 200+ languages.

Key Capabilities

Qwen3.5 is optimized for several advanced use cases:

Visual reasoning: Understanding and navigating mobile and web user interfaces
Coding tasks: Web development and code generation
Agentic workflows: Complex multi-step reasoning and decision-making
Search and QA: Complex information retrieval across modalities

The model outperforms previous generations of VLMs in UI navigation tasks, making it particularly suitable for automating workflows that require understanding visual layouts.

Developer Access and Deployment

Developers can start building immediately with free access to GPU-accelerated endpoints on build.nvidia.com, powered by NVIDIA Blackwell GPUs. The model is also available via API integration through the NVIDIA Developer Program at no cost with registration. Code examples demonstrate OpenAI-compatible chat completion API calls with tool-calling support.

For production deployments, NVIDIA NIM provides containerized inference microservices with performance tuning and standardized APIs, enabling flexible deployment across on-premises, cloud, and hybrid environments.

Customization and Fine-Tuning

The NVIDIA NeMo framework enables domain-specific adaptation through the NeMo Automodel library, offering:

PyTorch-native training with Day 0 Hugging Face support
Memory-efficient fine-tuning options including LoRA
Large-scale multinode deployments via Slurm and Kubernetes
Reference implementations such as Medical Visual QA for radiological datasets

This combination of pre-built capabilities and customization tools positions Qwen3.5 as a comprehensive solution for enterprises deploying specialized multimodal agents.

Qwen3.5 VLM Architecture

Key Capabilities

Developer Access and Deployment

Customization and Fine-Tuning

Tags

Published

Source