Alibaba's Qwen3.5 VLM now available on NVIDIA GPU endpoints for multimodal agents

Qwen3.5 Model Overview

Alibaba has released Qwen3.5, a new open-source vision-language model purpose-built for native multimodal agents. The model features a ~400B parameter architecture combining mixture of experts (MoE) and Gated Delta Networks, with only 17B active parameters per inference token. Key capabilities include understanding and navigating user interfaces, visual reasoning across mobile and web applications, coding assistance, and complex search tasks.

Model Specifications:

Total parameters: 397B with 4.28% activation rate
Input context: 256K tokens (extensible to 1M)
Language support: 200+ languages
Architecture: 512 experts with 11 experts per token (10 routed + 1 shared)

Access and Integration

Developers can immediately access Qwen3.5 through multiple channels:

Free GPU endpoints on build.nvidia.com powered by NVIDIA Blackwell GPUs, enabling browser-based experimentation and real-world performance evaluation
NVIDIA API integration available to NVIDIA Developer Program members with free registration
NVIDIA NIM containerized deployment for production workloads on-premises, in the cloud, or across hybrid environments

The NVIDIA API provides OpenAI-compatible chat completions with support for tool calling and extended generation features like thinking modes.

Customization and Fine-tuning

The NVIDIA NeMo framework enables specialized domain adaptation through the NeMo Automodel PyTorch-native training library. Developers can perform supervised fine-tuning or memory-efficient adaptation methods like LoRA without tedious model conversions. The framework supports multinode deployment via Slurm and Kubernetes, with Hugging Face integration for direct checkpoint loading. A reference tutorial on Medical Visual QA demonstrates fine-tuning Qwen3.5 on radiological datasets for domain-specific reasoning.

Next Steps

Developers should visit the Qwen3.5 model page on Hugging Face and the build.nvidia.com platform to begin building multimodal agent applications with immediate GPU-accelerated access.

Qwen3.5 Model Overview

Access and Integration

Customization and Fine-tuning

Next Steps

Tags

Published

Source