Qwen3.5, 397B Vision-Language Model, Now Available on NVIDIA GPU-Accelerated Endpoints

Qwen3.5 Overview

Alibaba has released Qwen3.5, a ~400-billion-parameter native vision-language model designed specifically for building multimodal agents. The model combines mixture-of-experts (MoE) architecture with Gated Delta Networks, supporting 200+ languages and featuring a 256K token context window (extensible to 1M tokens). With only 17B active parameters per token (4.28% activation rate), the model delivers efficient inference despite its massive scale.

Key Capabilities

Qwen3.5 excels at several critical use cases:

UI Navigation: Can understand and interact with mobile and web interfaces
Visual Reasoning: Handles complex visual understanding tasks across multiple domains
Code Generation: Supports web development and coding tasks
Multimodal Chat: Combines language and vision understanding for conversational AI
Complex Search: Enables reasoning-driven information retrieval

Getting Started: Free Access and API Integration

Developers can immediately access Qwen3.5 through multiple entry points:

Playground: Free browser-based testing on build.nvidia.com with NVIDIA Blackwell GPU acceleration
API Access: OpenAI-compatible REST API available free with NVIDIA Developer Program registration
Production Deployment: NVIDIA NIM provides containerized inference microservices for on-premises, cloud, or hybrid deployments

The API supports tool-calling through OpenAI-compatible tools parameters, enabling agentic workflows out of the box.

Customization and Fine-Tuning

The NVIDIA NeMo framework enables domain-specific adaptation of Qwen3.5 with the NeMo Automodel library, offering:

PyTorch-native training with direct Hugging Face checkpoint support (no conversion required)
Flexible fine-tuning methods: Full supervised fine-tuning (SFT) or memory-efficient LoRA
Scalable training: Multinode deployment support via Slurm and Kubernetes for large-scale MoE optimization
Reference implementations: Technical tutorials (e.g., Medical Visual QA on radiological datasets) to guide domain adaptation

Developers can fine-tune Qwen3.5 for specialized reasoning and agentic workflows with minimal latency.

Qwen3.5 Overview

Key Capabilities

Getting Started: Free Access and API Integration

Customization and Fine-Tuning

Tags

Published

Source