← Back
NVIDIA
Qwen3.5, 397B Vision-Language Model, Now Available on NVIDIA GPU-Accelerated Endpoints
· featuremodelapiplatformintegration · developer.nvidia.com ↗

Qwen3.5 Overview

Alibaba has released Qwen3.5, a ~400-billion-parameter native vision-language model designed specifically for building multimodal agents. The model combines mixture-of-experts (MoE) architecture with Gated Delta Networks, supporting 200+ languages and featuring a 256K token context window (extensible to 1M tokens). With only 17B active parameters per token (4.28% activation rate), the model delivers efficient inference despite its massive scale.

Key Capabilities

Qwen3.5 excels at several critical use cases:

  • UI Navigation: Can understand and interact with mobile and web interfaces
  • Visual Reasoning: Handles complex visual understanding tasks across multiple domains
  • Code Generation: Supports web development and coding tasks
  • Multimodal Chat: Combines language and vision understanding for conversational AI
  • Complex Search: Enables reasoning-driven information retrieval

Getting Started: Free Access and API Integration

Developers can immediately access Qwen3.5 through multiple entry points:

  • Playground: Free browser-based testing on build.nvidia.com with NVIDIA Blackwell GPU acceleration
  • API Access: OpenAI-compatible REST API available free with NVIDIA Developer Program registration
  • Production Deployment: NVIDIA NIM provides containerized inference microservices for on-premises, cloud, or hybrid deployments

The API supports tool-calling through OpenAI-compatible tools parameters, enabling agentic workflows out of the box.

Customization and Fine-Tuning

The NVIDIA NeMo framework enables domain-specific adaptation of Qwen3.5 with the NeMo Automodel library, offering:

  • PyTorch-native training with direct Hugging Face checkpoint support (no conversion required)
  • Flexible fine-tuning methods: Full supervised fine-tuning (SFT) or memory-efficient LoRA
  • Scalable training: Multinode deployment support via Slurm and Kubernetes for large-scale MoE optimization
  • Reference implementations: Technical tutorials (e.g., Medical Visual QA on radiological datasets) to guide domain adaptation

Developers can fine-tune Qwen3.5 for specialized reasoning and agentic workflows with minimal latency.