Qwen3.5 Now Available on NVIDIA Infrastructure
Alibaba has released Qwen3.5, a 397-billion parameter native vision-language model designed for multimodal agents. The model uses a hybrid architecture combining mixture of experts (MoE) with Gated Delta Networks, achieving an impressive 4.28% activation rate with only 17B active parameters per token. It supports 256K token context windows (extensible to 1M), covers 200+ languages, and can understand and navigate complex user interfaces.
Immediate Access for Developers
Developers can start building with Qwen3.5 immediately through multiple channels:
- Free GPU-accelerated endpoints on build.nvidia.com powered by NVIDIA Blackwell GPUs, available to registered NVIDIA Developer Program members
- API access through NVIDIA's hosted endpoints with free usage tier
- Full code examples and OpenAI-compatible chat completion APIs for rapid integration
The model excels at coding tasks, visual reasoning over mobile and web interfaces, chat applications, and complex search scenarios.
Production Deployment and Customization
For production use, NVIDIA NIM provides containerized inference microservices with optimized performance, standardized APIs, and deployment flexibility across on-premises, cloud, and hybrid environments.
The NVIDIA NeMo framework enables fine-tuning for specialized domains. Key capabilities include:
- PyTorch-native training with Day 0 Hugging Face checkpoint support (no conversion needed)
- Memory-efficient methods like LoRA for cost-effective adaptation
- Multinode deployment on Slurm and Kubernetes for large-scale training
- Reference implementation for medical visual QA and radiological dataset fine-tuning
Getting Started
Developers can access Qwen3.5 immediately on build.nvidia.com, experiment with prompts, and test against their own data. Integration with existing NVIDIA infrastructure (Blackwell GPUs, NIM, NeMo) enables seamless scaling from prototyping to enterprise production workloads.