vLLM v0.16.0 brings async scheduling with pipeline parallelism, 30.8% throughput gains//vLLM's latest release introduces async scheduling combined with pipeline parallelism, delivering significant performance improvements and new WebSocket-based realtime audio streaming capabilities. The update adds support for 12+ new model architectures and major enhancements to speculative decoding, RLHF workflows, and Intel XPU platforms.
releasefeatureperformanceapisdk