vLLM logo

vLLM

vLLM — AI Infrastructure

https://github.com ↗

Changelogs

RSS
vLLM v0.16.0 brings async scheduling with pipeline parallelism, 30.8% throughput gains//vLLM's latest release introduces async scheduling combined with pipeline parallelism, delivering significant performance improvements and new WebSocket-based realtime audio streaming capabilities. The update adds support for 12+ new model architectures and major enhancements to speculative decoding, RLHF workflows, and Intel XPU platforms.
releasefeatureperformanceapisdk
vLLM v0.16.0rc3 fixes MTP accuracy issue with GLM-5 model//vLLM releases a release candidate patch addressing an accuracy bug in Multi-Token Prediction (MTP) functionality for the GLM-5 model. This fix ensures more reliable inference results for users running GLM-5 on vLLM infrastructure.
releasebugfix