vLLM v0.15.2rc0 Released
vLLM has released v0.15.2rc0, a release candidate focused on stability improvements for production deployments.
Key Fix
This release includes a critical bugfix that addresses a conflict between two performance optimization features:
- TRTLLM Attention: An optimized attention mechanism from TensorRT-LLM
- KV Cache Transfer: A feature for efficient key-value cache management
When both features were enabled simultaneously, they could cause incorrect behavior or performance degradation. The fix automatically disables TRTLLM attention when KV cache transfer is active, ensuring safe and predictable operation.
What Developers Need to Know
If you're using vLLM with either TRTLLM attention or KV cache transfer enabled, upgrading to this release is recommended to avoid potential conflicts. The fix is transparent—no configuration changes are required. The system will automatically select the appropriate attention mechanism based on your settings.
This is a release candidate (rc0), intended for testing before the final v0.15.2 release.