vLLM v0.15.2rc0 fixes TRTLLM attention conflict with KV cache transfer

vLLM v0.15.2rc0 Released

vLLM has released v0.15.2rc0, a release candidate focused on stability improvements for production deployments.

Key Fix

This release includes a critical bugfix that addresses a conflict between two performance optimization features:

TRTLLM Attention: An optimized attention mechanism from TensorRT-LLM
KV Cache Transfer: A feature for efficient key-value cache management

When both features were enabled simultaneously, they could cause incorrect behavior or performance degradation. The fix automatically disables TRTLLM attention when KV cache transfer is active, ensuring safe and predictable operation.

What Developers Need to Know

If you're using vLLM with either TRTLLM attention or KV cache transfer enabled, upgrading to this release is recommended to avoid potential conflicts. The fix is transparent—no configuration changes are required. The system will automatically select the appropriate attention mechanism based on your settings.

This is a release candidate (rc0), intended for testing before the final v0.15.2 release.

vLLM v0.15.2rc0 Released

Key Fix

What Developers Need to Know

Tags

Published

Source