CVE-2025-62426
📋 TL;DR
This vulnerability in vLLM allows attackers to send specially crafted requests to the /v1/chat/completions and /tokenize endpoints that can block API server processing for extended periods, causing denial of service. It affects vLLM deployments from version 0.5.5 to 0.11.0. Organizations using vLLM for LLM inference and serving are impacted.
💻 Affected Systems
- vLLM
📦 What is this software?
Vllm by Vllm
Vllm by Vllm
Vllm by Vllm
⚠️ Risk & Real-World Impact
Worst Case
Complete denial of service where the vLLM API server becomes unresponsive for extended periods, blocking all legitimate requests and disrupting LLM inference services.
Likely Case
Degraded service performance with delayed response times for legitimate requests, potentially causing timeouts and service disruptions.
If Mitigated
Minimal impact with proper rate limiting, input validation, and network segmentation in place.
🎯 Exploit Status
Exploitation requires sending specially crafted chat_template_kwargs parameters to vulnerable endpoints.
🛠️ Fix & Mitigation
✅ Official Fix
Patch Version: 0.11.1
Vendor Advisory: https://github.com/vllm-project/vllm/security/advisories/GHSA-69j4-grxj-j64p
Restart Required: Yes
Instructions:
1. Update vLLM to version 0.11.1 or later using pip: pip install --upgrade vllm>=0.11.1
2. Restart the vLLM service
3. Verify the update was successful
🔧 Temporary Workarounds
Input Validation Filter
allImplement middleware or proxy that filters out chat_template_kwargs parameter from requests
# Example using nginx location block to filter parameters
location /v1/chat/completions {
if ($args ~* "chat_template_kwargs") {
return 400;
}
proxy_pass http://vllm_backend;
}
Rate Limiting
allImplement strict rate limiting on vulnerable endpoints to prevent DoS attacks
# Example using nginx rate limiting
limit_req_zone $binary_remote_addr zone=vllm_limit:10m rate=10r/s;
location /v1/chat/completions {
limit_req zone=vllm_limit burst=20 nodelay;
proxy_pass http://vllm_backend;
}
🧯 If You Can't Patch
- Implement network-level controls to restrict access to vulnerable endpoints
- Deploy Web Application Firewall (WAF) with rules to block malicious chat_template_kwargs parameters
🔍 How to Verify
Check if Vulnerable:
Check vLLM version and verify it's between 0.5.5 and 0.11.0 inclusive
Check Version:
python -c "import vllm; print(vllm.__version__)"
Verify Fix Applied:
Confirm vLLM version is 0.11.1 or later and test endpoints with chat_template_kwargs parameter
📡 Detection & Monitoring
Log Indicators:
- Unusually long request processing times
- Multiple requests with chat_template_kwargs parameter
- Increased error rates or timeouts
Network Indicators:
- High volume of requests to /v1/chat/completions or /tokenize endpoints
- Requests containing chat_template_kwargs parameter
SIEM Query:
source="vllm_logs" AND (uri_path="/v1/chat/completions" OR uri_path="/tokenize") AND request_params="*chat_template_kwargs*"
🔗 References
- https://github.com/vllm-project/vllm/blob/2a6dc67eb520ddb9c4138d8b35ed6fe6226997fb/vllm/entrypoints/chat_utils.py#L1602-L1610
- https://github.com/vllm-project/vllm/blob/2a6dc67eb520ddb9c4138d8b35ed6fe6226997fb/vllm/entrypoints/openai/serving_engine.py#L809-L814
- https://github.com/vllm-project/vllm/commit/3ada34f9cb4d1af763fdfa3b481862a93eb6bd2b
- https://github.com/vllm-project/vllm/pull/27205
- https://github.com/vllm-project/vllm/security/advisories/GHSA-69j4-grxj-j64p