CVE-2025-62426

6.5 MEDIUM

📋 TL;DR

This vulnerability in vLLM allows attackers to send specially crafted requests to the /v1/chat/completions and /tokenize endpoints that can block API server processing for extended periods, causing denial of service. It affects vLLM deployments from version 0.5.5 to 0.11.0. Organizations using vLLM for LLM inference and serving are impacted.

💻 Affected Systems

Products:
  • vLLM
Versions: 0.5.5 to 0.11.0
Operating Systems: All
Default Config Vulnerable: ⚠️ Yes
Notes: Only affects deployments with the vulnerable endpoints exposed and accessible.

📦 What is this software?

⚠️ Risk & Real-World Impact

🔴

Worst Case

Complete denial of service where the vLLM API server becomes unresponsive for extended periods, blocking all legitimate requests and disrupting LLM inference services.

🟠

Likely Case

Degraded service performance with delayed response times for legitimate requests, potentially causing timeouts and service disruptions.

🟢

If Mitigated

Minimal impact with proper rate limiting, input validation, and network segmentation in place.

🌐 Internet-Facing: HIGH
🏢 Internal Only: MEDIUM

🎯 Exploit Status

Public PoC: ✅ No
Weaponized: UNKNOWN
Unauthenticated Exploit: ⚠️ Yes
Complexity: LOW

Exploitation requires sending specially crafted chat_template_kwargs parameters to vulnerable endpoints.

🛠️ Fix & Mitigation

✅ Official Fix

Patch Version: 0.11.1

Vendor Advisory: https://github.com/vllm-project/vllm/security/advisories/GHSA-69j4-grxj-j64p

Restart Required: Yes

Instructions:

1. Update vLLM to version 0.11.1 or later using pip: pip install --upgrade vllm>=0.11.1
2. Restart the vLLM service
3. Verify the update was successful

🔧 Temporary Workarounds

Input Validation Filter

all

Implement middleware or proxy that filters out chat_template_kwargs parameter from requests

# Example using nginx location block to filter parameters
location /v1/chat/completions {
    if ($args ~* "chat_template_kwargs") {
        return 400;
    }
    proxy_pass http://vllm_backend;
}

Rate Limiting

all

Implement strict rate limiting on vulnerable endpoints to prevent DoS attacks

# Example using nginx rate limiting
limit_req_zone $binary_remote_addr zone=vllm_limit:10m rate=10r/s;

location /v1/chat/completions {
    limit_req zone=vllm_limit burst=20 nodelay;
    proxy_pass http://vllm_backend;
}

🧯 If You Can't Patch

  • Implement network-level controls to restrict access to vulnerable endpoints
  • Deploy Web Application Firewall (WAF) with rules to block malicious chat_template_kwargs parameters

🔍 How to Verify

Check if Vulnerable:

Check vLLM version and verify it's between 0.5.5 and 0.11.0 inclusive

Check Version:

python -c "import vllm; print(vllm.__version__)"

Verify Fix Applied:

Confirm vLLM version is 0.11.1 or later and test endpoints with chat_template_kwargs parameter

📡 Detection & Monitoring

Log Indicators:

  • Unusually long request processing times
  • Multiple requests with chat_template_kwargs parameter
  • Increased error rates or timeouts

Network Indicators:

  • High volume of requests to /v1/chat/completions or /tokenize endpoints
  • Requests containing chat_template_kwargs parameter

SIEM Query:

source="vllm_logs" AND (uri_path="/v1/chat/completions" OR uri_path="/tokenize") AND request_params="*chat_template_kwargs*"

🔗 References

📤 Share & Export