CVE-2024-8768
📋 TL;DR
A denial-of-service vulnerability exists in vLLM where sending an empty prompt to the completions API causes the API server to crash. This affects any system running a vulnerable version of vLLM with the completions API exposed. The vulnerability is simple to exploit and can disrupt AI inference services.
💻 Affected Systems
- vLLM
⚠️ Manual Verification Required
This CVE does not have specific version information in our database, so automatic vulnerability detection cannot determine if your system is affected.
Why? The CVE database entry doesn't specify which versions are vulnerable (no version ranges provided by the vendor/NVD).
🔒 Custom verification scripts are available for registered users. Sign up free to download automated test scripts.
- Review the CVE details at NVD
- Check vendor security advisories for your specific version
- Test if the vulnerability is exploitable in your environment
- Consider updating to the latest version as a precaution
⚠️ Risk & Real-World Impact
Worst Case
An attacker could repeatedly crash the vLLM API server, causing sustained service unavailability and disrupting all AI inference capabilities.
Likely Case
Accidental or malicious empty prompts cause intermittent service outages requiring manual server restarts.
If Mitigated
With proper input validation and rate limiting, the impact is limited to occasional crashes that are automatically recovered.
🎯 Exploit Status
Exploitation requires only sending a simple HTTP request with an empty prompt field.
🛠️ Fix & Mitigation
✅ Official Fix
Patch Version: v0.5.3
Vendor Advisory: https://github.com/vllm-project/vllm/pull/7746
Restart Required: Yes
Instructions:
1. Update vLLM to version 0.5.3 or later using pip: pip install --upgrade vllm==0.5.3
2. Restart the vLLM API server
3. Verify the fix by testing with empty prompts
🔧 Temporary Workarounds
Input validation at proxy layer
allAdd request validation to reject empty prompts before they reach vLLM
# Example nginx config location block
location /v1/completions {
if ($request_body ~* '"prompt":\s*""') {
return 400;
}
proxy_pass http://vllm_backend;
}
Rate limiting
allImplement rate limiting to prevent repeated exploitation attempts
# Using nginx rate limiting
limit_req_zone $binary_remote_addr zone=vllm_limit:10m rate=10r/s;
location /v1/completions {
limit_req zone=vllm_limit burst=20;
proxy_pass http://vllm_backend;
}
🧯 If You Can't Patch
- Implement a reverse proxy or WAF with request validation to filter out empty prompts
- Monitor vLLM process health and implement automatic restart mechanisms
🔍 How to Verify
Check if Vulnerable:
Send a POST request to /v1/completions with {"prompt": ""} and observe if the server crashes
Check Version:
python -c "import vllm; print(vllm.__version__)"
Verify Fix Applied:
After patching, send the same empty prompt request and verify the server responds with an error instead of crashing
📡 Detection & Monitoring
Log Indicators:
- vLLM process crashes
- Connection resets on completions endpoint
- Error logs mentioning empty prompts or validation failures
Network Indicators:
- Multiple POST requests to /v1/completions with minimal payload size
- Sudden drop in successful completions responses
SIEM Query:
source="vllm.logs" AND ("crash" OR "segmentation fault" OR "empty prompt")