CVE-2024-8768

7.5 HIGH

📋 TL;DR

A denial-of-service vulnerability exists in vLLM where sending an empty prompt to the completions API causes the API server to crash. This affects any system running a vulnerable version of vLLM with the completions API exposed. The vulnerability is simple to exploit and can disrupt AI inference services.

💻 Affected Systems

Products:
  • vLLM
Versions: Versions before v0.5.3
Operating Systems: All
Default Config Vulnerable: ⚠️ Yes
Notes: Only affects systems where the completions API endpoint is accessible.

⚠️ Manual Verification Required

This CVE does not have specific version information in our database, so automatic vulnerability detection cannot determine if your system is affected.

Why? The CVE database entry doesn't specify which versions are vulnerable (no version ranges provided by the vendor/NVD).

🔒 Custom verification scripts are available for registered users. Sign up free to download automated test scripts.

Recommended Actions:
  1. Review the CVE details at NVD
  2. Check vendor security advisories for your specific version
  3. Test if the vulnerability is exploitable in your environment
  4. Consider updating to the latest version as a precaution

⚠️ Risk & Real-World Impact

🔴

Worst Case

An attacker could repeatedly crash the vLLM API server, causing sustained service unavailability and disrupting all AI inference capabilities.

🟠

Likely Case

Accidental or malicious empty prompts cause intermittent service outages requiring manual server restarts.

🟢

If Mitigated

With proper input validation and rate limiting, the impact is limited to occasional crashes that are automatically recovered.

🌐 Internet-Facing: HIGH
🏢 Internal Only: MEDIUM

🎯 Exploit Status

Public PoC: ⚠️ Yes
Weaponized: LIKELY
Unauthenticated Exploit: ⚠️ Yes
Complexity: LOW

Exploitation requires only sending a simple HTTP request with an empty prompt field.

🛠️ Fix & Mitigation

✅ Official Fix

Patch Version: v0.5.3

Vendor Advisory: https://github.com/vllm-project/vllm/pull/7746

Restart Required: Yes

Instructions:

1. Update vLLM to version 0.5.3 or later using pip: pip install --upgrade vllm==0.5.3
2. Restart the vLLM API server
3. Verify the fix by testing with empty prompts

🔧 Temporary Workarounds

Input validation at proxy layer

all

Add request validation to reject empty prompts before they reach vLLM

# Example nginx config location block
location /v1/completions {
    if ($request_body ~* '"prompt":\s*""') {
        return 400;
    }
    proxy_pass http://vllm_backend;
}

Rate limiting

all

Implement rate limiting to prevent repeated exploitation attempts

# Using nginx rate limiting
limit_req_zone $binary_remote_addr zone=vllm_limit:10m rate=10r/s;

location /v1/completions {
    limit_req zone=vllm_limit burst=20;
    proxy_pass http://vllm_backend;
}

🧯 If You Can't Patch

  • Implement a reverse proxy or WAF with request validation to filter out empty prompts
  • Monitor vLLM process health and implement automatic restart mechanisms

🔍 How to Verify

Check if Vulnerable:

Send a POST request to /v1/completions with {"prompt": ""} and observe if the server crashes

Check Version:

python -c "import vllm; print(vllm.__version__)"

Verify Fix Applied:

After patching, send the same empty prompt request and verify the server responds with an error instead of crashing

📡 Detection & Monitoring

Log Indicators:

  • vLLM process crashes
  • Connection resets on completions endpoint
  • Error logs mentioning empty prompts or validation failures

Network Indicators:

  • Multiple POST requests to /v1/completions with minimal payload size
  • Sudden drop in successful completions responses

SIEM Query:

source="vllm.logs" AND ("crash" OR "segmentation fault" OR "empty prompt")

🔗 References

📤 Share & Export