CVE-2025-29770

6.5 MEDIUM

📋 TL;DR

This vulnerability in vLLM's outlines library cache allows denial of service attacks. A malicious user can send numerous short decoding requests with unique schemas, filling the local filesystem cache and causing service disruption. This affects vLLM deployments using the V0 engine with outlines backend enabled.

💻 Affected Systems

Products:
  • vLLM
Versions: All versions before 0.8.0 using V0 engine
Operating Systems: All operating systems where vLLM is deployed
Default Config Vulnerable: ⚠️ Yes
Notes: Only affects V0 engine, outlines cache is enabled by default, and outlines backend is available through OpenAI compatible API server.

📦 What is this software?

⚠️ Risk & Real-World Impact

🔴

Worst Case

Complete service outage due to filesystem exhaustion, requiring manual cleanup and service restart.

🟠

Likely Case

Degraded performance and eventual service disruption as cache fills, requiring administrative intervention.

🟢

If Mitigated

Minimal impact with proper monitoring and disk space management in place.

🌐 Internet-Facing: HIGH - The vulnerability is exploitable via the OpenAI compatible API server which is typically internet-facing.
🏢 Internal Only: MEDIUM - Internal users could still exploit this, but attack surface is smaller.

🎯 Exploit Status

Public PoC: ✅ No
Weaponized: LIKELY
Unauthenticated Exploit: ⚠️ Yes
Complexity: LOW

Exploitation requires sending numerous API requests with unique schemas, which is straightforward to automate.

🛠️ Fix & Mitigation

✅ Official Fix

Patch Version: 0.8.0

Vendor Advisory: https://github.com/vllm-project/vllm/security/advisories/GHSA-mgrm-fgjv-mhv8

Restart Required: Yes

Instructions:

1. Upgrade vLLM to version 0.8.0 or later. 2. Restart the vLLM service. 3. Verify the outlines cache behavior is properly configured.

🔧 Temporary Workarounds

Disable outlines cache

all

Disable the outlines grammar cache to prevent filesystem exhaustion

Set environment variable: OUTLINES_CACHE_DIR='' or configure outlines to disable cache

Use different guided decoding backend

all

Configure vLLM to use a backend other than outlines for guided decoding

Set guided_decoding_backend to 'lmql' or other supported backend in vLLM configuration

🧯 If You Can't Patch

  • Implement rate limiting on API endpoints to prevent rapid request flooding
  • Monitor filesystem usage and set up alerts for abnormal disk consumption

🔍 How to Verify

Check if Vulnerable:

Check vLLM version and if using V0 engine with outlines backend enabled

Check Version:

python -c "import vllm; print(vllm.__version__)"

Verify Fix Applied:

Verify vLLM version is 0.8.0 or later and check outlines cache configuration

📡 Detection & Monitoring

Log Indicators:

  • Rapid sequence of guided decoding requests with unique schemas
  • Filesystem space warnings or errors

Network Indicators:

  • High volume of short API requests to guided decoding endpoints
  • Requests with unique schema patterns

SIEM Query:

source="vllm" AND ("guided_decoding" OR "outlines") AND count > 1000 per minute

🔗 References

📤 Share & Export