CVE-2025-29770
📋 TL;DR
This vulnerability in vLLM's outlines library cache allows denial of service attacks. A malicious user can send numerous short decoding requests with unique schemas, filling the local filesystem cache and causing service disruption. This affects vLLM deployments using the V0 engine with outlines backend enabled.
💻 Affected Systems
- vLLM
📦 What is this software?
Vllm by Vllm
⚠️ Risk & Real-World Impact
Worst Case
Complete service outage due to filesystem exhaustion, requiring manual cleanup and service restart.
Likely Case
Degraded performance and eventual service disruption as cache fills, requiring administrative intervention.
If Mitigated
Minimal impact with proper monitoring and disk space management in place.
🎯 Exploit Status
Exploitation requires sending numerous API requests with unique schemas, which is straightforward to automate.
🛠️ Fix & Mitigation
✅ Official Fix
Patch Version: 0.8.0
Vendor Advisory: https://github.com/vllm-project/vllm/security/advisories/GHSA-mgrm-fgjv-mhv8
Restart Required: Yes
Instructions:
1. Upgrade vLLM to version 0.8.0 or later. 2. Restart the vLLM service. 3. Verify the outlines cache behavior is properly configured.
🔧 Temporary Workarounds
Disable outlines cache
allDisable the outlines grammar cache to prevent filesystem exhaustion
Set environment variable: OUTLINES_CACHE_DIR='' or configure outlines to disable cache
Use different guided decoding backend
allConfigure vLLM to use a backend other than outlines for guided decoding
Set guided_decoding_backend to 'lmql' or other supported backend in vLLM configuration
🧯 If You Can't Patch
- Implement rate limiting on API endpoints to prevent rapid request flooding
- Monitor filesystem usage and set up alerts for abnormal disk consumption
🔍 How to Verify
Check if Vulnerable:
Check vLLM version and if using V0 engine with outlines backend enabled
Check Version:
python -c "import vllm; print(vllm.__version__)"
Verify Fix Applied:
Verify vLLM version is 0.8.0 or later and check outlines cache configuration
📡 Detection & Monitoring
Log Indicators:
- Rapid sequence of guided decoding requests with unique schemas
- Filesystem space warnings or errors
Network Indicators:
- High volume of short API requests to guided decoding endpoints
- Requests with unique schema patterns
SIEM Query:
source="vllm" AND ("guided_decoding" OR "outlines") AND count > 1000 per minute