CVE-2025-62164

8.8 HIGH

📋 TL;DR

A memory corruption vulnerability in vLLM's Completions API endpoint allows attackers to send malicious prompt embeddings that bypass bounds checks and trigger out-of-bounds memory writes. This can cause denial-of-service crashes and potentially remote code execution on servers running vulnerable vLLM versions. Organizations using vLLM versions 0.10.2 through 0.11.0 for LLM inference are affected.

💻 Affected Systems

Products:
  • vLLM
Versions: 0.10.2 to 0.11.0
Operating Systems: All
Default Config Vulnerable: ⚠️ Yes
Notes: Requires PyTorch 2.8.0 or later where sparse tensor integrity checks are disabled by default.

📦 What is this software?

⚠️ Risk & Real-World Impact

🔴

Worst Case

Remote code execution on the vLLM server, allowing complete system compromise and data exfiltration.

🟠

Likely Case

Denial-of-service crashes disrupting LLM inference services and potentially corrupting model states.

🟢

If Mitigated

Service disruption limited to the affected vLLM instance if proper network segmentation and monitoring are in place.

🌐 Internet-Facing: HIGH
🏢 Internal Only: MEDIUM

🎯 Exploit Status

Public PoC: ✅ No
Weaponized: UNKNOWN
Unauthenticated Exploit: ⚠️ Yes
Complexity: MEDIUM

Exploitation requires crafting malicious tensors but leverages a known PyTorch behavior change.

🛠️ Fix & Mitigation

✅ Official Fix

Patch Version: 0.11.1

Vendor Advisory: https://github.com/vllm-project/vllm/security/advisories/GHSA-mrw7-hf4f-83pf

Restart Required: Yes

Instructions:

1. Update vLLM to version 0.11.1 or later using pip: pip install --upgrade vllm==0.11.1. 2. Restart all vLLM services. 3. Verify the patch is applied by checking the version.

🔧 Temporary Workarounds

Downgrade PyTorch

all

Revert to PyTorch version before 2.8.0 where sparse tensor integrity checks are enabled by default.

pip install torch==2.7.1

Disable Completions API

all

Temporarily disable the vulnerable Completions API endpoint if not required.

🧯 If You Can't Patch

  • Implement strict network access controls to limit Completions API access to trusted sources only.
  • Deploy vLLM instances in isolated containers with minimal privileges to limit potential RCE impact.

🔍 How to Verify

Check if Vulnerable:

Check vLLM version and PyTorch version: if vLLM is 0.10.2-0.11.0 and PyTorch is >=2.8.0, the system is vulnerable.

Check Version:

python -c "import vllm; print(vllm.__version__)"

Verify Fix Applied:

Confirm vLLM version is 0.11.1 or later and test the Completions API with valid embeddings to ensure normal operation.

📡 Detection & Monitoring

Log Indicators:

  • Unexpected crashes or segmentation faults in vLLM logs
  • Errors related to torch.load() or tensor processing in Completions API requests

Network Indicators:

  • Unusual spikes in requests to the /completions endpoint
  • Requests with abnormally large or malformed payloads

SIEM Query:

source="vllm.logs" AND ("segmentation fault" OR "torch.load" OR "to_dense")

🔗 References

📤 Share & Export