CVE-2025-6211

6.5 MEDIUM

📋 TL;DR

This vulnerability in the run-llama/llama_index library uses MD5 hashing to generate document chunk IDs, causing hash collisions when different chunks have identical text. This leads to data loss, broken document hierarchies, and inaccurate AI responses. Users of llama_index versions up to 0.12.28 who process documents with the DocugamiReader are affected.

💻 Affected Systems

Products:
  • run-llama/llama_index
Versions: Versions up to and including 0.12.28
Operating Systems: All
Default Config Vulnerable: ⚠️ Yes
Notes: Only affects systems using the DocugamiReader class for document processing.

📦 What is this software?

⚠️ Risk & Real-World Impact

🔴

Worst Case

Critical document content is permanently lost, legal documents become incomplete or inaccurate, AI systems generate completely hallucinated responses based on corrupted data.

🟠

Likely Case

Some document chunks are overwritten, causing partial data loss and occasional inaccurate AI responses, particularly with repetitive or templated documents.

🟢

If Mitigated

Minor data inconsistencies in non-critical documents with minimal impact on overall system functionality.

🌐 Internet-Facing: MEDIUM
🏢 Internal Only: MEDIUM

🎯 Exploit Status

Public PoC: ✅ No
Weaponized: NO
Unauthenticated Exploit: ✅ No
Complexity: LOW

Exploitation requires feeding specific document structures to trigger hash collisions, but no authentication or special privileges are needed.

🛠️ Fix & Mitigation

✅ Official Fix

Patch Version: 0.3.1

Vendor Advisory: https://github.com/run-llama/llama_index/commit/29b2e07e64ed7d302b1cc058185560b28eaa1352

Restart Required: No

Instructions:

1. Update llama_index package: pip install llama_index==0.3.1 2. Verify the update completed successfully 3. Re-process any documents that were previously processed with vulnerable versions

🔧 Temporary Workarounds

Avoid DocugamiReader

all

Temporarily use alternative document readers that don't have this vulnerability

Custom ID Generation

all

Override the ID generation method in DocugamiReader to use a collision-resistant hash function

🧯 If You Can't Patch

  • Implement data validation checks to detect hash collisions in processed documents
  • Maintain backup copies of all source documents before processing with vulnerable versions

🔍 How to Verify

Check if Vulnerable:

Check if using llama_index <= 0.12.28 and if DocugamiReader is used in your codebase

Check Version:

pip show llama_index | grep Version

Verify Fix Applied:

Verify llama_index version is 0.3.1 or higher and test document processing with known collision-triggering content

📡 Detection & Monitoring

Log Indicators:

  • Repeated document chunk IDs in processing logs
  • Missing expected document sections in output

SIEM Query:

Search for application logs containing 'DocugamiReader' AND 'MD5' OR 'hash collision'

🔗 References

📤 Share & Export