CVE-2025-6211
📋 TL;DR
This vulnerability in the run-llama/llama_index library uses MD5 hashing to generate document chunk IDs, causing hash collisions when different chunks have identical text. This leads to data loss, broken document hierarchies, and inaccurate AI responses. Users of llama_index versions up to 0.12.28 who process documents with the DocugamiReader are affected.
💻 Affected Systems
- run-llama/llama_index
📦 What is this software?
Llamaindex by Llamaindex
⚠️ Risk & Real-World Impact
Worst Case
Critical document content is permanently lost, legal documents become incomplete or inaccurate, AI systems generate completely hallucinated responses based on corrupted data.
Likely Case
Some document chunks are overwritten, causing partial data loss and occasional inaccurate AI responses, particularly with repetitive or templated documents.
If Mitigated
Minor data inconsistencies in non-critical documents with minimal impact on overall system functionality.
🎯 Exploit Status
Exploitation requires feeding specific document structures to trigger hash collisions, but no authentication or special privileges are needed.
🛠️ Fix & Mitigation
✅ Official Fix
Patch Version: 0.3.1
Vendor Advisory: https://github.com/run-llama/llama_index/commit/29b2e07e64ed7d302b1cc058185560b28eaa1352
Restart Required: No
Instructions:
1. Update llama_index package: pip install llama_index==0.3.1 2. Verify the update completed successfully 3. Re-process any documents that were previously processed with vulnerable versions
🔧 Temporary Workarounds
Avoid DocugamiReader
allTemporarily use alternative document readers that don't have this vulnerability
Custom ID Generation
allOverride the ID generation method in DocugamiReader to use a collision-resistant hash function
🧯 If You Can't Patch
- Implement data validation checks to detect hash collisions in processed documents
- Maintain backup copies of all source documents before processing with vulnerable versions
🔍 How to Verify
Check if Vulnerable:
Check if using llama_index <= 0.12.28 and if DocugamiReader is used in your codebase
Check Version:
pip show llama_index | grep Version
Verify Fix Applied:
Verify llama_index version is 0.3.1 or higher and test document processing with known collision-triggering content
📡 Detection & Monitoring
Log Indicators:
- Repeated document chunk IDs in processing logs
- Missing expected document sections in output
SIEM Query:
Search for application logs containing 'DocugamiReader' AND 'MD5' OR 'hash collision'