CVE-2021-3878
📋 TL;DR
CVE-2021-3878 is an XML External Entity (XXE) vulnerability in Stanford CoreNLP that allows attackers to read arbitrary files from the server filesystem or conduct server-side request forgery attacks. This affects all systems running vulnerable versions of CoreNLP that process XML input. The vulnerability is particularly dangerous because it can be exploited remotely without authentication.
💻 Affected Systems
- Stanford CoreNLP
📦 What is this software?
Corenlp by Stanford
⚠️ Risk & Real-World Impact
Worst Case
Complete server compromise through file disclosure of sensitive data (passwords, keys, configuration files) leading to further attacks, or denial of service through resource exhaustion.
Likely Case
Unauthorized reading of sensitive files from the server filesystem, potentially exposing credentials, configuration data, or other sensitive information.
If Mitigated
Limited impact with proper network segmentation, file system permissions, and input validation controls in place.
🎯 Exploit Status
XXE vulnerabilities are well-understood and exploitation tools are widely available. The vulnerability requires XML input processing.
🛠️ Fix & Mitigation
✅ Official Fix
Patch Version: 4.3.0 and later
Vendor Advisory: https://github.com/stanfordnlp/corenlp/commit/e5bbe135a02a74b952396751ed3015e8b8252e99
Restart Required: Yes
Instructions:
1. Update Stanford CoreNLP to version 4.3.0 or later. 2. Download from official repository: https://stanfordnlp.github.io/CoreNLP/. 3. Replace existing CoreNLP installation. 4. Restart all CoreNLP services.
🔧 Temporary Workarounds
Disable XML external entity processing
allConfigure XML parsers to disable external entity resolution
Set XML parser properties: FEATURE_SECURE_PROCESSING = true, DISALLOW_DOCTYPE_DECL = true
Input validation and filtering
allImplement strict input validation to reject or sanitize XML containing external entity references
🧯 If You Can't Patch
- Implement network segmentation to isolate CoreNLP instances from sensitive systems
- Apply strict file system permissions to limit what files CoreNLP can access
🔍 How to Verify
Check if Vulnerable:
Check CoreNLP version: java -cp "stanford-corenlp-*.jar" edu.stanford.nlp.util.SystemUtils --version
Check Version:
java -cp "stanford-corenlp-*.jar" edu.stanford.nlp.util.SystemUtils --version
Verify Fix Applied:
Verify version is 4.3.0 or later and test with XXE payloads to confirm they are rejected
📡 Detection & Monitoring
Log Indicators:
- Unusual file access patterns from CoreNLP process
- XML parsing errors with external entity references
- Increased memory usage during XML processing
Network Indicators:
- XML payloads containing external entity declarations (DOCTYPE, SYSTEM, ENTITY)
- Outbound connections from CoreNLP to unexpected external systems
SIEM Query:
source="corenlp.log" AND ("DOCTYPE" OR "SYSTEM" OR "ENTITY")
🔗 References
- https://github.com/stanfordnlp/corenlp/commit/e5bbe135a02a74b952396751ed3015e8b8252e99
- https://huntr.dev/bounties/a11c889b-ccff-4fea-9e29-963a23a63dd2
- https://github.com/stanfordnlp/corenlp/commit/e5bbe135a02a74b952396751ed3015e8b8252e99
- https://huntr.dev/bounties/a11c889b-ccff-4fea-9e29-963a23a63dd2