CVE-2021-3869
📋 TL;DR
CVE-2021-3869 is an XXE (XML External Entity) vulnerability in Stanford CoreNLP that allows attackers to read arbitrary files from the server filesystem or conduct server-side request forgery attacks. This affects any system running vulnerable versions of CoreNLP that processes untrusted XML input. The vulnerability is particularly dangerous when CoreNLP is exposed to user-controlled XML data.
💻 Affected Systems
- Stanford CoreNLP
📦 What is this software?
Corenlp by Stanford
⚠️ Risk & Real-World Impact
Worst Case
Complete server compromise through file disclosure of sensitive data (passwords, keys, configuration files) or SSRF leading to internal network reconnaissance and potential lateral movement.
Likely Case
Unauthorized file read access to server files, potentially exposing sensitive configuration data, credentials, or application source code.
If Mitigated
Limited impact with proper input validation and XML parser configuration, potentially no exploitation if external entity processing is disabled.
🎯 Exploit Status
XXE vulnerabilities are well-understood with many public exploit examples. The vulnerability requires XML input processing but doesn't require authentication.
🛠️ Fix & Mitigation
✅ Official Fix
Patch Version: 4.3.0 and later
Vendor Advisory: https://github.com/stanfordnlp/corenlp/commit/5d83f1e8482ca304db8be726cad89554c88f136a
Restart Required: Yes
Instructions:
1. Update CoreNLP to version 4.3.0 or later. 2. Replace the corenlp.jar file with the patched version. 3. Restart any services using CoreNLP.
🔧 Temporary Workarounds
Disable XXE in XML parser
allConfigure XML parser to disable external entity processing
Set XML parser properties: FEATURE_SECURE_PROCESSING = true, DISALLOW_DOCTYPE_DECL = true, EXTERNAL_GENERAL_ENTITIES = false, EXTERNAL_PARAMETER_ENTITIES = false, LOAD_EXTERNAL_DTD = false
Input validation and sanitization
allValidate and sanitize XML input before processing
Implement XML schema validation
Remove DOCTYPE declarations from input
Use whitelist for allowed XML elements
🧯 If You Can't Patch
- Implement network segmentation to isolate CoreNLP instances from sensitive systems
- Deploy WAF rules to block XML containing DOCTYPE declarations or external entity references
🔍 How to Verify
Check if Vulnerable:
Check CoreNLP version: java -cp corenlp.jar edu.stanford.nlp.util.SystemUtils
Check Version:
java -cp corenlp.jar edu.stanford.nlp.util.SystemUtils | grep 'Stanford CoreNLP version'
Verify Fix Applied:
Verify version is 4.3.0 or later and test with XXE payload that should be rejected
📡 Detection & Monitoring
Log Indicators:
- XML parsing errors related to external entities
- Unexpected file read operations from CoreNLP process
- HTTP requests to internal resources from CoreNLP
Network Indicators:
- Outbound requests from CoreNLP to unexpected internal systems
- Large XML payloads containing DOCTYPE declarations
SIEM Query:
source="corenlp" AND (message="*DOCTYPE*" OR message="*ENTITY*" OR message="*external*" OR error="*XML*" OR error="*parse*")
🔗 References
- https://github.com/stanfordnlp/corenlp/commit/5d83f1e8482ca304db8be726cad89554c88f136a
- https://huntr.dev/bounties/2f8baf6c-14b3-420d-8ede-9805797cd324
- https://github.com/stanfordnlp/corenlp/commit/5d83f1e8482ca304db8be726cad89554c88f136a
- https://huntr.dev/bounties/2f8baf6c-14b3-420d-8ede-9805797cd324