CVE-2022-0239
📋 TL;DR
CVE-2022-0239 is an XXE (XML External Entity) vulnerability in Stanford CoreNLP that allows attackers to read arbitrary files from the server filesystem or conduct server-side request forgery attacks. This affects any system running vulnerable versions of CoreNLP that processes untrusted XML input. The vulnerability is particularly dangerous because it can be exploited remotely without authentication.
💻 Affected Systems
- Stanford CoreNLP
📦 What is this software?
Corenlp by Stanford
⚠️ Risk & Real-World Impact
Worst Case
Complete server compromise including sensitive file disclosure (passwords, keys, configuration files), SSRF attacks to internal services, and potential remote code execution through file inclusion.
Likely Case
Arbitrary file read from the server filesystem, potentially exposing sensitive configuration files, credentials, or application data.
If Mitigated
Limited impact with proper input validation and XML parser configuration that disables external entity processing.
🎯 Exploit Status
Exploitation requires sending malicious XML payloads to CoreNLP endpoints. Public proof-of-concept examples demonstrate file disclosure attacks.
🛠️ Fix & Mitigation
✅ Official Fix
Patch Version: 4.4.0 and later
Vendor Advisory: https://github.com/stanfordnlp/corenlp/commit/1940ffb938dc4f3f5bc5f2a2fd8b35aabbbae3dd
Restart Required: Yes
Instructions:
1. Upgrade to CoreNLP version 4.4.0 or later. 2. Download from official GitHub releases. 3. Replace existing CoreNLP installation. 4. Restart any services using CoreNLP.
🔧 Temporary Workarounds
Disable XXE in XML parser
allConfigure XML parser to disable external entity processing before parsing untrusted XML
// Java code: DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
Input validation and sanitization
allValidate and sanitize XML input before processing, rejecting suspicious content
// Implement input validation to reject XML containing DOCTYPE declarations
// or external entity references before passing to CoreNLP
🧯 If You Can't Patch
- Implement network segmentation to isolate CoreNLP instances from sensitive systems
- Deploy WAF rules to block XML payloads containing DOCTYPE declarations or external entity references
🔍 How to Verify
Check if Vulnerable:
Check CoreNLP version: java -cp "stanford-corenlp-*.jar" edu.stanford.nlp.util.SystemUtils --version
Check Version:
java -cp "stanford-corenlp-*.jar" edu.stanford.nlp.util.SystemUtils --version
Verify Fix Applied:
Verify version is 4.4.0 or higher and test with XXE payloads that should be rejected
📡 Detection & Monitoring
Log Indicators:
- XML parsing errors related to external entities
- Unexpected file access patterns from CoreNLP process
- HTTP requests containing XML with DOCTYPE declarations
Network Indicators:
- HTTP POST requests with XML content to CoreNLP endpoints
- Outbound connections from CoreNLP server to internal services
SIEM Query:
source="*corenlp*" AND ("DOCTYPE" OR "ENTITY" OR "SYSTEM")
🔗 References
- https://github.com/stanfordnlp/corenlp/commit/1940ffb938dc4f3f5bc5f2a2fd8b35aabbbae3dd
- https://huntr.dev/bounties/a717aec2-5646-4a5f-ade0-dadc25736ae3
- https://github.com/stanfordnlp/corenlp/commit/1940ffb938dc4f3f5bc5f2a2fd8b35aabbbae3dd
- https://huntr.dev/bounties/a717aec2-5646-4a5f-ade0-dadc25736ae3