CVE-2022-0239

9.8 CRITICAL

📋 TL;DR

CVE-2022-0239 is an XXE (XML External Entity) vulnerability in Stanford CoreNLP that allows attackers to read arbitrary files from the server filesystem or conduct server-side request forgery attacks. This affects any system running vulnerable versions of CoreNLP that processes untrusted XML input. The vulnerability is particularly dangerous because it can be exploited remotely without authentication.

💻 Affected Systems

Products:
  • Stanford CoreNLP
Versions: All versions before 4.4.0
Operating Systems: All operating systems running Java
Default Config Vulnerable: ⚠️ Yes
Notes: Any CoreNLP installation that processes XML input is vulnerable. The vulnerability is in the XML parsing functionality.

📦 What is this software?

⚠️ Risk & Real-World Impact

🔴

Worst Case

Complete server compromise including sensitive file disclosure (passwords, keys, configuration files), SSRF attacks to internal services, and potential remote code execution through file inclusion.

🟠

Likely Case

Arbitrary file read from the server filesystem, potentially exposing sensitive configuration files, credentials, or application data.

🟢

If Mitigated

Limited impact with proper input validation and XML parser configuration that disables external entity processing.

🌐 Internet-Facing: HIGH - The vulnerability can be exploited remotely without authentication when CoreNLP services are exposed to the internet.
🏢 Internal Only: MEDIUM - Internal attackers or compromised internal systems could exploit this to escalate privileges or access sensitive data.

🎯 Exploit Status

Public PoC: ⚠️ Yes
Weaponized: LIKELY
Unauthenticated Exploit: ⚠️ Yes
Complexity: LOW

Exploitation requires sending malicious XML payloads to CoreNLP endpoints. Public proof-of-concept examples demonstrate file disclosure attacks.

🛠️ Fix & Mitigation

✅ Official Fix

Patch Version: 4.4.0 and later

Vendor Advisory: https://github.com/stanfordnlp/corenlp/commit/1940ffb938dc4f3f5bc5f2a2fd8b35aabbbae3dd

Restart Required: Yes

Instructions:

1. Upgrade to CoreNLP version 4.4.0 or later. 2. Download from official GitHub releases. 3. Replace existing CoreNLP installation. 4. Restart any services using CoreNLP.

🔧 Temporary Workarounds

Disable XXE in XML parser

all

Configure XML parser to disable external entity processing before parsing untrusted XML

// Java code: DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);

Input validation and sanitization

all

Validate and sanitize XML input before processing, rejecting suspicious content

// Implement input validation to reject XML containing DOCTYPE declarations
// or external entity references before passing to CoreNLP

🧯 If You Can't Patch

  • Implement network segmentation to isolate CoreNLP instances from sensitive systems
  • Deploy WAF rules to block XML payloads containing DOCTYPE declarations or external entity references

🔍 How to Verify

Check if Vulnerable:

Check CoreNLP version: java -cp "stanford-corenlp-*.jar" edu.stanford.nlp.util.SystemUtils --version

Check Version:

java -cp "stanford-corenlp-*.jar" edu.stanford.nlp.util.SystemUtils --version

Verify Fix Applied:

Verify version is 4.4.0 or higher and test with XXE payloads that should be rejected

📡 Detection & Monitoring

Log Indicators:

  • XML parsing errors related to external entities
  • Unexpected file access patterns from CoreNLP process
  • HTTP requests containing XML with DOCTYPE declarations

Network Indicators:

  • HTTP POST requests with XML content to CoreNLP endpoints
  • Outbound connections from CoreNLP server to internal services

SIEM Query:

source="*corenlp*" AND ("DOCTYPE" OR "ENTITY" OR "SYSTEM")

🔗 References

📤 Share & Export