CVE-2021-3878

9.8 CRITICAL
XXE

📋 TL;DR

CVE-2021-3878 is an XML External Entity (XXE) vulnerability in Stanford CoreNLP that allows attackers to read arbitrary files from the server filesystem or conduct server-side request forgery attacks. This affects all systems running vulnerable versions of CoreNLP that process XML input. The vulnerability is particularly dangerous because it can be exploited remotely without authentication.

💻 Affected Systems

Products:
  • Stanford CoreNLP
Versions: All versions prior to 4.3.0
Operating Systems: All operating systems running Java
Default Config Vulnerable: ⚠️ Yes
Notes: Any CoreNLP instance that processes XML input is vulnerable. The vulnerability is in the XML parsing functionality.

📦 What is this software?

⚠️ Risk & Real-World Impact

🔴

Worst Case

Complete server compromise through file disclosure of sensitive data (passwords, keys, configuration files) leading to further attacks, or denial of service through resource exhaustion.

🟠

Likely Case

Unauthorized reading of sensitive files from the server filesystem, potentially exposing credentials, configuration data, or other sensitive information.

🟢

If Mitigated

Limited impact with proper network segmentation, file system permissions, and input validation controls in place.

🌐 Internet-Facing: HIGH - The vulnerability can be exploited remotely without authentication, making internet-facing instances particularly vulnerable.
🏢 Internal Only: MEDIUM - Internal systems are still vulnerable but may have additional network controls that reduce attack surface.

🎯 Exploit Status

Public PoC: ⚠️ Yes
Weaponized: LIKELY
Unauthenticated Exploit: ⚠️ Yes
Complexity: LOW

XXE vulnerabilities are well-understood and exploitation tools are widely available. The vulnerability requires XML input processing.

🛠️ Fix & Mitigation

✅ Official Fix

Patch Version: 4.3.0 and later

Vendor Advisory: https://github.com/stanfordnlp/corenlp/commit/e5bbe135a02a74b952396751ed3015e8b8252e99

Restart Required: Yes

Instructions:

1. Update Stanford CoreNLP to version 4.3.0 or later. 2. Download from official repository: https://stanfordnlp.github.io/CoreNLP/. 3. Replace existing CoreNLP installation. 4. Restart all CoreNLP services.

🔧 Temporary Workarounds

Disable XML external entity processing

all

Configure XML parsers to disable external entity resolution

Set XML parser properties: FEATURE_SECURE_PROCESSING = true, DISALLOW_DOCTYPE_DECL = true

Input validation and filtering

all

Implement strict input validation to reject or sanitize XML containing external entity references

🧯 If You Can't Patch

  • Implement network segmentation to isolate CoreNLP instances from sensitive systems
  • Apply strict file system permissions to limit what files CoreNLP can access

🔍 How to Verify

Check if Vulnerable:

Check CoreNLP version: java -cp "stanford-corenlp-*.jar" edu.stanford.nlp.util.SystemUtils --version

Check Version:

java -cp "stanford-corenlp-*.jar" edu.stanford.nlp.util.SystemUtils --version

Verify Fix Applied:

Verify version is 4.3.0 or later and test with XXE payloads to confirm they are rejected

📡 Detection & Monitoring

Log Indicators:

  • Unusual file access patterns from CoreNLP process
  • XML parsing errors with external entity references
  • Increased memory usage during XML processing

Network Indicators:

  • XML payloads containing external entity declarations (DOCTYPE, SYSTEM, ENTITY)
  • Outbound connections from CoreNLP to unexpected external systems

SIEM Query:

source="corenlp.log" AND ("DOCTYPE" OR "SYSTEM" OR "ENTITY")

🔗 References

📤 Share & Export