CVE-2021-23901

9.1 CRITICAL

📋 TL;DR

This XXE vulnerability in Apache Nutch's DmozParser allows attackers to read arbitrary files from the server filesystem and potentially perform server-side request forgery. It affects all Nutch versions before 1.18 that process XML data through the vulnerable parser.

💻 Affected Systems

Products:
  • Apache Nutch
Versions: All versions < 1.18
Operating Systems: All
Default Config Vulnerable: ⚠️ Yes
Notes: Only affects systems using Nutch's DmozParser to process XML data. The vulnerability is in the parser itself, not dependent on specific configurations.

📦 What is this software?

⚠️ Risk & Real-World Impact

🔴

Worst Case

Complete server compromise through file disclosure of sensitive data (passwords, keys, configs) and potential remote code execution via SSRF to internal services.

🟠

Likely Case

Unauthorized file system access leading to information disclosure of configuration files, source code, or sensitive data stored on the server.

🟢

If Mitigated

Limited impact if XML parsing is disabled or external entity processing is blocked at network/application layers.

🌐 Internet-Facing: HIGH
🏢 Internal Only: MEDIUM

🎯 Exploit Status

Public PoC: ⚠️ Yes
Weaponized: LIKELY
Unauthenticated Exploit: ⚠️ Yes
Complexity: LOW

XXE vulnerabilities are well-understood with many public exploit examples. The vulnerability requires XML input to the DmozParser but doesn't require authentication.

🛠️ Fix & Mitigation

✅ Official Fix

Patch Version: Apache Nutch 1.18

Vendor Advisory: https://lists.apache.org/thread.html/r5e2f7737b42c73a3325f3c2c8cdee1ec27631b3a0e144104d84d70e6%40%3Cannounce.apache.org%3E

Restart Required: Yes

Instructions:

1. Download Nutch 1.18 or later from Apache website. 2. Stop Nutch services. 3. Replace existing Nutch installation with patched version. 4. Restart Nutch services. 5. Verify version is 1.18+.

🔧 Temporary Workarounds

Disable DTD processing

all

Configure XML parser to disable Document Type Definition (DTD) processing which prevents XXE attacks

Set XML parser properties: setFeature("http://apache.org/xml/features/disallow-doctype-decl", true); setFeature("http://xml.org/sax/features/external-general-entities", false); setFeature("http://xml.org/sax/features/external-parameter-entities", false);

Input validation and filtering

all

Filter or reject XML input containing DOCTYPE declarations or external entity references

Implement input validation to reject XML containing: <!DOCTYPE, <!ENTITY, SYSTEM, PUBLIC

🧯 If You Can't Patch

  • Implement network segmentation to isolate Nutch instances from sensitive internal systems
  • Deploy web application firewall (WAF) with XXE protection rules to block malicious XML payloads

🔍 How to Verify

Check if Vulnerable:

Check Nutch version: if version < 1.18 and using DmozParser for XML processing, system is vulnerable

Check Version:

nutch version (or check version in Nutch configuration files)

Verify Fix Applied:

Verify Nutch version is 1.18 or higher and test with XXE payloads to confirm they are rejected

📡 Detection & Monitoring

Log Indicators:

  • Unusual file access patterns from Nutch process
  • XML parsing errors containing external entity references
  • Large XML payloads with DOCTYPE declarations

Network Indicators:

  • XML requests containing SYSTEM or PUBLIC entities
  • HTTP requests to internal resources from Nutch server
  • File:// or other URI schemes in XML payloads

SIEM Query:

source="nutch.log" AND ("DOCTYPE" OR "ENTITY" OR "SYSTEM=" OR "PUBLIC=")

🔗 References

📤 Share & Export