CVE-2021-23901
📋 TL;DR
This XXE vulnerability in Apache Nutch's DmozParser allows attackers to read arbitrary files from the server filesystem and potentially perform server-side request forgery. It affects all Nutch versions before 1.18 that process XML data through the vulnerable parser.
💻 Affected Systems
- Apache Nutch
📦 What is this software?
Nutch by Apache
⚠️ Risk & Real-World Impact
Worst Case
Complete server compromise through file disclosure of sensitive data (passwords, keys, configs) and potential remote code execution via SSRF to internal services.
Likely Case
Unauthorized file system access leading to information disclosure of configuration files, source code, or sensitive data stored on the server.
If Mitigated
Limited impact if XML parsing is disabled or external entity processing is blocked at network/application layers.
🎯 Exploit Status
XXE vulnerabilities are well-understood with many public exploit examples. The vulnerability requires XML input to the DmozParser but doesn't require authentication.
🛠️ Fix & Mitigation
✅ Official Fix
Patch Version: Apache Nutch 1.18
Vendor Advisory: https://lists.apache.org/thread.html/r5e2f7737b42c73a3325f3c2c8cdee1ec27631b3a0e144104d84d70e6%40%3Cannounce.apache.org%3E
Restart Required: Yes
Instructions:
1. Download Nutch 1.18 or later from Apache website. 2. Stop Nutch services. 3. Replace existing Nutch installation with patched version. 4. Restart Nutch services. 5. Verify version is 1.18+.
🔧 Temporary Workarounds
Disable DTD processing
allConfigure XML parser to disable Document Type Definition (DTD) processing which prevents XXE attacks
Set XML parser properties: setFeature("http://apache.org/xml/features/disallow-doctype-decl", true); setFeature("http://xml.org/sax/features/external-general-entities", false); setFeature("http://xml.org/sax/features/external-parameter-entities", false);
Input validation and filtering
allFilter or reject XML input containing DOCTYPE declarations or external entity references
Implement input validation to reject XML containing: <!DOCTYPE, <!ENTITY, SYSTEM, PUBLIC
🧯 If You Can't Patch
- Implement network segmentation to isolate Nutch instances from sensitive internal systems
- Deploy web application firewall (WAF) with XXE protection rules to block malicious XML payloads
🔍 How to Verify
Check if Vulnerable:
Check Nutch version: if version < 1.18 and using DmozParser for XML processing, system is vulnerable
Check Version:
nutch version (or check version in Nutch configuration files)
Verify Fix Applied:
Verify Nutch version is 1.18 or higher and test with XXE payloads to confirm they are rejected
📡 Detection & Monitoring
Log Indicators:
- Unusual file access patterns from Nutch process
- XML parsing errors containing external entity references
- Large XML payloads with DOCTYPE declarations
Network Indicators:
- XML requests containing SYSTEM or PUBLIC entities
- HTTP requests to internal resources from Nutch server
- File:// or other URI schemes in XML payloads
SIEM Query:
source="nutch.log" AND ("DOCTYPE" OR "ENTITY" OR "SYSTEM=" OR "PUBLIC=")
🔗 References
- https://issues.apache.org/jira/browse/NUTCH-2841
- https://lists.apache.org/thread.html/r090321840b44cc91086c4e317bf2baffa270749dde6c1273b6567f7c%40%3Cdev.nutch.apache.org%3E
- https://lists.apache.org/thread.html/r5e2f7737b42c73a3325f3c2c8cdee1ec27631b3a0e144104d84d70e6%40%3Cannounce.apache.org%3E
- https://lists.apache.org/thread.html/r7ddfd680aa7ea001ca8da63bb23e3f8caa095a8b4f2261e46bade5c7%40%3Cdev.nutch.apache.org%3E
- https://security.netapp.com/advisory/ntap-20210513-0003/
- https://issues.apache.org/jira/browse/NUTCH-2841
- https://lists.apache.org/thread.html/r090321840b44cc91086c4e317bf2baffa270749dde6c1273b6567f7c%40%3Cdev.nutch.apache.org%3E
- https://lists.apache.org/thread.html/r5e2f7737b42c73a3325f3c2c8cdee1ec27631b3a0e144104d84d70e6%40%3Cannounce.apache.org%3E
- https://lists.apache.org/thread.html/r7ddfd680aa7ea001ca8da63bb23e3f8caa095a8b4f2261e46bade5c7%40%3Cdev.nutch.apache.org%3E
- https://security.netapp.com/advisory/ntap-20210513-0003/