CVE-2024-3572

7.5 HIGH

📋 TL;DR

This vulnerability in the Scrapy web scraping framework allows attackers to perform XML External Entity (XXE) attacks by submitting malicious XML data. It affects systems using Scrapy to parse untrusted XML input, potentially leading to denial of service, local file access, or network reconnaissance. Any application using vulnerable Scrapy versions with XML parsing functionality is at risk.

💻 Affected Systems

Products:
  • Scrapy
Versions: All versions before the fix commit 809bfac4890f75fc73607318a04d2ccba71b3d9f
Operating Systems: All
Default Config Vulnerable: ⚠️ Yes
Notes: Only affects systems using Scrapy's XML parsing functionality with untrusted input.

📦 What is this software?

⚠️ Risk & Real-World Impact

🔴

Worst Case

Complete system compromise through file exfiltration, denial of service, or internal network reconnaissance via XXE attacks.

🟠

Likely Case

Denial of service attacks causing application instability or limited file access from the server's filesystem.

🟢

If Mitigated

Minimal impact with proper input validation and XML parser configuration in place.

🌐 Internet-Facing: HIGH
🏢 Internal Only: MEDIUM

🎯 Exploit Status

Public PoC: ⚠️ Yes
Weaponized: LIKELY
Unauthenticated Exploit: ⚠️ Yes
Complexity: LOW

XXE attacks are well-documented and easy to weaponize with available tools.

🛠️ Fix & Mitigation

✅ Official Fix

Patch Version: Versions including commit 809bfac4890f75fc73607318a04d2ccba71b3d9f

Vendor Advisory: https://github.com/scrapy/scrapy/commit/809bfac4890f75fc73607318a04d2ccba71b3d9f

Restart Required: No

Instructions:

1. Update Scrapy to latest version. 2. Verify commit 809bfac4890f75fc73607318a04d2ccba71b3d9f is included. 3. Test XML parsing functionality.

🔧 Temporary Workarounds

Disable XXE in lxml parser

all

Configure lxml parser to disable external entity resolution

parser = etree.XMLParser(resolve_entities=False, no_network=True)

Input validation

all

Validate and sanitize XML input before parsing

🧯 If You Can't Patch

  • Implement strict input validation for all XML data
  • Use alternative XML parsers with XXE protection enabled

🔍 How to Verify

Check if Vulnerable:

Check if Scrapy version predates commit 809bfac4890f75fc73607318a04d2ccba71b3d9f and uses lxml.etree.fromstring for XML parsing.

Check Version:

pip show scrapy | grep Version

Verify Fix Applied:

Verify Scrapy version includes the security commit and test with XXE payloads that should be rejected.

📡 Detection & Monitoring

Log Indicators:

  • Unusual XML parsing errors
  • Large XML file processing
  • External entity resolution attempts

Network Indicators:

  • Outbound connections from XML parsing processes
  • Unexpected file read operations

SIEM Query:

source="application.log" AND "XML" AND ("entity" OR "external" OR "DOCTYPE")

🔗 References

📤 Share & Export