CVE-2021-41124

7.4 HIGH

📋 TL;DR

Scrapy-splash versions before 0.8.0 expose authentication credentials to unintended targets when using HttpAuthMiddleware for Splash authentication. This vulnerability leaks credentials to non-Splash requests including robots.txt requests, affecting users who configure Splash authentication globally rather than per-request.

💻 Affected Systems

Products:
  • scrapy-splash
Versions: All versions before 0.8.0
Operating Systems: all
Default Config Vulnerable: ✅ No
Notes: Only affects users who configure Splash authentication using HttpAuthMiddleware (http_user and http_pass attributes) instead of the proper Splash-specific authentication methods.

📦 What is this software?

⚠️ Risk & Real-World Impact

🔴

Worst Case

Authentication credentials are exposed to external servers during web scraping operations, potentially allowing credential harvesting and unauthorized access to Splash instances.

🟠

Likely Case

Credentials are unintentionally sent to robots.txt servers and other non-Splash targets during normal scraping operations, exposing authentication information to third parties.

🟢

If Mitigated

No credential exposure occurs when using proper authentication methods or when all requests are correctly routed through Splash.

🌐 Internet-Facing: HIGH
🏢 Internal Only: MEDIUM

🎯 Exploit Status

Public PoC: ✅ No
Weaponized: UNKNOWN
Unauthenticated Exploit: ✅ No
Complexity: LOW

Exploitation occurs passively through normal scraping operations when misconfigured; no active attack required.

🛠️ Fix & Mitigation

✅ Official Fix

Patch Version: 0.8.0

Vendor Advisory: https://github.com/scrapy-plugins/scrapy-splash/security/advisories/GHSA-823f-cwm9-4g74

Restart Required: No

Instructions:

1. Upgrade scrapy-splash to version 0.8.0 or later using pip install scrapy-splash==0.8.0
2. Replace http_user and http_pass attributes with SPLASH_USER and SPLASH_PASS settings in your Scrapy configuration
3. Verify authentication works correctly with the new settings

🔧 Temporary Workarounds

Per-request authentication

all

Set Splash authentication credentials on individual requests instead of globally

Use splash_headers parameter in each request: yield scrapy.Request(url, meta={'splash': {'args': {}, 'splash_headers': {'Authorization': 'Basic base64_encoded_credentials'}}})

Disable robots.txt middleware

all

Prevent robots.txt requests that would expose credentials

Set ROBOTSTXT_OBEY = False in Scrapy settings

🧯 If You Can't Patch

  • Ensure all requests go through Splash by configuring middleware appropriately
  • Monitor outgoing traffic for credential leakage to non-Splash targets

🔍 How to Verify

Check if Vulnerable:

Check if using scrapy-splash <0.8.0 AND using http_user/http_pass attributes for Splash authentication

Check Version:

pip show scrapy-splash | grep Version

Verify Fix Applied:

Confirm scrapy-splash version is >=0.8.0 AND using SPLASH_USER/SPLASH_PASS settings instead of http_user/http_pass

📡 Detection & Monitoring

Log Indicators:

  • Authentication failures on Splash server
  • Unexpected credential usage in access logs

Network Indicators:

  • HTTP Basic Auth headers sent to non-Splash endpoints
  • Credentials in robots.txt requests

SIEM Query:

http.request.method="GET" AND http.request.uri="robots.txt" AND http.headers.authorization EXISTS

🔗 References

📤 Share & Export