CVE-2021-41124
📋 TL;DR
Scrapy-splash versions before 0.8.0 expose authentication credentials to unintended targets when using HttpAuthMiddleware for Splash authentication. This vulnerability leaks credentials to non-Splash requests including robots.txt requests, affecting users who configure Splash authentication globally rather than per-request.
💻 Affected Systems
- scrapy-splash
📦 What is this software?
⚠️ Risk & Real-World Impact
Worst Case
Authentication credentials are exposed to external servers during web scraping operations, potentially allowing credential harvesting and unauthorized access to Splash instances.
Likely Case
Credentials are unintentionally sent to robots.txt servers and other non-Splash targets during normal scraping operations, exposing authentication information to third parties.
If Mitigated
No credential exposure occurs when using proper authentication methods or when all requests are correctly routed through Splash.
🎯 Exploit Status
Exploitation occurs passively through normal scraping operations when misconfigured; no active attack required.
🛠️ Fix & Mitigation
✅ Official Fix
Patch Version: 0.8.0
Vendor Advisory: https://github.com/scrapy-plugins/scrapy-splash/security/advisories/GHSA-823f-cwm9-4g74
Restart Required: No
Instructions:
1. Upgrade scrapy-splash to version 0.8.0 or later using pip install scrapy-splash==0.8.0
2. Replace http_user and http_pass attributes with SPLASH_USER and SPLASH_PASS settings in your Scrapy configuration
3. Verify authentication works correctly with the new settings
🔧 Temporary Workarounds
Per-request authentication
allSet Splash authentication credentials on individual requests instead of globally
Use splash_headers parameter in each request: yield scrapy.Request(url, meta={'splash': {'args': {}, 'splash_headers': {'Authorization': 'Basic base64_encoded_credentials'}}})
Disable robots.txt middleware
allPrevent robots.txt requests that would expose credentials
Set ROBOTSTXT_OBEY = False in Scrapy settings
🧯 If You Can't Patch
- Ensure all requests go through Splash by configuring middleware appropriately
- Monitor outgoing traffic for credential leakage to non-Splash targets
🔍 How to Verify
Check if Vulnerable:
Check if using scrapy-splash <0.8.0 AND using http_user/http_pass attributes for Splash authentication
Check Version:
pip show scrapy-splash | grep Version
Verify Fix Applied:
Confirm scrapy-splash version is >=0.8.0 AND using SPLASH_USER/SPLASH_PASS settings instead of http_user/http_pass
📡 Detection & Monitoring
Log Indicators:
- Authentication failures on Splash server
- Unexpected credential usage in access logs
Network Indicators:
- HTTP Basic Auth headers sent to non-Splash endpoints
- Credentials in robots.txt requests
SIEM Query:
http.request.method="GET" AND http.request.uri="robots.txt" AND http.headers.authorization EXISTS
🔗 References
- https://github.com/scrapy-plugins/scrapy-splash/commit/2b253e57fe64ec575079c8cdc99fe2013502ea31
- https://github.com/scrapy-plugins/scrapy-splash/security/advisories/GHSA-823f-cwm9-4g74
- https://github.com/scrapy-plugins/scrapy-splash/commit/2b253e57fe64ec575079c8cdc99fe2013502ea31
- https://github.com/scrapy-plugins/scrapy-splash/security/advisories/GHSA-823f-cwm9-4g74