CVE-2024-52595
📋 TL;DR
This vulnerability in lxml_html_clean allows attackers to bypass HTML sanitization by exploiting differences in how browsers versus the library parse special tags like <svg>, <math>, and <noscript>. Malicious scripts hidden in CSS comments can execute despite cleaning, leading to cross-site scripting attacks. Users who process untrusted HTML with lxml_html_clean versions before 0.4.0 are affected.
💻 Affected Systems
- lxml_html_clean
📦 What is this software?
Lxml Html Clean by Fedoralovespython
⚠️ Risk & Real-World Impact
Worst Case
Successful XSS attacks leading to session hijacking, credential theft, or complete compromise of user accounts in web applications.
Likely Case
Limited XSS attacks affecting users who interact with malicious content, potentially stealing cookies or performing actions on their behalf.
If Mitigated
No impact if proper sanitization controls are implemented or the library is upgraded.
🎯 Exploit Status
Exploitation requires crafting HTML with malicious CSS comments in special tags, which is straightforward for attackers familiar with XSS techniques.
🛠️ Fix & Mitigation
✅ Official Fix
Patch Version: 0.4.0
Vendor Advisory: https://github.com/fedora-python/lxml_html_clean/security/advisories/GHSA-5jfw-gq64-q45f
Restart Required: No
Instructions:
1. Upgrade lxml_html_clean to version 0.4.0 or later using pip: pip install --upgrade lxml_html_clean. 2. Verify the upgrade with pip show lxml_html_clean.
🔧 Temporary Workarounds
Configure tag restrictions
allUse lxml_html_clean configuration options to restrict or remove vulnerable tags.
cleaner = Cleaner(remove_tags=['svg', 'math', 'noscript'], kill_tags=['svg', 'math', 'noscript'], allow_tags=[list_of_safe_tags])
🧯 If You Can't Patch
- Implement additional HTML sanitization layers using alternative libraries or custom validation.
- Disable processing of untrusted HTML input until the patch can be applied.
🔍 How to Verify
Check if Vulnerable:
Check the installed version of lxml_html_clean; if it's below 0.4.0, the system is vulnerable.
Check Version:
pip show lxml_html_clean | grep Version
Verify Fix Applied:
Confirm the version is 0.4.0 or higher and test with sample HTML containing CSS comments in <svg>, <math>, or <noscript> tags to ensure sanitization works.
📡 Detection & Monitoring
Log Indicators:
- Unusual HTML input patterns with CSS comments in special tags
- Errors or warnings from HTML sanitization processes
Network Indicators:
- HTTP requests containing crafted HTML with CSS comments in <svg>, <math>, or <noscript> tags
SIEM Query:
source="web_logs" AND (html_content CONTAINS "/*" AND html_content CONTAINS "*/" AND (html_content CONTAINS "<svg>" OR html_content CONTAINS "<math>" OR html_content CONTAINS "<noscript>"))