CVE-2024-12391
📋 TL;DR
This vulnerability allows attackers to cause a denial of service by providing specially crafted regular expressions to the '解析项目源码(手动指定和筛选源码文件类型)' function in binary-husky/gpt_academic. The Python RE engine can be forced into exponential time execution, hanging the server indefinitely. Anyone using vulnerable versions of gpt_academic with user-controlled regex input is affected.
💻 Affected Systems
- binary-husky/gpt_academic
📦 What is this software?
Gpt Academic by Binary Husky
⚠️ Risk & Real-World Impact
Worst Case
Complete server unavailability for extended periods, disrupting all services using the affected application.
Likely Case
Service degradation or temporary unavailability when malicious regex patterns are processed.
If Mitigated
Minimal impact with proper input validation and regex timeout mechanisms in place.
🎯 Exploit Status
Exploitation requires control over both regex pattern and search string, but this is often available in the affected function's use case.
🛠️ Fix & Mitigation
✅ Official Fix
Patch Version: Versions after commit 310122f
Vendor Advisory: https://huntr.com/bounties/70b3f4f0-6b1b-4563-a18c-fe46502e6ba0
Restart Required: Yes
Instructions:
1. Update to latest version of gpt_academic. 2. Ensure commit 310122f is not present. 3. Restart the application service.
🔧 Temporary Workarounds
Implement regex timeout
allAdd timeout mechanisms to regex execution using Python's signal module or regex timeouts
import signal
class TimeoutException(Exception): pass
def timeout_handler(signum, frame): raise TimeoutException()
signal.signal(signal.SIGALRM, timeout_handler)
signal.alarm(5) # 5 second timeout
try: result = re.search(pattern, text) except TimeoutException: handle_timeout() finally: signal.alarm(0)
Input validation and sanitization
allValidate and sanitize user-provided regex patterns before execution
import re
def sanitize_regex(pattern, max_length=100):
if len(pattern) > max_length: raise ValueError('Pattern too long')
# Add additional validation logic
return pattern
🧯 If You Can't Patch
- Implement strict input validation for regex patterns
- Add rate limiting to the vulnerable endpoint
- Monitor for abnormal CPU usage patterns
- Isolate the vulnerable service in a container with resource limits
🔍 How to Verify
Check if Vulnerable:
Check if your gpt_academic version includes commit 310122f or earlier. Review if the '解析项目源码(手动指定和筛选源码文件类型)' function accepts user regex input without timeouts.
Check Version:
git log --oneline | head -20 # Check commit history for 310122f
Verify Fix Applied:
Verify the application version is newer than commit 310122f. Test that regex execution now has timeout protection.
📡 Detection & Monitoring
Log Indicators:
- Unusually long processing times for regex operations
- High CPU usage spikes from Python processes
- Timeout errors from the affected function
Network Indicators:
- Repeated requests to the vulnerable endpoint with similar payloads
- Abnormal request patterns targeting regex functionality
SIEM Query:
source="application.logs" AND "解析项目源码" AND (duration>10s OR "timeout" OR "CPU")