CVE-2024-12391

6.5 MEDIUM

📋 TL;DR

This vulnerability allows attackers to cause a denial of service by providing specially crafted regular expressions to the '解析项目源码(手动指定和筛选源码文件类型)' function in binary-husky/gpt_academic. The Python RE engine can be forced into exponential time execution, hanging the server indefinitely. Anyone using vulnerable versions of gpt_academic with user-controlled regex input is affected.

💻 Affected Systems

Products:
  • binary-husky/gpt_academic
Versions: Versions up to and including commit 310122f
Operating Systems: All platforms running Python
Default Config Vulnerable: ⚠️ Yes
Notes: Vulnerability exists when the '解析项目源码(手动指定和筛选源码文件类型)' function accepts user-provided regular expressions.

📦 What is this software?

⚠️ Risk & Real-World Impact

🔴

Worst Case

Complete server unavailability for extended periods, disrupting all services using the affected application.

🟠

Likely Case

Service degradation or temporary unavailability when malicious regex patterns are processed.

🟢

If Mitigated

Minimal impact with proper input validation and regex timeout mechanisms in place.

🌐 Internet-Facing: HIGH - If the vulnerable function is exposed to external users, attackers can easily trigger DoS.
🏢 Internal Only: MEDIUM - Internal users could still cause disruption, but attack surface is smaller.

🎯 Exploit Status

Public PoC: ⚠️ Yes
Weaponized: LIKELY
Unauthenticated Exploit: ⚠️ Yes
Complexity: LOW

Exploitation requires control over both regex pattern and search string, but this is often available in the affected function's use case.

🛠️ Fix & Mitigation

✅ Official Fix

Patch Version: Versions after commit 310122f

Vendor Advisory: https://huntr.com/bounties/70b3f4f0-6b1b-4563-a18c-fe46502e6ba0

Restart Required: Yes

Instructions:

1. Update to latest version of gpt_academic. 2. Ensure commit 310122f is not present. 3. Restart the application service.

🔧 Temporary Workarounds

Implement regex timeout

all

Add timeout mechanisms to regex execution using Python's signal module or regex timeouts

import signal
class TimeoutException(Exception): pass
def timeout_handler(signum, frame): raise TimeoutException()
signal.signal(signal.SIGALRM, timeout_handler)
signal.alarm(5)  # 5 second timeout
try: result = re.search(pattern, text) except TimeoutException: handle_timeout() finally: signal.alarm(0)

Input validation and sanitization

all

Validate and sanitize user-provided regex patterns before execution

import re
def sanitize_regex(pattern, max_length=100):
    if len(pattern) > max_length: raise ValueError('Pattern too long')
    # Add additional validation logic
    return pattern

🧯 If You Can't Patch

  • Implement strict input validation for regex patterns
  • Add rate limiting to the vulnerable endpoint
  • Monitor for abnormal CPU usage patterns
  • Isolate the vulnerable service in a container with resource limits

🔍 How to Verify

Check if Vulnerable:

Check if your gpt_academic version includes commit 310122f or earlier. Review if the '解析项目源码(手动指定和筛选源码文件类型)' function accepts user regex input without timeouts.

Check Version:

git log --oneline | head -20  # Check commit history for 310122f

Verify Fix Applied:

Verify the application version is newer than commit 310122f. Test that regex execution now has timeout protection.

📡 Detection & Monitoring

Log Indicators:

  • Unusually long processing times for regex operations
  • High CPU usage spikes from Python processes
  • Timeout errors from the affected function

Network Indicators:

  • Repeated requests to the vulnerable endpoint with similar payloads
  • Abnormal request patterns targeting regex functionality

SIEM Query:

source="application.logs" AND "解析项目源码" AND (duration>10s OR "timeout" OR "CPU")

🔗 References

📤 Share & Export