CVE-2021-43854 – CVE-2021-43854 is a regular expression denial o... (How to Fix)

📋 TL;DR

CVE-2021-43854 is a regular expression denial of service (ReDoS) vulnerability in NLTK's tokenization functions. Attackers can craft malicious input to cause excessive CPU consumption and service degradation. Users of NLTK's PunktSentenceTokenizer, sent_tokenize, or word_tokenize functions with untrusted input are affected.

💻 Affected Systems

Products:

NLTK (Natural Language Toolkit)

Versions: Versions prior to 3.6.5

Operating Systems: All platforms running Python

Default Config Vulnerable: ⚠️ Yes

Notes: Only affects systems using PunktSentenceTokenizer, sent_tokenize, or word_tokenize functions with untrusted input.

📦 What is this software?

Nltk by Nltk

View all CVEs affecting Nltk →

⚠️ Risk & Real-World Impact

🔴

Worst Case

Complete service unavailability due to resource exhaustion, potentially affecting downstream applications that depend on NLTK tokenization.

🟠

Likely Case

Degraded performance and increased response times for applications processing user-provided text, leading to partial service disruption.

🟢

If Mitigated

Minimal impact with proper input validation and version upgrades, maintaining normal service functionality.

🌐 Internet-Facing: HIGH

🏢 Internal Only: MEDIUM

🎯 Exploit Status

Public PoC: ⚠️ Yes

Weaponized: LIKELY

Unauthenticated Exploit: ⚠️ Yes

Complexity: LOW

Exploitation requires crafting specific input patterns but doesn't require authentication or special privileges.

🛠️ Fix & Mitigation

✅ Official Fix

Patch Version: 3.6.5 and later

Vendor Advisory: https://github.com/nltk/nltk/security/advisories/GHSA-f8m6-h2c7-8h9x

Restart Required: No

Instructions:

1. Upgrade NLTK using pip: pip install --upgrade nltk>=3.6.5
2. Verify installation: pip show nltk
3. Test tokenization functions with sample inputs to ensure functionality.

🔧 Temporary Workarounds

Input Length Limitation

all

Limit maximum input length to vulnerable tokenization functions to bound execution time.

# Python example
MAX_INPUT_LENGTH = 10000
if len(user_input) > MAX_INPUT_LENGTH:
    raise ValueError('Input too long')
# Then call tokenization functions

🧯 If You Can't Patch

Implement strict input validation and length limits on all user-provided text before tokenization.
Deploy rate limiting and monitoring to detect abnormal resource consumption patterns.

🔍 How to Verify

Check if Vulnerable:

Check NLTK version: python -c "import nltk; print(nltk.__version__)" - if version < 3.6.5, system is vulnerable.

Check Version:

python -c "import nltk; print('NLTK version:', nltk.__version__)"

Verify Fix Applied:

After upgrade, verify version is >=3.6.5 and test tokenization with various inputs to ensure normal performance.

📡 Detection & Monitoring

Log Indicators:

Unusually long execution times for tokenization functions
High CPU usage by Python processes running NLTK
Application timeouts or degraded performance

Network Indicators:

Increased response times for text processing endpoints
Service degradation patterns

SIEM Query:

Processes with high CPU usage AND command line containing 'python' AND (nltk OR tokenize)

📊 Metadata

CVE ID: CVE-2021-43854

CVSS v3 Score: 7.5 (HIGH)

CVSS Vector: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H

CWE: CWE-400

Published: December 23, 2021

Last Updated: November 21, 2024

🔗 References

https://github.com/nltk/nltk/commit/1405aad979c6b8080dbbc8e0858f89b2e3690341 Patch,Third Party Advisory
https://github.com/nltk/nltk/issues/2866 Exploit,Issue Tracking,Patch,Third Party Advisory
https://github.com/nltk/nltk/pull/2869 Exploit,Patch,Third Party Advisory
https://github.com/nltk/nltk/security/advisories/GHSA-f8m6-h2c7-8h9x Exploit,Patch,Third Party Advisory
https://github.com/nltk/nltk/commit/1405aad979c6b8080dbbc8e0858f89b2e3690341 Patch,Third Party Advisory
https://github.com/nltk/nltk/issues/2866 Exploit,Issue Tracking,Patch,Third Party Advisory
https://github.com/nltk/nltk/pull/2869 Exploit,Patch,Third Party Advisory
https://github.com/nltk/nltk/security/advisories/GHSA-f8m6-h2c7-8h9x Exploit,Patch,Third Party Advisory

📤 Share & Export

📄 Export Markdown 📋 Export JSON

🔗 Related Vulnerabilities

If you're affected by CVE-2021-43854, you might also want to check these: