CVE-2021-29421

7.5 HIGH

📋 TL;DR

This vulnerability in pikepdf allows XML External Entity (XXE) attacks when parsing XMP metadata in PDF files. Attackers can exploit this to read arbitrary files from the server filesystem or conduct server-side request forgery. Any Python application using vulnerable versions of pikepdf to process untrusted PDF files is affected.

💻 Affected Systems

Products:
  • pikepdf
Versions: 1.3.0 through 2.9.2
Operating Systems: All operating systems running Python
Default Config Vulnerable: ⚠️ Yes
Notes: Only affects applications that process PDF files with XMP metadata from untrusted sources.

📦 What is this software?

⚠️ Risk & Real-World Impact

🔴

Worst Case

Complete server compromise through file disclosure leading to credential theft, SSRF attacks on internal services, or denial of service via resource exhaustion.

🟠

Likely Case

Unauthorized file disclosure from the server filesystem, potentially exposing sensitive configuration files, credentials, or application source code.

🟢

If Mitigated

Limited impact with proper input validation and file processing restrictions in place, potentially only partial file disclosure.

🌐 Internet-Facing: HIGH
🏢 Internal Only: MEDIUM

🎯 Exploit Status

Public PoC: ⚠️ Yes
Weaponized: LIKELY
Unauthenticated Exploit: ⚠️ Yes
Complexity: LOW

Exploitation requires only a malicious PDF file with crafted XMP metadata. XXE attacks are well-documented and easily weaponized.

🛠️ Fix & Mitigation

✅ Official Fix

Patch Version: 2.10.0 and later

Vendor Advisory: https://github.com/pikepdf/pikepdf/blob/v2.10.0/docs/release_notes.rst#v2100

Restart Required: No

Instructions:

1. Upgrade pikepdf to version 2.10.0 or later using pip: pip install --upgrade pikepdf>=2.10.0
2. Verify the upgrade with: pip show pikepdf
3. Test PDF processing functionality after upgrade.

🔧 Temporary Workarounds

Disable XML external entity processing

all

Configure XML parser to disable external entity resolution before passing to pikepdf

# In Python code, configure XML parser:
import defusedxml.lxml
from lxml import etree
parser = etree.XMLParser(resolve_entities=False, no_network=True)

Input validation and sanitization

all

Validate PDF files before processing and reject files with XMP metadata

# Check for XMP metadata before processing
import pikepdf
with pikepdf.open('file.pdf') as pdf:
    if hasattr(pdf, 'Root') and '/Metadata' in pdf.Root:
        raise ValueError('PDF contains XMP metadata - reject processing')

🧯 If You Can't Patch

  • Implement strict file upload validation to reject PDFs with XMP metadata
  • Run pikepdf in isolated containers with minimal filesystem access and network restrictions

🔍 How to Verify

Check if Vulnerable:

Check pikepdf version: pip show pikepdf | grep Version. If version is between 1.3.0 and 2.9.2 inclusive, the system is vulnerable.

Check Version:

pip show pikepdf | grep Version

Verify Fix Applied:

Verify pikepdf version is 2.10.0 or later: pip show pikepdf | grep Version. Test with a PDF containing XMP metadata to ensure proper handling.

📡 Detection & Monitoring

Log Indicators:

  • Unusual file access patterns from PDF processing service
  • Large XML parsing errors in application logs
  • Outbound network connections initiated during PDF processing

Network Indicators:

  • HTTP requests to internal services from PDF processing host
  • DNS queries for internal hostnames during file processing

SIEM Query:

source="application.log" AND "pikepdf" AND ("XML" OR "XMP" OR "metadata") AND ("error" OR "exception" OR "failed")

🔗 References

📤 Share & Export