CVE-2025-23333

5.9 MEDIUM

📋 TL;DR

CVE-2025-23333 is an out-of-bounds read vulnerability in NVIDIA Triton Inference Server's Python backend that allows attackers to read memory beyond allocated bounds by manipulating shared memory data. This could lead to information disclosure of sensitive data from the server's memory. Organizations using NVIDIA Triton Inference Server with Python backend on Windows or Linux are affected.

💻 Affected Systems

Products:
  • NVIDIA Triton Inference Server
Versions: Versions prior to 24.09
Operating Systems: Windows, Linux
Default Config Vulnerable: ⚠️ Yes
Notes: Only affects deployments using the Python backend with shared memory enabled. Other backends (TensorRT, ONNX Runtime, etc.) are not affected.

📦 What is this software?

⚠️ Risk & Real-World Impact

🔴

Worst Case

An attacker could read sensitive information from server memory, potentially exposing model weights, inference data, credentials, or other confidential information stored in adjacent memory regions.

🟠

Likely Case

Information disclosure of random memory contents, potentially including fragments of inference data or server state information, but not reliable extraction of specific sensitive data.

🟢

If Mitigated

Limited impact with proper network segmentation and access controls, potentially only exposing non-sensitive memory fragments.

🌐 Internet-Facing: MEDIUM - While the vulnerability requires specific conditions and manipulation of shared memory, internet-facing inference servers could be targeted for information disclosure attacks.
🏢 Internal Only: LOW - Internal-only deployments with proper access controls have reduced exposure, though the vulnerability could still be exploited by authenticated internal users.

🎯 Exploit Status

Public PoC: ✅ No
Weaponized: UNKNOWN
Unauthenticated Exploit: ✅ No
Complexity: MEDIUM

Exploitation requires the attacker to have ability to manipulate shared memory data used by the Python backend, which typically requires some level of access or ability to submit specially crafted inference requests.

🛠️ Fix & Mitigation

✅ Official Fix

Patch Version: 24.09 or later

Vendor Advisory: https://nvidia.custhelp.com/app/answers/detail/a_id/5687

Restart Required: No

Instructions:

1. Download NVIDIA Triton Inference Server version 24.09 or later from NVIDIA NGC. 2. Replace existing Triton installation with the updated version. 3. No server restart is required for hot-swapping in most configurations.

🔧 Temporary Workarounds

Disable Python Backend Shared Memory

all

Disable shared memory usage in Python backend configuration to prevent the vulnerability from being exploited.

Set 'shared-memory' parameter to 'off' in Python backend configuration

Use Alternative Backends

all

Switch to non-Python backends (TensorRT, ONNX Runtime, etc.) for inference workloads where possible.

Configure model repositories to use non-Python backends

🧯 If You Can't Patch

  • Implement strict network access controls to limit who can submit inference requests to the Triton server
  • Monitor for unusual memory access patterns or abnormal inference request patterns

🔍 How to Verify

Check if Vulnerable:

Check Triton server version with: 'tritonserver --version' and verify if it's below 24.09

Check Version:

tritonserver --version

Verify Fix Applied:

Confirm version is 24.09 or higher with: 'tritonserver --version'

📡 Detection & Monitoring

Log Indicators:

  • Unusual Python backend errors related to memory access
  • Abnormal shared memory operations in system logs

Network Indicators:

  • Unusual patterns of inference requests targeting Python backend
  • Multiple rapid requests with varying shared memory parameters

SIEM Query:

source="triton" AND (error="memory" OR error="out of bounds" OR error="shared memory")

🔗 References

📤 Share & Export