CVE-2025-33202

6.5 MEDIUM

📋 TL;DR

NVIDIA Triton Inference Server contains a stack overflow vulnerability where attackers can send extra-large payloads to cause denial of service. This affects all deployments of Triton Inference Server on Linux and Windows systems. The vulnerability allows remote attackers to crash the inference service.

💻 Affected Systems

Products:
  • NVIDIA Triton Inference Server
Versions: All versions prior to the fix
Operating Systems: Linux, Windows
Default Config Vulnerable: ⚠️ Yes
Notes: All deployments using default configuration are vulnerable. The vulnerability affects both HTTP and gRPC endpoints.

📦 What is this software?

⚠️ Risk & Real-World Impact

🔴

Worst Case

Complete denial of service of the Triton Inference Server, disrupting all AI inference workloads and potentially requiring service restart.

🟠

Likely Case

Service crash and temporary unavailability of AI inference capabilities until manual intervention restarts the service.

🟢

If Mitigated

No impact if payload size limits are properly configured or the service is patched.

🌐 Internet-Facing: HIGH - Remote attackers can exploit this without authentication by sending malicious payloads to exposed endpoints.
🏢 Internal Only: MEDIUM - Internal attackers or compromised systems could still exploit this, but requires network access to the Triton service.

🎯 Exploit Status

Public PoC: ✅ No
Weaponized: UNKNOWN
Unauthenticated Exploit: ⚠️ Yes
Complexity: LOW

Exploitation requires sending specially crafted large payloads to Triton endpoints. No authentication is required.

🛠️ Fix & Mitigation

✅ Official Fix

Patch Version: Check NVIDIA advisory for specific patched versions

Vendor Advisory: https://nvidia.custhelp.com/app/answers/detail/a_id/5723

Restart Required: Yes

Instructions:

1. Review NVIDIA advisory for affected versions
2. Download and install the latest Triton Inference Server version
3. Restart the Triton service
4. Verify the fix by testing with normal payloads

🔧 Temporary Workarounds

Configure Payload Size Limits

all

Configure maximum payload size limits in Triton configuration to prevent oversized requests

Set 'max_request_size' and 'max_response_size' parameters in config.pbtxt

Network Segmentation

all

Restrict network access to Triton endpoints using firewalls or network policies

🧯 If You Can't Patch

  • Implement network-level rate limiting and payload size filtering using WAF or reverse proxy
  • Monitor Triton service health and implement automatic restart mechanisms for crash recovery

🔍 How to Verify

Check if Vulnerable:

Check Triton version against NVIDIA advisory. Test by sending large payloads to inference endpoints and monitoring for crashes.

Check Version:

tritonserver --version or check container/image version

Verify Fix Applied:

After patching, attempt to send large payloads and verify service remains stable. Check version matches patched release.

📡 Detection & Monitoring

Log Indicators:

  • Triton service crash logs
  • Out-of-memory errors in system logs
  • Abnormal termination of tritonserver process

Network Indicators:

  • Unusually large HTTP/gRPC requests to Triton endpoints
  • Sudden drop in inference request success rates

SIEM Query:

source="triton" AND ("crash" OR "segmentation fault" OR "stack overflow") OR dest_port=8000 AND http_request_size > threshold

🔗 References

📤 Share & Export