CVE-2025-33202
📋 TL;DR
NVIDIA Triton Inference Server contains a stack overflow vulnerability where attackers can send extra-large payloads to cause denial of service. This affects all deployments of Triton Inference Server on Linux and Windows systems. The vulnerability allows remote attackers to crash the inference service.
💻 Affected Systems
- NVIDIA Triton Inference Server
📦 What is this software?
⚠️ Risk & Real-World Impact
Worst Case
Complete denial of service of the Triton Inference Server, disrupting all AI inference workloads and potentially requiring service restart.
Likely Case
Service crash and temporary unavailability of AI inference capabilities until manual intervention restarts the service.
If Mitigated
No impact if payload size limits are properly configured or the service is patched.
🎯 Exploit Status
Exploitation requires sending specially crafted large payloads to Triton endpoints. No authentication is required.
🛠️ Fix & Mitigation
✅ Official Fix
Patch Version: Check NVIDIA advisory for specific patched versions
Vendor Advisory: https://nvidia.custhelp.com/app/answers/detail/a_id/5723
Restart Required: Yes
Instructions:
1. Review NVIDIA advisory for affected versions
2. Download and install the latest Triton Inference Server version
3. Restart the Triton service
4. Verify the fix by testing with normal payloads
🔧 Temporary Workarounds
Configure Payload Size Limits
allConfigure maximum payload size limits in Triton configuration to prevent oversized requests
Set 'max_request_size' and 'max_response_size' parameters in config.pbtxt
Network Segmentation
allRestrict network access to Triton endpoints using firewalls or network policies
🧯 If You Can't Patch
- Implement network-level rate limiting and payload size filtering using WAF or reverse proxy
- Monitor Triton service health and implement automatic restart mechanisms for crash recovery
🔍 How to Verify
Check if Vulnerable:
Check Triton version against NVIDIA advisory. Test by sending large payloads to inference endpoints and monitoring for crashes.
Check Version:
tritonserver --version or check container/image version
Verify Fix Applied:
After patching, attempt to send large payloads and verify service remains stable. Check version matches patched release.
📡 Detection & Monitoring
Log Indicators:
- Triton service crash logs
- Out-of-memory errors in system logs
- Abnormal termination of tritonserver process
Network Indicators:
- Unusually large HTTP/gRPC requests to Triton endpoints
- Sudden drop in inference request success rates
SIEM Query:
source="triton" AND ("crash" OR "segmentation fault" OR "stack overflow") OR dest_port=8000 AND http_request_size > threshold