CVE-2025-23335
📋 TL;DR
NVIDIA Triton Inference Server contains an integer underflow vulnerability in its TensorRT backend that could allow attackers to cause denial of service. The vulnerability affects both Windows and Linux deployments when using specific model configurations with particular inputs. Organizations running NVIDIA Triton Inference Server with TensorRT backend are potentially affected.
💻 Affected Systems
- NVIDIA Triton Inference Server
- TensorRT backend
📦 What is this software?
⚠️ Risk & Real-World Impact
Worst Case
Complete service disruption of Triton Inference Server, causing unavailability of AI inference services and potential cascading failures in dependent applications.
Likely Case
Temporary service interruption requiring server restart, with potential data loss for in-flight inference requests.
If Mitigated
Minimal impact with proper input validation and monitoring; service may experience brief degradation but remains operational.
🎯 Exploit Status
Exploitation requires knowledge of vulnerable model configurations and ability to send specific inputs; not trivial but achievable by determined attackers.
🛠️ Fix & Mitigation
✅ Official Fix
Patch Version: Check NVIDIA advisory for specific patched versions
Vendor Advisory: https://nvidia.custhelp.com/app/answers/detail/a_id/5687
Restart Required: No
Instructions:
1. Review NVIDIA advisory for affected versions. 2. Update NVIDIA Triton Inference Server to patched version. 3. Update TensorRT backend if separate. 4. Test with existing models to ensure compatibility.
🔧 Temporary Workarounds
Input validation and sanitization
allImplement strict input validation for model inference requests to prevent malicious inputs
Model configuration review
allAudit and review model configurations to identify potentially vulnerable configurations
🧯 If You Can't Patch
- Implement network segmentation and restrict access to Triton Inference Server
- Deploy rate limiting and input validation at API gateway or load balancer level
🔍 How to Verify
Check if Vulnerable:
Check Triton Server version and compare against NVIDIA advisory; review if using TensorRT backend with custom models
Check Version:
tritonserver --version (Linux) or check service properties (Windows)
Verify Fix Applied:
Verify Triton Server version matches patched version from NVIDIA advisory; test with known vulnerable configurations
📡 Detection & Monitoring
Log Indicators:
- Unexpected server crashes
- Error messages related to TensorRT underflow
- Abnormal termination of inference processes
Network Indicators:
- Sudden drop in inference request success rate
- Increased error responses from inference API
SIEM Query:
source="triton_server" AND (error OR crash OR termination) AND (tensorrt OR underflow)