CVE-2025-23335

4.4 MEDIUM

📋 TL;DR

NVIDIA Triton Inference Server contains an integer underflow vulnerability in its TensorRT backend that could allow attackers to cause denial of service. The vulnerability affects both Windows and Linux deployments when using specific model configurations with particular inputs. Organizations running NVIDIA Triton Inference Server with TensorRT backend are potentially affected.

💻 Affected Systems

Products:
  • NVIDIA Triton Inference Server
  • TensorRT backend
Versions: Specific version range not specified in CVE; check NVIDIA advisory for affected versions
Operating Systems: Windows, Linux
Default Config Vulnerable: ⚠️ Yes
Notes: Vulnerability requires specific model configuration and specific input to trigger; not all deployments may be vulnerable.

📦 What is this software?

⚠️ Risk & Real-World Impact

🔴

Worst Case

Complete service disruption of Triton Inference Server, causing unavailability of AI inference services and potential cascading failures in dependent applications.

🟠

Likely Case

Temporary service interruption requiring server restart, with potential data loss for in-flight inference requests.

🟢

If Mitigated

Minimal impact with proper input validation and monitoring; service may experience brief degradation but remains operational.

🌐 Internet-Facing: MEDIUM - While exploitation requires specific conditions, internet-facing inference servers could be targeted for DoS attacks.
🏢 Internal Only: LOW - Internal deployments with controlled access and input validation face reduced risk.

🎯 Exploit Status

Public PoC: ✅ No
Weaponized: UNKNOWN
Unauthenticated Exploit: ✅ No
Complexity: MEDIUM

Exploitation requires knowledge of vulnerable model configurations and ability to send specific inputs; not trivial but achievable by determined attackers.

🛠️ Fix & Mitigation

✅ Official Fix

Patch Version: Check NVIDIA advisory for specific patched versions

Vendor Advisory: https://nvidia.custhelp.com/app/answers/detail/a_id/5687

Restart Required: No

Instructions:

1. Review NVIDIA advisory for affected versions. 2. Update NVIDIA Triton Inference Server to patched version. 3. Update TensorRT backend if separate. 4. Test with existing models to ensure compatibility.

🔧 Temporary Workarounds

Input validation and sanitization

all

Implement strict input validation for model inference requests to prevent malicious inputs

Model configuration review

all

Audit and review model configurations to identify potentially vulnerable configurations

🧯 If You Can't Patch

  • Implement network segmentation and restrict access to Triton Inference Server
  • Deploy rate limiting and input validation at API gateway or load balancer level

🔍 How to Verify

Check if Vulnerable:

Check Triton Server version and compare against NVIDIA advisory; review if using TensorRT backend with custom models

Check Version:

tritonserver --version (Linux) or check service properties (Windows)

Verify Fix Applied:

Verify Triton Server version matches patched version from NVIDIA advisory; test with known vulnerable configurations

📡 Detection & Monitoring

Log Indicators:

  • Unexpected server crashes
  • Error messages related to TensorRT underflow
  • Abnormal termination of inference processes

Network Indicators:

  • Sudden drop in inference request success rate
  • Increased error responses from inference API

SIEM Query:

source="triton_server" AND (error OR crash OR termination) AND (tensorrt OR underflow)

🔗 References

📤 Share & Export