CVE-2025-0453
📋 TL;DR
This vulnerability in MLflow's GraphQL endpoint allows attackers to cause denial of service by sending specially crafted queries that consume excessive server resources. Attackers can tie up all worker processes by requesting large batches of experiment run data, making the application unresponsive to legitimate requests. This affects all organizations running vulnerable MLflow versions with the GraphQL endpoint enabled.
💻 Affected Systems
- mlflow/mlflow
📦 What is this software?
Mlflow by Lfprojects
⚠️ Risk & Real-World Impact
Worst Case
Complete service outage where MLflow becomes completely unresponsive, disrupting machine learning workflows, experiment tracking, and model deployment pipelines across the organization.
Likely Case
Degraded performance and intermittent service disruptions during attack periods, causing delays in ML operations and potentially corrupting ongoing experiments.
If Mitigated
Minimal impact with proper rate limiting and resource controls in place, potentially causing temporary slowdowns but preventing complete outages.
🎯 Exploit Status
Exploitation requires knowledge of GraphQL query structure but is straightforward once understood; no authentication needed to access the vulnerable endpoint.
🛠️ Fix & Mitigation
✅ Official Fix
Patch Version: Version 2.18.0 or later
Vendor Advisory: https://github.com/mlflow/mlflow/security/advisories
Restart Required: Yes
Instructions:
1. Upgrade MLflow to version 2.18.0 or later using pip: 'pip install --upgrade mlflow>=2.18.0' 2. Restart all MLflow services 3. Verify the GraphQL endpoint now has query complexity limits
🔧 Temporary Workarounds
Disable GraphQL endpoint
allCompletely disable the vulnerable GraphQL endpoint if not required for your workflows
Set environment variable: export MLFLOW_DISABLE_GRAPHQL=true
Or modify mlflow configuration to disable GraphQL
Implement rate limiting
allAdd rate limiting at the network or application layer to prevent excessive query batches
Use nginx rate limiting: limit_req_zone $binary_remote_addr zone=mlflow:10m rate=10r/s;
Add to nginx location block: limit_req zone=mlflow burst=20 nodelay;
🧯 If You Can't Patch
- Implement Web Application Firewall (WAF) rules to block suspicious GraphQL queries with large batch sizes
- Deploy MLflow behind reverse proxy with strict request size limits and timeout configurations
🔍 How to Verify
Check if Vulnerable:
Check if GraphQL endpoint responds to queries requesting large numbers of experiment runs without restrictions
Check Version:
python -c "import mlflow; print(mlflow.__version__)"
Verify Fix Applied:
Test that GraphQL queries with excessive batch sizes now return error responses or are properly rate-limited
📡 Detection & Monitoring
Log Indicators:
- Multiple large GraphQL queries in short timeframes
- High CPU/memory usage by MLflow workers
- Timeout errors in MLflow logs
Network Indicators:
- Unusually large HTTP POST requests to /graphql endpoint
- Sustained high traffic to GraphQL endpoint from single source
SIEM Query:
source="mlflow.log" AND "graphql" AND ("timeout" OR "memory" OR "worker") | stats count by src_ip