CVE-2025-0453

7.5 HIGH

📋 TL;DR

This vulnerability in MLflow's GraphQL endpoint allows attackers to cause denial of service by sending specially crafted queries that consume excessive server resources. Attackers can tie up all worker processes by requesting large batches of experiment run data, making the application unresponsive to legitimate requests. This affects all organizations running vulnerable MLflow versions with the GraphQL endpoint enabled.

💻 Affected Systems

Products:
  • mlflow/mlflow
Versions: Version 2.17.2 specifically mentioned; potentially earlier versions with similar GraphQL implementation
Operating Systems: All platforms running MLflow
Default Config Vulnerable: ⚠️ Yes
Notes: Only affects deployments with GraphQL endpoint enabled; MLflow installations using only REST API may not be vulnerable.

📦 What is this software?

⚠️ Risk & Real-World Impact

🔴

Worst Case

Complete service outage where MLflow becomes completely unresponsive, disrupting machine learning workflows, experiment tracking, and model deployment pipelines across the organization.

🟠

Likely Case

Degraded performance and intermittent service disruptions during attack periods, causing delays in ML operations and potentially corrupting ongoing experiments.

🟢

If Mitigated

Minimal impact with proper rate limiting and resource controls in place, potentially causing temporary slowdowns but preventing complete outages.

🌐 Internet-Facing: HIGH - If the GraphQL endpoint is exposed to the internet, attackers can easily exploit this without authentication to cause service disruption.
🏢 Internal Only: MEDIUM - Internal attackers or compromised internal systems could still exploit this, but requires network access and may be detected by internal monitoring.

🎯 Exploit Status

Public PoC: ✅ No
Weaponized: UNKNOWN
Unauthenticated Exploit: ⚠️ Yes
Complexity: LOW

Exploitation requires knowledge of GraphQL query structure but is straightforward once understood; no authentication needed to access the vulnerable endpoint.

🛠️ Fix & Mitigation

✅ Official Fix

Patch Version: Version 2.18.0 or later

Vendor Advisory: https://github.com/mlflow/mlflow/security/advisories

Restart Required: Yes

Instructions:

1. Upgrade MLflow to version 2.18.0 or later using pip: 'pip install --upgrade mlflow>=2.18.0' 2. Restart all MLflow services 3. Verify the GraphQL endpoint now has query complexity limits

🔧 Temporary Workarounds

Disable GraphQL endpoint

all

Completely disable the vulnerable GraphQL endpoint if not required for your workflows

Set environment variable: export MLFLOW_DISABLE_GRAPHQL=true
Or modify mlflow configuration to disable GraphQL

Implement rate limiting

all

Add rate limiting at the network or application layer to prevent excessive query batches

Use nginx rate limiting: limit_req_zone $binary_remote_addr zone=mlflow:10m rate=10r/s;
Add to nginx location block: limit_req zone=mlflow burst=20 nodelay;

🧯 If You Can't Patch

  • Implement Web Application Firewall (WAF) rules to block suspicious GraphQL queries with large batch sizes
  • Deploy MLflow behind reverse proxy with strict request size limits and timeout configurations

🔍 How to Verify

Check if Vulnerable:

Check if GraphQL endpoint responds to queries requesting large numbers of experiment runs without restrictions

Check Version:

python -c "import mlflow; print(mlflow.__version__)"

Verify Fix Applied:

Test that GraphQL queries with excessive batch sizes now return error responses or are properly rate-limited

📡 Detection & Monitoring

Log Indicators:

  • Multiple large GraphQL queries in short timeframes
  • High CPU/memory usage by MLflow workers
  • Timeout errors in MLflow logs

Network Indicators:

  • Unusually large HTTP POST requests to /graphql endpoint
  • Sustained high traffic to GraphQL endpoint from single source

SIEM Query:

source="mlflow.log" AND "graphql" AND ("timeout" OR "memory" OR "worker") | stats count by src_ip

🔗 References

📤 Share & Export