Summary
CVE-2025-23266 is a critical flaw in NVIDIA Container Toolkit (up to version 1.17.7) and GPU Operator (up to 25.3.0) caused by improper handling of OCI hooks, allowing attackers to inject malicious code via the LD_PRELOAD environment variable. This lets them execute code with root privileges on the host, breaking container isolation and compromising the entire system and workloads, especially in multi-tenant AI and cloud GPU environments. NVIDIA has patched this in Container Toolkit 1.17.8 and GPU Operator 25.3.1, and urges immediate updates and monitoring to prevent exploitation.
Urgent Actions Required
- Upgrade NVIDIA Container Toolkit to version 1.17.8 or later immediately.
- Upgrade NVIDIA GPU Operator for Linux to version 25.3.1 or later.
- Disable the enable-cuda-compat-lib-hook in legacy runtimes by setting disable-cuda-compat-lib-hook = true in /etc/nvidia-container-toolkit/config.toml.
- For GPU Operator with Helm, set NVIDIA_CONTAINER_TOOLKIT_OPT_IN_FEATURES=disable-cuda-compat-lib-hook during deployment or upgrade.
- Monitor containers for suspicious LD_PRELOAD usage and unexpected shared library loading.
Which Systems Are Vulnerable to CVE-2025-23266?
Technical Overview
- Vulnerability Type: Container Escape via OCI Hook Environment Variable Injection
-     Affected Software/Versions: - NVIDIA Container Toolkit versions up to and including 1.17.7
- NVIDIA GPU Operator versions up to and including 25.3.0 (Linux, CDI mode)
 
- Attack Vector: Adjacent Network
- CVSS Score: 9.0
-     Exploitability Score: - Attack Vector: Adjacent Network
- Attack Complexity: Low
- Privileges Required: Low (container run privileges)
- User Interaction: None
 
- Patch Availability: Yes, patches released in Container Toolkit 1.17.8+ and GPU Operator 25.3.1+[1]
How Does the CVE-2025-23266 Exploit Work?
The attack typically follows these steps:
What Causes CVE-2025-23266?
Vulnerability Root Cause:
The NVIDIA Container Toolkit has a flaw where it doesn’t properly check environment variables passed to its createContainer hook. Attackers can use this to load malicious code, which runs with root access on the host, breaking container isolation and giving them full control.
How Can You Mitigate CVE-2025-23266?
If immediate patching is delayed or not possible:
- Disable the enable-cuda-compat hook by setting disable-cuda-compat-lib-hook = true in /etc/nvidia-container-toolkit/config.toml for legacy container runtimes.
- For GPU Operator deployments, add disable-cuda-compat-lib-hook to the NVIDIA_CONTAINER_TOOLKIT_OPT_IN_FEATURES via Helm arguments.
- Manually upgrade NVIDIA Container Toolkit to version 1.17.8 on affected hosts.
- Monitor container images and workloads for suspicious activity or unexpected environment variable changes related to OCI hooks.
- Restrict running untrusted or arbitrary container images on shared GPU infrastructure.
Which Assets and Systems Are at Risk?
Asset Types Affected:
- AI/ML Compute Nodes – Systems running vulnerable NVIDIA Container Toolkit (≤ v1.17.7) or GPU Operator (≤ v25.3.0), especially in GPU-accelerated workloads.
- Multi‑tenant Cloud & Kubernetes Environments – Shared GPU clusters where different users deploy their own containers on the same host, increasing risk.
Will Patching CVE-2025-23266 Cause Downtime?
Patch application impact: Low. Updating to:
- NVIDIA Container Toolkit v1.17.8
- GPU Operator v25.3.1
These updates are straightforward and limited in scope. If you’re using CI/CD for deployments (e.g., Helm in Kubernetes), applying the updates should result in minimal to no downtime.
How Can You Detect CVE-2025-23266 Exploitation?
Exploitation Signatures:
- Containers setting LD_PRELOAD to unusual .so files, especially paths like /proc/self/cwd/poc.so.
- Execution of shared libraries from container filesystems at the host level during startup.
- nvidia-ctk running with enable-cuda-compat, showing unexpected behavior.
- Creation or changes to .so files in container-accessible host paths.
- OCI createContainer hook triggered with suspicious environment variables.
- Containers writing unexpected files to the host (e.g., /owned), signaling root access.
- GPU containers performing unauthorized actions or showing elevated privileges.
Remediation & Response
Patch/Upgrade Instructions:
Mitigation Steps if No Patch:
- Disable the problematic CUDA compatibility hook in the toolkit’s configuration for legacy runtimes.
- Use your deployment tool (e.g., Helm) to ensure the toolkit is instructed not to load the vulnerable hook.
Incident Response Considerations:
- Watch for containers that set LD_PRELOAD to unusual libraries—especially files placed inside the container.
- Be alert to shared libraries being loaded from container file systems during hook execution.
- Monitor any suspicious behavior by the NVIDIA hook process (nvidiactk) that deviates from normal activity.
Compliance & Governance Notes
Audit Trail Requirement:
- Log all upgrades to NVIDIA Container Toolkit 1.17.8 and GPU Operator 25.3.1, including date, time, environment (e.g., production or staging), and person responsible.
- Track any configuration changes that disable the enable-cuda-compat hook, with timestamps and approval information.
Where Can I Find More Information on CVE-2025-23266?
- ^Security Bulletin: NVIDIA Container Toolkit – July 2025 | NVIDIA
- ^CVE-2025-23266 – Red Hat Customer Portal
- ^CVE-2025-23266 : NVIDIA Container Toolkit for all platforms contains a vulnerability in some hook
- ^CVE-2025-23266
- ^CVE-2025-23266 – NVIDIA Container Toolkit Privilege Escalation Vulnerability
- ^NVIDIA Container Toolkit for all platforms contains a… · CVE-2025-23266 · GitHub Advisory Database · GitHub
CVSS Breakdown Table
| Metric | Value | Description | 
|---|---|---|
| Base Score | 9.0 | Critical severity, indicating both high impact and ease of exploitation | 
| Attack Vector | Adjacent Network | Requires access to shared GPU environment—not fully remote | 
| Attack Complexity | Low | Straightforward exploitation without special conditions | 
| Privileges Required | Low (container run privileges) | No elevated privileges; ability to run a container suffices | 
| User Interaction | None | Exploit doesn’t rely on user input | 
| Scope | Changed | Host is compromised beyond the initial container boundary | 
| Confidentiality Impact | High | Allows unauthorized access to host data and other workloads | 
| Integrity Impact | High | Full host-level control permits altering critical files and configurations | 
| Availability Impact | High | Attack can disrupt GPU services and host operations |