docker 使用 Nvidia 顯示卡

張首富發表於2020-05-17

docker19.03讀取NVIDIA顯示卡

作者: 張首富
時間: 2019-09-06
w x: y18163201

前言

2019年7月的docker 19.03已經正式釋出了,這次釋出對我來說有兩大亮點。
1,就是docker不需要root許可權來啟動喝執行了
2,就是支援GPU的增強功能,我們在docker裡面想讀取nvidia顯示卡再也不需要額外的安裝nvidia-docker

安裝nvidia驅動

確認已檢測到NVIDIA卡:

$ lspci -vv | grep -i nvidia
00:04.0 3D controller: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] (rev a1)
        Subsystem: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB]
        Kernel modules: nvidiafb

這裡不再詳細介紹:如果不知道請移步ubuntu離線安裝TTS服務

安裝NVIDIA Container Runtime

$ cat nvidia-container-runtime-script.sh
 
curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \
  sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
sudo apt-get update

執行指令碼

sh nvidia-container-runtime-script.sh
OK
deb https://nvidia.github.io/libnvidia-container/ubuntu18.04/$(ARCH) /
deb https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/$(ARCH) /
Hit:1 http://archive.canonical.com/ubuntu bionic InRelease
Get:2 https://nvidia.github.io/libnvidia-container/ubuntu18.04/amd64  InRelease [1139 B]                
Get:3 https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/amd64  InRelease [1136 B]           
Hit:4 http://security.ubuntu.com/ubuntu bionic-security InRelease                                       
Get:5 https://nvidia.github.io/libnvidia-container/ubuntu18.04/amd64  Packages [4076 B]                 
Get:6 https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/amd64  Packages [3084 B]            
Hit:7 http://us-east4-c.gce.clouds.archive.ubuntu.com/ubuntu bionic InRelease
Hit:8 http://us-east4-c.gce.clouds.archive.ubuntu.com/ubuntu bionic-updates InRelease
Hit:9 http://us-east4-c.gce.clouds.archive.ubuntu.com/ubuntu bionic-backports InRelease
Fetched 9435 B in 1s (17.8 kB/s)                   
Reading package lists... Done
$ apt-get install nvidia-container-runtime
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages were automatically installed and are no longer required:
  grub-pc-bin libnuma1
Use 'sudo apt autoremove' to remove them.
The following additional packages will be installed:
Get:1 https://nvidia.github.io/libnvidia-container/ubuntu18.04/amd64  libnvidia-container1 1.0.2-1 [59.1 kB]
Get:2 https://nvidia.github.io/libnvidia-container/ubuntu18.04/amd64  libnvidia-container-tools 1.0.2-1 [15.4 kB]
Get:3 https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/amd64  nvidia-container-runtime-hook 1.4.0-1 [575 kB]

...
Unpacking nvidia-container-runtime (2.0.0+docker18.09.6-3) ...
Setting up libnvidia-container1:amd64 (1.0.2-1) ...
Setting up libnvidia-container-tools (1.0.2-1) ...
Processing triggers for libc-bin (2.27-3ubuntu1) ...
Setting up nvidia-container-runtime-hook (1.4.0-1) ...
Setting up nvidia-container-runtime (2.0.0+docker18.09.6-3) ...
which nvidia-container-runtime-hook
/usr/bin/nvidia-container-runtime-hook

安裝docker-19.03

# step 1: 安裝必要的一些系統工具
yum install -y yum-utils device-mapper-persistent-data lvm2
# Step 2: 新增軟體源資訊
yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
# Step 3: 更新並安裝 Docker-CE
yum makecache fast
yum -y install docker-ce-19.03.2
# Step 4: 開啟Docker服務
systemctl start docker && systemctl enable docker

驗證docker版本是否安裝正常

$ docker version
Client: Docker Engine - Community
 Version:           19.03.2
 API version:       1.40
 Go version:        go1.12.8
 Git commit:        6a30dfc
 Built:             Thu Aug 29 05:28:55 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.2
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.8
  Git commit:       6a30dfc
  Built:            Thu Aug 29 05:27:34 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.6
  GitCommit:        894b81a4b802e4eb2a91d1ce216b8817763c29fb
 runc:
  Version:          1.0.0-rc8
  GitCommit:        425e105d5a03fabd737a126ad93d62a9eeede87f
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

驗證下-gpus選項

$ docker run --help | grep -i gpus
      --gpus gpu-request               GPU devices to add to the container ('all' to pass all GPUs)

執行利用GPU的Ubuntu容器

 $ docker run -it --rm --gpus all ubuntu nvidia-smi
Unable to find image 'ubuntu:latest' locally
latest: Pulling from library/ubuntu
f476d66f5408: Pull complete 
8882c27f669e: Pull complete 
d9af21273955: Pull complete 
f5029279ec12: Pull complete 
Digest: sha256:d26d529daa4d8567167181d9d569f2a85da3c5ecaf539cace2c6223355d69981
Status: Downloaded newer image for ubuntu:latest
Tue May  7 15:52:15 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.116                Driver Version: 390.116                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   39C    P0    22W /  75W |      0MiB /  7611MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
:~$ 

故障排除

您是否遇到以下錯誤訊息:

$ docker run -it --rm --gpus all debian
docker: Error response from daemon: linux runtime spec devices: could not select device driver "" with capabilities: [[gpu]].

上述錯誤意味著Nvidia無法正確註冊Docker。它實際上意味著驅動程式未正確安裝在主機上。這也可能意味著安裝了nvidia容器工具而無需重新啟動docker守護程式:您需要重新啟動docker守護程式。

我建議你回去驗證是否安裝了nvidia-container-runtime或者重新啟動Docker守護程式。

列出GPU裝置

$ docker run -it --rm --gpus all ubuntu nvidia-smi -L
GPU 0: Tesla P4 (UUID: GPU-fa974b1d-3c17-ed92-28d0-805c6d089601)
$ docker run -it --rm --gpus all ubuntu nvidia-smi  --query-gpu=index,name,uui
d,serial --format=csv
index, name, uuid, serial
0, Tesla P4, GPU-fa974b1d-3c17-ed92-28d0-805c6d089601, 0325017070224

原文轉載至: https://collabnix.com/introducing-new-docker-cli-api-support-for-nvidia-gpus-under-docker-engine-19-03-0-beta-release/

相關文章