@
- GPU、NVIDIA Graphics Drivers、CUDA、CUDA Toolkit和cuDNN的關係
- 使用情形判斷
- 僅僅使用PyTorch
- 使用torch的第三方子模組
- 安裝NVIDIA Graphics Drivers(可跳過)
- 前言
- Linux
- 法一:圖形化介面安裝(推薦)
- 法二:手動下載檔案後命令列安裝(不推薦)
- windows
- 法一:GeForce Experience自動安裝
- 法二:手動安裝
- 檢驗安裝
- 安裝CUDA Toolkit
- 檢視顯示卡驅動版本情況
- Linux
- Windows
- 檢驗安裝
- 版本切換
- Linux
- Windows
- Linux解除安裝CUDA Toolkit
- 安裝PyTorch
- 檢視顯示卡驅動的CUDA支援版本情況
- 下載pytorch
- 安裝cuDNN
- Linux
- 法一:下載tar壓縮包解壓(推薦)
- 法二:下載deb包安裝(不推薦)
- Windows
- 檢驗安裝
- Linux
GPU、NVIDIA Graphics Drivers、CUDA、CUDA Toolkit和cuDNN的關係
- GPU:物理顯示卡。
- NVIDIA Graphics Drivers:物理顯示卡驅動。
- CUDA:一種由NVIDIA推出的通用平行計算架構,是一種平行計算平臺和程式設計模型,該架構使GPU能夠解決複雜的計算問題。在安裝NVIDIA Graphics Drivers時,CUDA已經捆綁安裝,無需另外安裝。
- CUDA Toolkit:包含了CUDA的runtime API、CUDA程式碼的編譯器nvcc(CUDA也有自己的語言,程式碼需要編譯才能執行)和debug工具等。簡單言之,可以將CUDA Toolkit視為開發CUDA程式的工具包。需要自己下載安裝。此外,在安裝CUDA Toolkit時,還可以選擇是否捆綁安裝NVIDIA Graphics Drivers顯示卡驅動,因此就可以簡略我們的步驟。
- cuDNN:基於CUDA Toolkit,專門針對深度神經網路中的基礎操作而設計基於GPU的加速庫。需要自己下載安裝,其實所謂的安裝就是移動幾個庫檔案到指定路徑。
使用情形判斷
僅僅使用PyTorch
在只使用torch的情況下,不需要安裝CUDA Toolkit和cuDNN,只需要顯示卡驅動,conda或者pip會為我們安排好一切。
安裝順序應該是:NVIDIA Graphics Drivers
->PyTorch
使用torch的第三方子模組
需要安裝CUDA Toolkit。
在安裝一些基於torch的第三方子模組時,譬如tiny-cuda-nn、nvdiffrast、simple-knn。如果沒有安裝CUDA Toolkit,torch/utils/cpp_extension.py
會報錯如下:
File "....../torch/utils/cpp_extension.py", line 1076, in CUDAExtension
library_dirs += library_paths(cuda=True)
File "....../torch/utils/cpp_extension.py", line 1203, in library_paths
if (not os.path.exists(_join_cuda_home(lib_dir)) and
File "....../torch/utils/cpp_extension.py", line 2416, in _join_cuda_home
raise OSError('CUDA_HOME environment variable is not set. '
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
這個報錯的意思是找不到CUDA的環境變數路徑。這個環境變數是隻有安裝了CUDA Toolkit之後才會設定的。
這個報錯在僅僅使用pytorch時沒有影響,因為pytorch在安裝時已經準備好了一切,不需要CUDA環境變數。但是,我們現在需要安裝其他子模組,就必須要解決這個問題了。
對於做深度學習的研究者,使用其他子模組是經常會碰到的,因此,筆者建議直接安裝CUDA Toolkit,在安裝CUDA Toolkit的時候捆綁安裝顯示卡驅動。
因此,安裝順序應該是:NVIDIA Graphics Drivers
(可跳過,在安裝CUDA Toolkit的時候捆綁安裝)->CUDA Toolkit
->PyTorch
->cuDNN
安裝NVIDIA Graphics Drivers(可跳過)
前言
在安裝CUDA Toolkit
的時候可以選擇捆綁安裝NVIDIA Graphics Drivers顯示卡驅動。因此,這一步完全可以跳過,但筆者依舊先寫出來。
Linux
法一:圖形化介面安裝(推薦)
換好源之後更新升級。必須要升級。否則,安裝的n卡驅動是無法生效的!而且,下次重啟進入Linux之後,連圖形化介面都不會出現!!
sudo apt update
sudo apt upgrade
安裝必要依賴。必須要安裝gcc
、g++
、cmake
。否則,安裝的n卡驅動是無法生效的!而且,下次重啟進入Linux之後,連圖形化介面都不會出現!!
sudo apt install gcc cmake
sudo apt install g++
然後直接下載安裝即可:
法二:手動下載檔案後命令列安裝(不推薦)
筆者沒有使用過此方法,也不推薦此方法。在能用圖形化介面的情況下就不要多此一舉了。
windows
法一:GeForce Experience自動安裝
去NVIDIA官網下載GeForce Experience,安裝好GeForce Experience之後可以在這個應用裡面直接下載最新的驅動。
法二:手動安裝
在同樣的頁面手動搜尋對應型號的顯示卡驅動,下載安裝。
檢驗安裝
nvidia-smi
如果有類似下面的輸出,那麼NVIDIA Graphics Drivers就已經安裝了:
Sat Jan 27 14:35:37 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A |
| N/A 35C P5 23W / 115W | 1320MiB / 8192MiB | 13% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3719 G /usr/lib/xorg/Xorg 489MiB |
| 0 N/A N/A 3889 G /usr/bin/gnome-shell 53MiB |
| 0 N/A N/A 4218 C+G fantascene-dynamic-wallpaper 406MiB |
| 0 N/A N/A 8052 G gnome-control-center 2MiB |
| 0 N/A N/A 8397 G ...--variations-seed-version 282MiB |
| 0 N/A N/A 13242 G ...RendererForSitePerProcess 59MiB |
| 0 N/A N/A 47273 G ...--variations-seed-version 18MiB |
+-----------------------------------------------------------------------------+
安裝CUDA Toolkit
檢視顯示卡驅動版本情況
CUDA Toolkit對剛剛安裝的顯示卡驅動有版本要求,具體可以去此處查詢。2024.1查詢的關係如下:
如果你跳過了安裝顯示卡驅動的步驟,那麼你就下載一個最新的CUDA Toolkit好了,它會捆綁安裝適配的顯示卡驅動。
如果你已經安裝了顯示卡驅動,那麼可以鍵入如下指令查詢自己的顯示卡驅動版本:
nvidia-smi
可以在下面看到我的版本是525.147.05
:
Sat Jan 27 14:35:37 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A |
| N/A 35C P5 23W / 115W | 1320MiB / 8192MiB | 13% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3719 G /usr/lib/xorg/Xorg 489MiB |
| 0 N/A N/A 3889 G /usr/bin/gnome-shell 53MiB |
| 0 N/A N/A 4218 C+G fantascene-dynamic-wallpaper 406MiB |
| 0 N/A N/A 8052 G gnome-control-center 2MiB |
| 0 N/A N/A 8397 G ...--variations-seed-version 282MiB |
| 0 N/A N/A 13242 G ...RendererForSitePerProcess 59MiB |
| 0 N/A N/A 47273 G ...--variations-seed-version 18MiB |
+-----------------------------------------------------------------------------+
Linux
進入官網選擇合適的版本。然後根據自己的裝置一步步選擇安裝即可。
完成選擇之後,應該有類似介面。根據官網的步驟一步一步來即可。
根據官網步驟,可以看到給出了三種安裝方式。筆者在安裝的時候先嚐試了第二種,官網步驟如下:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda
然後在第三步報錯如下:
han@ASUS-TUF-Gaming-F15-FX507ZR:~$ sudo apt-get -y install cuda
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:
The following packages have unmet dependencies:
libnvidia-common-525 : Conflicts: libnvidia-common
libnvidia-common-545 : Conflicts: libnvidia-common
nvidia-kernel-common-525 : Conflicts: nvidia-kernel-common
nvidia-kernel-common-545 : Conflicts: nvidia-kernel-common
E: Error, pkgProblemResolver::Resolve generated breaks, this may be caused by held packages.
根據提示資訊,是因為我現在已經有libnvidia-common-525
和nvidia-kernel-common-525
,無法再安裝libnvidia-common
和nvidia-kernel-common
,需要更換現有的軟體包。理論上,這個問題有兩個解決方案:
- 替換軟體包
sudo apt-get remove libnvidia-common-525 nvidia-kernel-common-525
sudo apt-get install libnvidia-common nvidia-kernel-common
- 放棄
apt
,使用aptitude
安裝
sudo aptitude install cuda
筆者這裡都沒有嘗試,而是改成了官網的另外一種安裝方式:
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
sudo sh cuda_11.8.0_520.61.05_linux.run
執行這個run程式之後會解壓一段時間,要有一些耐心,然後就會有安裝引導,一路yes之後來到這裡:
- 注意點1:如果跳過了顯示卡驅動安裝的,這裡就勾選第一個
Driver
。筆者已經安裝了顯示卡驅動,自然就不用再勾選Driver
了。然後安裝即可。 - 注意點2:如果這裡勾選
Kernel Objects
,會導致安裝不成功。筆者暫時不清楚原因,可能是因為已經安裝了顯示卡驅動的原因。總之,這裡不要勾選Kernel Objects
。
筆者在選擇Install
之後的安裝過程中還出現dkms
未安裝報錯,於是sudo apt install dkms
,再次嘗試安裝,就成功了,然後出現:
===========
= Summary =
===========
Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-11.8/
Please make sure that
- PATH includes /usr/local/cuda-11.8/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-11.8/lib64, or, add /usr/local/cuda-11.8/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.8/bin
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 520.00 is required for CUDA 11.8 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>.run --silent --driver
Logfile is /var/log/cuda-installer.log
根據提示,我們新增環境變數:
echo "export LD_LIBRARY_PATH="/usr/local/cuda/lib64:\$LD_LIBRARY_PATH"" >> ~/.bashrc
echo "export PATH="/usr/local/cuda/bin:\$PATH"" >> ~/.bashrc
然後安裝就完成了。
Windows
win系統下比較簡單,進入官網選擇合適的版本,直接下載exe可執行程式,進入引導安裝即可。
- 注意點:同樣根據自己是否安裝過顯示卡驅動來勾選要不要裝顯示卡驅動。
環境變數會自動設定好,不需要手動設定。安裝程式會自動新增以下CUDA_PATH_V11_8
和CUDA_PATH
這2個環境變數:
CUDA_PATH_V11_8=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8
CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8
安裝程式還會自動在Path
環境變數中新增以下2項:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\libnvvp
檢驗安裝
重新開啟一個終端檢視:
nvcc -V
應該有如下類似的輸出:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
版本切換
Linux
cuda的軟連結位於/usr/local/
,輸入如下命令檢視:
ls -l /usr/local/
應該有類似如下的輸出:
han@ASUS-TUF-Gaming-F15-FX507ZR:~$ ls -l /usr/local/
total 40
lrwxrwxrwx 1 root root 21 1月 27 16:43 cuda -> /usr/local/cuda-11.8/
drwxr-xr-x 17 root root 4096 1月 27 16:44 cuda-11.8
drwxr-xr-x 2 root root 4096 8月 9 2022 etc
drwxr-xr-x 2 root root 4096 8月 9 2022 games
drwxr-xr-x 2 root root 4096 8月 9 2022 include
drwxr-xr-x 2 root root 4096 1月 27 16:38 kernelobjects
drwxr-xr-x 3 root root 4096 1月 22 15:26 lib
lrwxrwxrwx 1 root root 9 1月 22 14:10 man -> share/man
drwxr-xr-x 3 root root 4096 1月 23 21:52 Qt-5.6.3
drwxr-xr-x 2 root root 4096 8月 9 2022 sbin
drwxr-xr-x 8 root root 4096 1月 23 22:09 share
drwxr-xr-x 2 root root 4096 8月 9 2022 src
可以看到現在的cuda是連結到了我剛剛安裝的cuda-11.8.一臺裝置可以安裝不同的CUDA Toolkit版本,想要切換版本,只需要改變這個軟連結即可。
假如我還有一個CUDA Toolkit12.0,可以用如下指令切換:
ls -snf /usr/local/cuda-12.0/ /usr/local/cuda
Windows
修改之前自動新增的CUDA_PATH
環境變數到對應目錄:
CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.0
修改之前自動新增到Path
環境變數下的那兩個專案:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.0\bin
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.0\libnvvp
Linux解除安裝CUDA Toolkit
再次回顧安裝完成後的summary:
===========
= Summary =
===========
Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-11.8/
Please make sure that
- PATH includes /usr/local/cuda-11.8/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-11.8/lib64, or, add /usr/local/cuda-11.8/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.8/bin
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 520.00 is required for CUDA 11.8 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>.run --silent --driver
Logfile is /var/log/cuda-installer.log
可以知道只需要輸入類似的指令:(自行更改版本號)
sudo /usr/local/cuda-11.8/bin/cuda-uninstaller
安裝PyTorch
檢視顯示卡驅動的CUDA支援版本情況
同樣是這個指令:
nvidia-smi
可以在下面看到我的最大支援的CUDA版本是12.0
:
Sat Jan 27 14:35:37 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A |
| N/A 35C P5 23W / 115W | 1320MiB / 8192MiB | 13% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3719 G /usr/lib/xorg/Xorg 489MiB |
| 0 N/A N/A 3889 G /usr/bin/gnome-shell 53MiB |
| 0 N/A N/A 4218 C+G fantascene-dynamic-wallpaper 406MiB |
| 0 N/A N/A 8052 G gnome-control-center 2MiB |
| 0 N/A N/A 8397 G ...--variations-seed-version 282MiB |
| 0 N/A N/A 13242 G ...RendererForSitePerProcess 59MiB |
| 0 N/A N/A 47273 G ...--variations-seed-version 18MiB |
+-----------------------------------------------------------------------------+
下載pytorch
開啟pytorch的官網,輸入對應自己裝置環境的pytorch安裝指令即可。著重注意剛剛安裝的顯示卡驅動的可支援CUDA的最高版本,我們選擇的pytorch的CUDA版本要低於顯示卡驅動的可支援CUDA的最高版本。例如,我剛剛查詢到我的顯示卡驅動可支援CUDA最高版本為12.0
,那麼我就選擇11.8
,如下圖這樣:
當然也可以選擇conda安裝。
安裝cuDNN
cuDNN對已經安裝的CUDA版本有要求。進入官網,選擇合適的版本,介面如下:
下載即可。安裝的官方文件在這裡。
Linux
按官方文件,先安裝依賴:
sudo apt-get install zlib1g
法一:下載tar壓縮包解壓(推薦)
下載好之後解壓縮:
tar -xvf cudnn-linux-*-archive.tar.xz
然後複製檔案並賦予許可權就完成了:
sudo cp cudnn-*-archive/include/cudnn*.h /usr/local/cuda/include
sudo cp -P cudnn-*-archive/lib/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*
法二:下載deb包安裝(不推薦)
deb包安裝反而要複雜一些。
- 下載好之後dpkg安裝一下:
sudo dpkg -i cudnn-local-repo-*.deb
- 匯入GPG key並更新:
sudo apt-get install libcudnn8=x.x.x.x-1+cudaX.Y
這裡的x
和y
根據自己的版本自己調整
- 再安裝幾個依賴:
sudo apt-get install libcudnn8=x.x.x.x-1+cudaX.Y
sudo apt-get install libcudnn8-dev=x.x.x.x-1+cudaX.Y
sudo apt-get install libcudnn8-samples=x.x.x.x-1+cudaX.Y
這裡的x
和y
也自己的版本略微調整
Windows
Windows只有解壓縮安裝的方式。下載並解壓縮好zip檔案,然後複製庫檔案如下:
- 複製
bin\cudnn*.dll
到C:\Program Files\NVIDIA\CUDNN\v8.x\bin
。 - 複製
include\cudnn*.h
到C:\Program Files\NVIDIA\CUDNN\v8.x\include
。 - 複製
lib\cudnn*.lib
到C:\Program Files\NVIDIA\CUDNN\v8.x\lib
。
然後修改PATH
環境變數,在其中新增一個專案:
C:\Program Files\NVIDIA\CUDNN\v8.x\bin
檢驗安裝
執行/usr/local/cuda/extras/demo_suite/deviceQuery
,應該有以下類似輸出:
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "NVIDIA GeForce RTX 3070 Laptop GPU"
CUDA Driver Version / Runtime Version 12.0 / 11.8
CUDA Capability Major/Minor version number: 8.6
Total amount of global memory: 7952 MBytes (8337752064 bytes)
(40) Multiprocessors, (128) CUDA Cores/MP: 5120 CUDA Cores
GPU Max Clock rate: 1560 MHz (1.56 GHz)
Memory Clock rate: 7001 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 4194304 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.0, CUDA Runtime Version = 11.8, NumDevs = 1, Device0 = NVIDIA GeForce RTX 3070 Laptop GPU
Result = PASS
執行/usr/local/cuda/extras/demo_suite/bandwidthTest
,應該有以下類似輸出:
[CUDA Bandwidth Test] - Starting...
Running on...
Device 0: NVIDIA GeForce RTX 3070 Laptop GPU
Quick Mode
Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 12499.4
Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 12843.0
Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 384586.5
Result = PASS
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
本文由部落格一文多發平臺 OpenWrite 釋出!