centos下caffe用GPU編譯搭建過程以及錯誤總結
引言
本篇想記錄一下caffe安裝過程中我犯下的一些問題,在另一篇opencv用CPU編譯的時候我曾經說GPU版本的opencv是編譯得最久的,但現在看caffe是平分秋色了,特別是版本問題,比opencv更加難處理。
安裝筆記
本篇是建立在有幾個前置條件的情況下進行的,具體的環境為:
- opencv 3.4.6
- cuda 10.2與cudnn 7.6.5
- anaconda3 / python 3.6
上面的三個環境在我之前部落格都有寫,連結為:
其中opencv是經過python3編譯後的,這也讓我在編譯caffe時選擇了艱難的python3,而不是python2,因為caffe已經很多年沒更新了,GitHub上的版本最推薦的還是python2,即使兩個都停止了維護,但還是有框架需要依託這個環境。
專案依賴
centos下是:
sudo yum install protobuf-devel leveldb-devel snappy-devel opencv-devel boost-devel hdf5-devel
sudo yum install gflags-devel glog-devel lmdb-devel
另外記錄一下ubuntu下是:
sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libhdf5-serial-dev protobuf-compiler
sudo apt-get install --no-install-recommends libboost-all-dev
sudo apt-get install libopenblas-dev liblapack-dev libatlas-base-dev
sudo apt-get install libgflags-dev libgoogle-glog-dev liblmdb-dev
mark一下,下面的還是以centos為安裝方式。
另外拉取GitHub上關於caffe的程式碼,為:
git clone https://github.com/BVLC/caffe.git
如果有必要,可以先建立一個虛擬環境,先安裝python 依賴:
for req in $(cat requirements.txt); do pip install $req; done
這是caffe推薦的安裝方式,等執行完成後,再執行一遍pip install -r requirement如果都是already,那麼代表python依賴安裝完成。這裡可能還需要安裝boost和hdf5的動態庫,hdf5可能上面直接用yum或者apt安裝的版本沒問題,需要檢查一下是否有boost動態庫安裝方式為:
安裝過程
首先進入caffe下的makefile.config中,或者cp一份makefile.config.example為makefile.config,vim修改此檔案:
## Refer to http://caffe.berkeleyvision.org/installation.html
# Contributions simplifying and improving our build system are welcome!
# cuDNN acceleration switch (uncomment to build with cuDNN).
USE_CUDNN := 1
# CPU-only switch (uncomment to build without GPU support).
# CPU_ONLY := 1
# uncomment to disable IO dependencies and corresponding data layers
# USE_OPENCV := 0
# USE_LEVELDB := 0
# USE_LMDB := 0
# This code is taken from https://github.com/sh1r0/caffe-android-lib
# USE_HDF5 := 0
# uncomment to allow MDB_NOLOCK when reading LMDB files (only if necessary)
# You should not set this flag if you will be reading LMDBs with any
# possibility of simultaneous read and write
# ALLOW_LMDB_NOLOCK := 1
# Uncomment if you're using OpenCV 3
# 安裝的是opencv==3.4.7
OPENCV_VERSION := 3.4.7
# To customize your choice of compiler, uncomment and set the following.
# N.B. the default for Linux is g++ and the default for OSX is clang++
"linux系統預設使用g++編譯器,OSX則是clang++。"
# CUSTOM_CXX := g++
# CUDA directory contains bin/ and lib/ directories that we need.
"CUDA 路徑"
CUDA_DIR := /usr/local/cuda-10.2
# On Ubuntu 14.04, if cuda tools are installed via
# "sudo apt-get install nvidia-cuda-toolkit" then use this instead:
# CUDA_DIR := /usr
# CUDA architecture setting: going with all of them.
# For CUDA < 6.0, comment the *_50 through *_61 lines for compatibility.
# For CUDA < 8.0, comment the *_60 and *_61 lines for compatibility.
# For CUDA >= 9.0, comment the *_20 and *_21 lines for compatibility.
"代表GPU計算能力的編碼,這些引數也根據GPU的計算能力,CUDA > 9.0 的話需要註釋掉20和21"
CUDA_ARCH := # -gencode arch=compute_20,code=sm_20 \
# -gencode arch=compute_20,code=sm_21
-gencode arch=compute_30,code=sm_30 \
-gencode arch=compute_35,code=sm_35 \
-gencode arch=compute_50,code=sm_50 \
-gencode arch=compute_52,code=sm_52 \
-gencode arch=compute_60,code=sm_60 \
-gencode arch=compute_61,code=sm_61 \
-gencode arch=compute_61,code=compute_61
# BLAS choice:
# atlas for ATLAS (default)
# mkl for MKL
# open for OpenBlas
"如果用的是ATLAS計算庫則賦值atlas,MKL計算庫則用mkl賦值,OpenBlas則賦值open。但目前安裝來看,BLAS需要安裝open,預設的blas找不到動態庫"
BLAS := open
# Custom (MKL/ATLAS/OpenBLAS) include and lib directories.
# Leave commented to accept the defaults for your choice of BLAS
# (which should work)!
BLAS_INCLUDE := /usr/include/atlas
BLAS_LIB := /usr/lib64/atlas
# Homebrew puts openblas in a directory that is not on the standard search path
# BLAS_INCLUDE := $(shell brew --prefix openblas)/include
# BLAS_LIB := $(shell brew --prefix openblas)/lib
# This is required only if you will compile the matlab interface.
# MATLAB directory should contain the mex binary in /bin.
"是否需要編譯出MATLAB的caffe版本"
# MATLAB_DIR := /usr/local
# MATLAB_DIR := /Applications/MATLAB_R2012b.app
# NOTE: this is required only if you will compile the python interface.
# We need to be able to find Python.h and numpy/arrayobject.h.
# PYTHON_INCLUDE := /usr/include/python2.7 \
# /usr/lib/python2.7/dist-packages/numpy/core/include
# Anaconda Python distribution is quite popular. Include path:
# Verify anaconda location, sometimes it's in root.
# caffe提供anaconda的編譯,需要將python的註釋,將anaconda的開啟,並修改路徑
ANACONDA_HOME := $(HOME)/anaconda3
PYTHON_INCLUDE := $(ANACONDA_HOME)/include \
$(ANACONDA_HOME)/include/python3.6m \
$(ANACONDA_HOME)/lib/python3.6/site-packages/numpy/core/include
# Uncomment to use Python 3 (default is Python 2)
# 在Linux下預設是用的python2.7,但不論是yum還是ubuntu的倉庫最高版本都不超過1.55,所以boost需要重新編譯,並開啟這個選項才確保,不會用python2
PYTHON_LIBRARIES := boost_python3 python3.6m
# PYTHON_INCLUDE := /usr/include/python3.5m \
# /usr/lib/python3.5/dist-packages/numpy/core/include
# We need to be able to find libpythonX.X.so or .dylib.
# PYTHON_LIB := /usr/lib
"這個可以寫進下面的LIBRARY_DIRS中,指代python的庫路徑"
PYTHON_LIB := $(ANACONDA_HOME)/lib
# Homebrew installs numpy in a non standard path (keg only)
"運用指令碼找尋當前環境的numpy"
# PYTHON_INCLUDE += $(dir $(shell python -c 'import numpy.core; print(numpy.core.__file__)'))/include
# PYTHON_LIB += $(shell brew --prefix numpy)/lib
# Uncomment to support layers written in Python (will link against Python libs)
"若要使用python來編寫layer,則需要開啟"
WITH_PYTHON_LAYER := 1
# Whatever else you find you need goes here.
"caffe如果找不到標頭檔案,就需要在這裡加入指定路徑,當然該路徑下沒有這個目錄則會報錯"
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/local/cuda-10.2/include /usr/include/hdf5/serial /xxxx/Opencv2/include
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/local/cuda-10.2/lib /usr/lib/x86_64-linux-gnu/hdf5/serial /xxxx/Opencv2/lib
# If Homebrew is installed at a non standard location (for example your home directory) and you use it for general dependencies
# INCLUDE_DIRS += $(shell brew --prefix)/include
# LIBRARY_DIRS += $(shell brew --prefix)/lib
# NCCL acceleration switch (uncomment to build with NCCL)
# https://github.com/NVIDIA/nccl (last tested version: v1.2.3-1+cuda8.0)
# USE_NCCL := 1
# Uncomment to use `pkg-config` to specify OpenCV library paths.
# (Usually not necessary -- OpenCV libraries are normally installed in one of the above $LIBRARY_DIRS.)
"一般不需要開啟,我在裝opencv之前就已經裝了pkg-config依賴,那麼就不需要caffe再指定,並且預設是開啟的"
# USE_PKG_CONFIG := 1
# N.B. both build and distribute dirs are cleared on `make clean`
BUILD_DIR := build
DISTRIBUTE_DIR := distribute
# Uncomment for debugging. Does not work on OSX due to https://github.com/BVLC/caffe/issues/171
# DEBUG := 1
# The ID of the GPU that 'make runtest' will use to run unit tests.
"所用的GPU的ID編號"
TEST_GPUID := 0
# enable pretty build (comment to see full commands)
Q ?= @
上述配置檔案如果修改錯誤,具體的錯誤結果見下面的錯誤總結裡,配置成正確的makefile.config檔案後,修改makefile:
#將:
NVCCFLAGS +=-ccbin=$(CXX) -Xcompiler-fPIC $(COMMON_FLAGS)
#替換為:
NVCCFLAGS += -D_FORCE_INLINES -ccbin=$(CXX) -Xcompiler -fPIC $(COMMON_FLAGS)
還有新增img
LIBRARIES + = glog gflags protobuf leveldb
snappy lmdb boost_system hdf5_hl hdf5 m
opencv_core opencv_highgui opencv_imgproc
# 更改為
LIBRARIES + = glog gflags protobuf leveldb
snappy lmdb boost_system hdf5_hl hdf5 m
opencv_core opencv_highgui opencv_imgproc opencv_imgcodecs
然後就能進行編譯了:
make all -j12
make test -j12
make runtest
make pycaffe
一般狀態會如下:
[----------] 1 test from PlatformTest
[ RUN ] PlatformTest.TestInitialization
Major revision number: 7
Minor revision number: 5
Name: Quadro RTX 5000
Total global memory: 16908615680
Total shared memory per block: 49152
Total registers per block: 65536
Warp size: 32
Maximum memory pitch: 2147483647
Maximum threads per block: 1024
Maximum dimension 0 of block: 1024
Maximum dimension 1 of block: 1024
Maximum dimension 2 of block: 64
Maximum dimension 0 of grid: 2147483647
Maximum dimension 1 of grid: 65535
Maximum dimension 2 of grid: 65535
Clock rate: 1815000
Total constant memory: 65536
Texture alignment: 512
Concurrent copy and execution: Yes
Number of multiprocessors: 48
Kernel execution timeout: No
Unified virtual addressing: Yes
[ OK ] PlatformTest.TestInitialization (0 ms)
[----------] 1 test from PlatformTest (0 ms total)
沒問題後加入環境變數:
export PYTHONPATH=~/caffe/python:$PYTHONPATH
至此,就可以在全域性使用caffe了。
錯誤總結
src/caffe/data_transformer.cpp:2:33: fatal error: opencv2/core/core.hpp:
第一點,如果LIBRARY_DIRS和INCLUDE_DIRS必須加上相應的路徑,不然會報錯:
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial /xxxx/Opencv2/include
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu/hdf5/serial /xxxx/Opencv2/lib
src/caffe/data_transformer.cpp:2:33: fatal error: opencv2/core/core.hpp: 沒有那個檔案或目錄。
Unsupported gpu architecture ‘compute_20’:
具體報錯為:
NVCC src/caffe/layers/bnll_layer.cu
nvcc fatal : Unsupported gpu architecture 'compute_20'
Makefile:594: recipe for target '.build_release/cuda/src/caffe/layers/bnll_layer.o' failed
make: *** [.build_release/cuda/src/caffe/layers/bnll_layer.o] Error 1
make: *** Waiting for unfinished jobs....
這個是因為cuda在大於9.0以後的版本,都需要將上面配置檔案的第一二行去掉,我猜測是因為cuda 9.0不再支援20和30了。
/usr/include/boost/python/detail/wrap_python.hpp:50:23: fatal error: pyconfig.h: No such file or dir
具體報錯為:
/usr/include/boost/python/detail/wrap_python.hpp:50:23: fatal error: pyconfig.h: No such file or directory
compilation terminated.
Makefile:582: recipe for target '.build_release/tools/caffe.o' failed
make: *** [.build_release/tools/caffe.o] Error 1
make: *** Waiting for unfinished jobs....
In file included from /usr/include/boost/python/detail/prefix.hpp:13:0,
from /usr/include/boost/python/args.hpp:8,
from /usr/include/boost/python.hpp:11,
from src/caffe/layer_factory.cpp:4:
/usr/include/boost/python/detail/wrap_python.hpp:50:23: fatal error: pyconfig.h: No such file or directory
出現這個報錯的原因是可能會有兩個:
- 檢查配置檔案中路徑是否寫對,主要是寫的PYTHON_INCLUDE的每個包路徑都必須真實存在
- 如果路徑沒問題,但也找不到辦法了,可以重灌 sudo apt-get install python-dev python3-dev
libhdf5_hl.so.100(XXX): cannot open shared object file: No such file or directory,Error127
上面的hdf5-devel有可能安裝失敗,或者版本非hdf5為libhdf5_hl.so.100,這個東西很重要,因為我上面的配置檔案是通過原始碼安裝了這個東西,安裝過程如下:
$ gunzip < hdf5-X.Y.Z.tar.gz | tar xf - #解壓縮
$ cd hdf5-X.Y.Z
$ ./configure --prefix=/usr/local/hdf5 #安裝路徑
$ make
$ make check # run test suite.
$ make install
$ make check-install # verify installation.
然後在上面的配置檔案中加入編譯過後的路徑。因為這裡指定了位置在/usr/local/hdf5下。
undefined symbol: _ZN5boost6python6detail11init_moduleER11PyModuleDefPFvvE
關於boost重新原始碼編譯上面的連結有提供,這裡就不解釋了。
undefined reference to `cv::imencode(cv::String const&, cv::_InputArray const&, std::vector<unsigned
.build_debug/lib/libcaffe.so: undefined reference to `cv::imencode(cv::String const&, cv::_InputArray const&, std::vector<unsigned char, std::allocator<unsigned char> >&, std::vector<int, std::allocator<int> > const&)'
.build_debug/lib/libcaffe.so: undefined reference to `cv::imdecode(cv::_InputArray const&, int)'
make: *** [.build_debug/tools/extract_features.bin] Error 1collect2: error: ld returned 1 exit status
make: *** [.build_debug/tools/compute_image_mean.bin] Error 1
.build_debug/lib/libcaffe.so: undefined reference to `cv::imread(cv::String const&, int)'
.build_debug/lib/libcaffe.so: undefined reference to `cv::imencode(cv::String const&, cv::_InputArray const&, std::vector<unsigned char, std::allocator<unsigned char> >&, std::vector<int, std::allocator<int> > const&)'
.build_debug/lib/libcaffe.so: undefined reference to `cv::imdecode(cv::_InputArray const&, int)'
collect2: error: ld returned 1 exit status
make: *** [.build_debug/tools/convert_imageset.bin] Error 1
報錯原因根據GitHub的 https://github.com/BVLC/caffe/issues/2348,需要更改makefile中原始碼為:
LIBRARIES + = glog gflags protobuf leveldb
snappy lmdb boost_system hdf5_hl hdf5 m
opencv_core opencv_highgui opencv_imgproc opencv_imgcodecs
注意最後一個opencv_imgcodecs看有沒有,一般imgcodecs是作為了
GPU device_alternate.hpp:34:10: fatal error: cublas_v2.h: No such file or directory
原因為安裝cuda有問題。少了一些標頭檔案,全域性搜尋是否有,如果沒有,那麼就要考慮重灌cuda了。
/usr/bin/ld: 找不到 -lopencv_core
如果是opencv一般伴隨不止一個,可能有很多個,我最多一次為:
CXX src/caffe/util/math_functions.cpp
CXX src/caffe/util/signal_handler.cpp
CXX src/caffe/util/upgrade_proto.cpp
AR -o .build_release/lib/libcaffe.a
LD -o .build_release/lib/libcaffe.so.1.0.0
/usr/bin/ld: 找不到 -lopencv_core
/usr/bin/ld: 找不到 -lopencv_highgui
/usr/bin/ld: 找不到 -lopencv_imgproc
/usr/bin/ld: 找不到 -lopencv_imgcodecs
/usr/bin/ld: 找不到 -lboost_python3
/usr/bin/ld: 找不到 -lcblas
/usr/bin/ld: 找不到 -latlas
collect2: 錯誤:ld 返回 1
這個需要一個個去解決了,首先關於opencv的,如果採用的原始碼安裝方式,可能動態庫並沒有連結上去,所以還需要手動新增一下。而boost_python3的動態庫,就需要全域性查詢一下libboost_python36也將它連結過去,lcblas是需要sudo apt-get install libopenblas-dev,並在makefile裡把blas改為open,centos下是yum install openblas
相關文章
- C語言程式碼區錯誤以及編譯過程C語言編譯
- Android編譯通過,執行編譯錯誤問題總結Android編譯
- Windows 下 Laravel Mix 資源編譯過程以及產生的錯誤解決WindowsLaravel編譯
- (14)caffe總結之Linux下Caffe如何除錯Linux除錯
- 編譯連結過程編譯
- caffe make 編譯編譯
- GCC編譯和連結過程GC編譯
- GCC編譯過程(預處理->編譯->彙編->連結)GC編譯
- ipvs編譯錯誤編譯
- 編譯過程編譯
- (7)caffe總結之Blob,Layer and Net以及對應配置檔案的編寫
- flutter 編譯報錯總結(不斷更新)Flutter編譯
- vc-vs2019編譯報錯總結編譯
- Latex — 寫作編譯過程中遇到問題記錄與總結編譯
- Linux_C++_編譯過程以及二進位制分析LinuxC++編譯
- 應用程式邏輯錯誤總結
- cesium原始碼編譯除錯及呼叫全過程原始碼編譯除錯
- opencv 編譯常見錯誤OpenCV編譯
- Win10下Qt+OpenCV+Cmake編譯錯誤Win10QTOpenCV編譯
- C++ 編譯過程C++編譯
- 編譯過程簡介編譯
- Ubuntu 16 04 編譯 Caffe SSDUbuntu編譯
- CentOS 下編譯安裝 NginxCentOS編譯Nginx
- CentOS 下編譯安裝 apacheCentOS編譯Apache
- 編譯ROCKSDB總結編譯
- IDEA報錯java: 編譯失敗: 內部 java 編譯器錯誤IdeaJava編譯
- C語言的編譯連結執行過程C語言編譯
- C語言編譯和連結過程簡介C語言編譯
- CentOS7.3 編譯搭建 lamp 環境CentOS編譯LAMP
- ios底層 編譯過程iOS編譯
- 痛苦的過程,編譯glomap編譯
- PHP編譯安裝時常見錯誤解決辦法,php編譯常見錯誤PHP編譯
- Polar mask錯誤總結
- Python部分錯誤總結Python
- AndroidStudio之NDK常見編譯錯誤Android編譯
- Include檔案易犯編譯錯誤編譯
- path_provider 編譯錯誤指導IDE編譯
- Idea編譯錯誤解決辦法Idea編譯