centos下caffe用GPU編譯搭建過程以及錯誤總結

submarineas發表於2020-10-12

引言

本篇想記錄一下caffe安裝過程中我犯下的一些問題,在另一篇opencv用CPU編譯的時候我曾經說GPU版本的opencv是編譯得最久的,但現在看caffe是平分秋色了,特別是版本問題,比opencv更加難處理。

安裝筆記

本篇是建立在有幾個前置條件的情況下進行的,具體的環境為:

  • opencv 3.4.6
  • cuda 10.2與cudnn 7.6.5
  • anaconda3 / python 3.6

上面的三個環境在我之前部落格都有寫,連結為:

Linux下從0開始GPU環境搭建與啟動測試

ubuntu18.04下opencv用CPU編譯全過程

其中opencv是經過python3編譯後的,這也讓我在編譯caffe時選擇了艱難的python3,而不是python2,因為caffe已經很多年沒更新了,GitHub上的版本最推薦的還是python2,即使兩個都停止了維護,但還是有框架需要依託這個環境。

專案依賴

centos下是:

sudo yum install protobuf-devel leveldb-devel snappy-devel opencv-devel boost-devel hdf5-devel
sudo yum install gflags-devel glog-devel lmdb-devel

另外記錄一下ubuntu下是:

sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libhdf5-serial-dev protobuf-compiler
sudo apt-get install --no-install-recommends libboost-all-dev
sudo apt-get install libopenblas-dev liblapack-dev libatlas-base-dev
sudo apt-get install libgflags-dev libgoogle-glog-dev liblmdb-dev

mark一下,下面的還是以centos為安裝方式。

另外拉取GitHub上關於caffe的程式碼,為:

git clone https://github.com/BVLC/caffe.git

如果有必要,可以先建立一個虛擬環境,先安裝python 依賴:

for req in $(cat requirements.txt); do pip install $req; done

這是caffe推薦的安裝方式,等執行完成後,再執行一遍pip install -r requirement如果都是already,那麼代表python依賴安裝完成。這裡可能還需要安裝boost和hdf5的動態庫,hdf5可能上面直接用yum或者apt安裝的版本沒問題,需要檢查一下是否有boost動態庫安裝方式為:

安裝過程

首先進入caffe下的makefile.config中,或者cp一份makefile.config.example為makefile.config,vim修改此檔案:

## Refer to http://caffe.berkeleyvision.org/installation.html
# Contributions simplifying and improving our build system are welcome!

# cuDNN acceleration switch (uncomment to build with cuDNN).
USE_CUDNN := 1

# CPU-only switch (uncomment to build without GPU support).
# CPU_ONLY := 1

# uncomment to disable IO dependencies and corresponding data layers
# USE_OPENCV := 0
# USE_LEVELDB := 0
# USE_LMDB := 0
# This code is taken from https://github.com/sh1r0/caffe-android-lib
# USE_HDF5 := 0

# uncomment to allow MDB_NOLOCK when reading LMDB files (only if necessary)
#	You should not set this flag if you will be reading LMDBs with any
#	possibility of simultaneous read and write
# ALLOW_LMDB_NOLOCK := 1

# Uncomment if you're using OpenCV 3
# 安裝的是opencv==3.4.7
OPENCV_VERSION := 3.4.7

# To customize your choice of compiler, uncomment and set the following.
# N.B. the default for Linux is g++ and the default for OSX is clang++
"linux系統預設使用g++編譯器,OSX則是clang++。"
# CUSTOM_CXX := g++

# CUDA directory contains bin/ and lib/ directories that we need.
"CUDA 路徑"
CUDA_DIR := /usr/local/cuda-10.2
# On Ubuntu 14.04, if cuda tools are installed via
# "sudo apt-get install nvidia-cuda-toolkit" then use this instead:
# CUDA_DIR := /usr

# CUDA architecture setting: going with all of them.
# For CUDA < 6.0, comment the *_50 through *_61 lines for compatibility.
# For CUDA < 8.0, comment the *_60 and *_61 lines for compatibility.
# For CUDA >= 9.0, comment the *_20 and *_21 lines for compatibility.
"代表GPU計算能力的編碼,這些引數也根據GPU的計算能力,CUDA > 9.0 的話需要註釋掉20和21"
CUDA_ARCH := # -gencode arch=compute_20,code=sm_20 \
		# -gencode arch=compute_20,code=sm_21 
		-gencode arch=compute_30,code=sm_30 \
		-gencode arch=compute_35,code=sm_35 \
		-gencode arch=compute_50,code=sm_50 \
		-gencode arch=compute_52,code=sm_52 \
		-gencode arch=compute_60,code=sm_60 \
		-gencode arch=compute_61,code=sm_61 \
		-gencode arch=compute_61,code=compute_61

# BLAS choice:
# atlas for ATLAS (default)
# mkl for MKL
# open for OpenBlas
"如果用的是ATLAS計算庫則賦值atlas,MKL計算庫則用mkl賦值,OpenBlas則賦值open。但目前安裝來看,BLAS需要安裝open,預設的blas找不到動態庫"
BLAS := open
# Custom (MKL/ATLAS/OpenBLAS) include and lib directories.
# Leave commented to accept the defaults for your choice of BLAS
# (which should work)!
BLAS_INCLUDE := /usr/include/atlas
BLAS_LIB := /usr/lib64/atlas

# Homebrew puts openblas in a directory that is not on the standard search path
# BLAS_INCLUDE := $(shell brew --prefix openblas)/include
# BLAS_LIB := $(shell brew --prefix openblas)/lib

# This is required only if you will compile the matlab interface.
# MATLAB directory should contain the mex binary in /bin.
"是否需要編譯出MATLAB的caffe版本"
# MATLAB_DIR := /usr/local
# MATLAB_DIR := /Applications/MATLAB_R2012b.app

# NOTE: this is required only if you will compile the python interface.
# We need to be able to find Python.h and numpy/arrayobject.h.
# PYTHON_INCLUDE := /usr/include/python2.7 \
# 		/usr/lib/python2.7/dist-packages/numpy/core/include
# Anaconda Python distribution is quite popular. Include path:
# Verify anaconda location, sometimes it's in root.
# caffe提供anaconda的編譯,需要將python的註釋,將anaconda的開啟,並修改路徑
ANACONDA_HOME := $(HOME)/anaconda3
PYTHON_INCLUDE := $(ANACONDA_HOME)/include \
		$(ANACONDA_HOME)/include/python3.6m \
		$(ANACONDA_HOME)/lib/python3.6/site-packages/numpy/core/include

# Uncomment to use Python 3 (default is Python 2)
# 在Linux下預設是用的python2.7,但不論是yum還是ubuntu的倉庫最高版本都不超過1.55,所以boost需要重新編譯,並開啟這個選項才確保,不會用python2
PYTHON_LIBRARIES := boost_python3 python3.6m
# PYTHON_INCLUDE := /usr/include/python3.5m \
#                 /usr/lib/python3.5/dist-packages/numpy/core/include

# We need to be able to find libpythonX.X.so or .dylib.
# PYTHON_LIB := /usr/lib
"這個可以寫進下面的LIBRARY_DIRS中,指代python的庫路徑"
PYTHON_LIB := $(ANACONDA_HOME)/lib

# Homebrew installs numpy in a non standard path (keg only)
"運用指令碼找尋當前環境的numpy"
# PYTHON_INCLUDE += $(dir $(shell python -c 'import numpy.core; print(numpy.core.__file__)'))/include
# PYTHON_LIB += $(shell brew --prefix numpy)/lib

# Uncomment to support layers written in Python (will link against Python libs)
"若要使用python來編寫layer,則需要開啟"
WITH_PYTHON_LAYER := 1

# Whatever else you find you need goes here.
"caffe如果找不到標頭檔案,就需要在這裡加入指定路徑,當然該路徑下沒有這個目錄則會報錯"
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/local/cuda-10.2/include /usr/include/hdf5/serial /xxxx/Opencv2/include
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/local/cuda-10.2/lib /usr/lib/x86_64-linux-gnu/hdf5/serial /xxxx/Opencv2/lib

# If Homebrew is installed at a non standard location (for example your home directory) and you use it for general dependencies
# INCLUDE_DIRS += $(shell brew --prefix)/include
# LIBRARY_DIRS += $(shell brew --prefix)/lib

# NCCL acceleration switch (uncomment to build with NCCL)
# https://github.com/NVIDIA/nccl (last tested version: v1.2.3-1+cuda8.0)
# USE_NCCL := 1

# Uncomment to use `pkg-config` to specify OpenCV library paths.
# (Usually not necessary -- OpenCV libraries are normally installed in one of the above $LIBRARY_DIRS.)
"一般不需要開啟,我在裝opencv之前就已經裝了pkg-config依賴,那麼就不需要caffe再指定,並且預設是開啟的"
# USE_PKG_CONFIG := 1

# N.B. both build and distribute dirs are cleared on `make clean`
BUILD_DIR := build
DISTRIBUTE_DIR := distribute

# Uncomment for debugging. Does not work on OSX due to https://github.com/BVLC/caffe/issues/171
# DEBUG := 1

# The ID of the GPU that 'make runtest' will use to run unit tests.
"所用的GPU的ID編號"
TEST_GPUID := 0

# enable pretty build (comment to see full commands)
Q ?= @

上述配置檔案如果修改錯誤,具體的錯誤結果見下面的錯誤總結裡,配置成正確的makefile.config檔案後,修改makefile:

#將:
NVCCFLAGS +=-ccbin=$(CXX) -Xcompiler-fPIC $(COMMON_FLAGS)
#替換為:
NVCCFLAGS += -D_FORCE_INLINES -ccbin=$(CXX) -Xcompiler -fPIC $(COMMON_FLAGS)

還有新增img

LIBRARIES + = glog gflags protobuf leveldb
snappy lmdb boost_system hdf5_hl hdf5 m
opencv_core opencv_highgui opencv_imgproc
# 更改為
LIBRARIES + = glog gflags protobuf leveldb
snappy lmdb boost_system hdf5_hl hdf5 m
opencv_core opencv_highgui opencv_imgproc opencv_imgcodecs

然後就能進行編譯了:

make all -j12
make test -j12
make runtest
make pycaffe

一般狀態會如下:

[----------] 1 test from PlatformTest
[ RUN      ] PlatformTest.TestInitialization
Major revision number:         7
Minor revision number:         5
Name:                          Quadro RTX 5000
Total global memory:           16908615680
Total shared memory per block: 49152
Total registers per block:     65536
Warp size:                     32
Maximum memory pitch:          2147483647
Maximum threads per block:     1024
Maximum dimension 0 of block:  1024
Maximum dimension 1 of block:  1024
Maximum dimension 2 of block:  64
Maximum dimension 0 of grid:   2147483647
Maximum dimension 1 of grid:   65535
Maximum dimension 2 of grid:   65535
Clock rate:                    1815000
Total constant memory:         65536
Texture alignment:             512
Concurrent copy and execution: Yes
Number of multiprocessors:     48
Kernel execution timeout:      No
Unified virtual addressing:    Yes
[       OK ] PlatformTest.TestInitialization (0 ms)
[----------] 1 test from PlatformTest (0 ms total)

沒問題後加入環境變數:

export PYTHONPATH=~/caffe/python:$PYTHONPATH

至此,就可以在全域性使用caffe了。

錯誤總結

src/caffe/data_transformer.cpp:2:33: fatal error: opencv2/core/core.hpp:

第一點,如果LIBRARY_DIRS和INCLUDE_DIRS必須加上相應的路徑,不然會報錯:

INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial /xxxx/Opencv2/include
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu/hdf5/serial /xxxx/Opencv2/lib

src/caffe/data_transformer.cpp:2:33: fatal error: opencv2/core/core.hpp: 沒有那個檔案或目錄。


Unsupported gpu architecture ‘compute_20’

具體報錯為:

NVCC src/caffe/layers/bnll_layer.cu
nvcc fatal   : Unsupported gpu architecture 'compute_20'
Makefile:594: recipe for target '.build_release/cuda/src/caffe/layers/bnll_layer.o' failed
make: *** [.build_release/cuda/src/caffe/layers/bnll_layer.o] Error 1
make: *** Waiting for unfinished jobs....

這個是因為cuda在大於9.0以後的版本,都需要將上面配置檔案的第一二行去掉,我猜測是因為cuda 9.0不再支援20和30了。

/usr/include/boost/python/detail/wrap_python.hpp:50:23: fatal error: pyconfig.h: No such file or dir

具體報錯為:

/usr/include/boost/python/detail/wrap_python.hpp:50:23: fatal error: pyconfig.h: No such file or directory
compilation terminated.
Makefile:582: recipe for target '.build_release/tools/caffe.o' failed
make: *** [.build_release/tools/caffe.o] Error 1
make: *** Waiting for unfinished jobs....
In file included from /usr/include/boost/python/detail/prefix.hpp:13:0,
                 from /usr/include/boost/python/args.hpp:8,
                 from /usr/include/boost/python.hpp:11,
                 from src/caffe/layer_factory.cpp:4:
/usr/include/boost/python/detail/wrap_python.hpp:50:23: fatal error: pyconfig.h: No such file or directory

出現這個報錯的原因是可能會有兩個:

  • 檢查配置檔案中路徑是否寫對,主要是寫的PYTHON_INCLUDE的每個包路徑都必須真實存在
  • 如果路徑沒問題,但也找不到辦法了,可以重灌 sudo apt-get install python-dev python3-dev

libhdf5_hl.so.100(XXX): cannot open shared object file: No such file or directory,Error127

上面的hdf5-devel有可能安裝失敗,或者版本非hdf5為libhdf5_hl.so.100,這個東西很重要,因為我上面的配置檔案是通過原始碼安裝了這個東西,安裝過程如下:

$ gunzip < hdf5-X.Y.Z.tar.gz | tar xf -   #解壓縮
$ cd hdf5-X.Y.Z
$ ./configure --prefix=/usr/local/hdf5  #安裝路徑
$ make
$ make check                # run test suite.
$ make install
$ make check-install        # verify installation.

然後在上面的配置檔案中加入編譯過後的路徑。因為這裡指定了位置在/usr/local/hdf5下。


undefined symbol: _ZN5boost6python6detail11init_moduleER11PyModuleDefPFvvE

關於boost重新原始碼編譯上面的連結有提供,這裡就不解釋了。

libboost_python3.6編譯過程記錄


undefined reference to `cv::imencode(cv::String const&, cv::_InputArray const&, std::vector<unsigned

.build_debug/lib/libcaffe.so: undefined reference to `cv::imencode(cv::String const&, cv::_InputArray const&, std::vector<unsigned char, std::allocator<unsigned char> >&, std::vector<int, std::allocator<int> > const&)'
.build_debug/lib/libcaffe.so: undefined reference to `cv::imdecode(cv::_InputArray const&, int)'
make: *** [.build_debug/tools/extract_features.bin] Error 1collect2: error: ld returned 1 exit status

make: *** [.build_debug/tools/compute_image_mean.bin] Error 1
.build_debug/lib/libcaffe.so: undefined reference to `cv::imread(cv::String const&, int)'
.build_debug/lib/libcaffe.so: undefined reference to `cv::imencode(cv::String const&, cv::_InputArray const&, std::vector<unsigned char, std::allocator<unsigned char> >&, std::vector<int, std::allocator<int> > const&)'
.build_debug/lib/libcaffe.so: undefined reference to `cv::imdecode(cv::_InputArray const&, int)'
collect2: error: ld returned 1 exit status
make: *** [.build_debug/tools/convert_imageset.bin] Error 1

報錯原因根據GitHub的 https://github.com/BVLC/caffe/issues/2348,需要更改makefile中原始碼為:

LIBRARIES + = glog gflags protobuf leveldb
snappy lmdb boost_system hdf5_hl hdf5 m
opencv_core opencv_highgui opencv_imgproc opencv_imgcodecs

注意最後一個opencv_imgcodecs看有沒有,一般imgcodecs是作為了

GPU device_alternate.hpp:34:10: fatal error: cublas_v2.h: No such file or directory

原因為安裝cuda有問題。少了一些標頭檔案,全域性搜尋是否有,如果沒有,那麼就要考慮重灌cuda了。


/usr/bin/ld: 找不到 -lopencv_core

如果是opencv一般伴隨不止一個,可能有很多個,我最多一次為:

CXX src/caffe/util/math_functions.cpp
CXX src/caffe/util/signal_handler.cpp
CXX src/caffe/util/upgrade_proto.cpp
AR -o .build_release/lib/libcaffe.a
LD -o .build_release/lib/libcaffe.so.1.0.0
/usr/bin/ld: 找不到 -lopencv_core
/usr/bin/ld: 找不到 -lopencv_highgui
/usr/bin/ld: 找不到 -lopencv_imgproc
/usr/bin/ld: 找不到 -lopencv_imgcodecs
/usr/bin/ld: 找不到 -lboost_python3
/usr/bin/ld: 找不到 -lcblas
/usr/bin/ld: 找不到 -latlas
collect2: 錯誤:ld 返回 1

這個需要一個個去解決了,首先關於opencv的,如果採用的原始碼安裝方式,可能動態庫並沒有連結上去,所以還需要手動新增一下。而boost_python3的動態庫,就需要全域性查詢一下libboost_python36也將它連結過去,lcblas是需要sudo apt-get install libopenblas-dev,並在makefile裡把blas改為open,centos下是yum install openblas

相關文章