centos下caffe用GPU編譯搭建過程以及錯誤總結

submarineas發表於2020-10-12

原文網址 : https://blog.csdn.net/submarineas/article/details/108986843

CentOSGPU編譯

引言

本篇想記錄一下caffe安裝過程中我犯下的一些問題，在另一篇opencv用CPU編譯的時候我曾經說GPU版本的opencv是編譯得最久的，但現在看caffe是平分秋色了，特別是版本問題，比opencv更加難處理。

安裝筆記

本篇是建立在有幾個前置條件的情況下進行的，具體的環境為：

opencv 3.4.6
cuda 10.2與cudnn 7.6.5
anaconda3 / python 3.6

上面的三個環境在我之前部落格都有寫，連結為：

Linux下從0開始GPU環境搭建與啟動測試

ubuntu18.04下opencv用CPU編譯全過程

其中opencv是經過python3編譯後的，這也讓我在編譯caffe時選擇了艱難的python3，而不是python2，因為caffe已經很多年沒更新了，GitHub上的版本最推薦的還是python2，即使兩個都停止了維護，但還是有框架需要依託這個環境。

專案依賴

centos下是：

sudo yum install protobuf-devel leveldb-devel snappy-devel opencv-devel boost-devel hdf5-devel
sudo yum install gflags-devel glog-devel lmdb-devel

另外記錄一下ubuntu下是：

sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libhdf5-serial-dev protobuf-compiler
sudo apt-get install --no-install-recommends libboost-all-dev
sudo apt-get install libopenblas-dev liblapack-dev libatlas-base-dev
sudo apt-get install libgflags-dev libgoogle-glog-dev liblmdb-dev

mark一下，下面的還是以centos為安裝方式。

另外拉取GitHub上關於caffe的程式碼，為：

git clone https://github.com/BVLC/caffe.git

如果有必要，可以先建立一個虛擬環境，先安裝python 依賴：

for req in $(cat requirements.txt); do pip install $req; done

這是caffe推薦的安裝方式，等執行完成後，再執行一遍pip install -r requirement如果都是already，那麼代表python依賴安裝完成。這裡可能還需要安裝boost和hdf5的動態庫，hdf5可能上面直接用yum或者apt安裝的版本沒問題，需要檢查一下是否有boost動態庫安裝方式為：

安裝過程

首先進入caffe下的makefile.config中，或者cp一份makefile.config.example為makefile.config，vim修改此檔案：

## Refer to http://caffe.berkeleyvision.org/installation.html
# Contributions simplifying and improving our build system are welcome!

# cuDNN acceleration switch (uncomment to build with cuDNN).
USE_CUDNN := 1

# CPU-only switch (uncomment to build without GPU support).
# CPU_ONLY := 1

# uncomment to disable IO dependencies and corresponding data layers
# USE_OPENCV := 0
# USE_LEVELDB := 0
# USE_LMDB := 0
# This code is taken from https://github.com/sh1r0/caffe-android-lib
# USE_HDF5 := 0

# uncomment to allow MDB_NOLOCK when reading LMDB files (only if necessary)
#	You should not set this flag if you will be reading LMDBs with any
#	possibility of simultaneous read and write
# ALLOW_LMDB_NOLOCK := 1

# Uncomment if you're using OpenCV 3
# 安裝的是opencv==3.4.7
OPENCV_VERSION := 3.4.7

# To customize your choice of compiler, uncomment and set the following.
# N.B. the default for Linux is g++ and the default for OSX is clang++
"linux系統預設使用g++編譯器，OSX則是clang++。"
# CUSTOM_CXX := g++

# CUDA directory contains bin/ and lib/ directories that we need.
"CUDA 路徑"
CUDA_DIR := /usr/local/cuda-10.2
# On Ubuntu 14.04, if cuda tools are installed via
# "sudo apt-get install nvidia-cuda-toolkit" then use this instead:
# CUDA_DIR := /usr

# CUDA architecture setting: going with all of them.
# For CUDA < 6.0, comment the *_50 through *_61 lines for compatibility.
# For CUDA < 8.0, comment the *_60 and *_61 lines for compatibility.
# For CUDA >= 9.0, comment the *_20 and *_21 lines for compatibility.
"代表GPU計算能力的編碼，這些引數也根據GPU的計算能力，CUDA > 9.0 的話需要註釋掉20和21"
CUDA_ARCH := # -gencode arch=compute_20,code=sm_20 \
		# -gencode arch=compute_20,code=sm_21 
		-gencode arch=compute_30,code=sm_30 \
		-gencode arch=compute_35,code=sm_35 \
		-gencode arch=compute_50,code=sm_50 \
		-gencode arch=compute_52,code=sm_52 \
		-gencode arch=compute_60,code=sm_60 \
		-gencode arch=compute_61,code=sm_61 \
		-gencode arch=compute_61,code=compute_61

# BLAS choice:
# atlas for ATLAS (default)
# mkl for MKL
# open for OpenBlas
"如果用的是ATLAS計算庫則賦值atlas，MKL計算庫則用mkl賦值，OpenBlas則賦值open。但目前安裝來看，BLAS需要安裝open，預設的blas找不到動態庫"
BLAS := open
# Custom (MKL/ATLAS/OpenBLAS) include and lib directories.
# Leave commented to accept the defaults for your choice of BLAS
# (which should work)!
BLAS_INCLUDE := /usr/include/atlas
BLAS_LIB := /usr/lib64/atlas

# Homebrew puts openblas in a directory that is not on the standard search path
# BLAS_INCLUDE := $(shell brew --prefix openblas)/include
# BLAS_LIB := $(shell brew --prefix openblas)/lib

# This is required only if you will compile the matlab interface.
# MATLAB directory should contain the mex binary in /bin.
"是否需要編譯出MATLAB的caffe版本"
# MATLAB_DIR := /usr/local
# MATLAB_DIR := /Applications/MATLAB_R2012b.app

# NOTE: this is required only if you will compile the python interface.
# We need to be able to find Python.h and numpy/arrayobject.h.
# PYTHON_INCLUDE := /usr/include/python2.7 \
# 		/usr/lib/python2.7/dist-packages/numpy/core/include
# Anaconda Python distribution is quite popular. Include path:
# Verify anaconda location, sometimes it's in root.
# caffe提供anaconda的編譯，需要將python的註釋，將anaconda的開啟，並修改路徑
ANACONDA_HOME := $(HOME)/anaconda3
PYTHON_INCLUDE := $(ANACONDA_HOME)/include \
		$(ANACONDA_HOME)/include/python3.6m \
		$(ANACONDA_HOME)/lib/python3.6/site-packages/numpy/core/include

# Uncomment to use Python 3 (default is Python 2)
# 在Linux下預設是用的python2.7，但不論是yum還是ubuntu的倉庫最高版本都不超過1.55，所以boost需要重新編譯，並開啟這個選項才確保，不會用python2
PYTHON_LIBRARIES := boost_python3 python3.6m
# PYTHON_INCLUDE := /usr/include/python3.5m \
#                 /usr/lib/python3.5/dist-packages/numpy/core/include

# We need to be able to find libpythonX.X.so or .dylib.
# PYTHON_LIB := /usr/lib
"這個可以寫進下面的LIBRARY_DIRS中，指代python的庫路徑"
PYTHON_LIB := $(ANACONDA_HOME)/lib

# Homebrew installs numpy in a non standard path (keg only)
"運用指令碼找尋當前環境的numpy"
# PYTHON_INCLUDE += $(dir $(shell python -c 'import numpy.core; print(numpy.core.__file__)'))/include
# PYTHON_LIB += $(shell brew --prefix numpy)/lib

# Uncomment to support layers written in Python (will link against Python libs)
"若要使用python來編寫layer，則需要開啟"
WITH_PYTHON_LAYER := 1

# Whatever else you find you need goes here.
"caffe如果找不到標頭檔案，就需要在這裡加入指定路徑，當然該路徑下沒有這個目錄則會報錯"
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/local/cuda-10.2/include /usr/include/hdf5/serial /xxxx/Opencv2/include
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/local/cuda-10.2/lib /usr/lib/x86_64-linux-gnu/hdf5/serial /xxxx/Opencv2/lib

# If Homebrew is installed at a non standard location (for example your home directory) and you use it for general dependencies
# INCLUDE_DIRS += $(shell brew --prefix)/include
# LIBRARY_DIRS += $(shell brew --prefix)/lib

# NCCL acceleration switch (uncomment to build with NCCL)
# https://github.com/NVIDIA/nccl (last tested version: v1.2.3-1+cuda8.0)
# USE_NCCL := 1

# Uncomment to use `pkg-config` to specify OpenCV library paths.
# (Usually not necessary -- OpenCV libraries are normally installed in one of the above $LIBRARY_DIRS.)
"一般不需要開啟，我在裝opencv之前就已經裝了pkg-config依賴，那麼就不需要caffe再指定，並且預設是開啟的"
# USE_PKG_CONFIG := 1

# N.B. both build and distribute dirs are cleared on `make clean`
BUILD_DIR := build
DISTRIBUTE_DIR := distribute

# Uncomment for debugging. Does not work on OSX due to https://github.com/BVLC/caffe/issues/171
# DEBUG := 1

# The ID of the GPU that 'make runtest' will use to run unit tests.
"所用的GPU的ID編號"
TEST_GPUID := 0

# enable pretty build (comment to see full commands)
Q ?= @

上述配置檔案如果修改錯誤，具體的錯誤結果見下面的錯誤總結裡，配置成正確的makefile.config檔案後，修改makefile：

#將：
NVCCFLAGS +=-ccbin=$(CXX) -Xcompiler-fPIC $(COMMON_FLAGS)
#替換為：
NVCCFLAGS += -D_FORCE_INLINES -ccbin=$(CXX) -Xcompiler -fPIC $(COMMON_FLAGS)

還有新增img

LIBRARIES + = glog gflags protobuf leveldb
snappy lmdb boost_system hdf5_hl hdf5 m
opencv_core opencv_highgui opencv_imgproc
# 更改為
LIBRARIES + = glog gflags protobuf leveldb
snappy lmdb boost_system hdf5_hl hdf5 m
opencv_core opencv_highgui opencv_imgproc opencv_imgcodecs

然後就能進行編譯了：

make all -j12
make test -j12
make runtest
make pycaffe

一般狀態會如下：

[----------] 1 test from PlatformTest
[ RUN      ] PlatformTest.TestInitialization
Major revision number:         7
Minor revision number:         5
Name:                          Quadro RTX 5000
Total global memory:           16908615680
Total shared memory per block: 49152
Total registers per block:     65536
Warp size:                     32
Maximum memory pitch:          2147483647
Maximum threads per block:     1024
Maximum dimension 0 of block:  1024
Maximum dimension 1 of block:  1024
Maximum dimension 2 of block:  64
Maximum dimension 0 of grid:   2147483647
Maximum dimension 1 of grid:   65535
Maximum dimension 2 of grid:   65535
Clock rate:                    1815000
Total constant memory:         65536
Texture alignment:             512
Concurrent copy and execution: Yes
Number of multiprocessors:     48
Kernel execution timeout:      No
Unified virtual addressing:    Yes
[       OK ] PlatformTest.TestInitialization (0 ms)
[----------] 1 test from PlatformTest (0 ms total)

沒問題後加入環境變數：

export PYTHONPATH=~/caffe/python:$PYTHONPATH

至此，就可以在全域性使用caffe了。

錯誤總結

src/caffe/data_transformer.cpp:2:33: fatal error: opencv2/core/core.hpp:

第一點，如果LIBRARY_DIRS和INCLUDE_DIRS必須加上相應的路徑，不然會報錯：

INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial /xxxx/Opencv2/include
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu/hdf5/serial /xxxx/Opencv2/lib

src/caffe/data_transformer.cpp:2:33: fatal error: opencv2/core/core.hpp: 沒有那個檔案或目錄。

Unsupported gpu architecture ‘compute_20’：

具體報錯為：

NVCC src/caffe/layers/bnll_layer.cu
nvcc fatal   : Unsupported gpu architecture 'compute_20'
Makefile:594: recipe for target '.build_release/cuda/src/caffe/layers/bnll_layer.o' failed
make: *** [.build_release/cuda/src/caffe/layers/bnll_layer.o] Error 1
make: *** Waiting for unfinished jobs....

這個是因為cuda在大於9.0以後的版本，都需要將上面配置檔案的第一二行去掉，我猜測是因為cuda 9.0不再支援20和30了。

/usr/include/boost/python/detail/wrap_python.hpp:50:23: fatal error: pyconfig.h: No such file or dir

具體報錯為：

/usr/include/boost/python/detail/wrap_python.hpp:50:23: fatal error: pyconfig.h: No such file or directory
compilation terminated.
Makefile:582: recipe for target '.build_release/tools/caffe.o' failed
make: *** [.build_release/tools/caffe.o] Error 1
make: *** Waiting for unfinished jobs....
In file included from /usr/include/boost/python/detail/prefix.hpp:13:0,
                 from /usr/include/boost/python/args.hpp:8,
                 from /usr/include/boost/python.hpp:11,
                 from src/caffe/layer_factory.cpp:4:
/usr/include/boost/python/detail/wrap_python.hpp:50:23: fatal error: pyconfig.h: No such file or directory

出現這個報錯的原因是可能會有兩個：

檢查配置檔案中路徑是否寫對，主要是寫的PYTHON_INCLUDE的每個包路徑都必須真實存在
如果路徑沒問題，但也找不到辦法了，可以重灌 sudo apt-get install python-dev python3-dev

libhdf5_hl.so.100(XXX): cannot open shared object file: No such file or directory，Error127

上面的hdf5-devel有可能安裝失敗，或者版本非hdf5為libhdf5_hl.so.100，這個東西很重要，因為我上面的配置檔案是通過原始碼安裝了這個東西，安裝過程如下：

$ gunzip < hdf5-X.Y.Z.tar.gz | tar xf -   #解壓縮
$ cd hdf5-X.Y.Z
$ ./configure --prefix=/usr/local/hdf5  #安裝路徑
$ make
$ make check                # run test suite.
$ make install
$ make check-install        # verify installation.

然後在上面的配置檔案中加入編譯過後的路徑。因為這裡指定了位置在/usr/local/hdf5下。

undefined symbol: _ZN5boost6python6detail11init_moduleER11PyModuleDefPFvvE

關於boost重新原始碼編譯上面的連結有提供，這裡就不解釋了。

libboost_python3.6編譯過程記錄

undefined reference to `cv::imencode(cv::String const&, cv::_InputArray const&, std::vector<unsigned

.build_debug/lib/libcaffe.so: undefined reference to `cv::imencode(cv::String const&, cv::_InputArray const&, std::vector<unsigned char, std::allocator<unsigned char> >&, std::vector<int, std::allocator<int> > const&)'
.build_debug/lib/libcaffe.so: undefined reference to `cv::imdecode(cv::_InputArray const&, int)'
make: *** [.build_debug/tools/extract_features.bin] Error 1collect2: error: ld returned 1 exit status

make: *** [.build_debug/tools/compute_image_mean.bin] Error 1
.build_debug/lib/libcaffe.so: undefined reference to `cv::imread(cv::String const&, int)'
.build_debug/lib/libcaffe.so: undefined reference to `cv::imencode(cv::String const&, cv::_InputArray const&, std::vector<unsigned char, std::allocator<unsigned char> >&, std::vector<int, std::allocator<int> > const&)'
.build_debug/lib/libcaffe.so: undefined reference to `cv::imdecode(cv::_InputArray const&, int)'
collect2: error: ld returned 1 exit status
make: *** [.build_debug/tools/convert_imageset.bin] Error 1

報錯原因根據GitHub的 https://github.com/BVLC/caffe/issues/2348，需要更改makefile中原始碼為：

LIBRARIES + = glog gflags protobuf leveldb
snappy lmdb boost_system hdf5_hl hdf5 m
opencv_core opencv_highgui opencv_imgproc opencv_imgcodecs

注意最後一個opencv_imgcodecs看有沒有，一般imgcodecs是作為了

GPU device_alternate.hpp:34:10: fatal error: cublas_v2.h: No such file or directory

原因為安裝cuda有問題。少了一些標頭檔案，全域性搜尋是否有，如果沒有，那麼就要考慮重灌cuda了。

/usr/bin/ld: 找不到 -lopencv_core

如果是opencv一般伴隨不止一個，可能有很多個，我最多一次為：

CXX src/caffe/util/math_functions.cpp
CXX src/caffe/util/signal_handler.cpp
CXX src/caffe/util/upgrade_proto.cpp
AR -o .build_release/lib/libcaffe.a
LD -o .build_release/lib/libcaffe.so.1.0.0
/usr/bin/ld: 找不到 -lopencv_core
/usr/bin/ld: 找不到 -lopencv_highgui
/usr/bin/ld: 找不到 -lopencv_imgproc
/usr/bin/ld: 找不到 -lopencv_imgcodecs
/usr/bin/ld: 找不到 -lboost_python3
/usr/bin/ld: 找不到 -lcblas
/usr/bin/ld: 找不到 -latlas
collect2: 錯誤：ld 返回 1

這個需要一個個去解決了，首先關於opencv的，如果採用的原始碼安裝方式，可能動態庫並沒有連結上去，所以還需要手動新增一下。而boost_python3的動態庫，就需要全域性查詢一下libboost_python36也將它連結過去，lcblas是需要sudo apt-get install libopenblas-dev，並在makefile裡把blas改為open，centos下是yum install openblas

C語言程式碼區錯誤以及編譯過程
2021-06-09
C語言編譯
Android編譯通過，執行編譯錯誤問題總結
2019-06-24
Android編譯
Windows 下 Laravel Mix 資源編譯過程以及產生的錯誤解決
2019-10-01
WindowsLaravel編譯
(14)caffe總結之Linux下Caffe如何除錯
2020-04-04
Linux除錯
編譯連結過程
2019-04-04
編譯
caffe make 編譯
2018-10-29
編譯
GCC編譯和連結過程
2020-06-27
GC編譯
GCC編譯過程（預處理-＞編譯-＞彙編-＞連結）
2020-09-30
GC編譯
ipvs編譯錯誤
2019-02-24
編譯
編譯過程
2018-06-23
編譯
(7)caffe總結之Blob,Layer and Net以及對應配置檔案的編寫
2020-04-04
flutter 編譯報錯總結（不斷更新）
2019-09-04
Flutter編譯
vc-vs2019編譯報錯總結
2020-11-22
編譯
Latex — 寫作編譯過程中遇到問題記錄與總結
2020-08-23
編譯
Linux_C++_編譯過程以及二進位制分析
2024-12-02
LinuxC++編譯
應用程式邏輯錯誤總結
2020-08-19
cesium原始碼編譯除錯及呼叫全過程
2023-04-21
原始碼編譯除錯
opencv 編譯常見錯誤
2024-06-21
OpenCV編譯
Win10下Qt+OpenCV+Cmake編譯錯誤
2020-12-09
Win10QTOpenCV編譯
C++ 編譯過程
2024-08-15
C++編譯
編譯過程簡介
2024-07-08
編譯
Ubuntu 16 04 編譯 Caffe SSD
2018-09-28
Ubuntu編譯
CentOS 下編譯安裝 Nginx
2020-03-12
CentOS編譯Nginx
CentOS 下編譯安裝 apache
2020-03-12
CentOS編譯Apache
編譯ROCKSDB總結
2018-11-13
編譯
IDEA報錯java: 編譯失敗: 內部 java 編譯器錯誤
2024-06-27
IdeaJava編譯
C語言的編譯連結執行過程
2018-10-31
C語言編譯
C語言編譯和連結過程簡介
2018-05-27
C語言編譯
CentOS7.3 編譯搭建 lamp 環境
2019-04-28
CentOS編譯LAMP
ios底層編譯過程
2020-01-15
iOS編譯
痛苦的過程，編譯glomap
2024-11-25
編譯
PHP編譯安裝時常見錯誤解決辦法，php編譯常見錯誤
2018-04-03
PHP編譯
Polar mask錯誤總結
2020-10-11
Python部分錯誤總結
2024-06-06
Python
AndroidStudio之NDK常見編譯錯誤
2018-11-04
Android編譯
Include檔案易犯編譯錯誤
2020-04-07
編譯
path_provider 編譯錯誤指導
2024-12-07
IDE編譯
Idea編譯錯誤解決辦法
2021-01-04
Idea編譯

centos下caffe用GPU編譯搭建過程以及錯誤總結

引言

安裝筆記

專案依賴

安裝過程

錯誤總結

相關文章