kaldi+cuda安裝

AliceXingFree發表於2019-05-26

1.下載kaldi

2.在tools下面按照步驟安裝

3.vim .bashrc修改環境變數

export PATH=/usr/local/cuda-8.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64$LD_LIBRARY_PATH

source .bashrc啟用環境變數

4.到src目錄下,在configure檔案26%左右修改

for base in /usr/local/share/cuda /usr/loc /cuda-8.0 /usr/; do

呼叫自己的環境變數

5.在src目錄下編譯

make clean

./configure --shared

 make depend -j 8

make -j 8

 

6.單獨編譯 cudamatrix

cd kaldi/src/cudamatrix/

 把Makefile檔案中的 TESTFILES 改成 BINFILES

make all

./cu-vector-test

如果不報錯並顯示如下資訊,則說明 CUDA 進行矩陣運算了

./cu-vector-test 
LOG (cu-vector-test[5.4.105~2-4fda]:SelectGpuId():cu-device.cc:123) Manually selected to compute on CPU.
-1.05384e+09 -1.05384e+09
-2.15126e+08 -2.15126e+08
LOG (cu-vector-test[5.4.105~2-4fda]:main():cu-vector-test.cc:859) Tests without GPU use succeeded.
WARNING (cu-vector-test[5.4.105~2-4fda]:SelectGpuId():cu-device.cc:196) Not in compute-exclusive mode.  Suggestion: use 'nvidia-smi -c 3' to set compute exclusive mode
LOG (cu-vector-test[5.4.105~2-4fda]:SelectGpuIdAuto():cu-device.cc:315) Selecting from 1 GPUs
LOG (cu-vector-test[5.4.105~2-4fda]:SelectGpuIdAuto():cu-device.cc:330) cudaSetDevice(0): Tesla K40c    free:412M, used:11028M, total:11441M, free/total:0.0360703
LOG (cu-vector-test[5.4.105~2-4fda]:SelectGpuIdAuto():cu-device.cc:379) Trying to select device: 0 (automatically), mem_ratio: 0.0360703
LOG (cu-vector-test[5.4.105~2-4fda]:SelectGpuIdAuto():cu-device.cc:398) Success selecting device 0 free mem ratio: 0.0360703
LOG (cu-vector-test[5.4.105~2-4fda]:FinalizeActiveGpu():cu-device.cc:247) The active GPU is [0]: Tesla K40c    free:366M, used:11074M, total:11441M, free/total:0.0320389 version 3.5
4.52132e+08 4.52132e+08
1.38749e+09 1.38749e+09
LOG (cu-vector-test[5.4.105~2-4fda]:main():cu-vector-test.cc:861) Tests with GPU use (if available) succeeded.
LOG (cu-vector-test[5.4.105~2-4fda]:PrintProfile():cu-device.cc:449) -----
[cudevice profile]
CuVectorBase::ApplyCeiling    0.0205135s
CuVectorBase::MulTp    0.0229831s
AddTpVec    0.0262518s
Sum    0.0361161s
CuVector::CopyFromVecH2D    0.0428603s
CopyRowsFromVec    0.061161s
CuVectorBase::CopyColFromMat    0.077713s
AddVec    0.0862093s
CopyToVec    0.101046s
CopyFromVec    0.15156s
CuMatrix::Resize    0.159565s
VecVec    0.252803s
CuVector::SetZero    0.55053s
CuVector::Resize    0.782922s
RandGaussian    6.31372s
Total GPU time:    8.93733s (may involve some double-counting)
-----
LOG (cu-vector-test[5.4.105~2-4fda]:PrintMemoryUsage():cu-allocator.cc:127) Memory usage: 16257160 bytes currently allocated (max: 16348884); 0 currently in use by user (max: 12585152); 1292/2299 calls to Malloc* resulted in CUDA calls.
LOG (cu-vector-test[5.4.105~2-4fda]:PrintMemoryUsage():cu-allocator.cc:136) Time taken in cudaMallocPitch=0.0663958, in cudaMalloc=0.106217, in cudaFree=0.113429, in this->MallocPitch()=0.346224
LOG (cu-vector-test[5.4.105~2-4fda]:PrintMemoryUsage():cu-device.cc:422) Memory used (according to the device): 20447232 bytes.
 

 

 

    

相關文章