前置工作:安裝OpenBLAS; 安裝Mpich (可參考首頁部落格)
-
官網下載壓縮包到/opt目錄
cd /opt && wget https://www.netlib.org/benchmark/hpl/hpl-2.3.tar.gz
-
解壓到 /opt 目錄
tar -xzf hpl-2.3.tar.gz
-
複製Make.Linux_PII_CBLAS並重新命名
cd /opt/hpl-2.3 && cp setup/Make.Linux_PII_CBLAS Make.Linux
-
編輯Make.Linux
vim Make.Linux
修改如下內容:
ARCH = Linux TOPdir = /opt/hpl-2.3 # hpl安裝目錄 MPdir = /opt/mpich # mpich安裝目錄 MPlib = $(MPdir)/lib/libmpi.a # mpi連結庫 LAdir = /opt/OpenBLAS # openblas安裝目錄 LAlib = $(LAdir)/lib/libopenblas.a # openblas連結庫 CC = /opt/mpich/bin/mpicc # compiler CCFLAGS = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -pthread LINKER = /opt/mpich/bin/mpif77 # linker
以上路徑根據個人安裝時的目錄修改
-
構建hpl
make arch=Linux
若build成功,則會在/opt/hpl-2.3/bin/Linux目錄下生成HPL.dat和xhpl檔案
-
測試hpl
cd /opt/hpl-2.3/bin/Linux
-
單節點測試
mpiexec -np 4 ./xhpl
-
多節點測試
需關閉各個節點的防火牆
systemctl stop firewalld
編輯節點檔案,輸入節點主機名或IP地址
vim nodes
eg:
修改HPL.dat
HPLinpack benchmark input file Innovative Computing Laboratory, University of Tennessee HPL.out output file name (if any) 6 device out (6=stdout,7=stderr,file) 1 # of problems sizes (N) 1200 Ns 1 # of NBs 232 NBs 0 PMAP process mapping (0=Row-,1=Column-major) 1 # of process grids (P x Q) 1 Ps 4 Qs 16.0 threshold 1 # of panel fact 0 PFACTs (0=left, 1=Crout, 2=Right) 1 # of recursive stopping criterium 2 NBMINs (>= 1) 1 # of panels in recursion 2 NDIVs 1 # of recursive panel fact. 0 RFACTs (0=left, 1=Crout, 2=Right) 1 # of broadcast 0 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM) 1 # of lookahead depth 1 DEPTHs (>=0) 2 SWAP (0=bin-exch,1=long,2=mix) 64 swapping threshold 0 L1 in (0=transposed,1=no-transposed) form 0 U in (0=transposed,1=no-transposed) form 1 Equilibration (0=no,1=yes) 8 memory alignment in double (> 0)
執行hpl
mpiexec -np 4 -machinefile ./nodes ./xhpl
-
HPL.dat配置項解釋
HPLinpack benchmark input file # 檔案頭,說明 Innovative Computing Laboratory, University of Tennessee HPL.out output file name (if any) # 如果使用檔案保留輸出結果,設定檔名 6 device out (6=stdout,7=stderr,file) # 輸出方式選擇(stdout,stderr或檔案) 2 # of problems sizes (N) # 指出要計算的矩陣規格有幾種 1960 2048 Ns # 每種規格分別的數值 2 # of NBs # 指出使用幾種不同的分塊大小 60 80 NBs # 分別指出每種大小的具體值 2 # of process grids (P x Q-l # 指出用幾種程式組合方式 2 4 Ps # 每對PQ具體的值 2 1 Qs 16.0 threshold # 餘數的閾值 1 # of panel fact # 用幾種分解方法 1 PFACTs (0=left, 1=Crout, 2=Right) # 具體用哪種,0 left,1 crout,2 right 1 # of recursive stopping criterium # 幾種停止遞迴的判斷標準 4 NBMINs (>= 1) # 具體的標準數值(須不小於1) 1 # of panels in recursion # 遞迴中用幾種分割法 2 NDIVs # 這裡用一種NDIV值為2,即每次遞迴分成兩塊 1 # of recursive panel fact. # 用幾種遞迴分解方法 2 RFACTs (0=left, 1=Crout, 2=Right) # 這裡每種都用到(左,右,crout分解) 1 # of broadcast # 用幾種廣播方法 3 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM) # 指定具體哪種(有1-ring,1-ring Modified,2-ring,2ring Modified,Long以及long-Modified) 1 # of lookahead depth # 用幾種向前看的步數 1 DEPTHs (>=0) # 具體步數值(須大於等於0) 2 SWAP (0=bin-exch,1=long,2=mix) # 哪種交換演算法(bin-exchange,long或者二者混合) 64 swapping threshold # 採用混合的交換演算法時使用的閾值 0 L1 in (0=transposed,1=no-transposed) form # L1是否用轉置形式 0 U in (0=transposed,1=no-transposed) form # U是否用轉置形式表示 1 Equilibration (0=no,1=yes) # 是否採用平衡狀態 8 memory alignment in double (> 0) # 指出程式執行時記憶體分配中的採用的對齊方式
-