針對磁碟的壓力效能測試工具有很多，簡單的測試可以透過dd命令實現，而比較專業和強大的磁碟IO測試工具，當然首推FIO了。這裡針對FIO工具使用做個簡單的介紹。介紹前，首先說明下兩個概念，順序讀寫和隨機讀寫。
     隨機讀取、寫入是根據磁碟資料分佈進行隨機讀寫，可能資料的讀、寫並不在連續的磁碟空間上，因此磁頭定址時間長、負荷大，每段資料都有地址碼，透過地址碼進行資料段讀取。
     順序讀取、寫入是根據磁碟資料分佈進行連續讀寫，由於資料分佈連續，定址時間短，中間沒有地址碼。
     通常機械硬碟主要是看順序讀寫效能，SSD主要看隨機讀寫效能。

    1)FIO下載


    2）FIO依賴包安裝
   [root@node3 fio-master]# rpm -qa |grep libaio
   [root@node3 fio-master]# yum install libaio libaio-devel -y
    3）FIO原始碼安裝
   [root@node3 yc]# unzip fio-master.zip
   [root@node3 yc]# cd fio-master
   [root@node3 fio-master]# ./configure --prefix=/usr/local/fio
   [root@node3 fio-master]# make -j 8
   [root@node3 fio-master]# make install
   [root@node3 fio]# cd /usr/local/fio/
   [root@node3 fio]# ln -s /usr/local/fio/bin/* /usr/bin/
    4）FIO使用
   linux block size和page size獲取方法：
   [root@node3 fio]# tune2fs -l /dev/sda1|grep "Block size"
   [root@node3 fio]# getconf PAGE_SIZE

   隨機讀測試：
   [root@node3 fio]# fio -filename=/dev/sda1 -direct=1 -iodepth 1 -thread -rw=randread -ioengine=psync -bs=4k -size=10G -numjobs=10 -runtime=300 -group_reporting -name=iotest

   隨機寫測試：
[root@node3 fio]# fio -filename=/dev/sda1 -direct=1 -iodepth 1 -thread -rw=randwrite -ioengine=psync -bs=4k -size=10G -numjobs=10 -runtime=300 -group_reporting -name=iotest

   順序讀測試：
   [root@node3 fio]# fio -filename=/dev/sda1 -direct=1 -iodepth 1 -thread -rw=read -ioengine=psync -bs=4k -size=10G -numjobs=10 -runtime=300 -group_reporting -name=iotest

   順序寫測試：
   [root@node3 fio]# fio -filename=/dev/sda1 -direct=1 -iodepth 1 -thread -rw=write -ioengine=psync -bs=4k -size=10G -numjobs=10 -runtime=300 -group_reporting -name=iotest

   混合隨機讀寫測試：
   [root@node3 fio]# fio -filename=/dev/sda1 -direct=1 -iodepth 1 -thread -rw=randrw -rwmixread=70 -ioengine=psync -bs=4k -size=10G -numjobs=10 -runtime=300 -group_reporting -name=iotest -ioscheduler=deadline

   混合順序讀寫測試：
   [root@node3 fio]# fio -filename=/dev/sda1 -direct=1 -iodepth 1 -thread -rw=rw -rwmixread=70 -ioengine=psync -bs=4k -size=10G -numjobs=10 -runtime=300 -group_reporting -name=iotest -ioscheduler=deadline

   測試引數說明：
   filename=/dev/sda1 測試檔名稱，通常選擇需要測試的盤的data目錄。
   direct=1 測試過程繞過機器自帶的buffer，同O_DIRECT效果一樣，使測試結果更真實。ZFS和Solaris不支援direct io，在windows同步IO引擎不支援direct io
   iodepth 加於檔案之上的保持的IO單元。預設對於每個檔案來說是1，可以設定一個更大的值來提供併發度。iodepth大於1不會影響同步IO引擎（除非verify_async這個選項被設定）。even async engines may impose OS restrictions causing the desired depth not to be achieved.這會在Linux使用libaio並且設定direct=1的時候發生，因為buffered io在OS中不是非同步的。在外部透過類似於iostat這些工具來觀察佇列深度來保證這個IO佇列深度是我們想要的。
   thread   io預設會使用fork()建立job，如果這個選項設定的話，fio將使用pthread_create來建立執行緒
   rw=randwrite 測試隨機寫的I/O
   rw=randrw 測試隨機寫和讀的I/O
   bs=4k 單次io的塊檔案大小為4k
   bsrange=512-2048 同上，提定資料塊的大小範圍,可透過tune2fs -l /dev/sda1|grep "Block size" 獲取。
   size=10g 本次的測試檔案大小為10g，以每次4k的io進行測試。
   numjobs=10 本次的測試執行緒為10.
   runtime=300 測試時間為300秒，如果不寫測試時間，則一直將10g檔案分4k每次寫完為止。
   ioengine=psync io引擎使用pync方式
   rwmixwrite=30 在混合讀寫的模式下，寫佔30%
   rwmixread=70 在混合讀寫的模式下，讀佔70%
   group_reporting 關於顯示結果的，彙總每個程式的資訊。
   lockmem=1g 只使用1g記憶體進行測試。
   zero_buffers 用0初始化系統buffer。
   nrfiles=8 每個程式生成檔案的數量。
   write_bw_log=str 在job file寫這個job的頻寬日誌。可以在他們的生命週期記憶體儲job的頻寬資料。內部的fio_generate_plots指令碼可以使用gnuplot將這些文字轉化成圖。

   測試過程中顯示資料說明：
   Starting 10 threads
   Jobs: 10 (f=10): [m(10)] [100.0% done] [4512KB/1908KB/0KB /s] [1128/477/0 iops] [eta 00m:00s]

   測試結果資料說明：
    順序讀、寫看主要看bw,隨機讀、寫主要看iops

    read : io=1183.9MB, bw=4040.7KB/s, iops=1010, runt=300011msec     ##IO資料量讀為1183.9MB，頻寬為4040.7KB/s，IOPS為1010，執行時間是300011毫秒
    write: io=518284KB, bw=1727.6KB/s, iops=431, runt=300011msec       ##IO資料量寫為518284KB，頻寬為1727KB/s，IOPS為431，執行時間是300011毫秒

   Run status group 0 (all jobs):
       READ: io=1183.9MB, aggrb=4040KB/s, minb=4040KB/s, maxb=4040KB/s, mint=300011msec, maxt=300011msec ##讀IO資料量、平均總頻寬、最小頻寬、最大頻寬、執行緒最短執行時間、執行緒最長執行時間
      WRITE: io=518284KB, aggrb=1727KB/s, minb=1727KB/s, maxb=1727KB/s, mint=300011msec, maxt=300011msec ##

   Disk stats (read/write):
      sda: ios=303136/129596, merge=18/25, ticks=2943520/45552, in_queue=2989060, util=100.00% ##所有IO數讀為303136/寫為129596，IO合併數讀為18/寫為25，磁碟票據繁忙數讀2943520/寫45552,磁碟花在佇列上的時間為2989060毫秒，磁碟利用率為100%

   輸出結果說明參考
   在執行時，fio將列印當前job建立的狀態
   Threads: 1: [_r] [24.8% done] [ 13509/ 8334 kb/s] [eta 00h:01m:31s]
   生命週期
   P   執行緒已經啟動，還沒有啟動
   C 執行緒啟動
   I 純種已經初始化，等待中
   p 執行緒執行中，預讀檔案
   R 順序讀
   r 隨機讀
   W 順序寫
   w 隨機寫
   M 混合順序讀寫
   m 混合隨機讀寫
   F 等待執行fsync()
   V 執行，檢驗寫的資料
   E 執行緒退出,還沒有被主執行緒獲取狀態_ Thread reaped, or
   X Thread reaped, exited with an error.
   K Thread reaped, exited due to signal.
   其它的值都是可以自解釋的：
   當前正在執行的IO執行緒數。
   從上次檢查之後的IO速度（讀速度/寫速度）
   估計的完成百分比
   整個group的估計完成時間
   當fio完成的時候（或是透過ctrl-c終止的時候），將會列印每一個執行緒的資料，每個group的資料，和磁碟資料。
   io= 執行了多少M的IO
   bw= 平均IO頻寬
   iops=   IOPS
   runt= 執行緒執行時間
   slat 提交延遲
   clat 完成延遲
   lat響應時間
   bw 頻寬
   cpu利用率
   IO depths=io佇列
   IO submit=單個IO提交要提交的IO數
   IO complete= Like the above submit number, but for completions instead.
   IO issued= The number of read/write requests issued, and how many of them were short.
   IO latencies=IO完延遲的分佈
   io= 總共執行了多少size的IO
   aggrb= group總頻寬
   minb= 最小平均頻寬.
   maxb= 最大平均頻寬.
   mint= group中執行緒的最短執行時間.
   maxt= group中執行緒的最長執行時間.
   ios= 所有group總共執行的IO數.
   merge= 總共發生的IO合併數.
   ticks= Number of ticks we kept the disk busy.
   io_queue= 花費在佇列上的總共時間.
   util= 磁碟利用率

    5）FIO附錄
   5-1）fio命令選項：
   --debug           Enable some debugging options (see below)
   --parse-only       Parse options only, don't start any IO
   --output       Write output to file               將輸出內容寫入指定檔案中
   --runtime       Runtime in seconds               fio測試時間，單位是秒
   --bandwidth-log       Generate per-job bandwidth logs
   --minimal       Minimal (terse) output
   --output-format=type   Output format (terse,json,json+,normal)       輸出檔案格式，預設是normal
   --terse-version=type   Terse version output format (default 3, or 2 or 4).
   --version       Print version info and exit
   --help           Print this page
   --cpuclock-test       Perform test/validation of CPU clock
   --crctest[=test]   Test speed of checksum functions
   --cmdhelp=cmd       Print command help, "all" for all of them
   --enghelp=engine   Print ioengine help, or list available ioengines FIO支援的引擎，預設有
   --enghelp=engine,cmd   Print help for an ioengine cmd
   --showcmd       Turn a job file into command line options
   --readonly       Turn on safety read-only checks, preventing
               writes
   --eta=when       When ETA estimate should be printed
               May be "always", "never" or "auto"
   --eta-newline=time   Force a new line for every 'time' period passed
   --status-interval=t   Force full status dump every 't' period passed
   --section=name       Only run specified section in job file.
               Multiple sections can be specified.
   --alloc-size=kb       Set smalloc pool to this size in kb (def 1024)
   --warnings-fatal   Fio parser warnings are fatal
   --max-jobs       Maximum number of threads/processes to support
   --server=args       Start backend server. See Client/Server section.
   --client=host       Connect to specified backend(s).
   --remote-config=file   Tell fio server to load this local file
   --idle-prof=option   Report cpu idleness on a system or percpu basis
               (option=system,percpu) or run unit work
               calibration only (option=calibrate).
   --inflate-log=log   Inflate and output compressed log
   --trigger-file=file   Execute trigger cmd when file exists
   --trigger-timeout=t   Execute trigger af this time
   --trigger=cmd       Set this command as local trigger
   --trigger-remote=cmd   Set this command as remote trigger
   --aux-path=path       Use this path for fio state generated files

   5-2）單盤IOPS計算：
   常見磁碟平均物理尋道時間為：
   7200轉/分的STAT硬碟平均物理尋道時間是9ms
   10000轉/分的STAT硬碟平均物理尋道時間是6ms
   15000轉/分的SAS硬碟平均物理尋道時間是4ms

   常見硬碟的旋轉延遲時間為：
   7200   rpm的磁碟平均旋轉延遲大約為60*1000/7200/2 = 4.17ms
   10000 rpm的磁碟平均旋轉延遲大約為60*1000/10000/2 = 3ms，
   15000 rpm的磁碟其平均旋轉延遲約為60*1000/15000/2 = 2ms。

   最大IOPS的理論計算方法
   --------------------------------------
   IOPS = 1000 ms/ (尋道時間 + 旋轉延遲)。可以忽略資料傳輸時間。

   7200 rpm的磁碟 IOPS = 1000 / (9 + 4.17) = 76 IOPS
   10000 rpm的磁碟IOPS = 1000 / (6+ 3) = 111 IOPS
   15000 rpm的磁碟IOPS = 1000 / (4 + 2) = 166 IOPS

   5-3）ioengine=str定義job向檔案發起IO的方式
   sync 基本的read,write.lseek用來作定位
   psync 基本的pread,pwrite
   vsync 基本的readv,writev
   libaio Linux專有的非同步IO。Linux僅支援非buffered IO的佇列行為。
   posixaio glibc posix非同步IO
   solarisaio solaris獨有的非同步IO
   windowsaio windows獨有的非同步IO
   mmap 檔案透過記憶體對映到使用者空間，使用memcpy寫入和讀出資料
   splice 使用splice和vmsplice在使用者空間和核心之間傳輸資料
   syslet-rw 使用syslet 系統呼叫來構造普通的read/write非同步IO
   sg SCSI generic sg v3 io.可以是使用SG_IO ioctl來同步，或是目標是一個sg字元裝置，我們使用read和write執行非同步IO
   null 不傳輸任何資料，只是偽裝成這樣。主要用於訓練使用fio，或是基本debug/test的目的。
   net 根據給定的host:port透過網路傳輸資料。根據具體的協議，hostname,port,listen,filename這些選項將被用來說明建立哪種連線，協議選項將決定哪種協議被使用。
   netsplice 像net，但是使用splic/vmsplice來對映資料和傳送/接收資料。
   cpuio 不傳輸任何的資料，但是要根據cpuload=和cpucycle=選項佔用CPU週期.e.g. cpuload=85將使用job不做任何的實際IO，但要佔用85%的CPU週期。在SMP機器上，使用numjobs=<no_of_cpu>來獲取需要的CPU，因為cpuload僅會載入單個CPU，然後佔用需要的比例。
    5-4）dd命令
    dd命令語法：
    CODE:[Copy to clipboard]dd 〔選項〕
    選項:
    if =輸入檔案(或裝置名稱)。
    of =輸出檔案(或裝置名稱)。
    ibs = bytes 一次讀取bytes位元組，即讀入緩衝區的位元組數。
    skip = blocks 跳過讀入緩衝區開頭的ibs*blocks塊。
    obs = bytes 一次寫入bytes位元組，即寫入緩衝區的位元組數。
    bs = bytes 同時設定讀/寫緩衝區的位元組數(等於設定obs和obs)。
    cbs = bytes 一次轉換bytes位元組。
    count = blocks 只複製輸入的blocks塊。
    conv = ASCII 把EBCDIC碼轉換為ASCII碼。
    conv = ebcdic 把ASCII碼轉換為EBCDIC碼。
    conv = ibm 把ASCII碼轉換為alternate EBCDIC碼。
    conv = blick 把變動位轉換成固定字元。
    conv = ublock 把固定們轉換成變動位
    conv = ucase 把字母由小寫變為大寫。
    conv = lcase 把字母由大寫變為小寫。
    conv = notrunc 不截短輸出檔案。
    conv = swab 交換每一對輸入位元組。
    conv = noerror 出錯時不停止處理。
    conv = sync 把每個輸入記錄的大小都調到ibs的大小(用ibs填充)。

    檢視os系統塊的大小
   tune2fs -l /dev/sda1 |grep 'Block size'
   Block size:               4096

    檢視os系統頁的大小
   getconf PAGESIZE

    磁碟管理
    （1）得到最恰當的block size
    透過比較dd指令輸出中所顯示的命令執行時間，即可確定系統最佳的block size大小：
    dd if=/dev/zero bs=1024 count=1000000 of=/root/1Gb.file
    dd if=/dev/zero bs=4096 count=250000 of=/root/1Gb.file
    dd if=/dev/zero bs=8192 count=125000 of=/root/1Gb.file
    （2）測試硬碟讀寫速度
    透過兩個命令輸出的執行時間，可以計算出測試硬碟的讀／寫速度：
    dd if=/root/1Gb.file bs=64k of=/dev/null
    dd if=/dev/zero of=/root/1Gb.file bs=1024 count=1000000
    （3）修復硬碟
    當硬碟較長時間（比如一兩年年）放置不使用後，磁碟上會產生magnetic flux point。當磁頭讀到
    這些區域時會遇到困難，並可能導致I/O錯誤。當這種情況影響到硬碟的第一個扇區時，可能導致
    硬碟報廢。下面的命令有可能使這些資料起死回生。且這個過程是安全，高效的。
    dd if=/dev/sda of=/dev/sda

壓力測試工具之FIO

相關文章