shell實戰之Linux主機系統監控

秦無殤發表於2019-04-05

1、系統監控概述

採集的監控資訊主要有記憶體佔用率,CPU佔用率,當前線上使用者,磁碟掛載及磁碟空間使用率,平均每秒寫入流量,平均每秒流出流量。磁碟IO:平均每秒從磁碟讀入記憶體的速率,平均每秒從記憶體寫入磁碟的速率。

2、監控原理

2.1、CPU佔用率

監控原理:

CPU相關資訊記錄在檔案 /proc/stat中。詳情請檢視博文:https://blog.csdn.net/ustclu/article/details/1721673

stephen@stephen-K55VD:~/shell$ cat  /proc/stat
cpu  348229 906 98356 7304276 81726 0 2821 0 0 0
cpu0 95033 273 22980 1803962 33023 0 1721 0 0 0
cpu1 79735 255 24756 1836717 17035 0 454 0 0 0
cpu2 84045 211 25742 1831963 16753 0 582 0 0 0
cpu3 89415 166 24876 1831633 14913 0 62 0 0 0
intr 10306028 7 28486 0 0 0 0 0 0 1 825 0 0 50130 0 0 0 76 284421 0 213811 0 0 0 29 795993 19 0 81 766580 15 648 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ctxt 51268973
btime 1554444493
processes 14526
procs_running 1
procs_blocked 0
softirq 9059312 7 2712077 5 5478 204089 0 1245879 2780432 0 2111345

 程式碼實現:

1 #獲取CPU的總量與使用量
2     cpuTotalStart=`awk 'BEGIN{total=0} /cpu / {for(i=2;i<=NF;i++);total+=i}END{print $total}' /proc/stat`
3     cpuUsedStart=`awk 'BEGIN{used=0} /cpu / { used=$2+$3+$4+$7+$8 }END{print used}' /proc/stat`
4     #隔30s再獲取一次CPU總量與使用量並計算差值
5     sleep 30
6     cpuTotalEnd=`awk 'BEGIN{total=0} /cpu / {for(i=2;i<=NF;i++);total+=i}END{print $total}' /proc/stat`
7     cpuUsedEnd=`awk 'BEGIN{used=0} /cpu / { used=$2+$3+$4+$7+$8 }END{print used}' /proc/stat`
8     usedCPU=`expr ${cpuUsedEnd} - ${cpuUsedStart}`
9     totalCPU=`expr ${cpuTotalEnd} - ${cpuTotalStart}`

2.2、記憶體佔用率

監控原理:

記憶體相關的資訊記錄在/proc/meminfo檔案中,MemTotal為記憶體總量,單位為kb,MemFree為空閒記憶體。記憶體佔用率=(總記憶體-空閒記憶體)/ 總記憶體。

stephen@stephen-K55VD:~/shell$ cat /proc/meminfo
MemTotal:        3922884 kB
MemFree:          139108 kB
MemAvailable:     317700 kB
Buffers:           31792 kB
Cached:           538160 kB
SwapCached:        10012 kB
Active:          2615652 kB

程式碼實現:

 1 #獲取記憶體使用率
 2 function memUsage(){
 3     logInfo "Begin to get mem usage of Host [${ip}]"
 4     #獲取總記憶體
 5     totalMem=`awk '/MemTotal/{print $2}' /proc/meminfo`
 6     #獲取空閒記憶體
 7     freeMem=`awk '/MemFree/{print $2}' /proc/meminfo`
 8     usedMem=`expr ${totalMem} - ${freeMem}`
 9     #echo $(usagePercent ${usedMem} ${totalMem})
10     #echo $(kbToGb ${totalMem})
11     logInfo "Host [${ip}] total mem is :  $(kbToGb ${totalMem}) GB"
12     #計算記憶體使用率並列印到日誌中
13     logInfo "Host [${ip}] mem usage is :  $(usagePercent ${usedMem} ${totalMem}) %"
14     logInfo "End to get mem usage of Host [${ip}]"
15 }

2.3、流量監控

監控原理:

Linux機器流量資訊記錄在/proc/net/dev檔案中。通過計算一段時間段內接收和傳送的位元組數來計算速率。第一列為網路卡資訊,第二列為接收的位元組數,第10列為傳送的位元組數。

stephen@stephen-K55VD:~/shell/sysMonitor$ cat /proc/net/dev
Inter-|   Receive                                                |  Transmit
 face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
wlp3s0: 19595253   41163    0    0    0     0          0         0 34741446   49185    0    0    0     0       0          0
enp4s0f2:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
docker0:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
    lo:  907275    5032    0    0    0     0          0         0   907275    5032    0    0    0     0       0          0

程式碼實現:

1 #ethName為網路卡名稱
2 receiveByteStart=`cat  /proc/net/dev  |grep  -E "${ethName}"|awk '{print $2}'`
3 sendByteStart=`cat  /proc/net/dev  |grep  -E "${ethName}"|awk '{print $10}'`

2.4、磁碟IO

監控原理:

磁碟IO相關的資訊記錄在/proc/vmstat檔案中,pgpgin對應的為輸入方向的資料量。pgpgout對應的為輸出方向的資料量。採集一段時間的資料量,除以時間來計算速率。

程式碼實現:

 1 #disk IO in
 2 function diskIOIn(){
 3     #獲取磁碟入方向IO
 4     inIoStart=`awk '/pgpgin/{print $2}' /proc/vmstat`
 5     sleep 30
 6     inIoEnd=`awk '/pgpgin/{print $2}' /proc/vmstat`
 7     inIo=$(((inIoEnd-inIoStart)/(30*1024)))
 8     logInfo "Host [${ip}] in IO is :  ${inIo} MB / s"
 9 
10 }

3、指令碼程式碼

  • hostLists:監控主機的IP集合。
  • sysMonitor.sh*:獲取各項監控資訊的指令碼。
  1 #!/bin/bash
  2 #監控linux主機系統資訊
  3 #匯入工具模組
  4 source utils
  5 
  6 #獲取CPU佔用率
  7 function cpuUsage()
  8 {
  9     #物理CPU個數
 10     phyCPUNums=`cat /proc/cpuinfo |grep "physical id"|sort |uniq|wc -l`
 11     #邏輯CPU個數
 12     lgCPUNums=`cat /proc/cpuinfo |grep "processor"|wc -l`
 13         #core
 14     cores=`cat /proc/cpuinfo |grep "cores"|uniq|awk '{print $4}'`
 15     logInfo "Host [${ip}] physical CPU nums is :  ${phyCPUNums}"
 16     logInfo "Host [${ip}] logic CPU nums is :  ${lgCPUNums}"
 17     logInfo "Host [${ip}] core nums is :  ${cores}"
 18     #CPU佔用率
 19     #獲取CPU的總量與使用量
 20     cpuTotalStart=`awk 'BEGIN{total=0} /cpu / {for(i=2;i<=NF;i++);total+=i}END{print $total}' /proc/stat`
 21         cpuUsedStart=`awk 'BEGIN{used=0} /cpu / { used=$2+$3+$4+$7+$8 }END{print used}' /proc/stat`
 22     #隔30s再獲取一次CPU總量與使用量並計算差值
 23     sleep 30
 24     cpuTotalEnd=`awk 'BEGIN{total=0} /cpu / {for(i=2;i<=NF;i++);total+=i}END{print $total}' /proc/stat`
 25     cpuUsedEnd=`awk 'BEGIN{used=0} /cpu / { used=$2+$3+$4+$7+$8 }END{print used}' /proc/stat`
 26     usedCPU=`expr ${cpuUsedEnd} - ${cpuUsedStart}`
 27     totalCPU=`expr ${cpuTotalEnd} - ${cpuTotalStart}`
 28     logInfo "Host [${ip}] CPU usage is :  $(usagePercent ${usedCPU} ${totalCPU}) %"
 29     
 30 }
 31 
 32 #獲取記憶體使用率
 33 function memUsage(){
 34     logInfo "Begin to get mem usage of Host [${ip}]"
 35     #獲取總記憶體
 36     totalMem=`awk '/MemTotal/{print $2}' /proc/meminfo`
 37     #獲取空閒記憶體
 38     freeMem=`awk '/MemFree/{print $2}' /proc/meminfo`
 39     usedMem=`expr ${totalMem} - ${freeMem}`
 40     #echo $(usagePercent ${usedMem} ${totalMem})
 41     #echo $(kbToGb ${totalMem})
 42     logInfo "Host [${ip}] total mem is :  $(kbToGb ${totalMem}) GB"
 43     #計算記憶體使用率並列印到日誌中
 44     logInfo "Host [${ip}] mem usage is :  $(usagePercent ${usedMem} ${totalMem}) %"
 45     logInfo "End to get mem usage of Host [${ip}]"
 46 }
 47 
 48 #網路卡平均每秒流量
 49 function netData(){
 50     logInfo "Begin to get  net data of Host [${ip}]"
 51     ethName=$1    
 52     receiveByteStart=`cat  /proc/net/dev  |grep  -E "${ethName}"|awk '{print $2}'`
 53     sendByteStart=`cat  /proc/net/dev  |grep  -E "${ethName}"|awk '{print $10}'`
 54     sleep 10
 55     receiveByteSEnd=`cat  /proc/net/dev  |grep  -E "${ethName}"|awk '{print $2}'`
 56     sendBytesEnd=`cat  /proc/net/dev  |grep  -E "${ethName}"|awk '{print $10}'`
 57     inDataRate=$(echo "scale=2;(${receiveByteSEnd}-${receiveByteStart})/10" | bc)
 58     outDataRate=$(echo "scale=2;(${sendBytesEnd}-${sendByteStart})/10" | bc)
 59     logInfo "Host [${ip}] in data is :  ${inDataRate} kb / s"    
 60     logInfo "Host [${ip}] out data is :  ${outDataRate} kb / s"
 61     logInfo "End to get  net data of Host [${ip}]"
 62 }
 63 
 64 #磁碟空間使用情況
 65 function diskUsage(){
 66     logInfo "Begin to get disk usage of Host [${ip}]"
 67     noTimeLogInfo "`df -h`"
 68     logInfo "End to get disk usage of Host [${ip}]"
 69 }
 70 
 71 #disk IO in
 72 function diskIOIn(){
 73     #獲取磁碟入方向IO
 74     inIoStart=`awk '/pgpgin/{print $2}' /proc/vmstat`
 75     sleep 30
 76     inIoEnd=`awk '/pgpgin/{print $2}' /proc/vmstat`
 77     inIo=$(((inIoEnd-inIoStart)/(30*1024)))
 78     logInfo "Host [${ip}] in IO is :  ${inIo} MB / s"
 79 
 80 }
 81 
 82 #disk IO out
 83 function diskIOout(){
 84     #獲取磁碟出方向的IO
 85     outIoStart=`awk '/pgpgout/{print $2}' /proc/vmstat`
 86     sleep 60
 87     outIoEnd=`awk '/pgpgout/{print $2}' /proc/vmstat`
 88     outIo=$(((outIoEnd-outIoStart)/(60*1024)))
 89     logInfo "Host [${ip}] out IO is :  ${outIo} MB / s"
 90 }
 91 
 92 #當前線上使用者
 93 function onlineUser(){
 94     user=`w |awk  'NR>1'|awk '{print $1 "\t" "\t" $4}'`
 95     userCount=`w |awk  'NR>1'|wc -l`
 96         #loginAt=`w |awk  'NR>1'|awk '{print $4 }'`
 97         logInfo "There are [${userCount}] users online now."
 98     noTimeLogInfo "UserName        loginAt"
 99         noTimeLogInfo "${user}"
100 }
101     
102 #判斷主機網路連通性
103 function isAlive(){
104         for ip in `cat hostLists`
105     do
106     ping ${ip} -c 3 >/dev/null
107         if [ $? -eq 0 ];then
108         logInfo "${ip} is reachable"
109         #檢視線上使用者
110             onlineUser
111         #獲取CPU相關資訊
112         cpuUsage
113         #獲取mem相關資訊
114         memUsage
115         #獲取磁碟IO
116         diskIOIn
117         diskIOout
118         #磁碟使用率
119         diskUsage
120         #平均每秒流接收或輸出流量
121         netData wlp3s0
122     else
123         logInfo "ERROR ${ip} is unreachable,try login in see more details.."
124     fi
125     done
126 }
127 
128 while [ 1 ]
129     do
130     isAlive
131     sleep 60
132     done
  • utils:列印日誌的函式等。
 1 #!/bin/bash
 2 #日誌列印
 3 curr_path=`pwd`
 4 function logInfo()
 5 {
 6 local curr_time=`date "+%Y-%m-%d %H:%M:%S"`
 7 log_file=${curr_path}/system_status.log
 8 #判斷日誌檔案是否存在
 9 if [ -e ${log_file} ]
10    then
11    #檢測檔案是否可寫
12    if [ -w ${log_file} ]
13    then
14        #若檔案無寫許可權則使用chmod命令賦予許可權
15        chmod 770 ${log_file}
16    fi
17 else
18    #若日誌檔案不存在則建立
19    touch ${log_file}
20 fi
21 #寫日誌
22 local info=$1
23 echo "${curr_time}  `whoami` [Info] ${info}">>${log_file}
24 }
25 function noTimeLogInfo(){
26     msg=$1
27     echo  "${msg}">>${log_file}
28 }
29 
30 #把kb轉換成gb,精度為3。expr只支援整數計算
31 function kbToGb(){
32     kbVal=$1
33     gbVal=$(echo "scale=3;${kbVal}/1024/1024"| bc)
34     echo $gbVal
35 }
36 #使用率以百分比的形式
37 #第一個引數為已使用量,第二個引數為總量
38 function usagePercent(){
39     used=$1
40     total=$2
41     usedPercent=$(echo "scale=2;${used}*100/${total}"| bc)
42     echo ${usedPercent}
43 }

指令碼結構:

1 -rw-r--r-- 1 stephen stephen   30 4月   5 18:33 hostLists
2 -rwxrwxr-x 1 stephen stephen 4164 4月   5 18:50 sysMonitor.sh*
3 -rw-r--r-- 1 stephen stephen  951 4月   5 15:23 utils

4、執行結果

監控資訊記錄在日誌system_status.log中。執行結果如下:

2019-04-05 19:44:42  stephen [Info] 192.168.1.109 is reachable
2019-04-05 19:44:42  stephen [Info] There are [2] users online now.
UserName        loginAt
USER        LOGIN@
stephen        14:09
2019-04-05 19:44:42  stephen [Info] Host [192.168.1.109] physical CPU nums is :  1
2019-04-05 19:44:42  stephen [Info] Host [192.168.1.109] logic CPU nums is :  4
2019-04-05 19:44:42  stephen [Info] Host [192.168.1.109] core nums is :  2
2019-04-05 19:45:12  stephen [Info] Host [192.168.1.109] CPU usage is :  10.12 %
2019-04-05 19:45:12  stephen [Info] Begin to get mem usage of Host [192.168.1.109]
2019-04-05 19:45:12  stephen [Info] Host [192.168.1.109] total mem is :  3.741 GB
2019-04-05 19:45:12  stephen [Info] Host [192.168.1.109] mem usage is :  95.83 %
2019-04-05 19:45:12  stephen [Info] End to get mem usage of Host [192.168.1.109]
2019-04-05 19:45:42  stephen [Info] Host [192.168.1.109] in IO is :  0 MB / s
2019-04-05 19:46:42  stephen [Info] Host [192.168.1.109] out IO is :  0 MB / s
2019-04-05 19:46:42  stephen [Info] Begin to get disk usage of Host [192.168.1.109]
檔案系統        容量  已用  可用 已用% 掛載點
udev            1.9G     0  1.9G    0% /dev
tmpfs           384M  2.0M  382M    1% /run
/dev/sda10       42G   20G   20G   51% /
tmpfs           1.9G   20M  1.9G    2% /dev/shm
tmpfs           5.0M  4.0K  5.0M    1% /run/lock
tmpfs           1.9G     0  1.9G    0% /sys/fs/cgroup
/dev/loop0      3.8M  3.8M     0  100% /snap/notepad-plus-plus/202
/dev/loop2       54M   54M     0  100% /snap/core18/782
/dev/loop4      441M  441M     0  100% /snap/wine-platform/111
/dev/loop5      441M  441M     0  100% /snap/wine-platform/105
/dev/loop7      3.8M  3.8M     0  100% /snap/notepad-plus-plus/199
/dev/loop3       90M   90M     0  100% /snap/core/6673
/dev/loop1      274M  274M     0  100% /snap/wps-office-multilang/1
/dev/loop6       91M   91M     0  100% /snap/core/6405
/dev/loop8       92M   92M     0  100% /snap/core/6531
/dev/loop9       36M   36M     0  100% /snap/gtk-common-themes/1198
/dev/loop10     3.8M  3.8M     0  100% /snap/notepad-plus-plus/195
/dev/loop11     441M  441M     0  100% /snap/wine-platform/103
tmpfs           384M   16K  384M    1% /run/user/125
tmpfs           384M   52K  384M    1% /run/user/1000
2019-04-05 19:46:42  stephen [Info] End to get disk usage of Host [192.168.1.109]
2019-04-05 19:46:42  stephen [Info] Begin to get  net data of Host [192.168.1.109]
2019-04-05 19:46:52  stephen [Info] Host [192.168.1.109] in data is :  42.90 kb / s
2019-04-05 19:46:52  stephen [Info] Host [192.168.1.109] out data is :  7.00 kb / s
2019-04-05 19:46:52  stephen [Info] End to get  net data of Host [192.168.1.109]
2019-04-05 19:47:04  stephen [Info] ERROR 255.255.255.254 is unreachable,try login in see more details..

5、參考文件

5.1、ifstat網路流量監控之/proc/net/dev檔案

https://blog.csdn.net/kongshuai19900505/article/details/80676607

5.2、awk命令

http://man.linuxde.net/awk

5.3、使用shell指令碼採集系統cpu、記憶體、磁碟、網路等資訊

https://www.jb51.net/article/50436.htm

相關文章