1、系統監控概述
採集的監控資訊主要有記憶體佔用率,CPU佔用率,當前線上使用者,磁碟掛載及磁碟空間使用率,平均每秒寫入流量,平均每秒流出流量。磁碟IO:平均每秒從磁碟讀入記憶體的速率,平均每秒從記憶體寫入磁碟的速率。
2、監控原理
2.1、CPU佔用率
監控原理:
CPU相關資訊記錄在檔案 /proc/stat中。詳情請檢視博文:https://blog.csdn.net/ustclu/article/details/1721673
stephen@stephen-K55VD:~/shell$ cat /proc/stat cpu 348229 906 98356 7304276 81726 0 2821 0 0 0 cpu0 95033 273 22980 1803962 33023 0 1721 0 0 0 cpu1 79735 255 24756 1836717 17035 0 454 0 0 0 cpu2 84045 211 25742 1831963 16753 0 582 0 0 0 cpu3 89415 166 24876 1831633 14913 0 62 0 0 0 intr 10306028 7 28486 0 0 0 0 0 0 1 825 0 0 50130 0 0 0 76 284421 0 213811 0 0 0 29 795993 19 0 81 766580 15 648 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ctxt 51268973 btime 1554444493 processes 14526 procs_running 1 procs_blocked 0 softirq 9059312 7 2712077 5 5478 204089 0 1245879 2780432 0 2111345
程式碼實現:
1 #獲取CPU的總量與使用量 2 cpuTotalStart=`awk 'BEGIN{total=0} /cpu / {for(i=2;i<=NF;i++);total+=i}END{print $total}' /proc/stat` 3 cpuUsedStart=`awk 'BEGIN{used=0} /cpu / { used=$2+$3+$4+$7+$8 }END{print used}' /proc/stat` 4 #隔30s再獲取一次CPU總量與使用量並計算差值 5 sleep 30 6 cpuTotalEnd=`awk 'BEGIN{total=0} /cpu / {for(i=2;i<=NF;i++);total+=i}END{print $total}' /proc/stat` 7 cpuUsedEnd=`awk 'BEGIN{used=0} /cpu / { used=$2+$3+$4+$7+$8 }END{print used}' /proc/stat` 8 usedCPU=`expr ${cpuUsedEnd} - ${cpuUsedStart}` 9 totalCPU=`expr ${cpuTotalEnd} - ${cpuTotalStart}`
2.2、記憶體佔用率
監控原理:
記憶體相關的資訊記錄在/proc/meminfo檔案中,MemTotal為記憶體總量,單位為kb,MemFree為空閒記憶體。記憶體佔用率=(總記憶體-空閒記憶體)/ 總記憶體。
stephen@stephen-K55VD:~/shell$ cat /proc/meminfo MemTotal: 3922884 kB MemFree: 139108 kB MemAvailable: 317700 kB Buffers: 31792 kB Cached: 538160 kB SwapCached: 10012 kB Active: 2615652 kB
程式碼實現:
1 #獲取記憶體使用率 2 function memUsage(){ 3 logInfo "Begin to get mem usage of Host [${ip}]" 4 #獲取總記憶體 5 totalMem=`awk '/MemTotal/{print $2}' /proc/meminfo` 6 #獲取空閒記憶體 7 freeMem=`awk '/MemFree/{print $2}' /proc/meminfo` 8 usedMem=`expr ${totalMem} - ${freeMem}` 9 #echo $(usagePercent ${usedMem} ${totalMem}) 10 #echo $(kbToGb ${totalMem}) 11 logInfo "Host [${ip}] total mem is : $(kbToGb ${totalMem}) GB" 12 #計算記憶體使用率並列印到日誌中 13 logInfo "Host [${ip}] mem usage is : $(usagePercent ${usedMem} ${totalMem}) %" 14 logInfo "End to get mem usage of Host [${ip}]" 15 }
2.3、流量監控
監控原理:
Linux機器流量資訊記錄在/proc/net/dev檔案中。通過計算一段時間段內接收和傳送的位元組數來計算速率。第一列為網路卡資訊,第二列為接收的位元組數,第10列為傳送的位元組數。
stephen@stephen-K55VD:~/shell/sysMonitor$ cat /proc/net/dev Inter-| Receive | Transmit face |bytes packets errs drop fifo frame compressed multicast|bytes packets errs drop fifo colls carrier compressed wlp3s0: 19595253 41163 0 0 0 0 0 0 34741446 49185 0 0 0 0 0 0 enp4s0f2: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 docker0: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 lo: 907275 5032 0 0 0 0 0 0 907275 5032 0 0 0 0 0 0
程式碼實現:
1 #ethName為網路卡名稱 2 receiveByteStart=`cat /proc/net/dev |grep -E "${ethName}"|awk '{print $2}'` 3 sendByteStart=`cat /proc/net/dev |grep -E "${ethName}"|awk '{print $10}'`
2.4、磁碟IO
監控原理:
磁碟IO相關的資訊記錄在/proc/vmstat檔案中,pgpgin對應的為輸入方向的資料量。pgpgout對應的為輸出方向的資料量。採集一段時間的資料量,除以時間來計算速率。
程式碼實現:
1 #disk IO in 2 function diskIOIn(){ 3 #獲取磁碟入方向IO 4 inIoStart=`awk '/pgpgin/{print $2}' /proc/vmstat` 5 sleep 30 6 inIoEnd=`awk '/pgpgin/{print $2}' /proc/vmstat` 7 inIo=$(((inIoEnd-inIoStart)/(30*1024))) 8 logInfo "Host [${ip}] in IO is : ${inIo} MB / s" 9 10 }
3、指令碼程式碼
- hostLists:監控主機的IP集合。
- sysMonitor.sh*:獲取各項監控資訊的指令碼。
1 #!/bin/bash 2 #監控linux主機系統資訊 3 #匯入工具模組 4 source utils 5 6 #獲取CPU佔用率 7 function cpuUsage() 8 { 9 #物理CPU個數 10 phyCPUNums=`cat /proc/cpuinfo |grep "physical id"|sort |uniq|wc -l` 11 #邏輯CPU個數 12 lgCPUNums=`cat /proc/cpuinfo |grep "processor"|wc -l` 13 #core 14 cores=`cat /proc/cpuinfo |grep "cores"|uniq|awk '{print $4}'` 15 logInfo "Host [${ip}] physical CPU nums is : ${phyCPUNums}" 16 logInfo "Host [${ip}] logic CPU nums is : ${lgCPUNums}" 17 logInfo "Host [${ip}] core nums is : ${cores}" 18 #CPU佔用率 19 #獲取CPU的總量與使用量 20 cpuTotalStart=`awk 'BEGIN{total=0} /cpu / {for(i=2;i<=NF;i++);total+=i}END{print $total}' /proc/stat` 21 cpuUsedStart=`awk 'BEGIN{used=0} /cpu / { used=$2+$3+$4+$7+$8 }END{print used}' /proc/stat` 22 #隔30s再獲取一次CPU總量與使用量並計算差值 23 sleep 30 24 cpuTotalEnd=`awk 'BEGIN{total=0} /cpu / {for(i=2;i<=NF;i++);total+=i}END{print $total}' /proc/stat` 25 cpuUsedEnd=`awk 'BEGIN{used=0} /cpu / { used=$2+$3+$4+$7+$8 }END{print used}' /proc/stat` 26 usedCPU=`expr ${cpuUsedEnd} - ${cpuUsedStart}` 27 totalCPU=`expr ${cpuTotalEnd} - ${cpuTotalStart}` 28 logInfo "Host [${ip}] CPU usage is : $(usagePercent ${usedCPU} ${totalCPU}) %" 29 30 } 31 32 #獲取記憶體使用率 33 function memUsage(){ 34 logInfo "Begin to get mem usage of Host [${ip}]" 35 #獲取總記憶體 36 totalMem=`awk '/MemTotal/{print $2}' /proc/meminfo` 37 #獲取空閒記憶體 38 freeMem=`awk '/MemFree/{print $2}' /proc/meminfo` 39 usedMem=`expr ${totalMem} - ${freeMem}` 40 #echo $(usagePercent ${usedMem} ${totalMem}) 41 #echo $(kbToGb ${totalMem}) 42 logInfo "Host [${ip}] total mem is : $(kbToGb ${totalMem}) GB" 43 #計算記憶體使用率並列印到日誌中 44 logInfo "Host [${ip}] mem usage is : $(usagePercent ${usedMem} ${totalMem}) %" 45 logInfo "End to get mem usage of Host [${ip}]" 46 } 47 48 #網路卡平均每秒流量 49 function netData(){ 50 logInfo "Begin to get net data of Host [${ip}]" 51 ethName=$1 52 receiveByteStart=`cat /proc/net/dev |grep -E "${ethName}"|awk '{print $2}'` 53 sendByteStart=`cat /proc/net/dev |grep -E "${ethName}"|awk '{print $10}'` 54 sleep 10 55 receiveByteSEnd=`cat /proc/net/dev |grep -E "${ethName}"|awk '{print $2}'` 56 sendBytesEnd=`cat /proc/net/dev |grep -E "${ethName}"|awk '{print $10}'` 57 inDataRate=$(echo "scale=2;(${receiveByteSEnd}-${receiveByteStart})/10" | bc) 58 outDataRate=$(echo "scale=2;(${sendBytesEnd}-${sendByteStart})/10" | bc) 59 logInfo "Host [${ip}] in data is : ${inDataRate} kb / s" 60 logInfo "Host [${ip}] out data is : ${outDataRate} kb / s" 61 logInfo "End to get net data of Host [${ip}]" 62 } 63 64 #磁碟空間使用情況 65 function diskUsage(){ 66 logInfo "Begin to get disk usage of Host [${ip}]" 67 noTimeLogInfo "`df -h`" 68 logInfo "End to get disk usage of Host [${ip}]" 69 } 70 71 #disk IO in 72 function diskIOIn(){ 73 #獲取磁碟入方向IO 74 inIoStart=`awk '/pgpgin/{print $2}' /proc/vmstat` 75 sleep 30 76 inIoEnd=`awk '/pgpgin/{print $2}' /proc/vmstat` 77 inIo=$(((inIoEnd-inIoStart)/(30*1024))) 78 logInfo "Host [${ip}] in IO is : ${inIo} MB / s" 79 80 } 81 82 #disk IO out 83 function diskIOout(){ 84 #獲取磁碟出方向的IO 85 outIoStart=`awk '/pgpgout/{print $2}' /proc/vmstat` 86 sleep 60 87 outIoEnd=`awk '/pgpgout/{print $2}' /proc/vmstat` 88 outIo=$(((outIoEnd-outIoStart)/(60*1024))) 89 logInfo "Host [${ip}] out IO is : ${outIo} MB / s" 90 } 91 92 #當前線上使用者 93 function onlineUser(){ 94 user=`w |awk 'NR>1'|awk '{print $1 "\t" "\t" $4}'` 95 userCount=`w |awk 'NR>1'|wc -l` 96 #loginAt=`w |awk 'NR>1'|awk '{print $4 }'` 97 logInfo "There are [${userCount}] users online now." 98 noTimeLogInfo "UserName loginAt" 99 noTimeLogInfo "${user}" 100 } 101 102 #判斷主機網路連通性 103 function isAlive(){ 104 for ip in `cat hostLists` 105 do 106 ping ${ip} -c 3 >/dev/null 107 if [ $? -eq 0 ];then 108 logInfo "${ip} is reachable" 109 #檢視線上使用者 110 onlineUser 111 #獲取CPU相關資訊 112 cpuUsage 113 #獲取mem相關資訊 114 memUsage 115 #獲取磁碟IO 116 diskIOIn 117 diskIOout 118 #磁碟使用率 119 diskUsage 120 #平均每秒流接收或輸出流量 121 netData wlp3s0 122 else 123 logInfo "ERROR ${ip} is unreachable,try login in see more details.." 124 fi 125 done 126 } 127 128 while [ 1 ] 129 do 130 isAlive 131 sleep 60 132 done
- utils:列印日誌的函式等。
1 #!/bin/bash 2 #日誌列印 3 curr_path=`pwd` 4 function logInfo() 5 { 6 local curr_time=`date "+%Y-%m-%d %H:%M:%S"` 7 log_file=${curr_path}/system_status.log 8 #判斷日誌檔案是否存在 9 if [ -e ${log_file} ] 10 then 11 #檢測檔案是否可寫 12 if [ -w ${log_file} ] 13 then 14 #若檔案無寫許可權則使用chmod命令賦予許可權 15 chmod 770 ${log_file} 16 fi 17 else 18 #若日誌檔案不存在則建立 19 touch ${log_file} 20 fi 21 #寫日誌 22 local info=$1 23 echo "${curr_time} `whoami` [Info] ${info}">>${log_file} 24 } 25 function noTimeLogInfo(){ 26 msg=$1 27 echo "${msg}">>${log_file} 28 } 29 30 #把kb轉換成gb,精度為3。expr只支援整數計算 31 function kbToGb(){ 32 kbVal=$1 33 gbVal=$(echo "scale=3;${kbVal}/1024/1024"| bc) 34 echo $gbVal 35 } 36 #使用率以百分比的形式 37 #第一個引數為已使用量,第二個引數為總量 38 function usagePercent(){ 39 used=$1 40 total=$2 41 usedPercent=$(echo "scale=2;${used}*100/${total}"| bc) 42 echo ${usedPercent} 43 }
指令碼結構:
1 -rw-r--r-- 1 stephen stephen 30 4月 5 18:33 hostLists 2 -rwxrwxr-x 1 stephen stephen 4164 4月 5 18:50 sysMonitor.sh* 3 -rw-r--r-- 1 stephen stephen 951 4月 5 15:23 utils
4、執行結果
監控資訊記錄在日誌system_status.log中。執行結果如下:
2019-04-05 19:44:42 stephen [Info] 192.168.1.109 is reachable 2019-04-05 19:44:42 stephen [Info] There are [2] users online now. UserName loginAt USER LOGIN@ stephen 14:09 2019-04-05 19:44:42 stephen [Info] Host [192.168.1.109] physical CPU nums is : 1 2019-04-05 19:44:42 stephen [Info] Host [192.168.1.109] logic CPU nums is : 4 2019-04-05 19:44:42 stephen [Info] Host [192.168.1.109] core nums is : 2 2019-04-05 19:45:12 stephen [Info] Host [192.168.1.109] CPU usage is : 10.12 % 2019-04-05 19:45:12 stephen [Info] Begin to get mem usage of Host [192.168.1.109] 2019-04-05 19:45:12 stephen [Info] Host [192.168.1.109] total mem is : 3.741 GB 2019-04-05 19:45:12 stephen [Info] Host [192.168.1.109] mem usage is : 95.83 % 2019-04-05 19:45:12 stephen [Info] End to get mem usage of Host [192.168.1.109] 2019-04-05 19:45:42 stephen [Info] Host [192.168.1.109] in IO is : 0 MB / s 2019-04-05 19:46:42 stephen [Info] Host [192.168.1.109] out IO is : 0 MB / s 2019-04-05 19:46:42 stephen [Info] Begin to get disk usage of Host [192.168.1.109] 檔案系統 容量 已用 可用 已用% 掛載點 udev 1.9G 0 1.9G 0% /dev tmpfs 384M 2.0M 382M 1% /run /dev/sda10 42G 20G 20G 51% / tmpfs 1.9G 20M 1.9G 2% /dev/shm tmpfs 5.0M 4.0K 5.0M 1% /run/lock tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup /dev/loop0 3.8M 3.8M 0 100% /snap/notepad-plus-plus/202 /dev/loop2 54M 54M 0 100% /snap/core18/782 /dev/loop4 441M 441M 0 100% /snap/wine-platform/111 /dev/loop5 441M 441M 0 100% /snap/wine-platform/105 /dev/loop7 3.8M 3.8M 0 100% /snap/notepad-plus-plus/199 /dev/loop3 90M 90M 0 100% /snap/core/6673 /dev/loop1 274M 274M 0 100% /snap/wps-office-multilang/1 /dev/loop6 91M 91M 0 100% /snap/core/6405 /dev/loop8 92M 92M 0 100% /snap/core/6531 /dev/loop9 36M 36M 0 100% /snap/gtk-common-themes/1198 /dev/loop10 3.8M 3.8M 0 100% /snap/notepad-plus-plus/195 /dev/loop11 441M 441M 0 100% /snap/wine-platform/103 tmpfs 384M 16K 384M 1% /run/user/125 tmpfs 384M 52K 384M 1% /run/user/1000 2019-04-05 19:46:42 stephen [Info] End to get disk usage of Host [192.168.1.109] 2019-04-05 19:46:42 stephen [Info] Begin to get net data of Host [192.168.1.109] 2019-04-05 19:46:52 stephen [Info] Host [192.168.1.109] in data is : 42.90 kb / s 2019-04-05 19:46:52 stephen [Info] Host [192.168.1.109] out data is : 7.00 kb / s 2019-04-05 19:46:52 stephen [Info] End to get net data of Host [192.168.1.109] 2019-04-05 19:47:04 stephen [Info] ERROR 255.255.255.254 is unreachable,try login in see more details..
5、參考文件
5.1、ifstat網路流量監控之/proc/net/dev檔案
https://blog.csdn.net/kongshuai19900505/article/details/80676607
5.2、awk命令
http://man.linuxde.net/awk
5.3、使用shell指令碼採集系統cpu、記憶體、磁碟、網路等資訊
https://www.jb51.net/article/50436.htm