CentOS效能監控工具

audered發表於2012-03-24

Linux系統出現問題時，我們不僅需要檢視系統日誌資訊，而且還要使用大量的效能監測工具來判斷究竟是哪一部分（記憶體、CPU、硬碟……）出了問題。在Linux系統中，所有的執行引數儲存在虛擬目錄/proc中，換句話說，我們使用的效能監控工具取到的資料值實際上就是源自於這個目錄，當涉及到系統高估時，我們就可以修改/proc目錄中的相關引數了，當然有些是不能亂改的。下面就讓我們瞭解一下這些常用的效能監控工具。

工具	功能描述
uptime	系統平均負載率
dmesg	硬體/系統資訊
top	程式進行狀態
iostat	CPU和磁碟平均使用率
vmstat	系統執行狀態
sar	實時收集系統使用狀態
KDE System Guard	圖形監控工具
free	記憶體使用率
traffic-vis	網路監控（只有SUSE有）
pmap	程式記憶體佔用率
strace	追蹤程式執行狀態
ulimit	系統資源使用限制
mpstat	多處理器使用率

1、uptime

uptime命令用於檢視伺服器執行了多長時間以及有多少個使用者登入，快速獲知伺服器的負荷情況。

uptime的輸出包含一項內容是load average，顯示了最近1，5，15分鐘的負荷情況。它的值代表等待CPU處理的程式數，如果CPU沒有時間處理這些程式，load average值會升高；反之則會降低。
load average的最佳值是1，說明每個程式都可以馬上處理並且沒有CPU cycles被丟失。對於單CPU的機器，1或者2是可以接受的值；對於多路CPU的機器，load average值可能在8到10之間。
也可以使用uptime命令來判斷網路效能。例如，某個網路應用效能很低，通過執行uptime檢視伺服器的負荷是否很高，如果不是，那麼問題應該是網路方面造成的。
以下是uptime的執行例項：
9:24am up 19:06, 1 user, load average: 0.00, 0.00, 0.00
也可以檢視/proc/loadavg和/proc/uptime兩個檔案，注意不能編輯/proc中的檔案，要用cat等命令來檢視，如：
liyawei:~ # cat /proc/loadavg
0.00 0.00 0.00 1/55 5505

2、dmesg

dmesg命令主要用來顯示核心資訊。使用dmesg可以有效診斷機器硬體故障或者新增硬體出現的問題。
另外，使用dmesg可以確定您的伺服器安裝了那些硬體。每次系統重啟，系統都會檢查所有硬體並將資訊記錄下來。執行/bin/dmesg命令可以檢視該記錄。
dmesg輸入例項：
ReiserFS: hda6: checking transaction log (hda6)
ReiserFS: hda6: Using r5 hash to sort names
Adding 1044184k swap on /dev/hda5. Priority:-1 extents:1 across:1044184k
parport_pc: VIA 686A/8231 detected
parport_pc: probing current configuration
parport_pc: Current parallel port base: 0x378
parport0: PC-style at 0x378 (0x778), irq 7, using FIFO [PCSPP,TRISTATE,COMPAT,ECP]
parport_pc: VIA parallel port: io=0x378, irq=7
lp0: using parport0 (interrupt-driven).
e100: Intel(R) PRO/100 Network Driver, 3.5.10-k2-NAPI
e100: Copyright(c) 1999-2005 Intel Corporation
ACPI: PCI Interrupt 0000:00:0d.0[A] -> GSI 17 (level, low) -> IRQ 169
e100: eth0: e100_probe: addr 0xd8042000, irq 169, MAC addr 00:02:55:1E:35:91
usbcore: registered new driver usbfs
usbcore: registered new driver hub
hdc: ATAPI 48X CD-ROM drive, 128kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20
USB Universal Host Controller Interface driver v2.3

3、top

top命令顯示處理器的活動狀況。預設情況下，顯示佔用CPU最多的任務，並且每隔5秒鐘做一次重新整理。
Process priority的數值決定了CPU處理程式的順序。LIUNX核心會根據需要調整該數值的大小。nice value侷限於priority。priority的值不能低於nice value（nice value值越低，優先順序越高）。您不可以直接修改Process priority的值，但是可以通過調整nice level值來間接地改變Process priority值，然而這一方法並不是所有時候都可用。如果某個程式執行異常的慢，可以通過降低nice level為該程式分配更多的CPU。
Linux 支援的 nice levels 由19 (優先順序低)到-20 (優先順序高)，預設值為0。
執行/bin/ps命令可以檢視到當前程式的情況。

4、iostat

iostat由Red Hat Enterprise Linux AS釋出。同時iostat也是Sysstat的一部分，可以下載到，網址是http://perso.wanadoo.fr/sebastien.godard/
執行iostat命令可以從系統啟動之後的CPU平均時間，類似於uptime。除此之外，iostat還對建立一個伺服器磁碟子系統的活動報告。該報告包含兩部分：CPU使用情況和磁碟使用情況。
iostat顯示例項：
avg-cpu: %user %nice %system %iowait %steal %idle
0.16 0.01 0.03 0.10 0.00 99.71

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
hda 0.31 4.65 4.12 327796 290832

avg-cpu: %user %nice %system %iowait %steal %idle
1.00 0.00 0.00 0.00 0.00 100.00

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
hda 0.00 0.00 0.00 0 0

avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.00 0.00 0.00 99.01

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
hda 0.00 0.00 0.00 0 0
CPU佔用情況包括四塊內容
%user：顯示user level (applications)時，CPU的佔用情況。
%nice：顯示user level在nice priority時，CPU的佔用情況。
%sys:顯示system level (kernel)時，CPU的佔用情況。
%idle: 顯示CPU空閒時間所佔比例。

磁碟使用報告分成以下幾個部分：
Device: 塊裝置的名字
tps: 該裝置每秒I/O傳輸的次數。多個I/O請求可以組合為一個，每個I/O請求傳輸的位元組數不同，因此可以將多個I/O請求合併為一個。
Blk_read/s, Blk_wrtn/s: 表示從該裝置每秒讀寫的資料塊數量。塊的大小可以不同，如1024, 2048 或 4048位元組，這取決於partition的大小。

例如，執行下列命令獲得裝置/dev/sda1 的資料塊大小：
dumpe2fs -h /dev/sda1 |grep -F "Block size"

輸出結果如下
dumpe2fs 1.34 (25-Jul-2003)
Block size: 1024

Blk_read, Blk_wrtn: 指示自從系統啟動之後資料塊讀/寫的合計數。
也可以檢視這幾個檔案/proc/stat，/proc/partitions，/proc/diskstats的內容。

5、vmstat

vmstat提供了processes, memory, paging, block I/O, traps和CPU的活動狀況
procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------
r b   swpd   free   buff cache   si   so    bi    bo   in   cs us sy id wa st
1 0      0 513072 52324 162404    0    0     2     2 261   32 0 0 100 0 0
0 0      0 513072 52324 162404    0    0     0     0 271   43 0 0 100 0 0
0 0      0 513072 52324 162404    0    0     0     0 255   27 0 0 100 0 0
0 0      0 513072 52324 162404    0    0     0    28 275   51 0 0 97 3 0
0 0      0 513072 52324 162404    0    0     0     0 255   21 0 0 100 0 0
各輸出列的含義：
Process
– r: The number of processes waiting for runtime.
– b: The number of processes in uninterruptable sleep.
Memory
– swpd: The amount of virtual memory used (KB).
– free: The amount of idle memory (KB).
– buff: The amount of memory used as buffers (KB).
Swap
– si: Amount of memory swapped from the disk (KBps).
– so: Amount of memory swapped to the disk (KBps).
IO
– bi: Blocks sent to a block device (blocks/s).
– bo: Blocks received from a block device (blocks/s).
System
– in: The number of interrupts per second, including the clock.
– cs: The number of context switches per second.
CPU (these are percentages of total CPU time)
- us: Time spent running non-kernel code (user time, including nice time).
– sy: Time spent running kernel code (system time).
– id: Time spent idle. Prior to Linux 2.5.41, this included IO-wait time.
– wa: Time spent waiting for IO. Prior to Linux 2.5.41, this appeared as zero.

6、sar

sar是Red Hat Enterprise Linux AS發行的一個工具，同時也是Sysstat工具集的命令之一，可以從以下網址下載：http://perso.wanadoo.fr/sebastien.godard/
sar用於收集、報告或者儲存系統活動資訊。sar由三個應用組成：sar顯示資料、sar1和sar2用於收集和儲存資料。
使用sar1和sar2，系統能夠配置成自動抓取資訊和日誌，以備分析使用。配置舉例：在/etc/crontab中新增如下幾行內容
同樣的，你也可以在命令列方式下使用sar執行實時報告。如圖所示：
從收集的資訊中，可以得到詳細的CPU使用情況(%user, %nice, %system, %idle)、記憶體頁面排程、網路I/O、程式活動、塊裝置活動、以及interrupts/second
liyawei:~ # sar -u 3 10
Linux 2.6.16.21-0.8-default (liyawei) 05/31/07

10:17:16          CPU     %user     %nice   %system   %iowait     %idle
10:17:19          all      0.00      0.00      0.00      0.00    100.00
10:17:22          all      0.00      0.00      0.00      0.33     99.67
10:17:25          all      0.00      0.00      0.00      0.00    100.00
10:17:28          all      0.00      0.00      0.00      0.00    100.00
10:17:31         all      0.00      0.00      0.00      0.00    100.00
10:17:34          all      0.00      0.00      0.00      0.00    100.00

7、KDE System Guard

KDE System Guard (KSysguard) 是KDE圖形方式的任務管理和效能監視工具。監視本地及遠端客戶端/伺服器架構體系的中的主機。

8、free

/bin/free命令顯示所有空閒的和使用的記憶體數量，包括swap。同時也包含核心使用的快取。
total       used       free     shared    buffers     cached
Mem:        776492     263480     513012          0      52332     162504
-/+ buffers/cache:      48644     727848
Swap:      1044184          0    1044184

9、Traffic-vis

Traffic-vis是一套測定哪些主機在IP網進行通訊、通訊的目標主機以及傳輸的資料量。並輸出純文字、HTML或者GIF格式的報告。

注：Traffic-vis僅僅適用於SUSE LINUX ENTERPRISE SERVER。

如下命令用來收集網口eth0的資訊：
traffic-collector -i eth0 -s /root/output_traffic-collector
可以使用killall命令來控制該程式。如果要將報告寫入磁碟，可使用如下命令：
killall -9 traffic-collector
要停止對資訊的收集，執行如下命令：killall -9 traffic-collector

注意，不要忘記執行最後一條命令，否則會因為記憶體佔用而影響效能。

可以根據packets, bytes, TCP連線數對輸出進行排序，根據每項的總數或者收/發的數量進行。
例如根據主機上packets的收/發數量排序，執行命令：
traffic-sort -i output_traffic-collector -o output_traffic-sort -Hp

如要生成HTML格式的報告，顯示傳輸的位元組數，packets的記錄、全部TCP連線請求和網路中每臺伺服器的資訊，請執行命令：
traffic-tohtml -i output_traffic-sort -o output_traffic-tohtml.html
如要生成GIF格式（600X600）的報告，請執行命令：
traffic-togif -i output_traffic-sort -o output_traffic-togif.gif -x 600 -y 600

GIF格式的報告可以方便地發現網路廣播，檢視哪臺主機在TCP網路中使用IPX/SPX協議並隔離網路，需要記住的是，IPX是基於廣播包的協議。如果我們需要查明例如網路卡故障或重複IP的問題，需要使用特殊的工具。例如SUSE LINUX Enterprise Server自帶的Ethereal。
技巧和提示：使用管道，可以只需執行一條命令來產生報告。如生成HTML的報告，執行命令：
cat output_traffic-collector | traffic-sort -Hp | traffic-tohtml -o output_traffic-tohtml.html
如要生成GIF檔案，執行命令：
cat output_traffic-collector | traffic-sort -Hp | traffic-togif -o output_traffic-togif.gif -x 600 -y 600

10、pmap

pmap可以報告某個或多個程式的記憶體使用情況。使用pmap判斷主機中哪個程式因佔用過多記憶體導致記憶體瓶頸。
pmap <pid>

liyawei:~ # pmap 1
1: init
START       SIZE     RSS   DIRTY PERM MAPPING
08048000    484K    244K      0K r-xp /sbin/init
080c1000      4K      4K      4K rw-p /sbin/init
080c2000    144K     24K     24K rw-p [heap]
bfb5b000     84K     12K     12K rw-p [stack]
ffffe000      4K      0K      0K ---p [vdso]
Total:      720K    284K     40K

232K writable-private, 488K readonly-private, and 0K shared

11、strace

strace擷取和記錄系統程式呼叫，以及程式收到的訊號。是一個非常有效的檢測、指導和除錯工具。系統管理員可以通過該命令容易地解決程式問題。
使用該命令需要指明程式的ID(PID)，例如：
strace -p <pid>
# strace –p 2582
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
read(7, "\"\\\"\\\\\\\"\\\\\\\\\\\\\\\"\\\\\\\\\\\\\\\\\\\\\\\\"..., 16384) = 321
write(3, "}H\331q\37\275$\271\t\311M\304$\317~)R9\330Oj\304\257\327"..., 360) = 360
select(8, [3 4 7], [3], NULL, NULL)     = 2 (in [7], out [3])
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
read(7, "\"\\\"\\\\\\\"\\\\\\\\\\\\\\\"\\\\\\\\\\\\\\\\\\\\\\\\"..., 16384) = 323
write(3, "\204\303\27$\35\206\\\306VL\370\5R\200\226\2\320^\253\253"..., 360) = 360
select(8, [3 4 7], [3], NULL, NULL)     = 2 (in [7], out [3])
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
read(7, "\"\\\"\\\\\\\"\\\\\\\\\\\\\\\"\\\\\\\\\\\\\\\\\\\\\\\\"..., 16384) = 323
write(3, "\243\207\204\277Cw\0162\2ju=\205\'L\352?0J\256I\376\32"..., 360) = 360
select(8, [3 4 7], [3], NULL, NULL)     = 2 (in [7], out [3])
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
read(7, "\"\\\"\\\\\\\"\\\\\\\\\\\\\\\"\\\\\\\\\\\\\\\\\\\\\\\\"..., 16384) = 320
write(3, "6\270S\3i\310\334\301\253!ys\324\'\234%\356\305\26\233"..., 360) = 360
select(8, [3 4 7], [3], NULL, NULL)     = 2 (in [7], out [3])
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0

12、ulimit

ulimit內建在bash shell中，用來提供對shell和程式可用資源的控制
liyawei:~ # ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
file size               (blocks, -f) unlimited
pending signals                 (-i) 6143
max locked memory       (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 6143
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
-H和-S選項指明所給資源的軟硬限制。如果超過了軟限制，系統管理員會收到警告資訊。硬限制指在使用者收到超過檔案句炳限制的錯誤資訊之前，可以達到的最大值。
例如可以設定對檔案句炳的硬限制：ulimit -Hn 4096
例如可以設定對檔案句炳的軟限制：ulimit -Sn 1024
檢視軟硬值，執行如下命令：
ulimit -Hn
ulimit -Sn
例如限制Oracle使用者. 在/etc/security/limits.conf輸入以下行:
soft nofile 4096
hard nofile 10240
對於Red Hat Enterprise Linux AS，確定檔案/etc/pam.d/system-auth包含如下行
session required /lib/security/$ISA/pam_limits.so
對於SUSE LINUX Enterprise Server，確定檔案/etc/pam.d/login 和/etc/pam.d/sshd包含如下行：
session required pam_limits.so
這一行使這些限制生效。

13、mpstat

mpstat是Sysstat工具集的一部分，下載地址是http://perso.wanadoo.fr/sebastien.godard/
mpstat用於報告多路CPU主機的每顆CPU活動情況，以及整個主機的CPU情況。
例如，下邊的命令可以隔2秒報告一次處理器的活動情況，執行3次
mpstat 2 3
liyawei:~ # mpstat 2 3
Linux 2.6.16.21-0.8-default (liyawei) 05/31/07

10:23:03     CPU   %user   %nice    %sys %iowait    %irq   %soft %steal   %idle    intr/s
10:23:05     all    0.50    0.00    0.00    1.99    0.00    0.00    0.00   97.51    271.64
10:23:07     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00 100.00    261.00
10:23:09     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00 100.00    261.50
Average:     all    0.17    0.00    0.00    0.67    0.00    0.00    0.00   99.17    264.73
如下命令每隔1秒顯示一次多路CPU主機的處理器活動情況，執行3次
mpstat -P ALL 1 3
liyawei:~ # mpstat -P ALL 1 10
Linux 2.6.16.21-0.8-default (liyawei)   05/31/07

10:23:31     CPU   %user   %nice    %sys %iowait    %irq   %soft %steal   %idle    intr/s
10:23:32     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00 100.00    273.00
10:23:32       0    0.00    0.00    0.00    0.00    0.00    0.00    0.00 100.00    272.00
10:23:33     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00 100.00    254.00
10:23:33       0    0.00    0.00    0.00    0.00    0.00    0.00    0.00 100.00    254.00
10:23:34     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00 100.00    271.00
10:23:34       0    0.00    0.00    0.00    0.00    0.00    0.00    0.00 100.00    271.00
10:23:35     all    0.00    0.00    0.00    1.98    0.00    0.00    0.00   98.02    254.46
10:23:35       0    0.00    0.00    0.00    1.98    0.00    0.00    0.00

98.02 254.46

Centos效能監控工具——netdata配置
2023-03-31
CentOS
Linux 效能監控工具
2024-05-03
Linux
centos 監控
2024-04-14
CentOS
Flutter效能監控工具（3）--- Observatory使用
2019-01-28
Flutter
ios 手機app效能監控工具
2024-08-13
iOSAPP
☕[JVM效能專題]（1）效能監控-命令列工具
2021-06-16
JVM命令列
Centos7部署nmon監控工具
2020-06-06
CentOS
Flutter效能監控工具（1）--- Observatory簡介
2019-01-27
Flutter
效能監控工具之Grafana+Prometheus+Exporters
2021-06-12
GrafanaPrometheusExport
效能測試監控工具--Jmeter + Grafana + InfluxDB
2021-01-07
JMeterGrafanaUX
Windows 2003自帶效能監控工具的使用
2018-06-27
Windows
Jmeter系列（38）- 詳解效能監控工具 nmon
2020-07-07
JMeter
深入理解JVM：效能分析與監控工具
2020-12-29
JVM
前端效能監控
2019-04-03
前端
php效能監控
2022-02-17
PHP
5 分鐘擼一個前端效能監控工具
2018-07-18
前端
redis效能監控工具redis-stat安裝初探
2018-05-29
Redis
5分鐘打造一個前端效能監控工具
2018-08-24
前端
OpManager--強大的網路效能監控工具
2023-12-13
最強效能監控工具之Grafana+Prometheus+Exporters
2023-03-30
GrafanaPrometheusExport
磁碟IO效能監控
2018-06-27
Performance --- 前端效能監控
2019-06-05
ORM前端
MySQL sys效能監控
2024-08-29
MySql
效能監控調優
2024-07-06
MySQL監控工具
2019-07-30
MySql
手把手教你安裝Linux效能監控工具——pydash
2019-04-03
Linux
效能測試之資料庫監控分析工具PMM
2020-05-10
資料庫
虛擬機器效能監控和故障處理工具
2018-04-20
虛擬機
【JVM進階之路】八：效能監控工具-命令列篇
2021-04-03
JVM命令列
Java生產環境效能監控與調優—基於JDK命令列工具的監控
2018-10-10
JavaJDK命令列
CentOS 配置OOM監控報警
2020-04-18
CentOSOOM
APM效能監控軟體的監控型別服務及監控流程
2022-06-08
型別
011.MongoDB效能監控
2019-06-15
MongoDB
mysql效能監控相關
2019-06-13
MySql
iOS網路效能監控
2018-12-26
iOS
Sentry Web 效能監控 - Metrics
2021-09-11
Web
JVM學習筆記---伺服器，JVM效能監控工具
2018-09-24
JVM筆記伺服器
高併發&效能優化（二）------系統監控工具使用
2020-08-25
優化
linux監控工具audit
2020-09-29
Linux

CentOS效能監控工具

相關文章