Linux系統及應用問題分析排查工具

瓦力瓦力發表於2016-02-19

Linux伺服器上經常遇到一些系統和應用上的問題,如何分析排查,需要利器,下面總結列表了一些常用工具、trace tool;最後也列舉了最近hadoop社群在開發發展的分散式系統的trace tool。

概覽:

引用linux-performance-analysis-and-tools中圖片,說明這些tool試用層次位置

Image:etao-linux-analysis-tools.jpg

OS系統命令

系統資訊(RHEL/Fedora)

  • uname -a 或 cat /proc/version #print system information
    • Linux hadoopst2.cm6 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
  • uptime
    • 15:42:46 up 674 days, 6 min, 35 users, load average: 1.30, 5.97, 11.53
  • cat /etc/redhat-release
    • Red Hat Enterprise Linux Server release 5.4 (Tikanga)
  • lsb_release
    • LSB Version:  :core-3.1-amd64:core-3.1-ia32:core-3.1-noarch:graphics-3.1-amd64:graphics-3.1-ia32:graphics-3.1-noarch
  • cat /proc/cpuinfo
  • cat /proc/meminfo
  • lspci – list all PCI devices
  • lsusb – list USB devices
  • last, lastb – show listing of last logged in users
  • lsmod — show the status of modules in the Linux Kernel
  • modprobe – add and remove modules from the Linux Kernel

常用命令/工具

  • ps
    • To print a process tree: ps -ejH / ps axjf
    • To get info about threads: ps -eLf / ps axms
  • ulimit -a
  • lsof – list open files, UNIX一切皆檔案
    • lsof -p PID
  • rpm/yum
    • rpm -qf FILE #檔案所屬rpm包
    • rpm -ql RPM #rpm包含檔案
    • /var/log/yum.log #yum 更新包日誌
  • /etc/XXX #系統級程式配置目錄, 如
    • /etc/yum.repos.d/ yum源配置
  • /var/log/XXX #日誌目錄, 如
    • /var/log/cron #crontab日誌,可以檢視排程執行情況
  • ntpd – Network Time Protocol (NTP) daemon,同步叢集中機器時間
  • squid – proxy caching server,叢集WebUI的代理

系統監控

  • mpstat – Report processors related statistics. 注意%sys %iowait值
  • vmstat – Report virtual memory statistics
  • iostat – Report Central Processing Unit (CPU) statistics and input/output statistics for devices and partitions.
  • netstat – Print network connections, routing tables, interface statistics, masquerade connections, and multicast memberships
    • netstat -atpn | grep PID
  • ganglia – a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids.
  • sar/tsar – Collect, report, or save system activity information; tsar是淘寶自己改進的版本
    • 定時取樣(每分鐘),可查歷史記錄(預設5分鐘),可彌補ganglia顯示更詳細資訊
  • iftop – the “top” bandwidth consumers shown. iftop wiki
  • iotop
  • vmtouch, Portable file system cache diagnostics and control

網路相關

  • telnet/nc IP PORT – 確認目標埠是否可訪問,只ping通不一定埠可訪問,可能防火牆等禁止
  • ifconfig/ifup/ifdown – configure a network interface
  • traceroute – print the route packets trace to network host
  • nslookup – query Internet name servers interactively
  • tcpdump – dump traffic on a network, 類似開源工具 wiresharknetsniff-ng更多工具比較
  • lynx – a general purpose distributed information browser for the World Wide Web
  • tcpcp – allows cooperating applications to pass ownership of TCP connection endpoints from one Linux host to another one.

程式/程式相關

靜態資訊

  • ldconfig – configure dynamic linker run time bindings
    • ldconfig -p | grep SO 檢視so是否在link cache中
  • ldd – print shared library dependencies, 檢視exe或so依賴的so
  • nm – list symbols from object files,可grep查詢是否存在相關的symbol,是否Undefined.
  • readelf – Displays information about ELF files. 可現實elf相關資訊,如32/64位,適用的OS,處理器

動態資訊

  • gdb
  • cat /proc/$PID/[cmdline|environ|limits|status|…] – 程式相關資訊
  • pstack – print a stack trace of a running process
  • pmap – report memory map of a process

java相關

  • JDK Tools and Utilities
  • Java Troubleshooting Tools
  • jinfo – print java process information, 如classpath,java.libary.path(jni so目錄)
  • jstack – print a stack trace of a running java process,可檢視死鎖情況
  • jmap – report memory map of a java process
    • jmap -histo:live 可觸發full gc
    • jmap -dump:live,file=$FILE 可dump heap記憶體,用於jhat等工具debug分析object在heap的佔用情況
  • jhat – Heap Dump Browser – Starts a web server on a heap dump file (eg, produced by jmap -dump), allowing the heap to be browsed.
    • 起http服務,瀏覽器訪問檢視
    • -J-mxXXXm ,分析大檔案時需要加大heap大小
    • 若有物件資料超大或記憶體佔用過多,極有可能memory leak
  • Memory Analyzer (MAT) – eclipse plugin,Java heap analyzer
    • 視覺化工具,但受到機器記憶體的限制,無法分析太大的heap dump file
  • jdb – 可起服務做server,eclipse等工具遠端連線除錯
  • jstat – Java Virtual Machine Statistics Monitoring Tool
  • jstatd – Virtual Machine jstat Daemon,可配合jvisualvm
  • jvisualvm – Java Virtual Machine Monitoring, Troubleshooting, and Profiling Tool;可遠端連線jstatd/jmx, 視覺化展示工具:演示
  • jvmtop – In a top-like manner, displays JVM internal metrics (e.g. memory information) of running java processes.
  • JVM performance optimization JVM開發者寫的優化文章
    1. Overview
    2. Compilers
    3. Garbage collection
    4. Concurrently compacting GC
    5. Scalability
  • HPROF – Heap Profiler: java -agentlib:hprof

Trace/Debug/Profiling工具

通用工具

  • 寫log,但系統線上或無法原始碼時
  • strace – trace system calls and signals
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 67.90 3966.320849         496   7992161   3050250 futex
 25.80 1507.326693      127093     11860           epoll_wait
....................
  • blktrace, generate traces of the i/o traffic on block devices
  • ltrace – A library call tracer
  • xtrace
  • gprof – a performance analysis tool, sampling and call-graph profiling
  • valgrind – an instrumentation framework for building dynamic analysis tools. automatically detect many memory management and threading bugs, and profile your programs in detail
  • systemtap – a simple command line interface and scripting language for writing instrumentation for a live running kernel plus user-space applications for complex tasks that may require live analysis, programmable on-line response, and whole-system symbolic access.
    • Linux版DTrace(SUN在Solaris上開發的)
    • 功能強大,kernel, user-space app,cross language(java perl python ruby),build-in markers(pg mysql)
    • can write and reuse simple scripts to deeply examine the activities of a live system
    • Data can be extracted, filtered, and summarized quickly and safely, to enable diagnoses of complex performance or functional problems
    • 豐富的 “tapset” script library

java trace工具

  • btrace – dynamic tracing tool for the Java platform. UserGuide
  • byteman – simplifies tracing and testing of Java programs. Can modify a running application without needing to stop and restart it.
    • define rules specifying the side effects you want to inject  BTrace類java語法

Distributed Tracing Tools

  • Dapper, a Large-Scale Distributed Systems Tracing Infrastructure
  • x-trace, a network diagnostic tool designed to provide users and network operators with better visibility into increasingly complex Internet applications.
  • HTrace a tracing framework intended for use with distributed systems written in java

部分內容有引用微博其他童鞋的,如有問題可以及時聯絡。



相關文章