使用Linux Strace跟蹤除錯Oracle程式程式

不一樣的天空w發表於2017-12-11

 

所謂作業系統,是應用程式與伺服器硬體進行溝通的中間層。應用程式的所有操作,都是和作業系統進行溝通互動。作業系統負責將所有互動轉化為裝置語言,進行硬體互動。

我們在進行Oracle故障除錯和核心原理工作的時候,經常需要了解後臺執行的動作和細節。一些故障場景,如程式程式hang住、無法登陸等問題,就需要作業系統級別監控,檢查定位問題。

Oracle自身已經提供了很多這型別的工具,如oradebug、各種等待事件和跟蹤方式。此外,各型別的作業系統提供出很多系統級別工具,幫助我們進行監控。本篇主要介紹Linux環境中的strace工具典型使用方法,留待需要朋友待查。

 

1、環境介紹

 

筆者使用紅帽Red Hat 6.5進行測試。

 

 

[root@XXX-Standby-Asm ~]# cat /etc/redhat-release

Red Hat Enterprise Linux Server release 6.5 (Santiago)

[root@XXX-Standby-Asm ~]# uname -a

Linux XXX-Standby-Asm 2.6.32-431.el6.x86_64 #1 SMP Sun Nov 10 22:19:54 EST 2013 x86_64 x86_64 x86_64 GNU/Linux

 

 

對應使用Oracle版本為11.2.0.4

 

 

[oracle@XXX-Standby-Asm ~]$ sqlplus /nolog

 

SQL*Plus: Release 11.2.0.4.0 Production on Tue Jun 16 16:39:21 2015

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

 

SQL> conn / as sysdba

Connected.

SQL> select * from v$version;

 

BANNER

--------------------------------------------------------------------------------

Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

PL/SQL Release 11.2.0.4.0 - Production

CORE    11.2.0.4.0      Production

TNS for Linux: Version 11.2.0.4.0 - Production

NLSRTL Version 11.2.0.4.0 – Production

 

 

2Strace命令資訊

 

使用-h引數,可以呼叫出strace命令的語法格式和各個使用引數。

 

 

[oracle@XXX-Standby-Asm ~]$ strace -h

usage: strace [-dDffhiqrtttTvVxx] [-a column] [-e expr] ... [-o file]

              [-p pid] ... [-s strsize] [-u username] [-E var=val] ...

              [command [arg ...]]

   or: strace -c [-D] [-e expr] ... [-O overhead] [-S sortby] [-E var=val] ...

              [command [arg ...]]

-c -- count time, calls, and errors for each syscall and report summary

-f -- follow forks, -ff -- with output into separate files

-F -- attempt to follow vforks, -h -- print help message

-i -- print instruction pointer at time of syscall

-q -- suppress messages about attaching, detaching, etc.

-r -- print relative timestamp, -t -- absolute timestamp, -tt -- with usecs

-T -- print time spent in each syscall, -V -- print version

-v -- verbose mode: print unabbreviated argv, stat, termio[s], etc. args

-x -- print non-ascii strings in hex, -xx -- print all strings in hex

-a column -- alignment COLUMN for printing syscall results (default 40)

-e expr -- a qualifying expression: option=[!]all or option=[!]val1[,val2]...

   options: trace, abbrev, verbose, raw, signal, read, or write

-o file -- send trace output to FILE instead of stderr

-O overhead -- set overhead for tracing syscalls to OVERHEAD usecs

-p pid -- trace process with process id PID, may be repeated

-D -- run tracer process as a detached grandchild, not as parent

-s strsize -- limit length of print strings to STRSIZE chars (default 32)

-S sortby -- sort syscall counts by: time, calls, name, nothing (default time)

-u username -- run command as username handling setuid and/or setgid

-E var=val -- put var=val in the environment for command

-E var -- remove var from the environment for command

 

 

在諸多引數中,比較常用的有如下幾個:

 

ü  -t引數:可以顯示呼叫作業系統過程的時間。這個在診斷中很重要,如果發現某一個執行過程時間消耗大,就可以定位到是什麼問題;

ü  -p引數:strace除了可以從程式啟動時候開始監控之外,還可以根據作業系統程式編號,進入一個執行中的程式進行debug

ü  -o引數:輸出結果如果需要以檔案方式儲存,就透過-o引數指定檔名稱;

ü  -e引數:監控表示式。提供tracesignalreadwrite等內容,反映監控方面專案。常用的引數是trace=file,集中在作業系統檔案訪問層面。

 

下面,透過兩個典型應用場景,來演示一下strace應用。

 

3、啟動程式監控

 

strace監控研究的一個重要範例是sqlplus,很多前輩使用這個工具來測試跟蹤sqlplus啟動過程訪問的檔案形式。啟動strace監控sqlplus,就是在strace命令後直接輸入sqlplus命令即可。

 

 

--啟動命令

[oracle@localhost ~]$ strace -t -e trace=file sqlplus /nolog

14:21:14 execve("/u01/app/oracle/bin/sqlplus", ["sqlplus", "/nolog"], [/* 31 vars */]) = 0 啟動命令

14:21:14 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)

14:21:14 open("/u01/app/oracle/lib/tls/x86_64/libsqlplus.so", O_RDONLY) = -1 ENOENT (No such file or directory)

14:21:14 stat("/u01/app/oracle/lib/tls/x86_64", 0x7fff3076da50) = -1 ENOENT (No such file or directory)

14:21:14 open("/u01/app/oracle/lib/tls/libsqlplus.so", O_RDONLY) = -1 ENOENT (No such file or directory)

14:21:14 stat("/u01/app/oracle/lib/tls", 0x7fff3076da50) = -1 ENOENT (No such file or directory)

14:21:14 open("/u01/app/oracle/lib/x86_64/libsqlplus.so", O_RDONLY) = -1 ENOENT (No such file or directory)

14:21:14 stat("/u01/app/oracle/lib/x86_64", 0x7fff3076da50) = -1 ENOENT (No such file or directory)

14:21:14 open("/u01/app/oracle/lib/libsqlplus.so", O_RDONLY) = 3

14:21:14 open("/u01/app/oracle/lib/libclntsh.so.11.1", O_RDONLY) = 3

14:21:14 open("/u01/app/oracle/lib/libnnz11.so", O_RDONLY) = 3

14:21:14 open("/u01/app/oracle/lib/libdl.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)

(篇幅原因,有省略……

14:21:14 access("/etc/tnsnav.ora", F_OK) = -1 ENOENT (No such file or directory)

14:21:14 access("/u01/app/oracle/network/admin/tnsnav.ora", F_OK) = -1 ENOENT (No such file or directory)

14:21:14 open("/u01/app/oracle/oracore/zoneinfo", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 6

14:21:14 open("/u01/app/oracle/oracore/zoneinfo/timezlrg_1.dat", O_RDONLY) = 7

14:21:14 open("/u01/app/oracle/oracore/zoneinfo/timezlrg_2.dat", O_RDONLY) = 7

(讀取各種時區資料……

14:21:14 open("/u01/app/oracle/oracore/zoneinfo/timezone_14.dat", O_RDONLY) = 7

14:21:14 open("/u01/app/oracle/oracore/zoneinfo/timezlrg_14.dat", O_RDONLY) = 6

14:21:14 open("/usr/lib64/libnuma.so", O_RDONLY) = -1 ENOENT (No such file or directory)

14:21:14 open("/usr/lib64/libnuma.so.1", O_RDONLY) = 6

14:21:14 open("/proc/self/status", O_RDONLY) = 6

14:21:14 open("/sys/devices/system/node", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 6

14:21:14 open("/sys/devices/system/node/node0/meminfo", O_RDONLY) = 7

14:21:14 open("/sys/devices/system/cpu", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 6

14:21:14 open("/proc/self/status", O_RDONLY) = 6

14:21:14 open("/proc/sys/kernel/shmmax", O_RDONLY) = 6

14:21:14 open("/proc/meminfo", O_RDONLY) = 6

14:21:14 open("/usr/lib64/libnuma.so", O_RDONLY) = -1 ENOENT (No such file or directory)

14:21:14 open("/etc/localtime", O_RDONLY) = 6

14:21:14 open("/u01/app/oracle/rdbms/mesg/ocius.msb", O_RDONLY) = 6

 

SQL*Plus: Release 11.2.0.4.0 Production on Fri Jun 12 14:21:14 2015

 

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

 

--啟動之後,讀取的檔案。

14:21:14 stat("login.sql", 0x7fff3076c740) = -1 ENOENT (No such file or directory)

14:21:14 stat("/u01/app/oracle/sqlplus/admin/glogin.sql", {st_mode=S_IFREG|0644, st_size=368, ...}) = 0

14:21:14 access("/u01/app/oracle/sqlplus/admin/glogin.sql", F_OK) = 0

14:21:14 statfs("/u01/app/oracle/sqlplus/admin/glogin.sql", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=174170552, f_bfree=147038320, f_bavail=138190973, f_files=44236800, f_ffree=44019067, f_fsid={1821877354, 763674030}, f_namelen=255, f_frsize=4096}) = 0

14:21:14 open("/u01/app/oracle/sqlplus/admin/glogin.sql", O_RDONLY) = 7

 

 

如果在啟動sqlplus過程中,直接進行登入。跟蹤過程中還包括了登入動作。

 

 

[oracle@localhost ~]$ strace -t -e trace=file sqlplus / as sysdba

14:29:30 execve("/u01/app/oracle/bin/sqlplus", ["sqlplus", "/", "as", "sysdba"], [/* 31 vars */]) = 0

(相同篇幅內容,略……

 

 

SQL*Plus: Release 11.2.0.4.0 Production on Fri Jun 12 14:29:30 2015

 

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

--差異部分

14:29:30 open("/u01/app/oracle/dbs/cm_xxxdb.dat", O_RDONLY|O_DSYNC) = -1 ENOENT (No such file or directory)

14:29:30 getcwd("/home/oracle", 256)    = 13

14:29:30 open("/proc/self/cmdline", O_RDONLY) = 7

14:29:30 open("/etc/nsswitch.conf", O_RDONLY) = 7

14:29:30 open("/u01/app/oracle/lib/libnss_files.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)

14:29:30 open("/etc/ld.so.cache", O_RDONLY) = 7

14:29:30 open("/lib64/libnss_files.so.2", O_RDONLY) = 7

14:29:30 open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 7

14:29:30 open("/u01/app/oracle/lib/libnque11.so", O_RDONLY) = 7

14:29:30 getcwd("/home/oracle", 256)    = 13

14:29:30 open("/u01/app/oracle/nls/data/lx20369.nlb", O_RDONLY) = 7

14:29:30 open("/u01/app/oracle/nls/data/lx207d0.nlb", O_RDONLY) = 7

14:29:30 readlink("/proc/self/fd/0", "/dev/pts/0", 254) = 10

14:29:30 open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 7

14:29:30 open("/etc/host.conf", O_RDONLY) = 7

14:29:30 open("/etc/resolv.conf", O_RDONLY) = 7

14:29:30 open("/etc/hosts", O_RDONLY|O_CLOEXEC) = 7

14:29:30 open("/etc/hostid", O_RDONLY)  = -1 ENOENT (No such file or directory)

14:29:30 open("/etc/hosts", O_RDONLY|O_CLOEXEC) = 7

14:29:30 open("/u01/app/oracle/rdbms/mesg/oraus.msb", O_RDONLY) = 7

 

Connected to:

Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

With the Partitioning, OLAP, Data Mining and Real Application Testing options

 

14:29:30 stat("login.sql", 0x7fff0a822b10) = -1 ENOENT (No such file or directory)

14:29:30 stat("/u01/app/oracle/sqlplus/admin/glogin.sql", {st_mode=S_IFREG|0644, st_size=368, ...}) = 0

14:29:30 access("/u01/app/oracle/sqlplus/admin/glogin.sql", F_OK) = 0

14:29:30 statfs("/u01/app/oracle/sqlplus/admin/glogin.sql", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=174170552, f_bfree=147038317, f_bavail=138190970, f_files=44236800, f_ffree=44019066, f_fsid={1821877354, 763674030}, f_namelen=255, f_frsize=4096}) = 0

14:29:30 open("/u01/app/oracle/sqlplus/admin/glogin.sql", O_RDONLY) = 7

 

 

4、跟蹤執行程式方式

 

除了上面介紹的方法外,strace還可以直接跟蹤監控正在執行的系統程式。下面我們來觀察一下straceOracle後臺CKPT程式的監控。

 

 

[oracle@XXX-Standby-Asm ~]$ ps -ef | grep ckpt

grid     15950     1  0 Jun05 ?        00:01:02 asm_ckpt_+ASM

oracle   18608 17848  0 17:35 pts/0    00:00:00 grep ckpt

oracle   21395     1  0 Jun08 ?        00:04:27 ora_ckpt_xxxstb

 

 

root角度,跟蹤程式編號21395的後臺程式。

 

 

[root@XXX-Standby-Asm ~]# strace -t -p 21395

Process 21395 attached - interrupt to quit

17:36:42 times({tms_utime=12512, tms_stime=14273, tms_cutime=0, tms_cstime=0}) = 645608271

17:36:42 semtimedop(6062085, {{16, -1, 0}}, 1, {1, 840000000}) = -1 EAGAIN (Resource temporarily unavailable)

17:36:44 getrusage(RUSAGE_SELF, {ru_utime={125, 122978}, ru_stime={142, 730301}, ...}) = 0

17:36:44 getrusage(RUSAGE_SELF, {ru_utime={125, 122978}, ru_stime={142, 730301}, ...}) = 0

17:36:44 times({tms_utime=12512, tms_stime=14273, tms_cutime=0, tms_cstime=0}) = 645608455

17:36:44 times({tms_utime=12512, tms_stime=14273, tms_cutime=0, tms_cstime=0}) = 645608455

17:36:44 read(16, "MSA\0\2\0\10\0P\0\0\0\0\0\0\0\20?P\r\0\0\0\0\0\200\330\266\177\177\0\0"..., 80) = 80

17:36:44 read(16, "MSA\0\2\0\10\0P\0\0\0\0\0\0\0\20?P\r\0\0\0\0\0\200\330\266\177\177\0\0"..., 80) = 80

17:36:44 times({tms_utime=12512, tms_stime=14273, tms_cutime=0, tms_cstime=0}) = 645608455

17:36:44 semtimedop(6062085, {{16, -1, 0}}, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable) –三秒差異

17:36:47 getrusage(RUSAGE_SELF, {ru_utime={125, 123978}, ru_stime={142, 731301}, ...}) = 0

17:36:47 getrusage(RUSAGE_SELF, {ru_utime={125, 123978}, ru_stime={142, 731301}, ...}) = 0

17:36:47 times({tms_utime=12512, tms_stime=14273, tms_cutime=0, tms_cstime=0}) = 645608755

17:36:47 times({tms_utime=12512, tms_stime=14273, tms_cutime=0, tms_cstime=0}) = 645608755

17:36:47 read(16, "MSA\0\2\0\10\0P\0\0\0\0\0\0\0\20?P\r\0\0\0\0\0\200\330\266\177\177\0\0"..., 80) = 80

17:36:47 read(16, "MSA\0\2\0\10\0P\0\0\0\0\0\0\0\20?P\r\0\0\0\0\0\200\330\266\177\177\0\0"..., 80) = 80

17:36:47 read(16, "MSA\0\2\0\10\0P\0\0\0\0\0\0\0\20?P\r\0\0\0\0\0\200\330\266\177\177\0\0"..., 80) = 80

17:36:47 read(16, "MSA\0\2\0\10\0P\0\0\0\0\0\0\0\20?P\r\0\0\0\0\0\200\330\266\177\177\0\0"..., 80) = 80

17:36:47 read(16, "MSA\0\2\0\10\0P\0\0\0\0\0\0\0\20?P\r\0\0\0\0\0\200\330\266\177\177\0\0"..., 80) = 80

17:36:47 read(16, "MSA\0\2\0\10\0P\0\0\0\0\0\0\0\20?P\r\0\0\0\0\0\200\330\266\177\177\0\0"..., 80) = 80

17:36:47 times({tms_utime=12512, tms_stime=14273, tms_cutime=0, tms_cstime=0}) = 645608756

17:36:47 times({tms_utime=12512, tms_stime=14273, tms_cutime=0, tms_cstime=0}) = 645608756

17:36:47 times({tms_utime=12512, tms_stime=14273, tms_cutime=0, tms_cstime=0}) = 645608756

17:36:47 times({tms_utime=12512, tms_stime=14273, tms_cutime=0, tms_cstime=0}) = 645608756

17:36:47 times({tms_utime=12512, tms_stime=14273, tms_cutime=0, tms_cstime=0}) = 645608756

17:36:47 semtimedop(6062085, {{16, -1, 0}}, 1, {3, 0}^C <unfinished ...>

Process 21395 detached

 

 

從裡面,我們可以看到ckpt的動作方式。

 

5、結論

 

StraceLinux環境下的一個非常有用的工具。用好這個工具,可以讓我們更加接近核心細節,豐富知識能力。


來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/31397003/viewspace-2148563/,如需轉載,請註明出處,否則將追究法律責任。

相關文章