不錯的工具,prw.sh

zhulch發表於2007-10-18
還真不錯啊[@more@]

Applies to:

Oracle Server - Enterprise Edition - Version: 10.2 to 10.2
HP-UX Itanium
AIX5L Based Systems (64-bit)
Linux Itanium
Linux x86-64
Solaris Operating System (SPARC 64-bit)
Linux x86
HP-UX PA-RISC (64-bit)
HP Tru64 UNIX

Purpose

Procwatcher is a tool to examine and monitor Oracle database and CRS processes at an interval. The tool will collect stack traces of these processes using Oracle tools like oradebug short_stack and OS debuggers like pstack, gdb, dbx, or ladebug.

Scope and Application

This tool is for Oracle representatives and DBAs looking to troubleshoot a problem further by monitoring processes. This tool should be used in conjunction with other tools or troubleshooting methods depending on the situation.

Procwatcher: Script to Monitor and Examine Oracle and CRS Processes

This script will find CRS and/or Oracle Background processes and collect stack traces for debugging. It will write a file called procname_pid_date.out for each process in its respective directory. If you are examining CRS then run this script as root. If you are only examining Oracle background processes then you can run as root or the oracle user.

To install the script, simply download it HERE, put it in its own directory, unzip it, and give it execute permissions.

Requirements:

- Must have /bin and /usr/bin in your $PATH
- Must have the relevant OS debugger installed on your platform. PRW looks for:

Linux - /usr/bin/gdb
HP-UX and HP Itanium - /opt/langtools/bin/gdb64
Sun - /usr/bin/pstack
IBM AIX - /bin/dbx
HP Tru64 - /bin/ladebug

- Recommended to set $ORACLE_HOME (PRW searches the oratab for the SID it finds and if it can't find the SID in the oratab it will default to $ORACLE_HOME).

Procwatcher Features:

- PRW collects stack traces for all processes defined using either oradebug short_stack or an OS debugger at a predefined interval.

- If USE_SQL is set to true, PRW will generate session wait, lock, and latch reports.

- If USE_SQL is set to true, PRW will look for wait events, lock, and latch contention and also dump stack traces of processes that are either waiting for non-idle wait events or waiting for or holding a lock or latch.

- If USE_SQL is set to true, PRW will dump session wait, lock, latch, and session history information into specific process files.

- You can define how aggressive PRW is about getting information by setting parameters like THROTTLE, IDLECPU, PROCINTERVAL, and INTERVAL. You can tune these parameters to either get the most information possible or to reduce PRW's cpu impact. See below for more information about what each of these parameters does.

- If CPU usage gets too high on the machine (as defined by IDLECPU), PRW will sleep and wait for CPU utilization to go down.

- PRW gets stack traces of ALL threads of a process.

- The housekeeper process runs on a 5 minute loop and cleans up files older than the specified number of days (default is 2).

- If USE_SQL is set to true and any SQL times out after 60 seconds it will be disabled. The housekeeper process (which is on a 5 minute loop), will re-enable SQL if the test passes.

- If oradebug shortstack is enabled and it times out, the housekeeper process will re-enable shortstack if the test passes.

Disclaimer: Most OS debuggers will temporarily suspend a process when attaching and dumping a stack trace. Procwatcher minimizes the amount of time that takes as much as possible. Some debuggers can also be CPU intensive. This script should be load tested prior to using in any production environment and the THROTTLE, IDLECPU, PROCINTERVAL, and INTERVAL parameters (see below) may need to be adjusted to suit your needs depending on how loaded the machine is and how fast it is. This tool should not be run on an overutilized machine with little or no idle CPU. Also note that some debuggers are faster and can get in and out of a process quicker than others. For example, pstack and oradebug short_stack are fast, ladebug is slower.

To start Procwatcher (the 2nd argument is the # of days to keep files, default is 2):

./prw.sh start 2

To stop Procwatcher: :

./prw.sh stop

To check the status of Procwatcher:

./prw.sh stat

Sample directory structure:

[root@racnode2 procwatcher]# ls
prw_09-21-07.log PRW_CRS PRW_DB_rac2 prw.sh PRW_SYS

Notice that it writes a log file for every day that it runs and creates a directory for CRS (PRW_CRS) and each DB that it finds (PRW_DB_$SID). The PRW_SYS directory contains files that prw uses at runtime (don't touch).

Sample log output:

################################################################################
Fri Sep 21 22:15:00 MDT 2007: Procwatcher starting on SunOS
################################################################################
Fri Sep 21 22:15:00 MDT 2007: Procwatcher running as user root
Fri Sep 21 22:15:00 MDT 2007: Created PRW_DB_ASM1 directory
Fri Sep 21 22:15:00 MDT 2007: Debugging for SIDs: ASM1
Fri Sep 21 22:15:00 MDT 2007: Going to use pstack for debugging
Fri Sep 21 22:15:00 MDT 2007: Created PRW_CRS directory
Fri Sep 21 22:15:01 MDT 2007: Getting crs_stat output (in PRW_CRS/prw_crs_stat_09-21-07_22:00.out)
Fri Sep 21 22:15:02 MDT 2007: Getting stack for crsd.bin 870 (in PRW_CRS/prw_crsd.bin_870_09-21-07.out)
Fri Sep 21 22:15:03 MDT 2007: Getting stack for evmd.bin 840 (in PRW_CRS/prw_evmd.bin_840_09-21-07.out)
Fri Sep 21 22:15:04 MDT 2007: Getting stack for ocssd.bin 1103 (in PRW_CRS/prw_ocssd.bin_1103_09-21-07.out)
Fri Sep 21 22:15:06 MDT 2007: Getting stack for asm_lmd0_+ASM1 1754 (in PRW_DB_ASM1/prw_asm_lmd0_+ASM1_1754_09-21-07.out)

Sample debug output:

################################################################################
Fri Sep 21 22:15:06 MDT 2007
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY
TIME CMD
0 S oracle 1754 1 0 40 20 ? 33778 ? Jul 18 ?
164:24 asm_lmd0_+ASM1


1754: asm_lmd0_+ASM1
ffffffff7a8ce49c pollsys (ffffffff7fffb9c0, 2, ffffffff7fffb900, 0)
ffffffff7a867f24 poll (ffffffff7fffb9c0, 2, 50, 4c4b400, 50, 1312d0) + 88
ffffffff7da19594 sskgxp_select (ffffffff7fffc9a0, 106744e10, ffffffff7fffc310,
2, 0, 50) + f4
ffffffff7da08fcc skgxpiwait (ffffffff7da1af78, 106744e10, 106745c10, 4f08ca92,
ffffffff7fffc310, fffe) + 82c
ffffffff7da086e4 skgxpwait (0, 106744e10, 4f08ca42, 400000, 50, 400000) + 364
0000000101185f6c ksxpwait (0, 101000, 0, 10652a698, 1000, 106530ac8) + 70c
0000000100ed50c8 ksliwat (0, 2, 8, 38793ad00, 38793ac88, 0) + b88
0000000100ed5690 kslwaitns_timed (8, 1, 33, 0, ffffffff7fffcec8, 0) + 30
0000000101172628 kskthbwt (8, 33, 0, 40, 0, 0) + e8
0000000100ed55d4 kslwait (1f5d7b0b, 0, a, a, 0, 0) + 74
0000000100ed5690 kslwaitns_timed (8, 1, 33, 0, ffffffff7fffcec8, 0) + 30
0000000101172628 kskthbwt (8, 33, 0, 40, 0, 0) + e8
0000000100ed55d4 kslwait (1f5d7b0b, 0, a, a, 0, 0) + 74
0000000101184894 ksxprcv (1056db, 106527c18, 8, 1056db618, 106527, 1056db000) +394
0000000101645894 kjctr_rksxp (40, 385fe5af8, 0, ffffffff7fffda18, 14, ffffffff7fffda14) + 1f4
0000000101647464 kjctrcv (ffffffff79c2c2c8, 385fe5af8, 10675bca0, ffffffff7fffe25c, 40, 33) + 164
0000000101633c80 kjcsrmg (ffffffff79c2c2b0, 0, 40, 33, 0, 106531) + 60
0000000101690634 kjmdm (8, 44, a, 8, 106531, 0) + 3274

Sample SQL Report (if USE_SQL=true):

###### #########################################################################
Procwatcher sessionwait report
####### ########################################################################

Snapshot Taken At: Thu Sep 27 13:36:03 GMT 2007
SID PROC STATE EVENT P1 P2 P3 WAIT_CLASS
SEC
--------------- ----------------- ---------- ------------------------------ ---------- ---------- ---------- ------------ -----
-----
SID H1021 PROC 233474 WAITING enq: TX - row lock contention 1415053318 524330 611 Application
117
SID H1021 PROC 913492 WAITED SHO SQL*Net message to client 1650815232 1 0 Network
0
Elapsed: 00:00:00.02

Sample SQL Data Dumped to Process Specific Files (if USE_SQL=true):
################################################################################
SQL: Session Wait Report for Process 192546 ora_fg_H1021

Snapshot Taken At: Thu Sep 27 13:37:49 GMT 2007
SID PROC STATE EVENT P1 P2 P3 WAIT_CLASS SEC
--------------- ----------------- ---------- ------------------------------ ---------- ---------- ---------- ------------ -----
SID H1021 PROC 192546 WAITING SQL*Net message from client 1650815232 1 0 Idle 228

################################################################################
SQL: Lock Report for Process 192546 ora_fg_H1021

Snapshot Taken At: Thu Sep 27 13:37:58 GMT 2007
SID PROC TY ID1 ID2 LMODE REQUEST BLOCK
-------------------- ----------------- -- ---------- ---------- ---------- ---------- ----------
SID H1021 PROC 192546 TX 524330 611 6 0 1

Procwatcher Parameters:

Procwatcher also has some configurable parameters that can be set within the script itself. The script also provides more information on how to set each one. Here is the section of the script where parameters can be set:

CONFIG SETTINGS:

Set EXAMINE_CRS variable if you want to examine CRS processes (default is true - or set to false):
EXAMINE_CRS=true

Set EXAMINE_BG variable if you want to examine all BG processes (default is true - or set to false):
EXAMINE_BG=true

Set USE_SQL variable if you want to use SQL to troubleshoot (default is true - or set to false):
USE_SQL=true

PERFORMANCE SETTINGS:

Set INVERVAL to the number of seconds between stack trace runs (default 30):
Probably should not set this much lower than 30 to make sure all SQL is finished (if USE_SQL=true)
INTERVAL=30

Set THROTTLE to the max of stack trace sessions to run at once (default 5):
THROTTLE=5

Set IDLECPU to the percentage of idle cpu remaining before PRW sleeps (default 5 - which means PRW will sleep if the machine is more than 95% busy)
IDLECPU=5

Set PROCINTERVAL to an additional sleep time in between process dumps (default 0-2 depending on the OS)
FYI - will sleep for $PROCINTERVAL twice when dumping a CRS stack or using an expensive OS tool
PROCINTERVAL=

PROCESS LIST SETTINGS:


Set SIDLIST to the list of SIDs you want to examine (default is derived - format "SID1|SID2|SID3"
Default: If root is starting prw, get all sids found running at the time prw was started.
If another user is starting prw, get all sids found running owned by that user.
SIDLIST=

CRS Process list for examination (max of first 8 characters - seperated by "|"):
Default: "crsd.bin|ocssd.b|evmd.bin|evmlogge|racgimon|racge|racgmain|racgons.bin|ocls"
- Do not list oprocd unless you want your node to reboot.
- Do not list ocssd.bin if you have manually set the css misscount to < 10
CRSPROCS="crsd.bin|ocssd.b|evmd.bin|evmlogge|racgimon|racge|racgmain|racgons.bin|ocls"

DB Process list for examination (max of first 8 characters - seperated by "|"):
Default: "_dbw|_smon|_pmon|_lgwr|_lmd|_lms|_lck|_lmon|_ckpt|_arc|_rvwr|asm_"
- To examine ALL oracle DB and ASM processes on the machine, set BGPROCS="ora|asm_" (not typically recommended)
BGPROCS="_dbw|_smon|_pmon|_lgwr|_lmd|_lms|_lck|_lmon|_ckpt|_arc|_rvwr|asm_"

For additional details, see the prw.sh script itself.

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/7318139/viewspace-977458/,如需轉載,請註明出處,否則將追究法律責任。

相關文章