Procwatcher: Script to Monitor and Examine DB and Clusterware Pro-459694.1
Procwatcher: Script. to Monitor and Examine Oracle DB and Clusterware Processes (文件 ID 459694.1)
In this Document
Purpose |
Scope |
Details |
DOWNLOAD PROCWATCHER |
Requirements |
Procwatcher Features |
Procwatcher is Ideal for: |
Procwatcher is Not Ideal for... |
Procwatcher User Commands |
Procwatcher Parameters |
Advanced Parameters |
References |
Applies to:
Oracle Database - Enterprise Edition - Version 10.2.0.2 to 12.1.0.1 [Release 10.2 to 12.1]Linux x86
HP-UX PA-RISC (64-bit)
IBM AIX on POWER Systems (64-bit)
Oracle Solaris on SPARC (64-bit)
HP-UX Itanium
Linux x86-64
Oracle Server Enterprise Edition - Version: 10.1 to 11.2
Purpose
Procwatcher is a tool to examine and monitor Oracle database and/or clusterware processes at an interval. The tool will collect stack traces of these processes using Oracle tools like oradebug short_stack and/or OS debuggers like pstack, gdb, dbx, or ladebug and collect SQL data if specified.
Scope
This tool is for Oracle representatives and DBAs looking to troubleshoot a problem further by monitoring processes. This tool can be used in conjunction with other tools or troubleshooting methods depending on the situation.
Details
# This script. will find clusterware and/or Oracle Background processes and collect
# stack traces for debugging. It will write a file called procname_pid_date_hour.out
# for each process. If you are debugging clusterware then run this script. as root.
# If you are only debugging Oracle background processes then you can run as
# root or oracle.
To install the script, simply download it put it in its own directory, unzip it, and give it execute permissions. Use the following link to download it:
DOWNLOAD PROCWATCHER
If you are in a clustered environment, you can "deploy" Procwatcher with "prw.sh deploy" to register with the clusterware, propagate to all nodes, and start on all nodes. There is also a deinstall option to deregister from the clusterware and remove the procwatcher directory. In a clustered environment, Procwatcher files will be written to GRID_HOME/log/procwatcher unless the PRWDIR parameter is set.
Requirements
- Must have /bin and /usr/bin in your $PATH
- Have your instance_name or db_name set in the oratab and/or set the $ORACLE_HOME env variable.(PRW searches the oratab for the SID it finds and if it can't find the SID in the oratab it will default to $ORACLE_HOME). Procwatcher cannot function properly if it cannot find an $ORACLE_HOME to use.
- Run Procwatcher as the oracle software owner if you are only troubleshooting homes/instances for that user. If you are troubleshooting clusterware processes (EXAMINE_CLUSTER=true or are troubleshooting for multiple oracle users) run as root.
- If you are monitoring the clusterware you must have the relevant OS debugger installed on your platform; PRW looks for:
Linux - /usr/bin/gdb
HP-UX and HP Itanium - /opt/langtools/bin/gdb64 or /usr/ccs/bin/gdb64
Sun - /usr/bin/pstack
IBM AIX - /bin/procstack or /bin/dbx
HP Tru64 - /bin/ladebug
Procwatcher Features
- Procwatcher collects stack traces for all processes defined using either oradebug short_stack or an OS debugger at a predefined interval if contentioin is found.
- PRW will generate wait chain, session wait, lock, and latch reports if problems are detected (look for pw_* reports in the PRW_DB_subdirectory).
- PRW will look for wait chains, wait events, lock, and latch contention and also dump stack traces of processes that are either waiting for non-idle wait events or waiting for or holding a lock or latch.
- PRW will dump wait chain, session wait, lock, latch, current SQL, process memory, and session history information into specific process files (look for prw_* files in the PRW_DB_subdirectory) for any processes or background processes when problems are detected.
- You can define how aggressive PRW is about getting information by setting parameters like THROTTLE, IDLECPU, and INTERVAL. You can tune these parameters to either get the most information possible or to reduce PRW's cpu impact. See below for more information about what each of these parameters does.
- If CPU usage gets too high on the machine (as defined by IDLECPU), PRW will sleep and wait for CPU utilization to go down.
- Procwatcher gets stack traces of ALL threads of a process (this is important for clusterware processes).
- The housekeeper process runs on a 5 minute loop and cleans up files older than the specified number of days (default is 7).
- If any SQL times out 90 seconds (by default) it will be disabled. At a later time the SQL can be re-tested. If the SQL times out 3 times it will be disabled for the life of Procwatcher. Any GV$ view that times out will automatically revert to the corresponding V$ view. Note that the GV$ view timeout is much lower. The logic is: it's not worth using GV$ views if they aren't fast...If oradebug shortstack is enabled and it times out or fails, the housekeeper process will re-enable shortstack if the test passes.
If you are on HP Itanium or HP-UX: Apply the fix for bug: 10158006 (or bug: 10287978 on 11.2.0.2) before monitoring the database with Procwatcher to fix a known short stack issue on HP. See Note: 1271173.1 for more information.
If you are on Solaris 10: Apply the fix for Solaris bt 6994922 ( see bug: 15677306 ) before monitoring the database with Procwatcher.
Procwatcher is Ideal for:
- Session level hangs or severe contention in the database/instance. See Note: 1352623.1
- Severe performance issues. See Note: 1352623.1
- Instance evictions and/or DRM timeouts.
- Clusterware or DB processes stuck or consuming high CPU (must set EXAMINE_CLUSTER=true and run as root for clusterware processes)
- ORA-4031 and SGA memory management issues. (Set sgamemwatch=diag or sgamemwatch=avoid4031 (not the default). See Note: 1355030.1
- ORA-4030 and DB process memory issues. (Set USE_SQL=true and process_memory=y).
- RMAN slowness/contention during a backup. (Set USE_SQL=true and rmanclient=y).
Procwatcher is Not Ideal for...
- Node evictions/reboots. In order to troubleshoot these you would have to enable Procwatcher for a process(es) that are capable of rebooting the machine. If the OS debugger suspends the processs for too long *that* could cause a reboot of the machine. I would only use Procwatcher for a node eviction/reboot if the problem was reproducing on a test system and I didn't care of the node got rebooted. Even in that case the INTERVAL would need to be set low (30) and many options would have to be turned off to get the cycle time low enough (EXAMINE_BG=false, USE_SQL=false, probably removing additional processes from the CLUSTERPROCS list).
- Non-severe database performance issues. AWR/ADDM/statspack are better options for this...
- Most installation or upgrade issues. We aren't getting data for this unless we are at a stage of the installation/upgrade where key processes are already started.
Procwatcher User Commands
To start Procwatcher:
Or if you want to start on all nodes in a clustered environment:
To stop Procwatcher: :
Or if you want to stop on all nodes in a clustered environment:
To check the status of Procwatcher:
To package up Procwatcher files to upload to support:
All user syntax available:
Usage: prw.sh
Verbs are:
deploy - Register Procwatcher in Clusterware and propagate to all nodes
start [all] - Start Procwatcher on local node, if 'all' is specified, start on all nodes
stop [all] - Stop Procwatcher on local node, if 'all' is specified, stop on all nodes
stat - Check the current status of Procwatcher
pack - Package up Procwatcher files (on all nodes) to upload to support
param - Check current Procwatcher parameters
deinstall - Deregister Procwatcher from Clusterware and remove
log [number] - See the last [number] lines of the procwatcher log file
log [runtime] - See contiuous procwatcher log file info - use Cntrl-C to break
help - What you are looking at...
Procwatcher Parameters
# Set EXAMINE_CLUSTER variable if you want to examine clusterware processes (default is false - or set to true):
# Note that if this is set to true you must deploy/run procwatcher as root unless using oracle restart
EXAMINE_CLUSTER=false
# Set EXAMINE_BG variable if you want to examine all BG processes (default is true - or set to false):
EXAMINE_BG=true
# Set permissions on Procwatcher files and directories (default: 777):
PRWPERM=777
# Set RETENTION variable to the number of days you want to keep historical procwatcher data (default: 7)
RETENTION=7
# Warning e-mails are sent to which e-mail addresses?
# "mail" must work on the unix server
# Example: WARNINGEMAIL=john.doe@oracle.com,jane.doe@oracle.com
WARNINGEMAIL=
######################## PERFORMANCE SETTINGS #########################
# Set INVERVAL to the number of seconds between runs (default 60):
# Probably should not set below 60 if EXAMINE_CLUSTER=true
INTERVAL=60
# Set THROTTLE to the max # of stack trace sessions or SQLs to run at once (default 5 - minimum 2):
THROTTLE=5
# Set IDLECPU to the percentage of idle cpu remaining before PRW sleeps (default 3 - which means PRW will sleep if the machine is more than 97% busy - check vmstat every 5 seconds)
IDLECPU=3
# Set SIDLIST to the list of SIDs you want to examine (default is derived - format example: "RAC1|ASM1|SID3")
# If setting for multiple instances for the same DB, specify each SID - example: "ASM1|ASM2|ASM3"
# Default: If root is starting prw, get all sids found running at the time prw was started.
# If another user is starting prw, get all sids found running owned by that user.
SIDLIST=
#######################################################################
Advanced Parameters
# Default is $GRID_HOME/log/procwatcher if clusterware is running and this is not set
# Default is the directory where prw.sh is run if no clusterware and this is not set
# Example: PRWDIR=/home/oracle/procwatcher
PRWDIR=
# SQL Control
# Set USE_SQL variable if you want to use SQL to troubleshoot (default is true - or set to false):
USE_SQL=true
# Set to 'y' to enable SQL, 'n' to disable
sessionwait=y
lock=y
latchholder=y
gesenqueue=y
waitchains=y
rmanclient=n
process_memory=n
sqltext=y
ash=y
# SGA Memory watch (default: off). Valid values are:
# ff = no SGA memory diagnostics
# diag = collect SGA memory diagnostics
# avoid4031 = collect SGA memory diagnostics and flush the shared pool to avoid ORA-4031
# if memory fragmentation occurs
# Note that setting sgamemwatch to 'diag' or 'avoid4031' will query x$ksmsp
# which may increase shared pool latch contention in some environments.
# Please keep this in mind and test in a test environment
# with load before using this setting in production.
sgamemwatch=off
# Levels for debugging before a flush if sgamemwatch=avoid4031 (default: 0 for both)
heapdump_level=0
lib_cache_dump_level=0
# Suspect Process Threshold (if # of suspect procs >
# 1 = Get query and stack output if there is at least 1 suspect proc (default)
# 0 = Get all diags each cycle
suspectprocthreshold=1
# Warning Process Threshold (if # of suspect procs >
warningprocthreshold=10
# Levels for debugging if warningprocthreshold is reached (default: 0 for both)
# If using this feature recommended values are (hanganalyze_level=3, systemstate_level=258)
# Flood control limits the dumps to a maximum of 3 per hour
hanganalyze_level=0
systemstate_level=0
# Cluster Process list for examination (seperated by "|"):
# Default: "crsd.bin|evmd.bin|evmlogge|racgimon|racge|racgmain|racgons.b|ohasd.b|oraagent|oraroota|gipcd.b|mdnsd.b|gpnpd.b|gnsd.bi|diskmon|
octssd.b|ons -d|tnslsnr"
# - The processes oprocd, cssdagent, and cssdmonitor are intentionally left off the list because of high reboot danger.
# - The ocssd.bin process is off the list due to moderate reboot danger. Only add this if your css misscount is the
# - default or higher, your machine is not highly loaded, and you are aware of the tradeoff.
CLUSTERPROCS="crsd.bin|evmd.bin|evmlogge|racgimon|racge|racgmain|racgons.b|ohasd.b|oraagent|oraroota|gipcd.b|mdnsd.b|gpnpd.b|
gnsd.bi|diskmon|octssd.b|ons -d|tnslsnr"
# DB Process list for examination (seperated by "|"):
# Default: "_dbw|_smon|_pmon|_lgwr|_lmd|_lms|_lck|_lmon|_ckpt|_arc|_rvwr|_gmon|_lmhb|_rms0"
# - To examine ALL oracle DB and ASM processes on the machine, set BGPROCS="ora|asm" (not typically recommended)
BGPROCS="_dbw|_smon|_pmon|_lgwr|_lmd|_lms|_lck|_lmon|_ckpt|_arc|_rvwr|_gmon|_lmhb|_rms0"
# Set to 'y' to enable gv$views, set to 'n' to disable gv$ views
# (makes queries a little faster in RAC but can't see other instances in reports)
# Default is derived based on if waitchains is used
use_gv=
# Set to 'y' to get pmap data for clusterware processes.
# Only available on Linux and Solaris
use_pmap=n
# DB Versions enabled, set to 'y' or 'n' (this will override the SIDLIST setting)
VERSION_10_1=y
VERSION_10_2=y
VERSION_11_1=y
VERSION_11_2=y
# Should we fall back to an OS debugger if oradebug short_stack fails?
# OS debuggers are less safe per bug 6859515 so default is false (or set to true)
FALL_BACK_TO_OSDEBUGGER=false
# Number of oradebug shortstacks to get on each pass
# Will automatically lower if stacks are taking too long
STACKCOUNT=3
# Point this to a custom .sql file for Procwatcher to capture every cycle.
# Don't use big or long running SQL. The .sql file must be executable.
# Only 1 SQL per file.
# Example: CUSTOMSQL1=/home/oracle/test.sql
CUSTOMSQL1=
CUSTOMSQL2=
CUSTOMSQL3=
References
NOTE:783456.1 - CRS Diagnostic Data Gathering: A Summary of Common tools and their UsageNOTE:1352623.1 - How To Troubleshoot Database Contention With Procwatcher
NOTE:1355030.1 - How To Troubleshoot ORA-4031's and Shared Pool Issues With Procwatcher
NOTE:1271173.1 - Process Hangs After Issuing Oradebug Short_Stack on HP Platforms
NOTE:1353073.1 - Exadata Diagnostic Collection Guide
NOTE:559339.1 - Diagnostic Tools Catalog
NOTE:1389167.1 - Get Proactive with Oracle Database
NOTE:1428210.1 - Troubleshooting Database Contention With V$Wait_Chains
NOTE:396940.1 - Troubleshooting and Diagnosing ORA-4031 Error [Video]
NOTE:1477599.1 - Best Practices: Proactive Data Collection for Performance Issues
NOTE:430473.1 - ORA-4031 Common Analysis/Diagnostic Scripts [Video]
NOTE:1096952.1 - Master Note for Real Application Clusters (RAC) Oracle Clusterware and Oracle Grid Infrastructure
NOTE:452358.1 - How to Collect Diagnostics for Database Hanging Issues
|
- Procwatcher Version 12.1.13.10.1(35.32 KB)
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/17252115/viewspace-775308/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 10.monitor web data scriptWeb
- [Oracle Script] select db parameterOracle
- [Shell] Linux monitor OS process and DB sessionLinuxSession
- DB2 Event Monitor使用與查詢DB2
- 6.Monitor Linux OS process and DB sessionLinuxSession
- linux_db_script ---014Linux
- DB中備份每一張table script
- 關於Oracle DB中SQL Script的註釋(REM,--,/* */)OracleSQLREM
- oracle clusterwareOracle
- ORACLE RAC clusterwareOracle
- Table Monitor
- R12 help--diagnostics--examine gives function not available errorFunctionAIError
- Script
- 請說說`<script>`、`<script async>`和`<script defer>`的區別
- Oracle Clusterware的心跳Oracle
- Oracle Clusterware工具集Oracle
- Clusterware 後臺程式
- 【OCP最新題庫解析(052)--題7】 Examine theses SQL statements and outputSQL
- 使用Procwatcher監控Oracle資料庫鎖定Contention(上)Oracle資料庫
- [Shell] monitor filesystem
- 2788647047_monitor
- Script Browser & Script Analyzer 1.3更新發布
- Oracle clusterware組成概述Oracle
- 安裝clusterware問題
- HACMP & Oracle Clusterware 對比ACMOracle
- Shell Script
- shell script
- oracle scriptOracle
- script win
- bat scriptBAT
- Postman的Monitor功能Postman
- 【SQL】Oracle SQL monitorSQLOracle
- Health Monitor簡介
- Monitor ASM DG IOASM
- aix_system_monitorAI
- clone grid INfrastructure Home and clusterwareASTStruct
- Oracle Clusterware and Oracle Grid InfrastructureOracleASTStruct
- script標籤