如何診斷oracle資料庫執行緩慢或hang住的問題

eric0435發表於2014-03-06

為了診斷oracle執行緩慢的問題首先要決定收集哪些診斷資訊,可以採取下面的診斷方法:
1.資料庫執行緩慢這個問題是常見還是在特定時間出現
如果資料庫執行緩慢是一個常見的問題那麼可以在問題出現的時候收集這個時期的awr或者statspack報告(通常收集時間間隔是一個小時).生成awr報告的方法如下:
awr是透過sys使用者來收集持久系統效能統計資訊並且這些資訊儲存在sysaux表空間.預設情況下快照是一個小時生成一次並且保留7天.awr報告輸出了基於指定快照之間的一系列的統計資訊用於效能分析和調查其它問題.
執行基本的報告
可以執行下面的指令碼來生成一個awr報告:
$ORACLE_HOME/rdbms/admin/awrrpt.sql

可以根據自己收集awr報告的原因來決定生成一個快照的時間間隔也可以指定生成awr報告的格式(text或html).

生成各種型別的awr報告
可以根據各種要求來執行各種sql指令碼來生成各種型別的awr報告.每一種報告都有兩種格式(txt或html):
awrrpt.sql
顯示指定快照範圍內的各種統計資訊

awrrpti.sql
顯示一個特定資料庫和例項中指定快照範圍內的各種統計資訊

awrsqrpt.sql
顯示一個指定快照範圍內的一個特定的sql語句的統計資訊.執行這個報告是為了檢查或調查一個特定sql語句的效能

awrsqrpi.sql
顯示一個特定sql在指定快照範圍內的的統計資訊.

awrddrpt.sql
比較在兩個選擇的時間間隔期間內詳細的效能資料和配置情況

awrddrpi.sql
在一個特定的資料庫和平共處例項中比較在兩個選擇的時間間隔期間內詳細的效能數和配置情況

各種awr相關的操作
怎樣修改awr快照的設定:

BEGIN
  DBMS_WORKLOAD_REPOSITORY.modify_snapshot_settings(
    retention => 43200,        -- Minutes (43200 = 30 Days).
                               -- Current value retained if NULL.
    interval  => 30);          -- Minutes. Current value retained if NULL.
END;
/

建立一個awr基線:

BEGIN
  DBMS_WORKLOAD_REPOSITORY.create_baseline (
    start_snap_id => 10,
    end_snap_id   => 100,
    baseline_name => 'AWR First baseline');
END;
/

在oracle11G中引入了一個新的dbms_workload_repository.create_baseline_template過程來建立一個awr基線模板

BEGIN
DBMS_WORKLOAD_REPOSITORY.CREATE_BASELINE_TEMPLATE (
start_time => to_date('&start_date_time','&start_date_time_format'),
end_time => to_date('&end_date_time','&end_date_time_format'),
baseline_name => 'MORNING',
template_name => 'MORNING',
expiration => NULL ) ;
END;
/

"expiration=>NULL"這意味著這個基線將永遠保持有效.

刪除一個awr基線

BEGIN
    DBMS_WORKLOAD_REPOSITORY.DROP_BASELINE (
    baseline_name => 'AWR First baseline');
END;
/

也能從一個老的資料庫中刪除一個awr基線:

BEGIN
DBMS_WORKLOAD_REPOSITORY.DROP_BASELINE (baseline_name => 'peak baseline',
cascade => FALSE, dbid => 3310949047);
END;
/

刪除awr快照:

BEGIN
  DBMS_WORKLOAD_REPOSITORY.drop_snapshot_range(
(low_snap_id=>40,
High_snap_id=>80);
END;
/

也可能基於報告時間期間對建立和刪除的awr基線指定一個模板:

BEGIN
DBMS_WORKLOAD_REPOSITORY.CREATE_BASELINE_TEMPLATE (
day_of_week => 'MONDAY',
hour_in_day => 9,
duration => 3,
start_time => to_date('&start_date_time','&start_date_time_format'),
end_time => to_date('&end_date_time','&end_date_time_format'),
baseline_name_prefix => 'MONDAY_MORNING'
template_name => 'MONDAY_MORNING',
expiration => 30 );
END;
/

將會在'&start_date_time'到'&end_date_time'期間的每一個星期一都會生成基線

手動生成的一個awr快照:

BEGIN
  DBMS_WORKLOAD_REPOSITORY.create_snapshot();
END;
/

工作負載資料檔案庫檢視:
V$ACTIVE_SESSION_HISTORY - 顯示歷史活動會話資訊每秒抽樣一樣
V$METRIC - 顯示度量標準資訊
V$METRICNAME - 顯示與每個度量標準組相關的度量標準
V$METRIC_HISTORY - 顯示歷史度量標準
V$METRICGROUP - 顯示所有的度量標準組
DBA_HIST_ACTIVE_SESS_HISTORY - 顯示歷史活動會話的詳細資訊
DBA_HIST_BASELINE - 顯示基線資訊
DBA_HIST_DATABASE_INSTANCE - 顯示資料庫環境資訊
DBA_HIST_SNAPSHOT - 顯示快照資訊
DBA_HIST_SQL_PLAN - 顯示sql執行計劃
DBA_HIST_WR_CONTROL - 顯示awr設定情況

如果資料庫執行緩慢在特定時間出現那麼可以當問題存在時生成一個awr或statspack報告,報告的時間間隔包含了問題出現的時間.另外為了比較可以收集沒有出現問題而時間間隔相同的資料庫正常執行的報告這樣可以對報告進行比較.

2.資料庫緩慢它影響的是一個會話,幾個會話還是所有會話
如果資料庫緩慢它影響的是一個會話或幾個會話可以對這個會話或幾個會話進行10046跟蹤
如果資料庫緩慢它影響的是所有會話可以收集awr或statspack報告

執行10046跟蹤的方法如下:
收集10046跟蹤檔案
10046事件是一種標準的方法用來對oracle會話收集擴充套件的sql_trace資訊
對於查詢效能問題來說通常要求記錄查詢的等待和繫結變數資訊.這可以使用級別為12的10046跟蹤來完成.下面的例子說明了在各種情況下設定10046事件.

跟蹤檔案的位置
在oracle11g及以上版本中引入了新的診斷架構,跟蹤和核心檔案儲存的位置由diagnostic_dest初始化引數來控制.可以使用下面的命令來顯示:

sys@JINGYONG> show parameter diagnostic_dest

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
diagnostic_dest                      string      /u01/app/oracle

注意:在有些例子中可能設定了'tracefile_identifier'來幫助找到輸出的跟蹤檔案

會話跟蹤
可以在使用者會話執行sql語句之前對會話啟用跟蹤,在會話級別收集10046跟蹤

sys@JINGYONG> alter session set timed_statistics=true;

會話已更改。

sys@JINGYONG> alter session set statistics_level=all;

會話已更改。

sys@JINGYONG> alter session set max_dump_file_size=unlimited;

會話已更改。

sys@JINGYONG> alter session set events '10046 trace name context forever,level 1
2';

會話已更改。

sys@JINGYONG> select * from dual;

D
-
X

sys@JINGYONG>exit

如果會話沒有退出可以執行以下語句來禁用10046跟蹤

sys@JINGYONG> alter session set events '10046 trace name context off';

會話已更改。


sys@JINGYONG> select * from v$diag_info ;

   INST_ID NAME
---------- ----------------------------------------------------------------
VALUE
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
----------------------------------------
         1 Diag Enabled
TRUE

         1 ADR Base
/u01/app/oracle

         1 ADR Home
/u01/app/oracle/diag/rdbms/jingyong/jingyong

         1 Diag Trace
/u01/app/oracle/diag/rdbms/jingyong/jingyong/trace

         1 Diag Alert
/u01/app/oracle/diag/rdbms/jingyong/jingyong/alert

         1 Diag Incident
/u01/app/oracle/diag/rdbms/jingyong/jingyong/incident

         1 Diag Cdump
/u01/app/oracle/diag/rdbms/jingyong/jingyong/cdump

         1 Health Monitor
/u01/app/oracle/diag/rdbms/jingyong/jingyong/hm

         1 Default Trace File
/u01/app/oracle/diag/rdbms/jingyong/jingyong/trace/jingyong_ora_2572_10046.trc

         1 Active Problem Count
0

         1 Active Incident Count
0


已選擇11行。

sys@JINGYONG> select * from v$diag_info where name='Default Trace File';

   INST_ID NAME
---------- ----------------------------------------------------------------
VALUE
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
----------------------------------------
         1 Default Trace File
/u01/app/oracle/diag/rdbms/jingyong/jingyong/trace/jingyong_ora_2572_10046.trc

注意:如果會話不是徹底的關閉和禁用跟蹤那麼重要的跟蹤資訊可能會從跟蹤檔案中丟失.

注意:這裡statistics_level=all因此它會在這種情況下收集一定程度的統計資訊.這個引數有三個引數值.all,typical,basic.為了診斷效能問題會要求獲得一定程度的統計資訊.設定為all可能是不必要的但可以使用typical以此來獲得全面的診斷資訊.

跟蹤一個已經啟動的程式
如果要跟蹤一個已經存在的會話可以使用oradebug來連線到會話初始化10046跟蹤
1.透過某種方法來確定要被跟蹤的會話
例如在sql*plus中啟動一個會話然後找到這個會話的作業系統進行id(spid):
select p.PID,p.SPID,s.SID
from v$process p,v$session s
where s.paddr = p.addr
and s.sid = &SESSION_ID
/

SPID是作業系統程式識別符號
PID是oracle程式識別符號
如果你不知道要不得被跟蹤會話的sid可以使用類似於下面的查詢來幫助你識別這個會話:

column line format a79
set heading off
select 'ospid: ' || p.spid ||'  pid: '||p.pid || ' # ''' ||s.sid||','||s.serial#||''' '||
  s.osuser || ' ' ||s.machine ||' '||s.username ||' '||s.program line
from v$session s , v$process p
where p.addr = s.paddr
and s.username <> ' ';
執行結果如下:
sys@JINGYONG> column line format a79
sys@JINGYONG> set heading off
sys@JINGYONG> select 'ospid: ' || p.spid || ' # ''' ||s.sid||','||s.serial#||'''
 '||
  2    s.osuser || ' ' ||s.machine ||' '||s.username ||' '||s.program line
  3  from v$session s , v$process p
  4  where p.addr = s.paddr
  5  and s.username <> ' ';

ospid: 2529 # '30,32' Administrator WORKGROUP\JINGYONG SYS sqlplus.exe

注意:在oracle12c中對於多執行緒程式,在v$process檢視中加入了新的列stid來找到特定的執行緒.因為oracle會組合多個程式到一個單獨的ospid中.為了找到這個特定的執行緒使用下面的語法:
oradebug setospid

2.當確定程式的作業系統程式ID後然後可以使用下面的語句來初始化跟蹤:
假設要被跟蹤程式的作業系統程式ID是2529

SQL>connect / as sysdba
sys@JINGYONG> oradebug setospid 2529
Oracle pid: 21, Unix process pid: 2529, image: oracle@jingyong
sys@JINGYONG> oradebug unlimit
已處理的語句
sys@JINGYONG> oradebug event 10046 trace name context forever,level 12
已處理的語句
sys@JINGYONG> select * from dual;

X

sys@JINGYONG> oradebug event 10046 trace name context off
已處理的語句
sys@JINGYONG> oradebug tracefile_name
/u01/app/oracle/diag/rdbms/jingyong/jingyong/trace/jingyong_ora_2529.trc

注意:連線到一個會話也可以使用oradebug setorapid
在這種情況下PID(oracle程式識別符號)將被使用(而不是使用SPID):
sys@JINGYONG> oradebug setorapid 21
Oracle pid: 21, Unix process pid: 2529, image: oracle@jingyong
從顯示的資訊可知道使用oradebug setorapid 21與oradebug set0spid 2529是一樣的
sys@JINGYONG> oradebug unlimit
已處理的語句
sys@JINGYONG> oradebug event 10046 trace name context forever,level 12
已處理的語句
sys@JINGYONG> select sysdate from dual;

11-11月-13

sys@JINGYONG> oradebug event 10046 trace name context off
已處理的語句
sys@JINGYONG> oradebug tracefile_name
/u01/app/oracle/diag/rdbms/jingyong/jingyong/trace/jingyong_ora_2529.trc

注意:在oracle12c中對於多執行緒程式,在v$process檢視中加入了新的列stid來找到特定的執行緒.因為oracle會組合多個程式到一個單獨的ospid中.為了找到這個特定的執行緒使用下面的語法:
oradebug setospid

跟蹤產生的跟蹤檔名稱類似於_.trc

例項級別的跟蹤
注意:在例項級別啟用跟蹤因為每一個會話都會被跟蹤這樣對效能是有影響的
在設定這個跟蹤引數後產生的每一個會話都會被跟蹤斷開的會話將不會被跟蹤
設定系統級別的10046跟蹤是用於當出現了一個問題會話但不能提前識別這個會話的情況下.在這種情況下跟蹤可以被短時間地啟用,這個問題可能會記錄到跟蹤檔案中然後禁用跟蹤在生成的跟蹤檔案中找到這個問題的原因

啟用系統級別的10046跟蹤:
alter system set events '10046 trace name context forever,level 12';
對所有會話禁有系統級別的10046跟蹤:
alter system set events '10046 trace name context off';

初始化引數的設定:
當例項重新啟動後對每一個會話啟用10046跟蹤.
event="10046 trace name context forever,level 12"
要禁用例項級別的10046跟蹤可以刪除這個初始化引數然後重啟例項或者使用alter system語句
alter system set events '10046 trace name context off';

編寫登入觸發器
在有些情況下可能要跟蹤特定使用者的會話活動在這種情況下可以編寫一個登入觸發器來實現例如:

CREATE OR REPLACE TRIGGER SYS.set_trace
AFTER LOGON ON DATABASE
WHEN (USER like '&USERNAME')
DECLARE
lcommand varchar(200);
BEGIN
EXECUTE IMMEDIATE 'alter session set tracefile_identifier=''From_Trigger''';
EXECUTE IMMEDIATE 'alter session set statistics_level=ALL';
EXECUTE IMMEDIATE 'alter session set max_dump_file_size=UNLIMITED';
EXECUTE IMMEDIATE 'alter session set events ''10046 trace name context forever, level 12''';
END set_trace;
/

注意:為了能跟蹤會話使用者執行觸發器需要顯式的被授予'alter session'許可權:
grant alter session to username;

使用SQLT來收集跟蹤資訊
什麼是SQLTXPLAIN(SQLT)
SQLTXPLAIN也叫作SQLT,它是由專業的oracle服務技術中心提供了一個工具.SQLT輸入一個SQL語句後它會輸出一組診斷檔案.這些診斷檔案會被用來診斷效能低下的sql語句.SQLT連線到資料庫並收集執行,基於成本最佳化的統計資訊,方案物件後設資料,效能統計,配置引數和類似影響SQL效能的元素.

使用SQLTXPLAIN的Xecute選項可以生成10046跟蹤作為SQLT輸出的一部分.

使用dbms_monitor包來進行跟蹤
dbms_monitor是一個新的跟蹤包.跟蹤基於特定的客戶端識別符號或者服務名,模組名和操作名的組合形式來啟用診斷和工作負載管理.在有些情況下可能會生成多個跟蹤檔案(例如對於一個模組啟用服務級別的跟蹤)使用新的trcsess工具來掃描所有的跟蹤檔案並將它們合成一個跟蹤檔案.在合併這一組跟蹤檔案後可以使用標準跟蹤檔案分析方法進行分析

檢視啟用的跟蹤
可以查詢dba_enabled_traces來檢測什麼跟蹤被啟用了.
例如:

sys@JINGYONG>select trace_type, primary_id, QUALIFIER_ID1, waits, binds
             from DBA_ENABLED_TRACES;

TRACE_TYPE                   PRIMARY_ID  QUALIFIER_ID1           WAITS        BINDS
---------------------- ---------------   ------------------      --------    -------
SERVICE_MODULE         SYS$USERS        SQL*Plus                 TRUE        FALSE
CLIENT_ID              HUGO                                      TRUE        FALSE
SERVICE                v101_DGB                                  TRUE        FALSE

在這個資料庫中已經啟用了三個不同的跟蹤狀態
1.第一行記錄顯示將會對在SQL*Plus中執行的所有sql語句進行跟蹤
2.第二行記錄顯示將會對帶有客戶端識別符號"HUGO'的所有會話進行跟蹤
3.第三行記錄顯示將會對使用服務"v101_DGB'連線到資料庫的所有程式進行跟蹤

session_trace_enable函式
可以使用session_trace_enable過程來對本地例項的一個指定的資料庫會話啟用sql跟蹤.
語法如下:
啟用sql跟蹤
dbms_monitor.session_trace_enable(session_id => x, serial_num => y,
waits=>(TRUE|FALSE),binds=>(TRUE|FALSE) );

禁止sql跟蹤
dbms_monitor.session_trace_disable(session_id => x, serial_num => y);
其中waits的預設值是true,binds的預設值是false.

可以從v$session檢視中查詢會話id和序列號

SQL> select serial#, sid , username from v$session;

SERIAL#             SID  USERNAME
-------           -----  --------------
  1                 131
 18                 139
  3                 140
 11                 143     SCOTT

然後可以使用下面的命令來對指定的會話啟用跟蹤
SQL> execute dbms_monitor.session_trace_enable(143,11);
跟蹤狀態在資料庫重啟後就會被刪除可以查詢dba_enabled_traces檢視看到沒有記錄
sys@JINGYONG> oradebug tracefile_name
/u01/app/oracle/diag/rdbms/jingyong/jingyong/trace/jingyong_ora_2529.trc
sys@JINGYONG> select trace_type,primary_id,qualifier_id1,waits,binds
2 from dba_enabled_traces;

未選定行

當會話斷開或者使用下面的命令可以禁止跟蹤
SQL> execute dbms_monitor.session_trace_disable(143,11);

client_id_trace_enable函式
在多層架構環境中,一個請求從一個終端客戶端透過中間層分發到不同的資料庫會話.這意味著終端客戶端與資料庫會話的聯絡不是靜態的.在oracle10g之前沒有方法可以對一個客戶端跨不同資料庫會話進行跟蹤.端到端的跟蹤可以透過一個新的屬性client_identifier來標識它是唯一標識一個特定的終端客戶端.這個客戶端識別符號對應於v$session檢視中的client_identifier列.透過系統上下文也可以檢視.
語法如下:
啟用跟蹤
execute dbms_monitor.client_id_trace_enable ( client_id =>'client x',
waits => (TRUE|FALSE), binds => (TRUE|FALSE) );

禁止跟蹤
execute dbms_monitor.client_id_trace_disable ( client_id =>'client x');
其中waits的預設值是true,binds的預設值是false.

例如:
可以使用dbms_session.set_identifier函式來設定client_identifier

sys@JINGYONG> exec dbms_session.set_identifier('JY');

PL/SQL 過程已成功完成。

sys@JINGYONG> select sys_context('USERENV','CLIENT_IDENTIFIER') client_id from dual;

JY


sys@JINGYONG> select client_identifier client_id from v$session where sid=30;

JY

sys@JINGYONG> exec dbms_monitor.client_id_trace_enable('JY');

PL/SQL 過程已成功完成。

使用查詢來檢查跟蹤是否已經啟用

sys@JINGYONG> select primary_id,qualifier_id1,waits,binds
  2  from dba_enabled_traces where trace_type='CLIENT_ID';
PRIMARY_ID         QUALIFIER_ID1         WAITS    BINDS
----------------   --------------        -------- --------
JY                                       TRUE     FALSE

這個跟蹤在資料庫重啟之後還是有效的你得呼叫函式來禁用.
sys@JINGYONG> exec dbms_monitor.client_id_trace_disable('JY');

PL/SQL 過程已成功完成。
檢查生成的跟檔案

Trace file /u01/app/oracle/diag/rdbms/jingyong/jingyong/trace/jingyong_ora_2529.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/11.2.0/db
System name:	Linux
Node name:	jingyong
Release:	2.6.18-164.el5
Version:	#1 SMP Tue Aug 18 15:51:54 EDT 2009
Machine:	i686
Instance name: jingyong
Redo thread mounted by this instance: 1
Oracle process number: 21
Unix process pid: 2529, image: oracle@jingyong


*** 2013-11-11 11:31:56.737
*** SESSION ID:(30.32) 2013-11-11 11:31:56.737
*** CLIENT ID:() 2013-11-11 11:31:56.737
*** SERVICE NAME:(jingyong) 2013-11-11 11:31:56.737
*** MODULE NAME:(sqlplus.exe) 2013-11-11 11:31:56.737
*** ACTION NAME:() 2013-11-11 11:31:56.737

PARSING IN CURSOR #8 len=96 dep=0 uid=0 oct=3 lid=0 tim=1384150635839986 hv=3018843459 ad='275fa5ec' sqlid='3gg23wktyzta3'
select primary_id,qualifier_id1,waits,binds
from dba_enabled_traces where trace_type='CLIENT_ID'
END OF STMT

在啟用跟蹤後執行的語句被記錄到了跟蹤檔案中.

sys@JINGYONG> select primary_id,qualifier_id1,waits,binds
  2  from dba_enabled_traces where trace_type='CLIENT_ID';

未選定行

當你使用MTS時有時將會生成多個跟蹤檔案,不同的共享伺服器程式能執行sql語句這就將會生成多個跟蹤檔案.對於RAC
環境也是一樣.

serv_mod_act_trace_enable函式
端到端跟蹤對於使用MODULE,ACTION,SERVICES標識的應用程式能夠進行有效地管理和計算其工作量.service名,module和
action名提供了一種方法來識別一個應用程式中重要的事務.
你可以使用serv_act_trace_enable過程來對由一組service,module和action名指定的全域性會話啟用sql跟蹤,除非指定了特定
的例項名.對於一個會話的service名,module名與v$session檢視中的service_name和module列相對應.
語句如下:
啟用跟蹤
execute dbms_monitor.serv_mod_act_trace_enable('Service S', 'Module M', 'Action A',
waits => (TRUE|FALSE), binds => (TRUE|FALSE), instance_name => 'ORCL' );

禁止跟蹤
execute dbms_monitor.serv_mod_act_trace_disable('Service S', 'Module M', 'Action A');
其中waits的預設值是true,binds的預設值是false,instance_name的預設值是null.

例如想要對在資料庫伺服器使用SQL*Plus執行的所有sql語句進行跟蹤可以執行以下命令:

sys@JINGYONG> select module,service_name from v$session where sid=25;
MODULE                                      SERVICE_NAME
-----------------------------               ---------------------
sqlplus@jingyong (TNS V1-V3)                SYS$USERS

sys@JINGYONG> exec dbms_monitor.serv_mod_act_trace_enable('SYS$USERS','sqlplus@j
ingyong (TNS V1-V3)');

PL/SQL 過程已成功完成。

sys@JINGYONG> select primary_id,qualifier_id1,waits,binds
  2  from dba_enabled_traces
  3  where trace_type='SERVICE_MODULE';
PRIMARY_ID       QUALIFIER_ID1                WAITS    BINDS
---------------  -------------------          -------- --------
SYS$USERS        sqlplus@jingyong (TNS V1-V3) TRUE     FALSE


啟用跟蹤後我們執行一個測試語句
SQL> select 'x' from dual;

'
-
x
檢查生成的跟蹤檔名
SQL> select * from v$diag_info where name='Default Trace File';

   INST_ID NAME
---------- ----------------------------------------------------------------
VALUE
--------------------------------------------------------------------------------
         1 Default Trace File
/u01/app/oracle/diag/rdbms/jingyong/jingyong/trace/jingyong_ora_4411.trc

檢視跟蹤內容如下

trace file /u01/app/oracle/diag/rdbms/jingyong/jingyong/trace/jingyong_ora_4411.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/11.2.0/db
System name:	Linux
Node name:	jingyong
Release:	2.6.18-164.el5
Version:	#1 SMP Tue Aug 18 15:51:54 EDT 2009
Machine:	i686
Instance name: jingyong
Redo thread mounted by this instance: 1
Oracle process number: 24
Unix process pid: 4411, image: oracle@jingyong (TNS V1-V3)


*** 2013-11-11 14:34:00.971
*** SESSION ID:(25.412) 2013-11-11 14:34:00.972
*** CLIENT ID:() 2013-11-11 14:34:00.972
*** SERVICE NAME:(SYS$USERS) 2013-11-11 14:34:00.972
*** MODULE NAME:(sqlplus@jingyong (TNS V1-V3)) 2013-11-11 14:34:00.972
*** ACTION NAME:() 2013-11-11 14:34:00.972

WAIT #1: nam='SQL*Net message from client' ela= 152965072 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1384151640937525
CLOSE #1:c=1000,e=521,dep=0,type=0,tim=1384151640973430
=====================
PARSING IN CURSOR #1 len=20 dep=0 uid=0 oct=3 lid=0 tim=1384151640977682 hv=2740543121 ad='275fa9e4' sqlid='04vfkrajpkrnj'
select 'x' from dual

我們執行的測試語句被記錄了在跟蹤檔案中.

sys@JINGYONG> exec dbms_monitor.serv_mod_act_trace_disable('SYS$USERS','sqlplus@
jingyong (TNS V1-V3)');

PL/SQL 過程已成功完成。

sys@JINGYONG> select primary_id,qualifier_id1,waits,binds
  2  from dba_enabled_traces
  3  where trace_type='SERVICE_MODULE';

未選定行

使用trcsess來合併跟蹤檔案
從某些跟蹤操作中會得到多個跟蹤檔案.在oracle10g之前的版本中你得手動將這些跟蹤檔案合併到一起.現在可以使用trcsess工具來幫你合併這些跟蹤檔案.
語句如下:
trcsess [output=] [session=] [clientid=] [service=] [action=] [module=]

output= output destination default being standard output.
session= session to be traced.
Session id is a combination of session Index & session serial number e.g. 8.13.


clientid= clientid to be traced.
service= service to be traced.
action= action to be traced.
module= module to be traced.

Space separated list of trace files with wild card '*' suppor
ted.

[oracle@jingyong trace]$ trcsess output=jingyong_ora_88888888.trc service=jingyong jingyong_ora_2529.trc jingyong_ora_4411.trc
[oracle@jingyong trace]$ ls -lrt jingyong_ora_88888888.trc
-rw-r--r-- 1 oracle oinstall 16219 Nov 11 14:59 jingyong_ora_88888888.trc

dbms_application_info
可以在過程開始一個事務之前使用dbms_application_info.set*過程來註冊一個事務名/客戶端資訊/模組名為以後檢查效能來使用.你應該對以後可能消耗你最多系統資源的活動事務進行指定.
dbms_application_info包有以下過程
SET_CLIENT_INFO ( client_info IN VARCHAR2 );
SET_ACTION ( action_name IN VARCHAR2 );
SET_MODULE ( module_name IN VARCHAR2, action_name IN VARCHAR2 );

例如
sys@JINGYONG> create table emp as select * from scott.emp where 1=0;

表已建立。

sys@JINGYONG> exec dbms_application_info.set_module(module_name=>'add_emp',actio
n_name=>'insert into emp');

PL/SQL 過程已成功完成。

sys@JINGYONG> insert into emp select * from scott.emp;

已建立14行。

sys@JINGYONG> commit;

提交完成。

sys@JINGYONG> exec dbms_application_info.set_module(null,null);

PL/SQL 過程已成功完成。
下面查詢v$sqlarea檢視使用module和action列進行查詢
sys@JINGYONG> select sql_text from v$sqlarea where module='add_emp';

insert into emp select * from scott.emp

sys@JINGYONG> select sql_text from v$sqlarea where action='insert into emp';

insert into emp select * from scott.emp

declare
l_client varchar2(100);
l_mod_name varchar2(100);
l_act_name varchar2(100);
begin
dbms_application_info.read_client_info(l_client);
dbms_application_info.read_module(l_mod_name,l_act_name);
dbms_output.put_line(l_client);
dbms_output.put_line(l_mod_name);
end;

dbms_session包:只能跟蹤當前會話,不能指定會話。
跟蹤當前會話:
SQL> exec dbms_session.set_sql_trace(true);
SQL> 執行sql
SQL> exec dbms_session.set_sql_trace(false);
dbms_session.set_sql_trace相當於alter session set sql_trace,從生成的trace檔案可以明確地看alter session set sql_trace語句。
使 用dbms_session.session_trace_enable過程,不僅可以看到等待事件資訊還可以看到繫結變數資訊,相當於alter session set events '10046 trace name context forever, level 12';語句,從生成的trace檔案可以確認。
SQL> exec dbms_session.session_trace_enable(waits=>true,binds=>true);
SQL> 執行sql
SQL> exec dbms_session.session_trace_enable(); --This procedure resets the session-level SQL trace for the session from which it was called.

dbms_support包:不應該使用這種方法,非官方支援。
系統預設沒有安裝這個包,可以手動執行$ORACLE_HOME/rdbms/admin/bmssupp.sql指令碼來建立該包。
SQL> desc dbms_support
FUNCTION MYSID RETURNS NUMBER
FUNCTION PACKAGE_VERSION RETURNS VARCHAR2
PROCEDURE START_TRACE
Argument Name Type In/Out Default?
------------------------------ ----------------------- ------ --------
WAITS BOOLEAN IN DEFAULT
BINDS BOOLEAN IN DEFAULT
PROCEDURE START_TRACE_IN_SESSION
Argument Name Type In/Out Default?
------------------------------ ----------------------- ------ --------
SID NUMBER IN
SERIAL NUMBER IN
WAITS BOOLEAN IN DEFAULT
BINDS BOOLEAN IN DEFAULT
PROCEDURE STOP_TRACE
PROCEDURE STOP_TRACE_IN_SESSION
Argument Name Type In/Out Default?
------------------------------ ----------------------- ------ --------
SID NUMBER IN
SERIAL NUMBER IN
SQL> select dbms_support.package_version from dual;
PACKAGE_VERSION
--------------------------------------------------------------------------------
DBMS_SUPPORT Version 1.0 (17-Aug-1998) - Requires Oracle 7.2 - 8.0.5
SQL> select dbms_support.mysid from dual;
MYSID
----------
292
SQL> select * from v$mystat where rownum=1;
SID STATISTIC# VALUE
---------- ---------- ----------
292 0 1
跟蹤當前會話:
SQL> exec dbms_support.start_trace
SQL> 執行sql
SQL> exec dbms_support.stop_trace
跟蹤其他會話:等待事件+繫結變數,相當於level 12的10046事件。
SQL> select sid,serial#,username from v$session where ...;
SQL> exec dbms_support.start_trace_in_session(sid=>sid,serial=>serial#,waits=>true,binds=>true);
SQL> exec dbms_support.stop_trace_in_session(sid=>sid,serial=>serial#);

dbms_system包:9i時使用
跟蹤其他會話:
SQL> select sid,serial#,username from v$session where ...;
SQL> exec dbms_system.set_sql_trace_in_session(sid,serial#,true);
可以等候片刻,跟蹤session執行任務,捕獲sql操作…
SQL> exec dbms_system.set_sql_trace_in_session(sid,serial#,false);
ps:dbms_system這個包在10gR2官方文件上面沒有找到這個包的說明,但資料庫中有。
SQL> exec sys.dbms_system.SET_BOOL_PARAM_IN_SESSION(sid, serial#, 'sql_trace', TRUE);
SQL> exec sys.dbms_system.SET_BOOL_PARAM_IN_SESSION(sid, serial#, 'sql_trace', FALSE);
使用dbms_system.set_ev設定10046事件
SQL> select sid,serial#,username from v$session where ...;
SQL> exec dbms_system.set_ev(sid,serial#,10046,12,'');
SQL> exec dbms_system.set_ev(sid,serial#,10046,0,'');
最後一個引數只有為''時,才會生成trace檔案,否則不報錯,但沒有trace檔案生成。

3.資料庫hang住是一個特定會話出現hang住還是幾個會話出現hang住還是所有的會話都出現hang住
如果資料庫是一個會話或幾個會話出現hang住可以對這個會話執行10046跟蹤,可以對這個會話收集一些errorstacks資訊,也可以當問題出現時生成一個awr或statspack報告

生成轉儲和errorstack資訊的方法如下:
為了轉儲跟蹤和errorstacks資訊,可以使用作業系統程式ID或者oracle程式ID.比如可以透過oracle的sid來查詢到作業系統進ID:
SELECT p.pid, p.SPID,s.SID
FROM v$process p, v$session s
WHERE s.paddr = p.addr
AND s.SID = &SID;

SPID是作業系統識別符號
SID是oracle會話識別符號
PID是oracle程式識別符號

比如一個SPID是1254,pid是56如果使用SPID來生成轉儲和errorstacks資訊可以執行下面的語句:
connect / as sysdba
ALTER SESSION SET tracefile_identifier = 'STACK_10046';
oradebug setospid 1254
oradebug unlimit

oradebug event 10046 trace name context forever,level 12

oradebug dump errorstack 3
oradebug dump errorstack 3
oradebug dump errorstack 3
oradebug tracefile_name
oradebug event 10046 trace name context off

如果使用PID來生成轉儲和errorstacks資訊可以執行下面的語句:
connect / as sysdba
ALTER SESSION SET tracefile_identifier = 'STACK_10046';
oradebug setpid 56
oradebug unlimit

oradebug event 10046 trace name context forever,level 12

oradebug dump errorstack 3
oradebug dump errorstack 3
oradebug dump errorstack 3
oradebug tracefile_name
oradebug event 10046 trace name context off

其中oradebug tracefile_name命令會顯示跟蹤檔案的名字和位置,在生成的跟蹤檔名字會包含STACK_10046字元

如果要對當前會話收集errorstacks資訊首先要找出當前會話的SPID或PID可以執行如下語句來獲得:
SELECT p.pid, p.SPID,s.SID
FROM v$process p, v$session s
WHERE s.paddr = p.addr
AND s.audsid = userenv('SESSIONID') ;

或者

SELECT p.pid, p.SPID,s.SID
FROM v$process p,v$session s
WHERE s.paddr = p.addr
AND s.SID =
(SELECT DISTINCT SID
FROM V$MYSTAT);

如果資料庫是所有會話出現hang也就是整個資料庫出現hang住了診斷hang住的方法如下:
當一個資料庫出現Hang的問題時從資料庫中收集資訊來診斷掛志的根本原因是非常有用的.資料庫Hang的原因往往是孤立的可以使用收集來的診斷資訊來解決.另外如果不能解決可以用獲得的資訊來避免這個問題的再次重現.
解決方法
診斷資料庫Hang需要什麼資訊
資料庫Hang的特點是一些程式正在等待另一些程式的完成.通常有一個或多個阻塞程式被困或者正在努力工作但不是迅速的釋放資源.為了診斷需要以下資訊:
1.Hanganalyze and Systemstate Dumps
2.資料庫效能的awr/statspack快照
3.及時的RDA
Hanganalyze and Systemstate Dumps
Hang分析和系統狀態轉儲提供了在一個特定時間點的資料庫中的程式資訊.Hang分析提供了在Hang連結串列中所有程式的資訊,系統狀態提供了資料庫中所有程式的資訊.當檢視一個潛在的Hang情況時你需要判斷是否一個程式被因或動行緩慢.透過在兩個連續的時間間隔內收集這些轉儲資訊如果程式被困這些跟蹤資訊可以用於將來的診斷可能幫助你提供一些解決方法.Hang分析用來總結和確認資料庫是真的Hang還是隻是緩慢並提供了一致性快照,系統狀態轉儲顯示了資料庫中每一個程式正在做什麼
收集Hang分析和系統狀態轉儲資訊
登入系統
使用sql*plus以sysdba身份來登入
sqlplus '/ as sysdba'
如果連線時出現問題在oracle10gr2中可以使用sqlplus的"preliminary connection'
sqlplus -prelim '/ as sysdba'
注意:從oracle 11.2.0.2開始Hang分析在sqlplus的'preliminary connection'連線下將不會生成輸出因為它要會請求一個程式的狀態物件和一個會話狀態物件.如果正試圖分析跟蹤會輸出:
HANG ANALYSIS:
ERROR: Can not perform hang analysis dump without a process state object and a session state object.
( process=(nil), sess=(nil) )
非rac環境收集Hang分析和系統狀態的收集命令
有些時候資料庫可能只是非常的慢而不是真正的Hang.因此建議收集級別為2的Hang分析和系統狀態轉儲來判斷這些程式是正在執行還是已經停止執行
持起分析

sqlplus '/ as sysdba'
oradebug setmypid
oradebug unlimit
oradebug hanganalyze 3
-- Wait one minute before getting the second hanganalyze
oradebug hanganalyze 3
oradebug tracefile_name
exit

系統轉儲

sqlplus '/ as sysdba'
oradebug setmypid
oradebug unlimit
oradebug dump systemstate 266
oradebug dump systemstate 266
oradebug tracefile_name
exit

rac環境收集Hang分析和系統狀態的收集命令
如果在你的系統中沒有應用相關的補丁程式使用級別為266或267的系統狀態轉儲會有2個bug.因此在沒有應用這些補丁收集這些級別的轉儲是不明智的選擇
補丁資訊如下:
Document 11800959.8 Bug 11800959 - A SYSTEMSTATE dump with level >= 10 in RAC dumps huge BUSY GLOBAL CACHE ELEMENTS - can hang/crash instances
Document 11827088.8 Bug 11827088 - Latch 'gc element' contention, LMHB terminates the instance
在修正bug 11800959和bug 11827088的情況下對於rac環境懼訂Hang分析和系統狀態的收集命令如下:

sqlplus '/ as sysdba'
oradebug setorapname reco
oradebug  unlimit
oradebug -g all hanganalyze 3
oradebug -g all hanganalyze 3
oradebug -g all dump systemstate 266
oradebug -g all dump systemstate 266
exit

在沒有修正bug 11800959和bug 11827088的情況下對於rac環境懼訂Hang分析和系統狀態的收集命令如下:

sqlplus '/ as sysdba'
oradebug setorapname reco
oradebug unlimit
oradebug -g all hanganalyze 3
oradebug -g all hanganalyze 3
oradebug -g all dump systemstate 258
oradebug -g all dump systemstate 258
exit

在rac環境中會在每一個例項的跟蹤檔案中建立所有例項的轉儲資訊
對Hang分析和系統狀態轉儲的級別說明
Hang分析級別
level 3(級別3):在oracle11g之前level 3對Hang連結串列中的相關程式也會收集一個簡短的堆疊資訊
系統狀態轉儲級別
level 258(級別258)是一個快速的選擇但是會丟失一些鎖的後設資料資訊
level 267(級別267)它包含了理解成本所需要的額外的緩衝區快取/鎖後設資料資訊
其它的方法
如果不能連線到系統時如何收集系統狀態轉儲資訊
通常有兩種方法來在系統Hang不能連線時來生成系統狀態轉儲資訊
1.alter session set events 'immediate trace name SYSTEMSTATE level 10';
2.$ sqlplus
connect sys/passwd as sysdba
oradebug setospid oradebug unlimit
oradebug dump systemstate 10
(注意:在oradebug中不能使用任何半冒號,如果你的資料庫是比oracle9i還老的版本你將需要使用svrmgrl來連線到內部)
當你使用這兩種方法中的一種時,要確保在兩次轉儲時內部連線斷開.這種方法生成的轉儲將在你的user_dump_dest目錄中是分開的ora_.trc檔案
在非常嚴重的情況下不能使用svrmgrl或sqlplus進行連線執行這些必要的命令.在這種情況下仍然有一個後門方法使用偵錯程式比如你的系統有dbx的話可以用dbx來生成系統狀態轉儲資訊.被連線到的轉儲核心程式可能會被殺死所以不能連線到一個oracle後臺程式.dbx的語法如下:
dbx -a PID (where PID = any oracle shadow process)
dbx() print ksudss(10)
...return value printed here
dbx() detach
首先你需要找到一個影子程式

(jy) % ps -ef |grep sqlplus
osupport  78526 154096   0 12:11:05  pts/1  0:00 sqlplus scott/tiger
osupport  94130  84332   1 12:11:20  pts/3  0:00 grep sqlplus
(jy) % ps -ef |grep 78526
osupport  28348  78526   0 12:11:05      -  0:00 oracles734 (DESCRIPTION=(LOCAL
osupport  78526 154096   0 12:11:05  pts/1  0:00 sqlplus scott/tiger
osupport  94132  84332   1 12:11:38  pts/3  0:00 grep 78526

這樣將會連線到影子程式PID 28348上.當返回提示符時輸入ksudss(10)命令和detach:

(jy) % dbx -a 28348
Waiting to attach to process 28348 ...
Successfully attached to oracle.
warning: Directory containing oracle could not be determined.
Apply 'use' command to initialize source path.

Type 'help' for help.
reading symbolic information ...
stopped in read at 0xd016fdf0
0xd016fdf0 (read+0x114) 80410014        lwz   r2,0x14(r1)
(dbx) print ksudss(10)
2
(dbx) detach

在user_dump_dest目錄中你將會透過跟蹤的pid找到一個系統狀態轉儲檔案

(jy) % ls -lrt *28348*
-rw-r-----   1 osupport dba        46922 Oct 10 12:12 ora_28348.trc

core_28348:
total 72
-rw-r--r--   1 osupport dba        16567 Oct 10 12:12 core
drwxr-xr-x   7 osupport dba        12288 Oct 10 12:12 ../
drwxr-x---   2 osupport dba          512 Oct 10 12:12 ./

在跟蹤檔案中你將會找到常用的頭資訊.在oracle7.3.4並行作業系統中在這後面緊跟的是鎖資訊然後才是系統轉儲資訊.
在oracle8並行作業系統中和非並行作業系統和oracle7.3.4非並行作業系統的系統狀態資訊是緊跟頭資訊.
在轉儲檔案中頭資訊如下:

Dump file /oracle/mpp/734/rdbms/log/ora_28348.trc
Oracle7 Server Release 7.3.4.4.1 - Production
With the distributed, replication, parallel query, Parallel Server
and Spatial Data options
PL/SQL Release 2.3.4.4.1 - Production
ORACLE_HOME = /oracle/mpp/734
System name:    AIX
Node name:      saki
Release:        3
Version:        4
Machine:        000089914C00
Instance name: s734
Redo thread mounted by this instance: 2
Oracle process number: 0
Unix process pid: 28348, image:

ksinfy: nfytype = 0x5
ksinfy: calling scggra(&se)
scggra: SCG_PROCESS_LOCKING not defined
scggra: calling lk_group_attach()
ksinfy: returning
*** SESSION ID:(12.15) 2000.10.10.12.11.06.000
ksqcmi: get or convert
ksqcmi: get or convert
*** 2000.10.10.12.12.08.000
===================================================
SYSTEM STATE

.....

確保在這個檔案中有一個end of system state.可以對它使用grep或在vi中搜尋.如果沒有那麼這個跟蹤檔案是不過完整.
可能是因為init.ora檔案中的max_dump_file的大小太小了.
對於oracle10g及以後的版本:
在有些情況下不連線到例項是允許的(在有些ora-20的情況下,對於oracle10.1.x,對於sqlplus有一個新選項來允許訪問例項來生成跟蹤檔案)sqlplus -prelim / as sysdba
例如

export ORACLE_SID=PROD                                 ## Replace PROD with the SID you want to trace
sqlplus -prelim / as sysdba
oradebug setmypid
oradebug unlimit;
oradebug dump systemstate 10

在rac系統中,Hang分析,系統轉儲和其它一些rac資訊可以使用racdiag.sql指令碼來進行收集:

-- NAME: RACDIAG.SQL
-- SYS OR INTERNAL USER, CATPARR.SQL ALREADY RUN, PARALLEL QUERY OPTION ON
-- ------------------------------------------------------------------------
-- AUTHOR:
-- Michael Polaski - Oracle Support Services
-- Copyright 2002, Oracle Corporation
-- ------------------------------------------------------------------------
-- PURPOSE:
-- This script is intended to provide a user friendly guide to troubleshoot
-- RAC hung sessions or slow performance scenerios. The script includes
-- information to gather a variety of important debug information to determine
-- the cause of a RAC session level hang. The script will create a file
-- called racdiag_.out in your local directory while dumping hang analyze
-- dumps in the user_dump_dest(s) and background_dump_dest(s) on all nodes.
--
-- ------------------------------------------------------------------------
-- DISCLAIMER:
-- This script is provided for educational purposes only. It is NOT
-- supported by Oracle World Wide Technical Support.
-- The script has been tested and appears to work as intended.
-- You should always run new scripts on a test instance initially.
-- ------------------------------------------------------------------------
-- Script output is as follows:

set echo off
set feedback off
column timecol new_value timestamp
column spool_extension new_value suffix
select to_char(sysdate,'Mondd_hhmi') timecol,
'.out' spool_extension from sys.dual;
column output new_value dbname
select value || '_' output
from v$parameter where name = 'db_name';
spool racdiag_&&dbname&&timestamp&&suffix
set lines 200
set pagesize 35
set trim on
set trims on
alter session set nls_date_format = 'MON-DD-YYYY HH24:MI:SS';
alter session set timed_statistics = true;
set feedback on
select to_char(sysdate) time from dual;

set numwidth 5
column host_name format a20 tru
select inst_id, instance_name, host_name, version, status, startup_time
from gv$instance
order by inst_id;

set echo on

-- WAIT CHAINS
-- 11.x+ Only (This will not work in < v11
-- See Note 1428210.1 for instructions on interpreting.
set pages 1000
set lines 120
set heading off
column w_proc format a50 tru
column instance format a20 tru
column inst format a28 tru
column wait_event format a50 tru
column p1 format a16 tru
column p2 format a16 tru
column p3 format a15 tru
column Seconds format a50 tru
column sincelw format a50 tru
column blocker_proc format a50 tru
column waiters format a50 tru
column chain_signature format a100 wra
column blocker_chain format a100 wra
SELECT *
FROM (SELECT 'Current Process: '||osid W_PROC, 'SID '||i.instance_name INSTANCE,
'INST #: '||instance INST,'Blocking Process: '||decode(blocker_osid,null,'',blocker_osid)||
' from Instance '||blocker_instance BLOCKER_PROC,'Number of waiters: '||num_waiters waiters,
'Wait Event: ' ||wait_event_text wait_event, 'P1: '||p1 p1, 'P2: '||p2 p2, 'P3: '||p3 p3,
'Seconds in Wait: '||in_wait_secs Seconds, 'Seconds Since Last Wait: '||time_since_last_wait_secs sincelw,
'Wait Chain: '||chain_id ||': '||chain_signature chain_signature,'Blocking Wait Chain: '||decode(blocker_chain_id,null,
'',blocker_chain_id) blocker_chain
FROM v$wait_chains wc,
v$instance i
WHERE wc.instance = i.instance_number (+)
AND ( num_waiters > 0
OR ( blocker_osid IS NOT NULL
AND in_wait_secs > 10 ) )
ORDER BY chain_id,
num_waiters DESC)
WHERE ROWNUM < 101; -- Taking Hang Analyze dumps  -- This may take a little while...  oradebug setmypid  oradebug unlimit  oradebug -g all hanganalyze 3  -- This part may take the longest, you can monitor bdump or udump to see if  -- the file is being generated.  oradebug -g all dump systemstate 258  -- WAITING SESSIONS:  -- The entries that are shown at the top are the sessions that have  -- waited the longest amount of time that are waiting for non-idle wait  -- events (event column). You can research and find out what the wait  -- event indicates (along with its parameters) by checking the Oracle  -- Server Reference Manual or look for any known issues or documentation  -- by searching Metalink for the event name in the search bar. Example  -- (include single quotes): [ 'buffer busy due to global cache' ].  -- Metalink and/or the Server Reference Manual should return some useful  -- information on each type of wait event. The inst_id column shows the  -- instance where the session resides and the SID is the unique identifier  -- for the session (gv$session). The p1, p2, and p3 columns will show  -- event specific information that may be important to debug the problem.  -- To find out what the p1, p2, and p3 indicates see the next section.  -- Items with wait_time of anything other than 0 indicate we do not know  -- how long these sessions have been waiting.  --  set numwidth 15 set heading on column state format a7 tru  column event format a25 tru  column last_sql format a40 tru  select sw.inst_id, sw.sid, sw.state, sw.event, sw.seconds_in_wait seconds,  sw.p1, sw.p2, sw.p3, sa.sql_text last_sql  from gv$session_wait sw, gv$session s, gv$sqlarea sa  where sw.event not in  ('rdbms ipc message','smon timer','pmon timer',  'SQL*Net message from client','lock manager wait for remote message',  'ges remote message', 'gcs remote message', 'gcs for action', 'client message',  'pipe get', 'null event', 'PX Idle Wait', 'single-task message',  'PX Deq: Execution Msg', 'KXFQ: kxfqdeq - normal deqeue',  'listen endpoint status','slave wait','wakeup time manager')  and sw.seconds_in_wait > 0
and (sw.inst_id = s.inst_id and sw.sid = s.sid)
and (s.inst_id = sa.inst_id and s.sql_address = sa.address)
order by seconds desc;

-- EVENT PARAMETER LOOKUP:
-- This section will give a description of the parameter names of the
-- events seen in the last section. p1test is the parameter value for
-- p1 in the WAITING SESSIONS section while p2text is the parameter
-- value for p3 and p3 text is the parameter value for p3. The
-- parameter values in the first section can be helpful for debugging
-- the wait event.
--
column event format a30 tru
column p1text format a25 tru
column p2text format a25 tru
column p3text format a25 tru
select distinct event, p1text, p2text, p3text
from gv$session_wait sw
where sw.event not in ('rdbms ipc message','smon timer','pmon timer',
'SQL*Net message from client','lock manager wait for remote message',
'ges remote message', 'gcs remote message', 'gcs for action', 'client message',
'pipe get', 'null event', 'PX Idle Wait', 'single-task message',
'PX Deq: Execution Msg', 'KXFQ: kxfqdeq - normal deqeue',
'listen endpoint status','slave wait','wakeup time manager')
and seconds_in_wait > 0
order by event;

-- GES LOCK BLOCKERS:
-- This section will show us any sessions that are holding locks that
-- are blocking other users. The inst_id will show us the instance that
-- the session resides on while the sid will be a unique identifier for
-- the session. The grant_level will show us how the GES lock is granted to
-- the user. The request_level will show us what status we are trying to
-- obtain.  The lockstate column will show us what status the lock is in.
-- The last column shows how long this session has been waiting.
--
set numwidth 5
column state format a16 tru;
column event format a30 tru;
select dl.inst_id, s.sid, p.spid, dl.resource_name1,
decode(substr(dl.grant_level,1,8),'KJUSERNL','Null','KJUSERCR','Row-S (SS)',
'KJUSERCW','Row-X (SX)','KJUSERPR','Share','KJUSERPW','S/Row-X (SSX)',
'KJUSEREX','Exclusive',request_level) as grant_level,
decode(substr(dl.request_level,1,8),'KJUSERNL','Null','KJUSERCR','Row-S (SS)',
'KJUSERCW','Row-X (SX)','KJUSERPR','Share','KJUSERPW','S/Row-X (SSX)',
'KJUSEREX','Exclusive',request_level) as request_level,
decode(substr(dl.state,1,8),'KJUSERGR','Granted','KJUSEROP','Opening',
'KJUSERCA','Canceling','KJUSERCV','Converting') as state,
s.sid, sw.event, sw.seconds_in_wait sec
from gv$ges_enqueue dl, gv$process p, gv$session s, gv$session_wait sw
where blocker = 1
and (dl.inst_id = p.inst_id and dl.pid = p.spid)
and (p.inst_id = s.inst_id and p.addr = s.paddr)
and (s.inst_id = sw.inst_id and s.sid = sw.sid)
order by sw.seconds_in_wait desc;

-- GES LOCK WAITERS:
-- This section will show us any sessions that are waiting for locks that
-- are blocked by other users. The inst_id will show us the instance that
-- the session resides on while the sid will be a unique identifier for
-- the session. The grant_level will show us how the GES lock is granted to
-- the user. The request_level will show us what status we are trying to
-- obtain.  The lockstate column will show us what status the lock is in.
-- The last column shows how long this session has been waiting.
--
set numwidth 5
column state format a16 tru;
column event format a30 tru;
select dl.inst_id, s.sid, p.spid, dl.resource_name1,
decode(substr(dl.grant_level,1,8),'KJUSERNL','Null','KJUSERCR','Row-S (SS)',
'KJUSERCW','Row-X (SX)','KJUSERPR','Share','KJUSERPW','S/Row-X (SSX)',
'KJUSEREX','Exclusive',request_level) as grant_level,
decode(substr(dl.request_level,1,8),'KJUSERNL','Null','KJUSERCR','Row-S (SS)',
'KJUSERCW','Row-X (SX)','KJUSERPR','Share','KJUSERPW','S/Row-X (SSX)',
'KJUSEREX','Exclusive',request_level) as request_level,
decode(substr(dl.state,1,8),'KJUSERGR','Granted','KJUSEROP','Opening',
'KJUSERCA','Cancelling','KJUSERCV','Converting') as state,
s.sid, sw.event, sw.seconds_in_wait sec
from gv$ges_enqueue dl, gv$process p, gv$session s, gv$session_wait sw
where blocked = 1
and (dl.inst_id = p.inst_id and dl.pid = p.spid)
and (p.inst_id = s.inst_id and p.addr = s.paddr)
and (s.inst_id = sw.inst_id and s.sid = sw.sid)
order by sw.seconds_in_wait desc;

-- LOCAL ENQUEUES:
-- This section will show us if there are any local enqueues. The inst_id will
-- show us the instance that the session resides on while the sid will be a
-- unique identifier for. The addr column will show the lock address. The type
-- will show the lock type. The id1 and id2 columns will show specific
-- parameters for the lock type.
--
set numwidth 12
column event format a12 tru
select l.inst_id, l.sid, l.addr, l.type, l.id1, l.id2,
decode(l.block,0,'blocked',1,'blocking',2,'global') block,
sw.event, sw.seconds_in_wait sec
from gv$lock l, gv$session_wait sw
where (l.sid = sw.sid and l.inst_id = sw.inst_id)
and l.block in (0,1)
order by l.type, l.inst_id, l.sid;

-- LATCH HOLDERS:
-- If there is latch contention or 'latch free' wait events in the WAITING
-- SESSIONS section we will need to find out which proceseses are holding
-- latches. The inst_id will show us the instance that the session resides
-- on while the sid will be a unique identifier for. The username column
-- will show the session's username. The os_user column will show the os
-- user that the user logged in as. The name column will show us the type
-- of latch being waited on. You can search Metalink for the latch name in
-- the search bar. Example (include single quotes):
-- [ 'library cache' latch ]. Metalink should return some useful information
-- on the type of latch.
--
set numwidth 5
select distinct lh.inst_id, s.sid, s.username, p.username os_user, lh.name
from gv$latchholder lh, gv$session s, gv$process p
where (lh.sid = s.sid and lh.inst_id = s.inst_id)
and (s.inst_id = p.inst_id and s.paddr = p.addr)
order by lh.inst_id, s.sid;

-- LATCH STATS:
-- This view will show us latches with less than optimal hit ratios
-- The inst_id will show us the instance for the particular latch. The
-- latch_name column will show us the type of latch. You can search Metalink
-- for the latch name in the search bar. Example (include single quotes):
-- [ 'library cache' latch ]. Metalink should return some useful information
-- on the type of latch. The hit_ratio shows the percentage of time we
-- successfully acquired the latch.
--
column latch_name format a30 tru
select inst_id, name latch_name,
round((gets-misses)/decode(gets,0,1,gets),3) hit_ratio,
round(sleeps/decode(misses,0,1,misses),3) "SLEEPS/MISS"
from gv$latch
where round((gets-misses)/decode(gets,0,1,gets),3) < .99
and gets != 0
order by round((gets-misses)/decode(gets,0,1,gets),3);

-- No Wait Latches:
--
select inst_id, name latch_name,
round((immediate_gets/(immediate_gets+immediate_misses)), 3) hit_ratio,
round(sleeps/decode(immediate_misses,0,1,immediate_misses),3) "SLEEPS/MISS"
from gv$latch
where round((immediate_gets/(immediate_gets+immediate_misses)), 3) < .99  and immediate_gets + immediate_misses > 0
order by round((immediate_gets/(immediate_gets+immediate_misses)), 3);

-- GLOBAL CACHE CR PERFORMANCE
-- This shows the average latency of a consistent block request.
-- AVG CR BLOCK RECEIVE TIME should typically be about 15 milliseconds
-- depending on your system configuration and volume, is the average
-- latency of a consistent-read request round-trip from the requesting
-- instance to the holding instance and back to the requesting instance. If
-- your CPU has limited idle time and your system typically processes
-- long-running queries, then the latency may be higher. However, it is
-- possible to have an average latency of less than one millisecond with
-- User-mode IPC. Latency can be influenced by a high value for the
-- DB_MULTI_BLOCK_READ_COUNT parameter. This is because a requesting process
-- can issue more than one request for a block depending on the setting of
-- this parameter. Correspondingly, the requesting process may wait longer.
-- Also check interconnect badwidth, OS tcp settings, and OS udp settings if
-- AVG CR BLOCK RECEIVE TIME is high.
--
set numwidth 20
column "AVG CR BLOCK RECEIVE TIME (ms)" format 9999999.9
select b1.inst_id, b2.value "GCS CR BLOCKS RECEIVED",
b1.value "GCS CR BLOCK RECEIVE TIME",
((b1.value / b2.value) * 10) "AVG CR BLOCK RECEIVE TIME (ms)"
from gv$sysstat b1, gv$sysstat b2
where b1.name = 'global cache cr block receive time' and
b2.name = 'global cache cr blocks received' and b1.inst_id = b2.inst_id
or b1.name = 'gc cr block receive time' and
b2.name = 'gc cr blocks received' and b1.inst_id = b2.inst_id ;

-- GLOBAL CACHE LOCK PERFORMANCE
-- This shows the average global enqueue get time.
-- Typically AVG GLOBAL LOCK GET TIME should be 20-30 milliseconds. the
-- elapsed time for a get includes the allocation and initialization of a
-- new global enqueue. If the average global enqueue get (global cache
-- get time) or average global enqueue conversion times are excessive,
-- then your system may be experiencing timeouts. See the 'WAITING SESSIONS',
-- 'GES LOCK BLOCKERS', GES LOCK WAITERS', and 'TOP 10 WAIT EVENTS ON SYSTEM'
-- sections if the AVG GLOBAL LOCK GET TIME is high.
--
set numwidth 20
column "AVG GLOBAL LOCK GET TIME (ms)" format 9999999.9
select b1.inst_id, (b1.value + b2.value) "GLOBAL LOCK GETS",
b3.value "GLOBAL LOCK GET TIME",
(b3.value / (b1.value + b2.value) * 10) "AVG GLOBAL LOCK GET TIME (ms)"
from gv$sysstat b1, gv$sysstat b2, gv$sysstat b3
where b1.name = 'global lock sync gets' and
b2.name = 'global lock async gets' and b3.name = 'global lock get time'
and b1.inst_id = b2.inst_id and b2.inst_id = b3.inst_id
or b1.name = 'global enqueue gets sync' and
b2.name = 'global enqueue gets async' and b3.name = 'global enqueue get time'
and b1.inst_id = b2.inst_id and b2.inst_id = b3.inst_id;

-- RESOURCE USAGE
-- This section will show how much of our resources we have used.
--
set numwidth 8
select inst_id, resource_name, current_utilization, max_utilization,
initial_allocation
from gv$resource_limit
where max_utilization > 0
order by inst_id, resource_name;

-- DLM TRAFFIC INFORMATION
-- This section shows how many tickets are available in the DLM. If the
-- TCKT_WAIT columns says "YES" then we have run out of DLM tickets which
-- could cause a DLM hang. Make sure that you also have enough TCKT_AVAIL.
--
set numwidth 10
select * from gv$dlm_traffic_controller
order by TCKT_AVAIL;

-- DLM MISC
--
set numwidth 10
select * from gv$dlm_misc;

-- LOCK CONVERSION DETAIL:
-- This view shows the types of lock conversion being done on each instance.
--
select * from gv$lock_activity;

-- INITIALIZATION PARAMETERS:
-- Non-default init parameters for each node.
--
set numwidth 5
column name format a30 tru
column value format a50 wra
column description format a60 tru
select inst_id, name, value, description
from gv$parameter
where isdefault = 'FALSE'
order by inst_id, name;

-- TOP 10 WAIT EVENTS ON SYSTEM
-- This view will provide a summary of the top wait events in the db.
--
set numwidth 10
column event format a25 tru
select inst_id, event, time_waited, total_waits, total_timeouts
from (select inst_id, event, time_waited, total_waits, total_timeouts
from gv$system_event where event not in ('rdbms ipc message','smon timer',
'pmon timer', 'SQL*Net message from client','lock manager wait for remote message',
'ges remote message', 'gcs remote message', 'gcs for action', 'client message',
'pipe get', 'null event', 'PX Idle Wait', 'single-task message',
'PX Deq: Execution Msg', 'KXFQ: kxfqdeq - normal deqeue',
'listen endpoint status','slave wait','wakeup time manager')
order by time_waited desc)
where rownum < 11  order by time_waited desc;  -- SESSION/PROCESS REFERENCE:  -- This section is very important for most of the above sections to find out  -- which user/os_user/process is identified to which session/process.  --  set numwidth 7  column event format a30 tru  column program format a25 tru  column username format a15 tru  select p.inst_id, s.sid, s.serial#, p.pid, p.spid, p.program, s.username,  p.username os_user, sw.event, sw.seconds_in_wait sec  from gv$process p, gv$session s, gv$session_wait sw  where (p.inst_id = s.inst_id and p.addr = s.paddr)  and (s.inst_id = sw.inst_id and s.sid = sw.sid)  order by p.inst_id, s.sid;  -- SYSTEM STATISTICS:  -- All System Stats with values of > 0. These can be referenced in the
-- Server Reference Manual
--
set numwidth 5
column name format a60 tru
column value format 9999999999999999999999999
select inst_id, name, value
from gv$sysstat
where value > 0
order by inst_id, name;

-- CURRENT SQL FOR WAITING SESSIONS:
-- Current SQL for any session in the WAITING SESSIONS list
--
set numwidth 5
column sql format a80 wra
select sw.inst_id, sw.sid, sw.seconds_in_wait sec, sa.sql_text sql
from gv$session_wait sw, gv$session s, gv$sqlarea sa
where sw.sid = s.sid (+)
and sw.inst_id = s.inst_id (+)
and s.sql_address = sa.address
and sw.event not in ('rdbms ipc message','smon timer','pmon timer',
'SQL*Net message from client','lock manager wait for remote message',
'ges remote message', 'gcs remote message', 'gcs for action', 'client message',
'pipe get', 'null event', 'PX Idle Wait', 'single-task message',
'PX Deq: Execution Msg', 'KXFQ: kxfqdeq - normal deqeue',
'listen endpoint status','slave wait','wakeup time manager')
and sw.seconds_in_wait > 0
order by sw.seconds_in_wait desc;

-- WAIT CHAINS
-- 11.x+ Only (This will not work in < v11
-- See Note 1428210.1 for instructions on interpreting.
set pages 1000
set lines 120
set heading off
column w_proc format a50 tru
column instance format a20 tru
column inst format a28 tru
column wait_event format a50 tru
column p1 format a16 tru
column p2 format a16 tru
column p3 format a15 tru
column seconds format a50 tru
column sincelw format a50 tru
column blocker_proc format a50 tru
column waiters format a50 tru
column chain_signature format a100 wra
column blocker_chain format a100 wra
SELECT *
FROM (SELECT 'Current Process: '||osid W_PROC, 'SID '||i.instance_name INSTANCE,
'INST #: '||instance INST,'Blocking Process: '||decode(blocker_osid,null,'',blocker_osid)||
' from Instance '||blocker_instance BLOCKER_PROC,'Number of waiters: '||num_waiters waiters,
'Wait Event: ' ||wait_event_text wait_event, 'P1: '||p1 p1, 'P2: '||p2 p2, 'P3: '||p3 p3,
'Seconds in Wait: '||in_wait_secs Seconds, 'Seconds Since Last Wait: '||time_since_last_wait_secs sincelw,
'Wait Chain: '||chain_id ||': '||chain_signature chain_signature,'Blocking Wait Chain: '||decode(blocker_chain_id,null,
'',blocker_chain_id) blocker_chain
FROM v$wait_chains wc,
v$instance i
WHERE wc.instance = i.instance_number (+)
AND ( num_waiters > 0
OR ( blocker_osid IS NOT NULL
AND in_wait_secs > 10 ) )
ORDER BY chain_id,
num_waiters DESC)
WHERE ROWNUM < 101;

-- Taking Hang Analyze dumps
-- This may take a little while...
oradebug setmypid
oradebug unlimit
oradebug -g all hanganalyze 3
-- This part may take the longest, you can monitor bdump or udump to see
-- if the file is being generated.
oradebug -g all dump systemstate 258

set echo off

select to_char(sysdate) time from dual;

spool off

-- ---------------------------------------------------------------------------
Prompt;
Prompt racdiag output files have been written to:;
Prompt;
host pwd
Prompt alert log and trace files are located in:;
column host_name format a12 tru
column name format a20 tru
column value format a60 tru
select distinct i.host_name, p.name, p.value
from gv$instance i, gv$parameter p
where p.inst_id = i.inst_id (+)
and p.name like '%_dump_dest'
and p.name != 'core_dump_dest';
v$wait_chains

從oracle11gr1開始,dia0後臺程式開始收集Hang分析資訊並儲存在記憶體中的"hang analysis cache"中.它會每3秒鐘收集一次本地的Hang分析和第10秒鐘收集一次全域性(rac)Hang分析資訊.這些資訊在出現Hang時提供快速檢視Hang連結串列的方法.
儲存在"hang analysiz cache"中的資料對於診斷資料庫競爭和Hang是非常有效的
有許多資料庫功能可以利用Hang分析快取:Hang Management, Resource Manager Idle Blocker Kill,
SQL Tune Hang Avoidance和PMON清除以及外部工具象Procwatcher
下面是oracle11gr2中v$wait_chains檢視的描述:

SQL> desc v$wait_chains
  Name                                      Null     Type
  ----------------------------------------- -------- ----------------------
  CHAIN_ID                                           NUMBER
  CHAIN_IS_CYCLE                                     VARCHAR2(5)
  CHAIN_SIGNATURE                                    VARCHAR2(801)
  CHAIN_SIGNATURE_HASH                               NUMBER
  INSTANCE                                           NUMBER
  OSID                                               VARCHAR2(25)
  PID                                                NUMBER
  SID                                                NUMBER
  SESS_SERIAL#                                       NUMBER
  BLOCKER_IS_VALID                                   VARCHAR2(5)
  BLOCKER_INSTANCE                                   NUMBER
  BLOCKER_OSID                                       VARCHAR2(25)
  BLOCKER_PID                                        NUMBER
  BLOCKER_SID                                        NUMBER
  BLOCKER_SESS_SERIAL#                               NUMBER
  BLOCKER_CHAIN_ID                                   NUMBER
  IN_WAIT                                            VARCHAR2(5)
  TIME_SINCE_LAST_WAIT_SECS                          NUMBER
  WAIT_ID                                            NUMBER
  WAIT_EVENT                                         NUMBER
  WAIT_EVENT_TEXT                                    VARCHAR2(64)
  P1                                                 NUMBER
  P1_TEXT                                            VARCHAR2(64)
  P2                                                 NUMBER
  P2_TEXT                                            VARCHAR2(64)
  P3                                                 NUMBER
  P3_TEXT                                            VARCHAR2(64)
  IN_WAIT_SECS                                       NUMBER
  TIME_REMAINING_SECS                                NUMBER
  NUM_WAITERS                                        NUMBER
  ROW_WAIT_OBJ#                                      NUMBER
  ROW_WAIT_FILE#                                     NUMBER
  ROW_WAIT_BLOCK#                                    NUMBER
  ROW_WAIT_ROW#                                      NUMBER

注意:v$wait_chains等價於gv$檢視可能在rac環境中報告多個例項
使用sql來查詢基本資訊

SQL> SELECT chain_id, num_waiters, in_wait_secs, osid, blocker_osid, substr(wait_event_text,1,30)
 FROM v$wait_chains; 2

 CHAIN_ID   NUM_WAITERS IN_WAIT_SECS OSID           BLOCKER_OSID         SUBSTR(WAIT_EVENT_TEXT,1,30)
 ---------- ----------- ------------ -------------- ------------------------- -----------------------------
1          0           10198        21045          21044                      enq: TX - row lock contention
 1          1           10214        21044                                    SQL*Net message from client

查詢top 100 wait chain processs

 set pages 1000
 set lines 120
 set heading off
 column w_proc format a50 tru
 column instance format a20 tru
 column inst format a28 tru
 column wait_event format a50 tru
 column p1 format a16 tru
 column p2 format a16 tru
 column p3 format a15 tru
 column Seconds format a50 tru
 column sincelw format a50 tru
 column blocker_proc format a50 tru
 column waiters format a50 tru
 column chain_signature format a100 wra
 column blocker_chain format a100 wra

 SELECT *
 FROM (SELECT 'Current Process: '||osid W_PROC, 'SID '||i.instance_name INSTANCE,
 'INST #: '||instance INST,'Blocking Process: '||decode(blocker_osid,null,'',blocker_osid)||
 ' from Instance '||blocker_instance BLOCKER_PROC,'Number of waiters: '||num_waiters waiters,
 'Wait Event: ' ||wait_event_text wait_event, 'P1: '||p1 p1, 'P2: '||p2 p2, 'P3: '||p3 p3,
 'Seconds in Wait: '||in_wait_secs Seconds, 'Seconds Since Last Wait: '||time_since_last_wait_secs sincelw,
 'Wait Chain: '||chain_id ||': '||chain_signature chain_signature,'Blocking Wait Chain: '||decode(blocker_chain_id,null,
 '',blocker_chain_id) blocker_chain
 FROM v$wait_chains wc,
 v$instance i
 WHERE wc.instance = i.instance_number (+)
 AND ( num_waiters > 0
 OR ( blocker_osid IS NOT NULL
 AND in_wait_secs > 10 ) )
 ORDER BY chain_id,
 num_waiters DESC)
 WHERE ROWNUM < 101;


Current Process:21549                                   SID RAC1                 INST #: 1
Blocking Process: from Instance                   Number of waiters:1
Wait Event:SQL*Net message from client                  P1: 1650815232  P2: 1     P3:0
Seconds in Wait:36                                      Seconds Since Last Wait:
Wait Chaing:1 : 'SQL*Net message from client '< ='enq: TX - row lock contention'
Blocking Wait Chain:

Current Process:25627                                   SID RAC1                 INST #: 1
Blocking Process:21549 from Instance 1                  Number of waiters:0
Wait Event:enq: TX - row lock contention                P1:1415053318 P2: 524316 P3:50784
Seconds in Wait:22                                      Seconds Since Last Wait:
Wait Chain:1 : 'SQL*Net message from client '< ='enq: TX - row lock contention'
Blocking Wait Chain:

ospid 25627正等待一個TX lock正被ospid 21549所阻塞
ospid 21549正空閒等待'SQL*Net message from client'
在oracle11gr2中的最終阻塞會話
在oracle11gr2中可能將v$session.final_blocking_session看作是最終的阻塞者.最終的阻會話/程式在top等待連結串列上.
這些會話/程式可能是造成問題的原因.

set pages 1000
set lines 120
set heading off
column w_proc format a50 tru
column instance format a20 tru
column inst format a28 tru
column wait_event format a50 tru
column p1 format a16 tru
column p2 format a16 tru
column p3 format a15 tru
column Seconds format a50 tru
column sincelw format a50 tru
column blocker_proc format a50 tru
column fblocker_proc format a50 tru
column waiters format a50 tru
column chain_signature format a100 wra
column blocker_chain format a100 wra

SELECT *
FROM (SELECT 'Current Process: '||osid W_PROC, 'SID '||i.instance_name INSTANCE,
 'INST #: '||instance INST,'Blocking Process: '||decode(blocker_osid,null,'',blocker_osid)||
 ' from Instance '||blocker_instance BLOCKER_PROC,
 'Number of waiters: '||num_waiters waiters,
 'Final Blocking Process: '||decode(p.spid,null,'',
 p.spid)||' from Instance '||s.final_blocking_instance FBLOCKER_PROC,
 'Program: '||p.program image,
 'Wait Event: ' ||wait_event_text wait_event, 'P1: '||wc.p1 p1, 'P2: '||wc.p2 p2, 'P3: '||wc.p3 p3,
 'Seconds in Wait: '||in_wait_secs Seconds, 'Seconds Since Last Wait: '||time_since_last_wait_secs sincelw,
 'Wait Chain: '||chain_id ||': '||chain_signature chain_signature,'Blocking Wait Chain: '||decode(blocker_chain_id,null,
 '',blocker_chain_id) blocker_chain
FROM v$wait_chains wc,
 gv$session s,
 gv$session bs,
 gv$instance i,
 gv$process p
WHERE wc.instance = i.instance_number (+)
 AND (wc.instance = s.inst_id (+) and wc.sid = s.sid (+)
 and wc.sess_serial# = s.serial# (+))
 AND (s.final_blocking_instance = bs.inst_id (+) and s.final_blocking_session = bs.sid (+))
 AND (bs.inst_id = p.inst_id (+) and bs.paddr = p.addr (+))
 AND ( num_waiters > 0
 OR ( blocker_osid IS NOT NULL
 AND in_wait_secs > 10 ) )
ORDER BY chain_id,
 num_waiters DESC)
WHERE ROWNUM < 101;



Current Process:2309                                    SID RAC1                 INST #: 1
Blocking Process: from Instance                   Number of waiters:2
Wait Event:SQL*Net message from client                  P1: 1650815232  P2: 1     P3:0
Seconds in Wait:157                                     Seconds Since Last Wait:
Wait Chaing:1 : 'SQL*Net message from client '< ='enq: TM - contention'<='enq: TM - contention'
Blocking Wait Chain:

Current Process:2395                                    SID RAC1                 INST #: 1
Blocking Process:2309 from Instance 1                   Number of waiters:0
Final Block Process:2309 from Instance 1                Program: oracle@racdbe1.us.oracle.com (TNS V1-V3)
Wait Event:enq: TX - contention                         P1:1415053318 P2: 524316 P3:50784
Seconds in Wait:139                                      Seconds Since Last Wait:
Wait Chain:1 : 'SQL*Net message from client '< ='enq: TM - contention'<='enq: TM - contention'
Blocking Wait Chain:

B.對資料庫效能生成一個awr/statspack快照
C.收集最新的RDA
最新的RDA提供了大量額外關於資料庫配置和效能度量的資訊可以用來檢測可能影響效能的熱點的後臺程式問題
有時資料庫不是真正的被hang住可是隻是'spinning' cpu.可以使用以下方法來檢查伺服器是hang還是spin如果一個操作執行的時間比期待的時間長或者這個操作損害了其它操作的效能時那麼最好是檢查v$session_wait檢視.這個檢視顯示了在系統中會話當前正在等待的資訊.可以使用下面的指令碼來操作.

column sid format 990
column seq# format 99990
column wait_time heading 'WTime' format 99990
column event format a30
column p1 format 9999999990
column p2 format 9999999990
column p3 format 9990
select sid,event,seq#,p1,p2,p3,wait_time
from V$session_wait
order by sid
/

上面的查詢最少應該執行三次並比較其它查詢結果
列意思
sid-- 會話的系統識別符號
seq#--序列號.當一個特定會話的等待一個新的事件時這個數字會增加.它能告訴你一個會話是否正在執行
evnet--會話正在等待的或最後等待的操作
p1,p2,p3--它們代表不同的等待值
wait_time--0指示這個會話正在等待的事件.非0指示這個會話最後等待的事件和會話正使用CPU
例如:

 SID EVENT                            SEQ#          P1          P2    P3  WTime
---- ------------------------------ ------ ----------- ----------- ----- ------
   1 pmon timer                        335         300           0     0      0
   2 rdbms ipc message                 779         300           0     0      0
   6 smon timer                         74         300           0     0      0
   9 Null event                        347           0         300     0      0
  16 SQL*Net message from client      1064  1650815315           1     0     -1

如果指令碼查詢的結果顯示正在等待一個enqueue等待事件那麼你將需要檢查與你hang會話相關的鎖資訊
column sid format 990
column type format a2
column id1 format 9999999990
column id2 format 9999999990
column Lmode format 990
column request format 990
select * from v$lock
/
Spinning
在spin的情況下事件通常來說是靜態的且會話不會是正在等待一個事件--而是在等待cpu(注意在極少數情況下,這個事件依賴於執行spin的程式碼也可能不會靜態的.如果會自豪感是spin它將嚴重使用cpu和記憶體資源.
對於一個spin的情況重要的是要檢測會話正處於spinning的程式碼.從事件的一些跡象說明通常需要對一個程式生成幾次的錯誤堆疊資訊用來分析:
connect sys/sys as sysdba
oradebug setospid
oradebug unlimit
oradebug dump errorstack 3
oradebug dump errorstack 3
oradebug dump errorstack 3
這裡的spid是作業系統識別符號可以從v$process檢視是得到.
Hanging
在正常的情況下在v$session_wait檢視中的值應該是用每個會話執行的不同操作來替換.
在hang住的情況下對於一個或一組特定會話的所有系統事件將會是保持靜態狀態且程式不會消耗任何cpu和記憶體資源.鑑於會話現在沒有請求鎖定任何資源這就叫hang
在這種情況下可對例項轉儲系統狀態來獲得一些更詳細更有用的資訊.
ALTER SESSION SET EVENTS 'IMMEDIATE TRACE NAME SYSTEMSTATE LEVEL XX';
在oralce9.2.0.6或oracle10.1.0.4或在oracle10g中最高的版本的中這裡的xx是266.執行上面的命令在你的user_dump_dest目錄中會生成系統狀態跟蹤檔案.
透過下面的查詢可以得到問題程式的程式ID
SELECT pid FROM v$process
WHERE addr =
(SELECT paddr FROM v$session
WHERE sid = sid_of_problem_session);
系統狀態轉儲檔案包含了每一個程式的資訊.可以透過搜尋'PROCESS '來找到每一個程式的詳細資訊.透過搜尋'waiting for'來找到當前正在等待的事件.

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/26015009/viewspace-1101956/,如需轉載,請註明出處,否則將追究法律責任。

相關文章