使用oradebug dump hanganalyze分析oracle hang系列一

wisdomone1發表於2015-11-05

背景

  oracle db有可能因為諸多因系而hang,可以藉助某些工具進行診斷分析,本文主要學習如何用不用oradebug dump hanganalyze 3進行分析產生的TRACE FILE.
 以及所包含的內容結構。 




結論

1,oradebug hanganalyze 3產生的TRACE FILE包括3部分內容
2,各部分內容如下:


第一部分
Chains most likely to have caused the hang:
 [a] Chain 1 Signature: <not in a wait><='latch: cache buffers chains'
     Chain 1 Signature Hash: 0xccebf225
 [b] Chain 2 Signature: <not in a wait><='buffer busy waits'
     Chain 2 Signature Hash: 0x9a13abed
 [c] Chain 3 Signature: <not in a wait><='latch: cache buffers chains'
     Chain 3 Signature Hash: 0xccebf225


第二部分(當然這裡面又分為intersecting chains或者non-intersecting chains)
===============================================================================
Non-intersecting chains:
 
-------------------------------------------------------------------------------
Chain 1:
-------------------------------------------------------------------------------
    Oracle session identified by:
    {
                instance: 1 (guowang.guowang)
                   os id: 10574
              process id: 10, oracle@seconary (DBW0)
              session id: 10
        session serial #: 1
    }
    is waiting for 'latch: cache buffers chains' with wait info:
    {
                      p1: 'address'=0xda4f8850
                      p2: 'number'=0x96
                      p3: 'tries'=0x0
            time in wait: 0.012628 sec
           timeout after: never
                 wait id: 50243
                blocking: 0 sessions
             current sql: <none>
             short stack: ksedsts()+461<-ksdxfstk()+32<-ksdxcb()+1782<-sspuser()+112<-__restore_rt()<-semop()+7<-skgpwwait()+156<-kslgess()+1799<-ksl_get_shared_latch()+620<-kcbbic1()+247<-kcbbiop()+868<-kcbbdrv()+566<-ksbabs()+465<-ksbrdp()+923<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+266<-ssthrdmain()+214<-main()+201<-__libc_start_main()+244<-_start()+36
            wait history:
              * time between current wait and wait #1: 0.000058 sec
              1.       event: 'db file async I/O submit'
                 time waited: 0.328988 sec
                     wait id: 50242           p1: 'requests'=0x1
                                              p2: 'interrupt'=0x0
                                              p3: 'timeout'=0x0
              * time between wait #1 and #2: 0.000076 sec
              2.       event: 'db file async I/O submit'
                 time waited: 0.000748 sec
                     wait id: 50241           p1: 'requests'=0x1
                                              p2: 'interrupt'=0x0
                                              p3: 'timeout'=0x0
              * time between wait #2 and #3: 0.000086 sec
              3.       event: 'db file async I/O submit'
                 time waited: 0.000177 sec
                     wait id: 50240           p1: 'requests'=0x1
                                              p2: 'interrupt'=0x0
                                              p3: 'timeout'=0x0
    }
    and is blocked by
 => Oracle session identified by:
    {
                instance: 1 (guowang.guowang)
                   os id: 6664
              process id: 91, oracle@seconary (J063)
              session id: 32
        session serial #: 9465
    }
    which is not in a wait:
    {
               last wait: 0.000000 sec ago
                blocking: 6 sessions
             current sql: <none>
             short stack: <none: error encountered - ORA-00072: process "Unix process pid: 6664, image: oracle@seconary (J063)" is not active>
    }
 
Chain 1 Signature: <not in a wait><='latch: cache buffers chains'
Chain 1 Signature Hash: 0xccebf225




第三部分
State of LOCAL nodes
([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]):
[9]/1/10/1/0xdcba9740/10574/NLEAF/[31]
[11]/1/12/1/0xdcba3a60/10578/SINGLE_NODE/
中間類似內容略
[346]/1/347/9522/0xdc7d75d0/6413/NLEAF/[118]
[348]/1/349/7696/0xdc7d18f0/6433/NLEAF/[204]


關於各列含義見下:
根據其每列含義
([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]):






           資料庫例項編號         會話sid    會話serial#   會話的地址saddr   會話對應的作業系統ID          表明會話是否等待           如有值,表明是持鎖會話的CHAIN編號,否則為空    
[163]   /    1              /   164    /   10419    /   0xdc9ea7e0   /    4046              /           NLEAF         /          [301]


3,non-intersecting chains的特色就是所包含的會話的阻塞會話,不隸屬於任何chain
   而intersecting chains包含的會話的阻塞會話隸屬於另一個chains中的持鎖會話,也就是說chains包含在另一個chains中


4,chains中包括每個會話以及阻塞會話的詳細資訊,包括SID,PID,SPID以及當前執行的SQL
  所呼叫堆疊;以及近期的歷史等待會話列表
5,關於 ([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]) 中的state列的各值含義如下,暫不全面(仍需要進一步測試)


NLEAF表明是等待會話
     SINGLE_NODE表明它是等待任何會話或資源
     LEAF表明它是持鎖會話,即它不等待任何會話或資源
     LEAF_NEW它也是持鎖會話


測試

1,資料庫版本
SQL> select * from v$version where rownum=1;


BANNER
----------------------------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production


2,資料庫未有任何HANG時,產生hanganalyze
SQL> oradebug setmypid
Statement processed.
SQL> oradebug hanganalyze 3
Hang Analysis in /oracle/diag/rdbms/guowang/guowang/trace/guowang_ora_13018.trc


*** 2015-11-02 07:02:02.135
===============================================================================
HANG ANALYSIS:
  instances (db_name.oracle_sid): guowang.guowang
  oradebug_node_dump_level: 3 --trace level級別
  analysis initiated by oradebug --標明hanganalyze由oradebug生成
===============================================================================
 
Chains most likely to have caused the hang:--導致HANG的原因
  <no chains found>
 
===============================================================================
No chains found.
===============================================================================
Extra information that will be dumped at higher levels:
 
State of ALL nodes  --全部節點的狀態(我估計適用於RAC)
([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]): --每個列的含義


*** 2015-11-02 07:02:02.163
===============================================================================
END OF HANG ANALYSIS
===============================================================================


*** 2015-11-02 07:02:02.164
===============================================================================
HANG ANALYSIS DUMPS:
  oradebug_node_dump_level: 3
===============================================================================
 
State of LOCAL nodes  --本地節點的狀態
([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]):--同state of all nodes
 
 
No processes qualify for dumping.
 
===============================================================================
HANG ANALYSIS DUMPS: END
===============================================================================






4,手工模擬產生鎖等待,產生hanganalyze


測試會話1
SQL> select sid,serial# from v$session where sid=(select sid from v$mystat where rownum=1);


       SID    SERIAL#
---------- ----------
       302       9226


SQL> select pid,spid from v$process where addr=(select paddr from v$session where sid=(select sid from v$mystat where rownum=1));


       PID SPID
---------- ------------------------------------------------
       165 3812


SQL> create table t_hang(a int,b int);


Table created.


SQL> insert into t_hang values(1,1);


1 row created.


SQL> commit;


Commit complete.




測試會話2
SQL> select sid,serial# from v$session where sid=(select sid from v$mystat where rownum=1);


       SID    SERIAL#
---------- ----------
       164      10419


SQL> select pid,spid from v$process where addr=(select paddr from v$session where sid=(select sid from v$mystat where rownum=1));


       PID SPID
---------- ------------------------------------------------
       184 4046


hang住
SQL> update t_hang set a=2 where a=1;     


SQL> oradebug setmypid
Statement processed.
SQL> oradebug hanganalyze 3
Hang Analysis in /oracle/diag/rdbms/guowang/guowang/trace/guowang_ora_6813.trc


*** 2015-11-02 07:16:19.252
===============================================================================
HANG ANALYSIS:
  instances (db_name.oracle_sid): guowang.guowang
  oradebug_node_dump_level: 3
  analysis initiated by oradebug
===============================================================================
 
Chains most likely to have caused the hang:
 [a] Chain 1 Signature: <not in a wait><='latch: cache buffers chains'
     Chain 1 Signature Hash: 0xccebf225
 [b] Chain 2 Signature: <not in a wait><='buffer busy waits'
     Chain 2 Signature Hash: 0x9a13abed
 [c] Chain 3 Signature: <not in a wait><='latch: cache buffers chains'
     Chain 3 Signature Hash: 0xccebf225
 


--經在TRACE FILE過濾等待會話的sid,即164,以後可以直接查session id:164或者對應的程式號process id:184,或者對應的作業系統程式os id:4096
發現它是在一個chain中,這個chain中清晰標明瞭,等待會話到底在等待什麼資源,以及是哪個會話阻塞它,並且會列出阻塞會話的詳細 資訊
 Chain 10:
-------------------------------------------------------------------------------
    Oracle session identified by:
    {
                instance: 1 (guowang.guowang)
                   os id: 4046
              process id: 184, oracle@seconary (TNS V1-V3)
              session id: 164
        session serial #: 10419
    }
    is waiting for 'enq: TX - row lock contention' with wait info:
    {
                      p1: 'name|mode'=0x54580006
                      p2: 'usn<<16 | slot'=0xca001d
                      p3: 'sequence'=0x2842
            time in wait: 1 min 23 sec
           timeout after: never
                 wait id: 33
                blocking: 0 sessions
             current sql: update t_hang set a=2 where a=1  -會列出當前等待會話正在執行的SQL以及呼叫堆疊
             short stack: ksedsts()+461<-ksdxfstk()+32<-ksdxcb()+1782<-sspuser()+112<-__restore_rt()<-semtimedop()+10<-skgpwwait()+156<-ksliwat()+1821<-kslwaitctx()+162<-ksqcmi()+5641<-ksqgtlctx()+3401<-ksqgelctx()+552<-ktcwit1()+330<-kdddgb()+7099<-kdusru()+490<-updrowFastPath()+1075<-qerupFetch()+2171<-updaul()+1179<-updThreePhaseExe()+328<-updexe()+663<-opiexe()+14377<-kpoal8()+2288<-opiodr()+1149<-ttcpip()+1251<-opitsk()+1633<-opiino()+958<-opiodr()+1149<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+214<-main()+201<-__lib
            wait history:  --等待會話近期的等待事件歷史列表,且基於時間反向排序
              * time between current wait and wait #1: 0.043281 sec
              1.       event: 'Disk file operations I/O'
                 time waited: 0.000090 sec
                     wait id: 32              p1: 'FileOperation'=0x2
                                              p2: 'fileno'=0x4
                                              p3: 'filetype'=0x2
              * time between wait #1 and #2: 0.000764 sec
              2.       event: 'SQL*Net message from client'
                 time waited: 1 min 0 sec
                     wait id: 31              p1: 'driver id'=0x62657100
                                              p2: '#bytes'=0x1
              * time between wait #2 and #3: 0.000007 sec
              3.       event: 'SQL*Net message to client'
                 time waited: 0.000003 sec
                     wait id: 30              p1: 'driver id'=0x62657100
                                              p2: '#bytes'=0x1
    }
    and is blocked by  --標明被哪個會話所阻塞
 => Oracle session identified by:
    {
                instance: 1 (guowang.guowang)
                   os id: 3812
              process id: 165, oracle@seconary (TNS V1-V3)
              session id: 302
        session serial #: 9226
    }
    which is waiting for 'SQL*Net message from client' with wait info:  -同上理,也會列出持鎖會話近期的歷史等待事件列
    {
                      p1: 'driver id'=0x62657100
                      p2: '#bytes'=0x1
            time in wait: 2 min 49 sec
           timeout after: never
                 wait id: 42
                blocking: 1 session
             current sql: <none>
             short stack: ksedsts()+461<-ksdxfstk()+32<-ksdxcb()+1782<-sspuser()+112<-__restore_rt()<-read()+14<-ntpfprd()+115<-nsbasic_brc()+338<-nsbrecv()+69<-nioqrc()+485<-__PGOSF24_opikndf2()+978<-opitsk()+824<-opiino()+958<-opiodr()+1149<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+214<-main()+201<-__libc_start_main()+244<-_start()+36
            wait history:
              * time between current wait and wait #1: 0.000015 sec
              1.       event: 'SQL*Net message to client'
                 time waited: 0.000005 sec
                     wait id: 41              p1: 'driver id'=0x62657100
                                              p2: '#bytes'=0x1
              * time between wait #1 and #2: 0.338132 sec
              2.       event: 'SQL*Net message from client'
                 time waited: 24.359100 sec
                     wait id: 40              p1: 'driver id'=0x62657100
                                              p2: '#bytes'=0x1
              * time between wait #2 and #3: 0.000013 sec
              3.       event: 'SQL*Net message to client'
                 time waited: 0.000005 sec
                     wait id: 39              p1: 'driver id'=0x62657100
                                              p2: '#bytes'=0x1
    }




經後查,hanganalyze會列出所有ORACLE程式的資訊




並且chain又分為intersecting chains和non-intersecting chains


我們來分析下intersecting chains和non-intersecting chains的區別是什麼,先看
non-intersecting chains


經過對比分析,non-intersecting chains的特色就是所包含的會話的阻塞會話,不隸屬於任何chain
Chain 2:
-------------------------------------------------------------------------------
    Oracle session identified by:
    {
                instance: 1 (guowang.guowang)
                   os id: 6728
              process id: 146, oracle@seconary (J117)
              session id: 29
        session serial #: 10069
    }
    is waiting for 'buffer busy waits' with wait info:
    {
                      p1: 'file#'=0x1
                      p2: 'block#'=0x15ab9
                      p3: 'class#'=0x1
            time in wait: 0.012849 sec
           timeout after: never
                 wait id: 31
                blocking: 0 sessions
             current sql: <none>
             short stack: <none: error encountered - ORA-00072: process "Unix process pid: 6728, image: oracle@seconary (J117)" is not active>
            wait history:
              * time between current wait and wait #1: 0.000028 sec
              1.       event: 'enq: TX - index contention'
                 time waited: 0.016712 sec
                     wait id: 30              p1: 'name|mode'=0x54580004
                                              p2: 'usn<<16 | slot'=0x12a0009
                                              p3: 'sequence'=0x1953
              * time between wait #1 and #2: 0.000066 sec
              2.       event: 'buffer busy waits'
                 time waited: 0.552796 sec
                     wait id: 29              p1: 'file#'=0x1
                                              p2: 'block#'=0x15ab9
                                              p3: 'class#'=0x1
              * time between wait #2 and #3: 0.000340 sec
              3.       event: 'buffer busy waits'
                 time waited: 0.497199 sec
                     wait id: 28              p1: 'file#'=0x1
                                              p2: 'block#'=0x225eb
                                              p3: 'class#'=0x1
    }
    and is blocked by
 => Oracle session identified by:
    {
                instance: 1 (guowang.guowang)
                   os id: 6439  --也就是說持鎖會話不隸屬於chain
              process id: 111, oracle@seconary (J083)
              session id: 205
        session serial #: 9696




再來看看intersecting chains


發現intersecting chains包含的會話的阻塞會話隸屬於另一個chains中的持鎖會話,也就是說chains包含在另一個chains中
-------------------------------------------------------------------------------
Chain 12:
-------------------------------------------------------------------------------
    Oracle session identified by:
    {
                instance: 1 (guowang.guowang)
                   os id: 6612
              process id: 59, oracle@seconary (J031)
              session id: 33
        session serial #: 11792
    }
    is waiting for 'buffer busy waits' with wait info:
    {
                      p1: 'file#'=0x1
                      p2: 'block#'=0x15ab9
                      p3: 'class#'=0x1
            time in wait: 0.007726 sec
           timeout after: never
                 wait id: 0
                blocking: 0 sessions
             current sql: <none>
             short stack: <none: error encountered - ORA-00072: process "Unix process pid: 6612, image: oracle@seconary (J031)" is not active>
    }
    and is blocked by 'instance: 1, os id: 6439, session id: 205',
    which is a member of 'Chain 2'.
 
Chain 12 Signature: <not in a wait><='buffer busy waits'
Chain 12 Signature Hash: 0x9a13abed




最後學習下state of all nodes(注:如果是單例項資料庫,其state of all nodes和state of local nodes內容相同)


先摘錄一節內容
State of ALL nodes
([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]):
[9]/1/10/1/0xdcba9740/10574/NLEAF/[31]
[11]/1/12/1/0xdcba3a60/10578/SINGLE_NODE/
[28]/1/29/10069/0xdcb724f0/6728/NLEAF/[204]
[31]/1/32/9465/0xdcb699a0/6664/LEAF_NW/
[32]/1/33/11792/0xdcb66b30/6612/NLEAF/[204]
[33]/1/34/3/0xdcb63cc0/10707/SINGLE_NODE/
[38]/1/39/9453/0xdcb55490/6754/NLEAF/[204]


其實每個列的含義都有註明,我們依次來分析,我採用的方法仍是基於我上述構建的測試2個會話即持鎖會話以及等待鎖會話,在TRACE FILE查等待會話164,匹配結果如下:


[163]/1/164/10419/0xdc9ea7e0/4046/NLEAF/[301]


根據其每列含義
([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]):


我們來分解下:


SQL> select saddr from v$session where sid=164;


SADDR
----------------
00000000DC9EA7E0


           資料庫例項編號         會話sid    會話serial#   會話的地址saddr   會話對應的作業系統ID          表明會話是否等待           如有值,表明是持鎖會話的CHAIN編號,否則為空    
[163]   /    1              /   164    /   10419    /   0xdc9ea7e0   /    4046              /           NLEAF         /          [301]




但是163是什麼含義呢,以及301是何義呢,會不會是chains的編號呢,還有NLEAF是何義,另急,我們試著分析下,我試著查301,定位到如下內容


[301]/1/302/9226/0xdc859f80/3812/LEAF/


經過對比分析,301就是持鎖會話所在的chains的編號,所以可知NLEAF表明是等待鎖的會話,而LEAF就是不等待鎖的會話


還不不夠,因為state列發現有幾個不同的值:NLEAF,SINGLE_NODE,LEAF_NEW,繼續分析


以chain 9為例
([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]):
[9]/1/10/1/0xdcba9740/10574/NLEAF/[31]


從adjlist 31定位到chain 31
[31]/1/32/9465/0xdcb699a0/6664/LEAF_NW/






先看chain 9
Chain 1:
-------------------------------------------------------------------------------
    Oracle session identified by:
    {
                instance: 1 (guowang.guowang)
                   os id: 10574
              process id: 10, oracle@seconary (DBW0)
              session id: 10
        session serial #: 1
    }
    is waiting for 'latch: cache buffers chains' with wait info:
    {
                      p1: 'address'=0xda4f8850
                      p2: 'number'=0x96
                      p3: 'tries'=0x0
            time in wait: 0.012628 sec
           timeout after: never
                 wait id: 50243
                blocking: 0 sessions
             current sql: <none>
             short stack: ksedsts()+461<-ksdxfstk()+32<-ksdxcb()+1782<-sspuser()+112<-__restore_rt()<-semop()+7<-skgpwwait()+156<-kslgess()+1799<-ksl_get_shared_latch()+620<-kcbbic1()+247<-kcbbiop()+868<-kcbbdrv()+566<-ksbabs()+465<-ksbrdp()+923<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+266<-ssthrdmain()+214<-main()+201<-__libc_start_main()+244<-_start()+36
            wait history:
              * time between current wait and wait #1: 0.000058 sec
              1.       event: 'db file async I/O submit'
                 time waited: 0.328988 sec
                     wait id: 50242           p1: 'requests'=0x1
                                              p2: 'interrupt'=0x0
                                              p3: 'timeout'=0x0
              * time between wait #1 and #2: 0.000076 sec
              2.       event: 'db file async I/O submit'
                 time waited: 0.000748 sec
                     wait id: 50241           p1: 'requests'=0x1
                                              p2: 'interrupt'=0x0
                                              p3: 'timeout'=0x0
              * time between wait #2 and #3: 0.000086 sec
              3.       event: 'db file async I/O submit'
                 time waited: 0.000177 sec
                     wait id: 50240           p1: 'requests'=0x1
                                              p2: 'interrupt'=0x0
                                              p3: 'timeout'=0x0
    }
    and is blocked by
 => Oracle session identified by:
    {
                instance: 1 (guowang.guowang)
                   os id: 6664
              process id: 91, oracle@seconary (J063)
              session id: 32
        session serial #: 9465




再看下SINGLE_NODE的含義是什麼
 [33]/1/34/3/0xdcb63cc0/10707/SINGLE_NODE/       


可見SINGLE_NODE表明會話不被任何會話或程式所阻塞
 Chain 146:
-------------------------------------------------------------------------------
    Oracle session identified by:
    {
                instance: 1 (guowang.guowang)
                   os id: 10707
              process id: 29, oracle@seconary (CJQ0)
              session id: 34
        session serial #: 3
    }
    is waiting for 'job scheduler coordinator slave wait' with wait info:
    {
            time in wait: 0.449655 sec
           timeout after: 15.550345 sec
                 wait id: 81976
                blocking: 0 sessions
             current sql: <none>
             short stack: ksedsts()+461<-ksdxfstk()+32<-ksdxcb()+1782<-sspuser()+112<-__restore_rt()<-semtimedop()+10<-skgpwwait()+156<-ksliwat()+1821<-kslwaitctx()+162<-kslwait()+141<-jscrs_select0()+6801<-jscrs_select()+595<-rpiswu2()+1541<-kkjcjexe()+713<-kkjssrh()+559<-ksbcti()+212<-ksbabs()+1732<-ksbrdp()+923<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+266<-ssthrdmain()+214<-main()+201<-__libc_start_main()+244<-_start()+36
            wait history:
              * time between current wait and wait #1: 0.000108 sec
              1.       event: 'rdbms ipc message'
                 time waited: 2.064533 sec
                     wait id: 81975           p1: 'timeout'=0xc8
              * time between wait #1 and #2: 0.006081 sec
              2.       event: 'rdbms ipc message'
                 time waited: 0.501429 sec
                     wait id: 81974           p1: 'timeout'=0x32
              * time between wait #2 and #3: 0.000706 sec
              3.       event: 'job scheduler coordinator slave wait'
                 time waited: 16.003771 sec
                     wait id: 81973           
    }
 
Chain 146 Signature: 'job scheduler coordinator slave wait'
Chain 146 Signature Hash: 0x8c8a9c97


總結:NLEAF表明是等待會話
     SINGLE_NODE表明它是等待任何會話或資源
     LEAF表明它是持鎖會話,即它不等待任何會話或資源
     LEAF_NEW它也是持鎖會話


 
 LEAF和LEAF_NEW的區別是什麼?我們將在下文進行分析與測試。


來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/9240380/viewspace-1823479/,如需轉載,請註明出處,否則將追究法律責任。

相關文章