使用oradebug dump hanganalyze 分析oracle hang系列三

wisdomone1發表於2015-11-06

背景

   db hang時,分析下oradebug dump hanganalyze 產生的trace file第一部分內容與未HANG的區別,進一步理解其內容構成。


結論

1,DB 未HANG時,第一部分顯示如下:


Chains most likely to have caused the hang:
 [a] Chain 1 Signature: <not in a wait><='latch: cache buffers chains'
     Chain 1 Signature Hash: 0xccebf225
 [b] Chain 2 Signature: <not in a wait><='buffer busy waits'
     Chain 2 Signature Hash: 0x9a13abed
 [c] Chain 3 Signature: <not in a wait><='latch: cache buffers chains'
     Chain 3 Signature Hash: 0xccebf225




2,DB HAGN時,第一部分顯示如下
Chains most likely to have caused the hang:


*** 2015-11-06 03:42:00.529
 [a] Chain 1 Signature: 'rdbms ipc message'<='log file sync'
     Chain 1 Signature Hash: 0xcfe404d3
 [b] Chain 2 Signature: 'rdbms ipc message'<='log file sync'
     Chain 2 Signature Hash: 0xcfe404d3
 [c] Chain 3 Signature: 'rdbms ipc message'<='log file sync'
     Chain 3 Signature Hash: 0xcfe404d3


3,可見二者區別在於未HANG有not in a wait的字樣,而HANG卻是具體的等待事件
4, 可以基於第一部分的   Chain 1 Signature Hash: 0xcfe404d3直接定位到發生HANG的chain
   然後再查詢       is blocked by,即可找到持鎖會話,進行針對性分析即可




測試



1,資料庫未HANG時產生的oracle dump hanganalyze 3的trace file第一部分


 
Chains most likely to have caused the hang:
 [a] Chain 1 Signature: <not in a wait><='latch: cache buffers chains'
     Chain 1 Signature Hash: 0xccebf225
 [b] Chain 2 Signature: <not in a wait><='buffer busy waits'
     Chain 2 Signature Hash: 0x9a13abed
 [c] Chain 3 Signature: <not in a wait><='latch: cache buffers chains'
     Chain 3 Signature Hash: 0xccebf225




2,模擬DB HANG,分析TRACE FILE第一部分




--oradebug session1
SQL> select pname,pid,spid,addr from v$process where pname in ('LGWR','PMON');


PNAME             PID SPID                                             ADDR
---------- ---------- ------------------------------------------------ ----------------
PMON                2 3856                                             00000000DD59FC40
LGWR               11 3876                                             00000000DD5A8E80


SQL> select pid,spid,addr from v$process where addr=(select paddr from v$session where sid=(select sid from v$mystat where rownum=1));


       PID SPID                                             ADDR
---------- ------------------------------------------------ ----------------
        19 3952                                             00000000DD5B1080




SQL> select sid from v$session where sid=(select sid from v$mystat where rownum=1);


       SID
----------
        17




--session 2
SQL> select sid from v$session where sid=(select sid from v$mystat where rownum=1);


       SID
----------
       183


SQL> select pid,spid,addr from v$process where addr=(select paddr from v$session where sid=(select sid from v$mystat where rownum=1));


       PID SPID                                             ADDR
---------- ------------------------------------------------ ----------------
        35 6845                                             00000000DD5C1480




SQL> create table t_db_hang(a int);


Table created.




--session1
SQL> oradebug setospid 3876
Oracle pid: 11, Unix process pid: 3876, image: oracle@seconary (LGWR)
SQL> oradebug suspend
Statement processed.


--session2 


---insert hang
SQL> insert into t_db_hang values(1);


-- session 3


--可見session 3不能產生新的會話,資料庫已經HANG住
[oracle@seconary ~]$ sqlplus scott/system


SQL*Plus: Release 11.2.0.1.0 Production on Fri Nov 6 03:37:28 2015


Copyright (c) 1982, 2009, Oracle.  All rights reserved.


--session1 
--生成一個oradebug dump hanganalyze 3




SQL> oradebug setmypid
Statement processed.
SQL> oradebug dump hanganalyze 3
Statement processed.
SQL> oradebug tracefile_name
/oracle/diag/rdbms/guowang/guowang/trace/guowang_ora_3952.trc


--分析trace file


第一部分
Chains most likely to have caused the hang:


*** 2015-11-06 03:42:00.529
 [a] Chain 1 Signature: 'rdbms ipc message'<='log file sync'
     Chain 1 Signature Hash: 0xcfe404d3
 [b] Chain 2 Signature: 'rdbms ipc message'<='log file sync'
     Chain 2 Signature Hash: 0xcfe404d3
 [c] Chain 3 Signature: 'rdbms ipc message'<='log file sync'
     Chain 3 Signature Hash: 0xcfe404d3


基於等待會話183查詢,發現如下資訊,可見等待會話11,如果發現DB HANG了,也可以查詢is blocked by,即可找到持鎖會話,進行針對性分析即可
-------------------------------------------------------------------------------
Chain 80:
-------------------------------------------------------------------------------
    Oracle session identified by:
    {
                instance: 1 (guowang.guowang)
                   os id: 6845
              process id: 35, oracle@seconary (TNS V1-V3)
              session id: 183
        session serial #: 183
    }
    is waiting for 'log file sync' with wait info:  --發現其在等待LGWR,其實這個和上面ORADEBUG 模擬LGWR被阻塞也有關
    {
                      p1: 'buffer#'=0x22f7
                      p2: 'sync scn'=0x4705d0e
            time in wait: 1 min 43 sec (last interval)
            time in wait: 5 min 14 sec (total)
           timeout after: never
                 wait id: 241
                blocking: 0 sessions
            wait history:
              * time between current wait and wait #1: 0.000000 sec
              1.       event: 'latch free'
                 time waited: 0.004440 sec
                     wait id: 242             p1: 'address'=0xdf0e9298
                                              p2: 'number'=0x2
                                              p3: 'tries'=0x0
              * time between wait #1 and #2: 0.000000 sec
              2.       event: 'log file sync'
                 time waited: 3 min 30 sec
                     wait id: 241             p1: 'buffer#'=0x22f7
                                              p2: 'sync scn'=0x4705d0e
              * time between wait #2 and #3: 0.120403 sec
              3.       event: 'db file sequential read'
                 time waited: 0.014195 sec
                     wait id: 240             p1: 'file#'=0x1
                                              p2: 'block#'=0xaad
                                              p3: 'blocks'=0x1
    }
    and is blocked by 'instance: 1, os id: 3876, session id: 11',
    which is a member of 'Chain 1'.




    看下等待會話11是什麼,可見會話11就是後臺程式LGWR


        and is blocked by
 => Oracle session identified by:
    {
                instance: 1 (guowang.guowang)
                   os id: 3876
              process id: 11, oracle@seconary (LGWR)
              session id: 11
        session serial #: 1


--既然lgwr是阻塞會話,恢復它的功能即可
--session1
[oracle@seconary ~]$ sqlplus -prelim '/as sysdba'


SQL*Plus: Release 11.2.0.1.0 Production on Fri Nov 6 03:51:57 2015


Copyright (c) 1982, 2009, Oracle.  All rights reserved.


SQL> 


SQL> oradebug setospid 3876
Oracle pid: 11, Unix process pid: 3876, image: oracle@seconary (LGWR)
SQL> oradebug resume
Statement processed.


---session 2
--hang dml執行成功
SQL> insert into t_db_hang values(1);


1 row created.


--session 3也可以新生成會話
[oracle@seconary ~]$ sqlplus scott/system


SQL*Plus: Release 11.2.0.1.0 Production on Fri Nov 6 03:58:36 2015


Copyright (c) 1982, 2009, Oracle.  All rights reserved.




Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options


SQL> 




再學習下另一個知識


Chains most likely to have caused the hang:


*** 2015-11-06 03:42:00.529
 [a] Chain 1 Signature: 'rdbms ipc message'<='log file sync'
     Chain 1 Signature Hash: 0xcfe404d3
 [b] Chain 2 Signature: 'rdbms ipc message'<='log file sync'
     Chain 2 Signature Hash: 0xcfe404d3
 [c] Chain 3 Signature: 'rdbms ipc message'<='log file sync'
     Chain 3 Signature Hash: 0xcfe404d3




 經過分析,這就是發生HANG的相關chain資訊,可以由具體的chain signature hash對應值,直接定們到具體的chain即可    

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/9240380/viewspace-1824819/,如需轉載,請註明出處,否則將追究法律責任。

相關文章