由oradebug poke process allocation latch引發dfs lock handle等待事件進一步分析

wisdomone1發表於2015-11-14

結論

1,阻塞process allocation latch,資料庫會出現latch free及dfs lock handle
2,dfs lock handle透過systemstate dump找不到有價值的資訊
  卻可透過v$session的p1,p2,p3找到有價值的資訊
3,因為dfs lock handle的p1,p2,p3含義不知,只能藉助於mos,找到文章:Understanding 'DFS Lock Handle' and Cross-Instance (CI) Call Waits [ID 458292.1]
  定位到鎖CI
4,關於鎖CI的含義請見v$lock_type
5, 關於ID1及ID2請見   Understanding 'DFS Lock Handle' and Cross-Instance (CI) Call Waits [ID 458292.1]
6,根據ID1及ID2不同的值,即會在不同的解決思路
7,ORACLE獲取鎖TYEP的指令碼
select chr(bitand(p1,-16777216)/16777215) || chr(bitand(p1, 16711680)/65535) "Lock_type" from dual; --lock_type對應v$lock.type



測試



--oradebug suspend process allocation latch前
SQL> select sid,serial#,paddr from v$session where sid=(select sid from v$mystat where rownum=1);


       SID    SERIAL# PADDR
---------- ---------- ----------------
       153        742 0000000083A60520


SQL> select pid,spid from v$process where addr='0000000083A60520';


       PID SPID
---------- ------------
        22 28711


SQL> create table t_process(a int);


Table created.


SQL> select sid,serial#,program,event,blocking_session,p1,p1text,p2,p2text,p3,p3text from v$session where  sid=153;


       SID    SERIAL# PROGRAM                        EVENT           BLOCKING_SESSION         P1 P1TEXT                  P2 P2TEXT                  P3 P3TEXT
---------- ---------- ------------------------------ --------------- ---------------- ---------- --------------- ---------- --------------- ---------- ---------------
       153        742 sqlplus@jingfa1 (TNS V1-V3)    SQL*Net message                  1650815232 driver id                1 #bytes                   0
                                                      from client




---oradebug suspend process allocation latch後


SQL> select name,addr,latch#,level# from v$latch where name='process allocation';


NAME                 ADDR                 LATCH#     LEVEL#
-------------------- ---------------- ---------- ----------
process allocation   0000000060007498          3          1


SQL> select 'oradebug poke 0x'||addr||' 4 0x00000001;' from v$latch where latch#=3;


'ORADEBUGPOKE0X'||ADDR||'40X00000001;'
----------------------------------------------
oradebug poke 0x0000000060007498 4 0x00000001;        


SQL> oradebug setmypid
Statement processed.
SQL> oradebug poke 0x0000000060007498 4 0x00000001;
BEFORE: [060007498, 06000749C) = 00000000
AFTER:  [060007498, 06000749C) = 00000001


可見dml不受影響
SQL> insert into t_process values(1);


1 row created.


SQL> commit;


Commit complete.


而發生檢查點,不過過了一會兒就正常了,這裡就不好說process allocation latch對於發生檢查點的影響程度了
SQL> alter system checkpoint;


等待dfs lock handle
SQL> select sid,serial#,program,event,blocking_session,p1,p1text,p2,p2text,p3,p3text from v$session where  sid=153;


       SID    SERIAL# PROGRAM                        EVENT           BLOCKING_SESSION         P1 P1TEXT                  P2 P2TEXT                  P3 P3TEXT
---------- ---------- ------------------------------ --------------- ---------------- ---------- --------------- ---------- --------------- ---------- ---------------
       153        742 sqlplus@jingfa1 (TNS V1-V3)    DFS lock handle                  1128857605 type|mode                2 id1                      2 id2






SQL> select sid,serial#,program,event from v$session where type='USER' and wait_class<>'Idle';


       SID    SERIAL# PROGRAM                        EVENT
---------- ---------- ------------------------------ --------------------------------------------------
       132        866 sqlplus@jingfa1 (TNS V1-V3)    SQL*Net message to client




新增2個無法登陸的會話,可見後臺臺程式mmon及lmon分別在等待dfs lock handle及latch free
SQL> select sid,serial#,program,event from v$session where   wait_class<>'Idle';


       SID    SERIAL# PROGRAM                        EVENT
---------- ---------- ------------------------------ --------------------------------------------------
       132        866 sqlplus@jingfa1 (TNS V1-V3)    SQL*Net message to client
       156          1 oracle@jingfa1 (MMON)          DFS lock handle
       167          1 oracle@jingfa1 (LMON)          latch free    


新增一個無法登陸會話,MMON又變成等待os thread startup
SQL> select sid,serial#,program,event from v$session where   wait_class<>'Idle';


       SID    SERIAL# PROGRAM                        EVENT
---------- ---------- ------------------------------ --------------------------------------------------
       132        866 sqlplus@jingfa1 (TNS V1-V3)    SQL*Net message to client
       156          1 oracle@jingfa1 (MMON)          os thread startup
       167          1 oracle@jingfa1 (LMON)          latch free          




繼續新增一個無法登陸會話,等待會話由3個變成了4個
SQL> select sid,serial#,program,event from v$session where   wait_class<>'Idle';


       SID    SERIAL# PROGRAM                        EVENT
---------- ---------- ------------------------------ --------------------------------------------------
       132        866 sqlplus@jingfa1 (TNS V1-V3)    SQL*Net message to client
       156          1 oracle@jingfa1 (MMON)          DFS lock handle
       160          1 oracle@jingfa1 (CKPT)          os thread startup
       167          1 oracle@jingfa1 (LMON)          latch free


 新增會話,同上
 
 新增會話,由4個會話變成了3個會話,少了CKPT程式
 SQL> select sid,serial#,program,event from v$session where   wait_class<>'Idle';


       SID    SERIAL# PROGRAM                        EVENT
---------- ---------- ------------------------------ --------------------------------------------------
       132        866 sqlplus@jingfa1 (TNS V1-V3)    SQL*Net message to client
       156          1 oracle@jingfa1 (MMON)          DFS lock handle
       167          1 oracle@jingfa1 (LMON)          latch free      




新增會話,同上


分析dump,暫未分析出有價值的內容
SQL> oradebug setmypid
Statement processed.
SQL> oradebug dump systemstate 3
Statement processed.
SQL> oradebug tracefile_name
/u01/app/oracle/admin/jingfa/udump/jingfa1_ora_27463.trc 


轉換思路,dfs lock handle幾個引數含義到底是什麼
SQL> select sid,serial#,program,event,p1,p1text,p2,p2text,p3,p3text from v$session where   wait_class<>'Idle';


       SID    SERIAL# PROGRAM                        EVENT                   P1 P1TEXT                  P2 P2TEXT                  P3 P3TEXT
---------- ---------- ------------------------------ --------------- ---------- --------------- ---------- --------------- ---------- ---------------
       132        866 sqlplus@jingfa1 (TNS V1-V3)    SQL*Net message 1650815232 driver id                1 #bytes                   0
                                                      to client


       156          1 oracle@jingfa1 (MMON)          DFS lock handle 1128857605 type|mode               10 id1                      2 id2
       167          1 oracle@jingfa1 (LMON)          latch free      1610642584 address                  3 number              177447 tries


也上DUMP檔案匹配,但還是不知是什麼含義
SQL> select to_char('1128857605','xxxxxxxxx') from dual;


TO_CHAR('1
----------
  43490005


經過查閱MOS發現如下文章:
Understanding 'DFS Lock Handle' and Cross-Instance (CI) Call Waits [ID 458292.1]
獲知:id1及id2的含義:可見id1=10,即 Create Remote parallel query Server ,說明與程式建立有關係,也就是說無法建立程式
      id2=2,即呼叫後臺程式中一個函式


1 reuse (checkpoint and invalidate block range) 
2 LGWR Checkpointing and Hot Backup 
3 DBWR syncronization of SGA with control file 
4 log file add/drop/rename notification 
5 miscellaneous CTWR operations 
7 Manged standby recovery related 
8 alter rollback segment optimal 
9 Signal Query Servers/coordinator 
10 Create Remote parallel query Server 
11 Set Global Partitions 
12 Stop Disk Writes 
13 Drop Sort Segments 
14 Release unused space from Sort Segments 
15 instance Recovery for Parallel operation Group 
16 Validate parallel slave Lock Value 
17 check transaction state objects 
18 object reuse request 
19 rolling release checks 
20 propagate begin backup scn for a file 
21 refresh top plan (for db scheduler 
22 clear checkpoint progress record 
23 drop temp file 
24 QUiesce database Restricted 
25 Update Dscn Tracking (ktcndt 
26 Purge dictionary Object number Cache 
27 set Database Force Logging mode 
28 invalidate cached file address translations 
29 Cursor Unauthorize Mode 
30 process waiters after row cache requeue 
31 Active Change Directory extent relocation 
32 block change tracking state change 
33 kgl mulitversion obsolete 
34 set previous resetlogs data 
35 set recovery destination pointer 
36 fast object reuse request 
38 ASM diskgroup discovery wait 
39 ASM diskgroup release 
40 ASM push DB updates 
41 ASM add ACD chunk 
42 ASM map resize message 
43 ASM map lock message 
44 ASM map unlock message (phase 1 
45 ASM map unlock message (phase 2 
46 ASM generate add disk redo marker 
d47 ASM check of PST validity 
48 ASM offline disk CIC 
49 Logical Standby Sync Point SCN 
50 update SQL Tuning Base existence bitvector 
51 PQ induced Checkpointing 
52 ASM F1X0 relocation 
53 Scheduler autostart 
54 KZS increment grant/revoke counter 
55 ASM disk operation message 
56 ASM I/O error emulation 
57 DB Supp log cursor invalidation 
58 Cache global range invalidation 
59 Cache global object invalidation 
60 ASM Pre-Existing Extent Lock wait 
61 Perform a ksk action through DBWR 
62 ASM diskgroup refresh wait 
63 KCBO object checkpoint 
64 KCBO object pq checkpoint 
65 global health check event 
66 Oracle Label Security refresh 
67 thread internal enable 
68 cross-instance registration 
69 KGL purge unused subheaps 
70 clear pin instance flag 
71 Rolling operations CIC
The following can be used to understand id2:


1 used to pass in parameters
2 used to invoke the function in backgroud process 
3 used to indicate the foreground has not returned
4 mounted excl, use to allocate mechanism 
5 used to queue up interested clients  


基於上述分析,獲取fs lock handle到底是什麼鎖
SQL> select chr(bitand(1128857605,-16777216)/16777215) || chr(bitand(1128857605, 16711680)/65535) "Lock_name" from dual;


Lock_name
--------------------
CI


獲取CI鎖的含義,用於協調跨例項函式的呼叫,可見用於RAC
SQL> select type,name,id1_tag,id2_tag,is_user,description from v$lock_type where type='CI';


TYPE            NAME                 ID1_TAG    ID2_TAG    IS_USER    DESCRIPTION
--------------- -------------------- ---------- ---------- ---------- --------------------------------------------------
CI              Cross-Instance Call  opcode     type       NO         Coordinates cross-instance function invocations
                Invocation






發現沒有任何會話持CI鎖
SQL> select sid,type,id1,id2,lmode,request from v$lock where type='CI';


no rows selected




一執行全域性GV$LOCK馬上就HANG住了


SQL> select sid,type,id1,id2,lmode,request from gv$lock where type='CI';


在另一個節點查詢看看,也沒有內容
SQL> select sid,type,id1,id2,lmode,request from v$lock where type='CI';


no rows selected




---換個思路,檢視對應節點的告警日誌


上述查詢一直HNAG時,告警日誌是沒有資訊的,但如果你強制取消,出現如下資訊,可見是啟動並行程式時報錯
引申,DB HANG或出故障時,查詢GV檢視會去建立並行查詢程式(到於為何要建立這個並行查詢程式,將在下文分析)
Fri Nov 13 12:06:58 2015
Timed out trying to start process PZ99.




這個就和前面的聯絡起來吧,哈哈,因為oradebug suspend process allocation latch




我們resume process allocation latch
[oracle@jingfa1 bdump]$ ps -ef|grep ora_|wc -l
25
[oracle@jingfa1 bdump]$ 


我們再次執行gv$lock查詢
可見沒有程式沒有變化
[oracle@jingfa1 bdump]$ ps -ef|grep ora_|wc -l
25
[oracle@jingfa1 bdump]$ 






來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/9240380/viewspace-1837910/,如需轉載,請註明出處,否則將追究法律責任。

相關文章