結論

1,阻塞process allocation latch,資料庫會出現latch free及dfs lock handle
2,dfs lock handle透過systemstate dump找不到有價值的資訊
卻可透過v$session的p1,p2,p3找到有價值的資訊
3,因為dfs lock handle的p1,p2,p3含義不知，只能藉助於mos,找到文章：Understanding 'DFS Lock Handle' and Cross-Instance (CI) Call Waits [ID 458292.1]
定位到鎖CI
4，關於鎖CI的含義請見v$lock_type
5, 關於ID1及ID2請見 Understanding 'DFS Lock Handle' and Cross-Instance (CI) Call Waits [ID 458292.1]
6，根據ID1及ID2不同的值，即會在不同的解決思路
7，ORACLE獲取鎖TYEP的指令碼
select chr(bitand(p1,-16777216)/16777215) || chr(bitand(p1, 16711680)/65535) "Lock_type" from dual; --lock_type對應v$lock.type

測試

--oradebug suspend process allocation latch前
SQL> select sid,serial#,paddr from v$session where sid=(select sid from v$mystat where rownum=1);

SID SERIAL# PADDR
---------- ---------- ----------------
153 742 0000000083A60520

SQL> select pid,spid from v$process where addr='0000000083A60520';

PID SPID
---------- ------------
22 28711

SQL> create table t_process(a int);

Table created.

SQL> select sid,serial#,program,event,blocking_session,p1,p1text,p2,p2text,p3,p3text from v$session where sid=153;

SID SERIAL# PROGRAM EVENT BLOCKING_SESSION P1 P1TEXT P2 P2TEXT P3 P3TEXT
---------- ---------- ------------------------------ --------------- ---------------- ---------- --------------- ---------- --------------- ---------- ---------------
153 742 sqlplus@jingfa1 (TNS V1-V3) SQL*Net message 1650815232 driver id 1 #bytes 0
from client

---oradebug suspend process allocation latch後

SQL> select name,addr,latch#,level# from v$latch where name='process allocation';

NAME ADDR LATCH# LEVEL#
-------------------- ---------------- ---------- ----------
process allocation 0000000060007498 3 1

SQL> select 'oradebug poke 0x'||addr||' 4 0x00000001;' from v$latch where latch#=3;

'ORADEBUGPOKE0X'||ADDR||'40X00000001;'
----------------------------------------------
oradebug poke 0x0000000060007498 4 0x00000001;

SQL> oradebug setmypid
Statement processed.
SQL> oradebug poke 0x0000000060007498 4 0x00000001;
BEFORE: [060007498, 06000749C) = 00000000
AFTER: [060007498, 06000749C) = 00000001

可見dml不受影響
SQL> insert into t_process values(1);

1 row created.

SQL> commit;

Commit complete.

而發生檢查點,不過過了一會兒就正常了,這裡就不好說process allocation latch對於發生檢查點的影響程度了
SQL> alter system checkpoint;

等待dfs lock handle
SQL> select sid,serial#,program,event,blocking_session,p1,p1text,p2,p2text,p3,p3text from v$session where sid=153;

SID SERIAL# PROGRAM EVENT BLOCKING_SESSION P1 P1TEXT P2 P2TEXT P3 P3TEXT
---------- ---------- ------------------------------ --------------- ---------------- ---------- --------------- ---------- --------------- ---------- ---------------
153 742 sqlplus@jingfa1 (TNS V1-V3) DFS lock handle 1128857605 type|mode 2 id1 2 id2

SQL> select sid,serial#,program,event from v$session where type='USER' and wait_class<>'Idle';

SID SERIAL# PROGRAM EVENT
---------- ---------- ------------------------------ --------------------------------------------------
132 866 sqlplus@jingfa1 (TNS V1-V3) SQL*Net message to client

新增2個無法登陸的會話，可見後臺臺程式mmon及lmon分別在等待dfs lock handle及latch free
SQL> select sid,serial#,program,event from v$session where wait_class<>'Idle';

SID SERIAL# PROGRAM EVENT
---------- ---------- ------------------------------ --------------------------------------------------
132 866 sqlplus@jingfa1 (TNS V1-V3) SQL*Net message to client
156 1 oracle@jingfa1 (MMON) DFS lock handle
167 1 oracle@jingfa1 (LMON) latch free

新增一個無法登陸會話，MMON又變成等待os thread startup
SQL> select sid,serial#,program,event from v$session where wait_class<>'Idle';

SID SERIAL# PROGRAM EVENT
---------- ---------- ------------------------------ --------------------------------------------------
132 866 sqlplus@jingfa1 (TNS V1-V3) SQL*Net message to client
156 1 oracle@jingfa1 (MMON) os thread startup
167 1 oracle@jingfa1 (LMON) latch free

繼續新增一個無法登陸會話，等待會話由3個變成了4個
SQL> select sid,serial#,program,event from v$session where wait_class<>'Idle';

SID SERIAL# PROGRAM EVENT
---------- ---------- ------------------------------ --------------------------------------------------
132 866 sqlplus@jingfa1 (TNS V1-V3) SQL*Net message to client
156 1 oracle@jingfa1 (MMON) DFS lock handle
160 1 oracle@jingfa1 (CKPT) os thread startup
167 1 oracle@jingfa1 (LMON) latch free

新增會話，同上

新增會話，由4個會話變成了3個會話，少了CKPT程式
SQL> select sid,serial#,program,event from v$session where wait_class<>'Idle';

SID SERIAL# PROGRAM EVENT
---------- ---------- ------------------------------ --------------------------------------------------
132 866 sqlplus@jingfa1 (TNS V1-V3) SQL*Net message to client
156 1 oracle@jingfa1 (MMON) DFS lock handle
167 1 oracle@jingfa1 (LMON) latch free

新增會話,同上

分析dump,暫未分析出有價值的內容
SQL> oradebug setmypid
Statement processed.
SQL> oradebug dump systemstate 3
Statement processed.
SQL> oradebug tracefile_name
/u01/app/oracle/admin/jingfa/udump/jingfa1_ora_27463.trc

轉換思路，dfs lock handle幾個引數含義到底是什麼
SQL> select sid,serial#,program,event,p1,p1text,p2,p2text,p3,p3text from v$session where wait_class<>'Idle';

SID SERIAL# PROGRAM EVENT P1 P1TEXT P2 P2TEXT P3 P3TEXT
---------- ---------- ------------------------------ --------------- ---------- --------------- ---------- --------------- ---------- ---------------
132 866 sqlplus@jingfa1 (TNS V1-V3) SQL*Net message 1650815232 driver id 1 #bytes 0
to client

156 1 oracle@jingfa1 (MMON) DFS lock handle 1128857605 type|mode 10 id1 2 id2
167 1 oracle@jingfa1 (LMON) latch free 1610642584 address 3 number 177447 tries

也上DUMP檔案匹配，但還是不知是什麼含義
SQL> select to_char('1128857605','xxxxxxxxx') from dual;

TO_CHAR('1
----------
43490005

經過查閱MOS發現如下文章：
Understanding 'DFS Lock Handle' and Cross-Instance (CI) Call Waits [ID 458292.1]
獲知：id1及id2的含義：可見id1=10，即 Create Remote parallel query Server ，說明與程式建立有關係，也就是說無法建立程式
id2=2,即呼叫後臺程式中一個函式

1 reuse (checkpoint and invalidate block range)
2 LGWR Checkpointing and Hot Backup
3 DBWR syncronization of SGA with control file
4 log file add/drop/rename notification
5 miscellaneous CTWR operations
7 Manged standby recovery related
8 alter rollback segment optimal
9 Signal Query Servers/coordinator
10 Create Remote parallel query Server
11 Set Global Partitions
12 Stop Disk Writes
13 Drop Sort Segments
14 Release unused space from Sort Segments
15 instance Recovery for Parallel operation Group
16 Validate parallel slave Lock Value
17 check transaction state objects
18 object reuse request
19 rolling release checks
20 propagate begin backup scn for a file
21 refresh top plan (for db scheduler
22 clear checkpoint progress record
23 drop temp file
24 QUiesce database Restricted
25 Update Dscn Tracking (ktcndt
26 Purge dictionary Object number Cache
27 set Database Force Logging mode
28 invalidate cached file address translations
29 Cursor Unauthorize Mode
30 process waiters after row cache requeue
31 Active Change Directory extent relocation
32 block change tracking state change
33 kgl mulitversion obsolete
34 set previous resetlogs data
35 set recovery destination pointer
36 fast object reuse request
38 ASM diskgroup discovery wait
39 ASM diskgroup release
40 ASM push DB updates
41 ASM add ACD chunk
42 ASM map resize message
43 ASM map lock message
44 ASM map unlock message (phase 1
45 ASM map unlock message (phase 2
46 ASM generate add disk redo marker
d47 ASM check of PST validity
48 ASM offline disk CIC
49 Logical Standby Sync Point SCN
50 update SQL Tuning Base existence bitvector
51 PQ induced Checkpointing
52 ASM F1X0 relocation
53 Scheduler autostart
54 KZS increment grant/revoke counter
55 ASM disk operation message
56 ASM I/O error emulation
57 DB Supp log cursor invalidation
58 Cache global range invalidation
59 Cache global object invalidation
60 ASM Pre-Existing Extent Lock wait
61 Perform a ksk action through DBWR
62 ASM diskgroup refresh wait
63 KCBO object checkpoint
64 KCBO object pq checkpoint
65 global health check event
66 Oracle Label Security refresh
67 thread internal enable
68 cross-instance registration
69 KGL purge unused subheaps
70 clear pin instance flag
71 Rolling operations CIC
The following can be used to understand id2:

1 used to pass in parameters
2 used to invoke the function in backgroud process
3 used to indicate the foreground has not returned
4 mounted excl, use to allocate mechanism
5 used to queue up interested clients

基於上述分析，獲取fs lock handle到底是什麼鎖
SQL> select chr(bitand(1128857605,-16777216)/16777215) || chr(bitand(1128857605, 16711680)/65535) "Lock_name" from dual;

Lock_name
--------------------
CI

獲取CI鎖的含義，用於協調跨例項函式的呼叫，可見用於RAC
SQL> select type,name,id1_tag,id2_tag,is_user,description from v$lock_type where type='CI';

TYPE NAME ID1_TAG ID2_TAG IS_USER DESCRIPTION
--------------- -------------------- ---------- ---------- ---------- --------------------------------------------------
CI Cross-Instance Call opcode type NO Coordinates cross-instance function invocations
Invocation

發現沒有任何會話持CI鎖
SQL> select sid,type,id1,id2,lmode,request from v$lock where type='CI';

no rows selected

一執行全域性GV$LOCK馬上就HANG住了

SQL> select sid,type,id1,id2,lmode,request from gv$lock where type='CI';

在另一個節點查詢看看,也沒有內容
SQL> select sid,type,id1,id2,lmode,request from v$lock where type='CI';

no rows selected

---換個思路，檢視對應節點的告警日誌

上述查詢一直HNAG時，告警日誌是沒有資訊的，但如果你強制取消，出現如下資訊，可見是啟動並行程式時報錯
引申，DB HANG或出故障時，查詢GV檢視會去建立並行查詢程式（到於為何要建立這個並行查詢程式，將在下文分析）
Fri Nov 13 12:06:58 2015
Timed out trying to start process PZ99.

這個就和前面的聯絡起來吧，哈哈，因為oradebug suspend process allocation latch

我們resume process allocation latch
[oracle@jingfa1 bdump]$ ps -ef|grep ora_|wc -l
25
[oracle@jingfa1 bdump]$

我們再次執行gv$lock查詢
可見沒有程式沒有變化
[oracle@jingfa1 bdump]$ ps -ef|grep ora_|wc -l
25
[oracle@jingfa1 bdump]$

由oradebug poke process allocation latch引發dfs lock handle等待事件進一步分析

結論

測試

相關文章