RAC生產庫出現嚴重row cache lock和log file switch(archiving need)
Oracle: 10.2.0.5 3節點RAC
OS: AIX 6
症狀:
早上發現節點1正在經歷嚴重的row cache lock
Alertlog記錄如下,
Thu Mar 29 19:54:34 EDT 2012
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=109
Thu Mar 29 19:54:34 EDT 2012
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=110
Thu Mar 29 19:54:34 EDT 2012
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=112
Thu Mar 29 19:54:34 EDT 2012
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=114
Thu Mar 29 19:54:34 EDT 2012
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=115
Thu Mar 29 19:54:35 EDT 2012
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=116
Thu Mar 29 19:54:35 EDT 2012
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=117
Thu Mar 29 20:22:19 EDT 2012
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=118
System State dumped to trace file /orasw/app/oracle/admin/BBB/udump/bbb1_ora_12451870.trc
先用ass109.awk分析該檔案,結果顯示所有的row cache lock會話都在等待7000000c9acb348
但是並沒有holder資訊
aaa098[]:/orasw/dba>awk -f ass109.awk /orasw/app/oracle/admin/BBB/udump/bbb1_ora_18284680.trc
OS: AIX 6
症狀:
早上發現節點1正在經歷嚴重的row cache lock
Alertlog記錄如下,
Thu Mar 29 19:54:34 EDT 2012
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=109
Thu Mar 29 19:54:34 EDT 2012
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=110
Thu Mar 29 19:54:34 EDT 2012
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=112
Thu Mar 29 19:54:34 EDT 2012
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=114
Thu Mar 29 19:54:34 EDT 2012
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=115
Thu Mar 29 19:54:35 EDT 2012
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=116
Thu Mar 29 19:54:35 EDT 2012
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=117
Thu Mar 29 20:22:19 EDT 2012
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=118
System State dumped to trace file /orasw/app/oracle/admin/BBB/udump/bbb1_ora_12451870.trc
先用ass109.awk分析該檔案,結果顯示所有的row cache lock會話都在等待7000000c9acb348
但是並沒有holder資訊
aaa098[]:/orasw/dba>awk -f ass109.awk /orasw/app/oracle/admin/BBB/udump/bbb1_ora_18284680.trc
Starting Systemstate 1
....................................................
Ass.Awk Version 1.0.9 - Processing /orasw/app/oracle/admin/BBB/udump/bbb1_ora_18284680.trc
....................................................
Ass.Awk Version 1.0.9 - Processing /orasw/app/oracle/admin/BBB/udump/bbb1_ora_18284680.trc
System State 1
~~~~~~~~~~~~~~~~
1:
2: waiting for 'pmon timer'
3: waiting for 'DIAG idle wait'
4: waiting for 'rdbms ipc message'
5: waiting for 'rdbms ipc message' [Latch received-location:]
6: waiting for 'ges remote message'
7: waiting for 'gcs remote message'
8: waiting for 'gcs remote message'
9: waiting for 'gcs remote message'
10: waiting for 'rdbms ipc message'
11: waiting for 'rdbms ipc message'
12: waiting for 'rdbms ipc message'
13: waiting for 'rdbms ipc message'
14: waiting for 'rdbms ipc message'
15: last wait for 'smon timer'
16: waiting for 'rdbms ipc message'
17: waiting for 'rdbms ipc message'
18: waiting for 'reliable message'
19: waiting for 'rdbms ipc message'
20: waiting for 'rdbms ipc message'
21: waiting for 'row cache lock' [Rcache bject=7000000c9acb348,]
Cmd: Select
22: waiting for 'ASM background timer'
23: waiting for 'rdbms ipc message'
24: waiting for 'row cache lock' [Rcache bject=7000000c9acb348,]
Cmd: Select
25: waiting for 'rdbms ipc message'
26: waiting for 'rdbms ipc message'
27: waiting for 'Streams AQ: qmn coordinator idle wait'
28: waiting for 'row cache lock' [Rcache bject=7000000c9acb348,]
Cmd: Select
29: waiting for 'row cache lock' [Rcache bject=7000000c9acb348,]
Cmd: Select
30: waiting for 'SQL*Net message from client'
31: waiting for 'row cache lock' [Rcache bject=7000000c9acb348,]
Cmd: Select
32: waiting for 'SQL*Net message from client'
33: waiting for 'row cache lock' [Rcache bject=7000000c9acb348,]
Cmd: Select
34: waiting for 'row cache lock' [Rcache bject=7000000c9acb348,]
Cmd: Select
35: for 'Streams AQ: waiting for messages in the queue'
36: waiting for 'SQL*Net message from client'
37: for 'Streams AQ: waiting for time management or cleanup tasks'
38: waiting for 'row cache lock' [Rcache bject=7000000c9acb348,]
Cmd: Select
39: waiting for 'SQL*Net message from client'
40: waiting for 'row cache lock' [Rcache bject=7000000c9acb348,]
Cmd: Select
41: waiting for 'SQL*Net message from client'
42: waiting for 'row cache lock' [Rcache bject=7000000c9acb348,]
Cmd: Select
43: waiting for 'SQL*Net message from client'
44: waiting for 'SQL*Net message from client'
45: waiting for 'row cache lock' [Rcache bject=7000000c9acb348,]
Cmd: Select
46: waiting for 'SQL*Net message from client'
47: waiting for 'row cache lock' [Rcache bject=7000000c9acb348,]
Cmd: Select
48: waiting for 'row cache lock' [Rcache bject=7000000c9acb348,]
Cmd: Select
49: waiting for 'SQL*Net message from client'
50: waiting for 'row cache lock' [Rcache bject=7000000c9acb348,]
Cmd: Select
51: waiting for 'SQL*Net message from client'
52: waiting for 'row cache lock' [Rcache bject=7000000c9acb348,]
Cmd: Select
Blockers
~~~~~~~~
~~~~~~~~~~~~~~~~
1:
2: waiting for 'pmon timer'
3: waiting for 'DIAG idle wait'
4: waiting for 'rdbms ipc message'
5: waiting for 'rdbms ipc message' [Latch received-location:]
6: waiting for 'ges remote message'
7: waiting for 'gcs remote message'
8: waiting for 'gcs remote message'
9: waiting for 'gcs remote message'
10: waiting for 'rdbms ipc message'
11: waiting for 'rdbms ipc message'
12: waiting for 'rdbms ipc message'
13: waiting for 'rdbms ipc message'
14: waiting for 'rdbms ipc message'
15: last wait for 'smon timer'
16: waiting for 'rdbms ipc message'
17: waiting for 'rdbms ipc message'
18: waiting for 'reliable message'
19: waiting for 'rdbms ipc message'
20: waiting for 'rdbms ipc message'
21: waiting for 'row cache lock' [Rcache bject=7000000c9acb348,]
Cmd: Select
22: waiting for 'ASM background timer'
23: waiting for 'rdbms ipc message'
24: waiting for 'row cache lock' [Rcache bject=7000000c9acb348,]
Cmd: Select
25: waiting for 'rdbms ipc message'
26: waiting for 'rdbms ipc message'
27: waiting for 'Streams AQ: qmn coordinator idle wait'
28: waiting for 'row cache lock' [Rcache bject=7000000c9acb348,]
Cmd: Select
29: waiting for 'row cache lock' [Rcache bject=7000000c9acb348,]
Cmd: Select
30: waiting for 'SQL*Net message from client'
31: waiting for 'row cache lock' [Rcache bject=7000000c9acb348,]
Cmd: Select
32: waiting for 'SQL*Net message from client'
33: waiting for 'row cache lock' [Rcache bject=7000000c9acb348,]
Cmd: Select
34: waiting for 'row cache lock' [Rcache bject=7000000c9acb348,]
Cmd: Select
35: for 'Streams AQ: waiting for messages in the queue'
36: waiting for 'SQL*Net message from client'
37: for 'Streams AQ: waiting for time management or cleanup tasks'
38: waiting for 'row cache lock' [Rcache bject=7000000c9acb348,]
Cmd: Select
39: waiting for 'SQL*Net message from client'
40: waiting for 'row cache lock' [Rcache bject=7000000c9acb348,]
Cmd: Select
41: waiting for 'SQL*Net message from client'
42: waiting for 'row cache lock' [Rcache bject=7000000c9acb348,]
Cmd: Select
43: waiting for 'SQL*Net message from client'
44: waiting for 'SQL*Net message from client'
45: waiting for 'row cache lock' [Rcache bject=7000000c9acb348,]
Cmd: Select
46: waiting for 'SQL*Net message from client'
47: waiting for 'row cache lock' [Rcache bject=7000000c9acb348,]
Cmd: Select
48: waiting for 'row cache lock' [Rcache bject=7000000c9acb348,]
Cmd: Select
49: waiting for 'SQL*Net message from client'
50: waiting for 'row cache lock' [Rcache bject=7000000c9acb348,]
Cmd: Select
51: waiting for 'SQL*Net message from client'
52: waiting for 'row cache lock' [Rcache bject=7000000c9acb348,]
Cmd: Select
Blockers
~~~~~~~~
Above is a list of all the processes. If they are waiting for a resource
then it will be given in square brackets. Below is a summary of the
waited upon resources, together with the holder of that resource.
Notes:
~~~~~
o A process id of '???' implies that the holder was not found in the
systemstate.
then it will be given in square brackets. Below is a summary of the
waited upon resources, together with the holder of that resource.
Notes:
~~~~~
o A process id of '???' implies that the holder was not found in the
systemstate.
Resource Holder State
Latch received-location: ??? Blocker
Rcache bject=7000000c9acb348, ??? Blocker
Latch received-location: ??? Blocker
Rcache bject=7000000c9acb348, ??? Blocker
Object Names
~~~~~~~~~~~~
Latch received-location: last post received-location: kjmdrms
Rcache bject=7000000c9acb348,
~~~~~~~~~~~~
Latch received-location: last post received-location: kjmdrms
Rcache bject=7000000c9acb348,
檢視原檔案,搜尋7000000c9acb348,發現指向dc_sequences
----------------------------------------
SO: 700000051376650, type: 50, owner: 7000000c0371d98, flag: INIT/-/-/0x00
row cache enqueue: count=1 session=7000000c8a00a50 bject=7000000c9acb348, request=X—請求x mode
savepoint=0x6a7
row cache parent object: address=7000000c9acb348 cid=13(dc_sequences)
hash=f9b631e5 typ=11 transaction=7000000c03c3d68 flags=0000012a
wn=7000000c9acb418[7000000bd2b07e0,7000000bd2b07e0] wat=7000000c9acb428[7000000bd2b05a0,700000051376f80] mode=X
status=VALID/UPDATE/-/-/IO/-/-/-/-
request=N release=TRUE flags=0
instance lock id=QN f9b631e5 53ff802a
data=
0000b7d6 0e000002 00020008 00010005 c1020000 00000000 00000000 00000000
0000c102 00000000 00000000 00000000 00000000 c7646464 64646464 00000000
00000000 00008000 00000000 00000000 00000000 00000000 c4193e62 14000000
00000000 00000000 00002d2d 2d2d2d2d 2d2d2d2d 2d2d2d2d 2d2d2d2d 2d2d2d2d
2d2d2d2d 2d2d2d2d 2d2d0000 00000000
………………
SO: 7000000c8a00a50, type: 4, owner: 7000000cb6624b8, flag: INIT/-/-/0x00—查詢對應的session,正在等待row cache lock,且已經等待了很長時間
(session) sid: 1030 trans: 7000000c0371d98, creator: 7000000cb6624b8, flag: (8000041) USR/- BSY/-/-/-/-/-
DID: 0001-0015-00000039, short-term DID: 0001-0015-0000003A
txn branch: 0
oct: 3, prv: 0, sql: 7000000c93c3318, psql: 7000000c9304dc8, user: 55/SCRFVAL
service name: bbb.world
O/S info: user: ksdl839, term: unknown, ospid: 1234, machine: USSBSOAPINT03
program: JDBC Thin Client
application name: JDBC Thin Client, hash value=2546894660
waiting for 'row cache lock' wait_time=0, seconds since wait started=1662
cache id=d, mode=0, request=5
blocking sess=0x0 seq=1118
Dumping Session Wait History
for 'row cache lock' count=1 wait_time=2.942778 sec
cache id=d, mode=0, request=5
for 'row cache lock' count=1 wait_time=2.929735 sec
cache id=d, mode=0, request=5
for 'row cache lock' count=1 wait_time=2.929736 sec
cache id=d, mode=0, request=5
for 'row cache lock' count=1 wait_time=2.929740 sec
cache id=d, mode=0, request=5
for 'row cache lock' count=1 wait_time=2.929741 sec
cache id=d, mode=0, request=5
for 'row cache lock' count=1 wait_time=2.929767 sec
cache id=d, mode=0, request=5
for 'row cache lock' count=1 wait_time=2.929737 sec
cache id=d, mode=0, request=5
for 'row cache lock' count=1 wait_time=2.929744 sec
cache id=d, mode=0, request=5
for 'row cache lock' count=1 wait_time=2.929743 sec
cache id=d, mode=0, request=5
for 'row cache lock' count=1 wait_time=2.929737 sec
cache id=d, mode=0, request=5
對系統做hanganalyze 3, 系統出現了cycle
*** ACTION NAME:() 2012-03-29 22:14:36.569
*** MODULE NAME:( (TNS V1-V3)) 2012-03-29 22:14:36.569
*** SERVICE NAME:(SYS$USERS) 2012-03-29 22:14:36.569
*** SESSION ID:(1051.2302) 2012-03-29 22:14:36.569
*** 2012-03-29 22:14:36.569
==============
HANG ANALYSIS:
==============
Found 20 objects waiting for
<0/1030/79/0xcb6624b8/11206794/row cache lock>
Found 19 objects waiting for
<0/1034/111/0xc86608e0/47710314/row cache lock>
Cycle 1 : :
<0/1036/3732/0xcb67d158/8847510/row cache lock>
Cycle 2 : :
<0/1036/3732/0xcb67d158/8847510/row cache lock>
Cycle 3 : :
<0/993/2335/0xc866df30/26411190/row cache lock>
Cycle 4 : :
<0/939/2776/0xc86726a0/27656270/row cache lock>
Open chains found:
Chain 1 : :
<0/926/17172/0xc8672e90/18284680/row cache lock>
-- <0/1021/1641/0xc86687e0/33423456/row cache lock>
-- <0/991/1696/0xc8670ed0/12451870/row cache lock>
Chain 2 : :
<0/933/4103/0xcb673298/34865224/gc current request>
-- <0/993/2335/0xc866df30/26411190/row cache lock>
-- <0/956/46180/0xcb677a08/41746508/row cache lock>
-- <0/992/16725/0xcb6781f8/17170470/row cache lock>
Chain 3 : :
<0/942/21768/0xcb66bb88/32374784/gc current request>
-- <0/944/3668/0xcb67a9a8/36831462/enq: SQ - contention>
-- <0/1078/3594/0xc8673680/21561442/enq: SQ - contention>
對程式11206794做errorstack跟蹤
SQL> oradebug setospid 11206794
Oracle pid: 21, Unix process pid: 11206794, image:
SQL> oradebug unlimit
Statement processed.
SQL> oradebug dump errorstack 3
Statement processed.
SQL> oradebug tracefile_name
/orasw/app/oracle/admin/BBB/udump/bbb1_ora_11206794.trc
*** ACTION NAME:() 2012-03-29 22:14:36.569
*** MODULE NAME:( (TNS V1-V3)) 2012-03-29 22:14:36.569
*** SERVICE NAME:(SYS$USERS) 2012-03-29 22:14:36.569
*** SESSION ID:(1051.2302) 2012-03-29 22:14:36.569
*** 2012-03-29 22:14:36.569
==============
HANG ANALYSIS:
==============
Found 20 objects waiting for
<0/1030/79/0xcb6624b8/11206794/row cache lock>
Found 19 objects waiting for
<0/1034/111/0xc86608e0/47710314/row cache lock>
Cycle 1 :
<0/1036/3732/0xcb67d158/8847510/row cache lock>
Cycle 2 :
<0/1036/3732/0xcb67d158/8847510/row cache lock>
Cycle 3 :
<0/993/2335/0xc866df30/26411190/row cache lock>
Cycle 4 :
<0/939/2776/0xc86726a0/27656270/row cache lock>
Open chains found:
Chain 1 :
<0/926/17172/0xc8672e90/18284680/row cache lock>
-- <0/1021/1641/0xc86687e0/33423456/row cache lock>
-- <0/991/1696/0xc8670ed0/12451870/row cache lock>
Chain 2 :
<0/933/4103/0xcb673298/34865224/gc current request>
-- <0/993/2335/0xc866df30/26411190/row cache lock>
-- <0/956/46180/0xcb677a08/41746508/row cache lock>
-- <0/992/16725/0xcb6781f8/17170470/row cache lock>
Chain 3 :
<0/942/21768/0xcb66bb88/32374784/gc current request>
-- <0/944/3668/0xcb67a9a8/36831462/enq: SQ - contention>
-- <0/1078/3594/0xc8673680/21561442/enq: SQ - contention>
對程式11206794做errorstack跟蹤
SQL> oradebug setospid 11206794
Oracle pid: 21, Unix process pid: 11206794, image:
SQL> oradebug unlimit
Statement processed.
SQL> oradebug dump errorstack 3
Statement processed.
SQL> oradebug tracefile_name
/orasw/app/oracle/admin/BBB/udump/bbb1_ora_11206794.trc
檢視生成的跟蹤檔案
*** 2012-03-29 21:56:04.878
*** 2012-03-29 21:56:04.878
================================
PROCESS DUMP FROM HANG ANALYZER:
================================
Current SQL statement for this session:
SELECT SHIP _SEQ.NEXTVAL, SYSDATE FROM DUAL
----- PL/SQL Call Stack -----
object line object
handle number name
7000000b7b46590 48 procedure SUT_ENT
70000009c1eeea8 1 anonymous block
*** 2012-03-29 21:56:04.878
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst+001c bl ksedst1 000000000 ? 000000000 ?
ksedms+00b4 bl ksedst 104C24078 ?
ksdxfdmp+0358 bl _ptrgl
ksdxcb+0500 bl _ptrgl
sspuser+0074 bl _ptrgl
000047DC ? 00000000
sskgpwwait+0034 bl 000D6B30
skgpwwait+00bc bl sskgpwwait 7000000CB662768 ?
7000000CB6625A8 ?
7000000CB6624B8 ? 000000000 ?
000000000 ?
ksliwat+06c0 bl skgpwwait 101248444 ? 0000493E0 ?
1037DB578 ? 0CA746D72 ?
1000D63B8 ?
kslwaitns_timed+002 bl ksliwat 000000000 ? 000000000 ?
4 FFFFFFFFFFF1DB0 ? 000000000 ?
000000000 ? 000000000 ?
000000000 ? 000000000 ?
kskthbwt+0280 bl kslwaitns_timed 000000000 ? 000000C1E ?
104F15D54 ? 104F15D5C ?
104F15D44 ? 7000000CB6625A8 ?
D124E3FB1E21C9 ? 000000000 ?
kslwait+00f4 bl kskthbwt 12C0000012C ? D2000000D2 ?
000000000 ? 00000000D ?
000000000 ? 000000005 ?
A00000000000A ?
38FEC78838FEC788 ?
kqrigt+05e8 bl kslwait 204F1655C ? 7000000C8A00A50 ?
00000004E ? 000000058 ?
3FFC000000000000 ?
0000165C0 ?
kqrpre1+0968 bl kqrigt 700000010017FA0 ? 00000FFFF ?
11045A818 ?
kqrpre+001c bl kqrpre1 DFFFF2AB8 ? 110C6B9F8 ?
FFFFFFFFFFF24A8 ? 1104A9A40 ?
1100099E0 ? 000000000 ?
1104763F8 ? 11046A2A8 ?
kdnuca+012c bl kqrpre 000000002 ? 110476400 ?
7000000AED0C9B0 ?
700000010020900 ?
7000000C4E0B278 ?
7000000AED0C9B0 ?
FFFFFFFFFFF2520 ?
kdnnxt+0098 bl kdnuca 110C6B290 ? 000000002 ?
FFFFFFFFFFF26B0 ?
qersqPopulate+00a0 bl 03F385F4
qersqRowProcedure+0 bl qersqPopulate 110C6B210 ?
020
qerfiFetch+00a8 bl 03F36AC8
qersqFetch+00ac bl 03F36AC8
opifch2+13e4 bl 01FC3F54
opiefn0+03e0 bl opifch2 FFFFFFFFFFF3560 ? 11053F9C8 ?
FFFFFFFFFFF3478 ?
opipls+211c bl opiefn0 300000000 ? 000000000 ?
FFFFFFFFFFF3738 ?
2262000000000000 ?
000000000 ? 000000000 ?
000000000 ? 000000000 ?
opiodr+0b2c bl _ptrgl
rpidrus+01dc bl opiodr 6600000000 ? 610C14188 ?
FFFFFFFFFFF66D0 ? 60000000C ?
skgmstack+00c8 bl _ptrgl
rpidru+0088 bl skgmstack FFFFFFFFFFF56F0 ?
10000011019C568 ? 000000002 ?
000000000 ? FFFFFFFFFFF5E80 ?
rpiswu2+0368 bl _ptrgl
rpidrv+097c bl rpiswu2 7000000C8A00A50 ? 11048E898 ?
11048E8D4 ? 11053FE80 ?
110509F70 ? 371045AA40 ?
000000000 ? 000000000 ?
psddr0+02dc bl 01FC6C64
psdnal+01d0 bl psddr0 600000000 ? 66FFFF6850 ?
FFFFFFFFFFF66D0 ?
30FFFF7D68 ?
pevm_EXECC+01f8 bl _ptrgl
pfrinstr_EXECC+0070 bl pevm_EXECC 10187D084 ? 110232BB8 ?
10187B754 ?
pfrrun_no_tool+005c bl _ptrgl
pfrrun+1064 bl pfrrun_no_tool FFFFFFFFFFF6FD8 ? 110506850 ?
FFFFFFFFFFF6A30 ?
plsql_run+06e8 bl pfrrun 110507A08 ?
peicnt+0244 bl plsql_run 110507A08 ? 100007E588440 ?
000000000 ?
kkxexe+0250 bl peicnt FFFFFFFFFFF7D68 ? 110507A08 ?
opiexe+3024 bl kkxexe 110506850 ?
kpoal8+0ef0 bl opiexe FFFFFFFFFFFB3E4 ?
FFFFFFFFFFFB138 ?
FFFFFFFFFFF9580 ?
opiodr+0b2c bl _ptrgl
ttcpip+1020 bl _ptrgl
opitsk+117c bl 01FC66D4
opiino+09d0 bl opitsk 0FFFFD3B0 ? 000000000 ?
opiodr+0b2c bl _ptrgl
opidrv+04a4 bl opiodr 3C10290FD8 ? 404C76C50 ?
FFFFFFFFFFFF370 ? 010290FD0 ?
sou2o+0090 bl opidrv 3C02A2B3BC ? 4A0071E60 ?
FFFFFFFFFFFF370 ?
opimai_real+01bc bl 01FC2B94
main+0098 bl opimai_real 000000000 ? 000000000 ?
__start+0070 bl main 000000000 ? 000000000 ?
……
PROCESS STATE
-------------
Process global information:
process: 7000000cb6624b8, call: 7000000bd547820, xact: 7000000c0371d98, curses: 7000000c8a00a50, usrses: 7000000c8a00a50
----------------------------------------
SO: 7000000cb6624b8, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=21, calls cur/top: 7000000bd547820/7000000c8ab64e0, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 0
last post received-location: No post
last process to post me: none
last post sent: 0 0 0
last post sent-location: No post
last process posted by me: none
(latch info) wait_event=0 bits=0
Process Group: DEFAULT, pseudo proc: 7000000cb757778
O/S info: user: oracle, term: UNKNOWN, ospid: 11206794
OSD pid info: Unix process pid: 11206794, image:
(FOB) flags=2 fib=7000000c1db78b0 incno=0 pending i/o cnt=0
fname=/dev/db030
fno=32 lblksz=512 fsiz=1
----------------------------------------
SO: 7000000b8318168, type: 83, owner: 7000000cb6624b8, flag: INIT/-/-/0x00
freelist:[7000000b7b35980,7000000b7b35980]
KFFMOP: hash link:[7000000b7b35970,7000000b7b35970] sobj link:[7000000b8318188,7000000b8318188]
map kggrp:[0x7000000cab8fc58, 0, valid] map id:15
group:[2,2008082090] file:[328,745576843] extent:11683
flags:0x0000 disk:29 au:34101 lock:0 proc:0x7000000cb6624b8
busylist:[7000000b8318198,7000000b8318198]
transistion:0x0
(FOB) flags=512 fib=7000000c1dbc698 incno=0 pending i/o cnt=0
fname=+DATA_DG/bbb/datafile/scrfval_data.328.745576843
fno=11 lblksz=8192 fsiz=4096000
----------------------------------------
SO: 7000000c8a00a50, type: 4, owner: 7000000cb6624b8, flag: INIT/-/-/0x00
(session) sid: 1030 trans: 7000000c0371d98, creator: 7000000cb6624b8, flag: (8000041) USR/- BSY/-/-/-/-/-
DID: 0001-0015-00000039, short-term DID: 0001-0015-0000003A
txn branch: 0
oct: 3, prv: 0, sql: 7000000c93c3318, psql: 7000000c9304dc8, user: 55/SCRFVAL
service name: bbb.world
O/S info: user: ksdl839, term: unknown, ospid: 1234, machine: USSBSOAPINT03
program: JDBC Thin Client
application name: JDBC Thin Client, hash value=2546894660
waiting for 'row cache lock' wait_time=0, seconds since wait started=1899
cache id=d, mode=0, request=5
blocking sess=0x0 seq=1118
Dumping Session Wait History
for 'row cache lock' count=1 wait_time=2.929746 sec
cache id=d, mode=0, request=5
for 'row cache lock' count=1 wait_time=2.929788 sec
cache id=d, mode=0, request=5
for 'row cache lock' count=1 wait_time=2.929741 sec
cache id=d, mode=0, request=5
for 'row cache lock' count=1 wait_time=2.929748 sec
cache id=d, mode=0, request=5
for 'row cache lock' count=1 wait_time=2.929770 sec
cache id=d, mode=0, request=5
for 'row cache lock' count=1 wait_time=2.929739 sec
cache id=d, mode=0, request=5
for 'row cache lock' count=1 wait_time=2.929733 sec
cache id=d, mode=0, request=5
for 'row cache lock' count=1 wait_time=2.929748 sec
cache id=d, mode=0, request=5
for 'row cache lock' count=1 wait_time=2.929741 sec
cache id=d, mode=0, request=5
for 'row cache lock' count=1 wait_time=2.929776 sec
cache id=d, mode=0, request=5
Sampled Session History of session 1030 serial 79
根據跟蹤檔案可以看出,該session經歷了嚴重的row cache lock且正在等待
SELECT SHIP_EVENT_LOG_SEQ.NEXTVAL, SYSDATE FROM DUAL
而根據前面systemstate dump可以看出row cache lock正在等待dc_sequence,基本可以確定是該sequence惹的禍,其cache為0
*** 2012-03-29 21:56:04.878
*** 2012-03-29 21:56:04.878
================================
PROCESS DUMP FROM HANG ANALYZER:
================================
Current SQL statement for this session:
SELECT SHIP _SEQ.NEXTVAL, SYSDATE FROM DUAL
----- PL/SQL Call Stack -----
object line object
handle number name
7000000b7b46590 48 procedure SUT_ENT
70000009c1eeea8 1 anonymous block
*** 2012-03-29 21:56:04.878
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst+001c bl ksedst1 000000000 ? 000000000 ?
ksedms+00b4 bl ksedst 104C24078 ?
ksdxfdmp+0358 bl _ptrgl
ksdxcb+0500 bl _ptrgl
sspuser+0074 bl _ptrgl
000047DC ? 00000000
sskgpwwait+0034 bl 000D6B30
skgpwwait+00bc bl sskgpwwait 7000000CB662768 ?
7000000CB6625A8 ?
7000000CB6624B8 ? 000000000 ?
000000000 ?
ksliwat+06c0 bl skgpwwait 101248444 ? 0000493E0 ?
1037DB578 ? 0CA746D72 ?
1000D63B8 ?
kslwaitns_timed+002 bl ksliwat 000000000 ? 000000000 ?
4 FFFFFFFFFFF1DB0 ? 000000000 ?
000000000 ? 000000000 ?
000000000 ? 000000000 ?
kskthbwt+0280 bl kslwaitns_timed 000000000 ? 000000C1E ?
104F15D54 ? 104F15D5C ?
104F15D44 ? 7000000CB6625A8 ?
D124E3FB1E21C9 ? 000000000 ?
kslwait+00f4 bl kskthbwt 12C0000012C ? D2000000D2 ?
000000000 ? 00000000D ?
000000000 ? 000000005 ?
A00000000000A ?
38FEC78838FEC788 ?
kqrigt+05e8 bl kslwait 204F1655C ? 7000000C8A00A50 ?
00000004E ? 000000058 ?
3FFC000000000000 ?
0000165C0 ?
kqrpre1+0968 bl kqrigt 700000010017FA0 ? 00000FFFF ?
11045A818 ?
kqrpre+001c bl kqrpre1 DFFFF2AB8 ? 110C6B9F8 ?
FFFFFFFFFFF24A8 ? 1104A9A40 ?
1100099E0 ? 000000000 ?
1104763F8 ? 11046A2A8 ?
kdnuca+012c bl kqrpre 000000002 ? 110476400 ?
7000000AED0C9B0 ?
700000010020900 ?
7000000C4E0B278 ?
7000000AED0C9B0 ?
FFFFFFFFFFF2520 ?
kdnnxt+0098 bl kdnuca 110C6B290 ? 000000002 ?
FFFFFFFFFFF26B0 ?
qersqPopulate+00a0 bl 03F385F4
qersqRowProcedure+0 bl qersqPopulate 110C6B210 ?
020
qerfiFetch+00a8 bl 03F36AC8
qersqFetch+00ac bl 03F36AC8
opifch2+13e4 bl 01FC3F54
opiefn0+03e0 bl opifch2 FFFFFFFFFFF3560 ? 11053F9C8 ?
FFFFFFFFFFF3478 ?
opipls+211c bl opiefn0 300000000 ? 000000000 ?
FFFFFFFFFFF3738 ?
2262000000000000 ?
000000000 ? 000000000 ?
000000000 ? 000000000 ?
opiodr+0b2c bl _ptrgl
rpidrus+01dc bl opiodr 6600000000 ? 610C14188 ?
FFFFFFFFFFF66D0 ? 60000000C ?
skgmstack+00c8 bl _ptrgl
rpidru+0088 bl skgmstack FFFFFFFFFFF56F0 ?
10000011019C568 ? 000000002 ?
000000000 ? FFFFFFFFFFF5E80 ?
rpiswu2+0368 bl _ptrgl
rpidrv+097c bl rpiswu2 7000000C8A00A50 ? 11048E898 ?
11048E8D4 ? 11053FE80 ?
110509F70 ? 371045AA40 ?
000000000 ? 000000000 ?
psddr0+02dc bl 01FC6C64
psdnal+01d0 bl psddr0 600000000 ? 66FFFF6850 ?
FFFFFFFFFFF66D0 ?
30FFFF7D68 ?
pevm_EXECC+01f8 bl _ptrgl
pfrinstr_EXECC+0070 bl pevm_EXECC 10187D084 ? 110232BB8 ?
10187B754 ?
pfrrun_no_tool+005c bl _ptrgl
pfrrun+1064 bl pfrrun_no_tool FFFFFFFFFFF6FD8 ? 110506850 ?
FFFFFFFFFFF6A30 ?
plsql_run+06e8 bl pfrrun 110507A08 ?
peicnt+0244 bl plsql_run 110507A08 ? 100007E588440 ?
000000000 ?
kkxexe+0250 bl peicnt FFFFFFFFFFF7D68 ? 110507A08 ?
opiexe+3024 bl kkxexe 110506850 ?
kpoal8+0ef0 bl opiexe FFFFFFFFFFFB3E4 ?
FFFFFFFFFFFB138 ?
FFFFFFFFFFF9580 ?
opiodr+0b2c bl _ptrgl
ttcpip+1020 bl _ptrgl
opitsk+117c bl 01FC66D4
opiino+09d0 bl opitsk 0FFFFD3B0 ? 000000000 ?
opiodr+0b2c bl _ptrgl
opidrv+04a4 bl opiodr 3C10290FD8 ? 404C76C50 ?
FFFFFFFFFFFF370 ? 010290FD0 ?
sou2o+0090 bl opidrv 3C02A2B3BC ? 4A0071E60 ?
FFFFFFFFFFFF370 ?
opimai_real+01bc bl 01FC2B94
main+0098 bl opimai_real 000000000 ? 000000000 ?
__start+0070 bl main 000000000 ? 000000000 ?
……
PROCESS STATE
-------------
Process global information:
process: 7000000cb6624b8, call: 7000000bd547820, xact: 7000000c0371d98, curses: 7000000c8a00a50, usrses: 7000000c8a00a50
----------------------------------------
SO: 7000000cb6624b8, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=21, calls cur/top: 7000000bd547820/7000000c8ab64e0, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 0
last post received-location: No post
last process to post me: none
last post sent: 0 0 0
last post sent-location: No post
last process posted by me: none
(latch info) wait_event=0 bits=0
Process Group: DEFAULT, pseudo proc: 7000000cb757778
O/S info: user: oracle, term: UNKNOWN, ospid: 11206794
OSD pid info: Unix process pid: 11206794, image:
(FOB) flags=2 fib=7000000c1db78b0 incno=0 pending i/o cnt=0
fname=/dev/db030
fno=32 lblksz=512 fsiz=1
----------------------------------------
SO: 7000000b8318168, type: 83, owner: 7000000cb6624b8, flag: INIT/-/-/0x00
freelist:[7000000b7b35980,7000000b7b35980]
KFFMOP: hash link:[7000000b7b35970,7000000b7b35970] sobj link:[7000000b8318188,7000000b8318188]
map kggrp:[0x7000000cab8fc58, 0, valid] map id:15
group:[2,2008082090] file:[328,745576843] extent:11683
flags:0x0000 disk:29 au:34101 lock:0 proc:0x7000000cb6624b8
busylist:[7000000b8318198,7000000b8318198]
transistion:0x0
(FOB) flags=512 fib=7000000c1dbc698 incno=0 pending i/o cnt=0
fname=+DATA_DG/bbb/datafile/scrfval_data.328.745576843
fno=11 lblksz=8192 fsiz=4096000
----------------------------------------
SO: 7000000c8a00a50, type: 4, owner: 7000000cb6624b8, flag: INIT/-/-/0x00
(session) sid: 1030 trans: 7000000c0371d98, creator: 7000000cb6624b8, flag: (8000041) USR/- BSY/-/-/-/-/-
DID: 0001-0015-00000039, short-term DID: 0001-0015-0000003A
txn branch: 0
oct: 3, prv: 0, sql: 7000000c93c3318, psql: 7000000c9304dc8, user: 55/SCRFVAL
service name: bbb.world
O/S info: user: ksdl839, term: unknown, ospid: 1234, machine: USSBSOAPINT03
program: JDBC Thin Client
application name: JDBC Thin Client, hash value=2546894660
waiting for 'row cache lock' wait_time=0, seconds since wait started=1899
cache id=d, mode=0, request=5
blocking sess=0x0 seq=1118
Dumping Session Wait History
for 'row cache lock' count=1 wait_time=2.929746 sec
cache id=d, mode=0, request=5
for 'row cache lock' count=1 wait_time=2.929788 sec
cache id=d, mode=0, request=5
for 'row cache lock' count=1 wait_time=2.929741 sec
cache id=d, mode=0, request=5
for 'row cache lock' count=1 wait_time=2.929748 sec
cache id=d, mode=0, request=5
for 'row cache lock' count=1 wait_time=2.929770 sec
cache id=d, mode=0, request=5
for 'row cache lock' count=1 wait_time=2.929739 sec
cache id=d, mode=0, request=5
for 'row cache lock' count=1 wait_time=2.929733 sec
cache id=d, mode=0, request=5
for 'row cache lock' count=1 wait_time=2.929748 sec
cache id=d, mode=0, request=5
for 'row cache lock' count=1 wait_time=2.929741 sec
cache id=d, mode=0, request=5
for 'row cache lock' count=1 wait_time=2.929776 sec
cache id=d, mode=0, request=5
Sampled Session History of session 1030 serial 79
根據跟蹤檔案可以看出,該session經歷了嚴重的row cache lock且正在等待
SELECT SHIP_EVENT_LOG_SEQ.NEXTVAL, SYSDATE FROM DUAL
而根據前面systemstate dump可以看出row cache lock正在等待dc_sequence,基本可以確定是該sequence惹的禍,其cache為0
於此同時第2個節點也出現了問題
檢視其Alert log,發現從3月27號就一直出現問題
Tue Mar 27 13:25:28 EDT 2012
Process startup failed, error stack:
Tue Mar 27 13:25:28 EDT 2012
Errors in file /orasw/app/oracle/admin/BBB/bdump/bbb2_psp0_1986702.trc:
ORA-27300: OS system dependent operation:fork failed with status: 12
ORA-27301: OS failure message: Not enough space
ORA-27302: failure occurred at: skgpspawn3
Tue Mar 27 13:25:28 EDT 2012
ERROR: Failed to connect with connect string:(DESCRIPTION=(ADDRESS=(PROTOCOL=beq)(PROGRAM=/orasw/app/oracle/product/11.1.0/asm/bin/oracle)(ARGV0=oracle+ASM2)(ENVS='ORACLE_HOME=/
orasw/app/oracle/product/11.1.0/asm,ORACLE_SID=+ASM2')(ARGS='(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))'))(enable=setuser))
Tue Mar 27 13:25:29 EDT 2012
Process J000 died, see its trace file
Tue Mar 27 13:25:29 EDT 2012
kkjcre1p: unable to spawn jobq slave process
Tue Mar 27 13:25:29 EDT 2012
Errors in file /orasw/app/oracle/admin/BBB/bdump/bbb2_cjq0_2220068.trc:
Tue Mar 27 13:25:29 EDT 2012—不能連線ASM
ERROR: Failed to connect with connect string:(DESCRIPTION=(ADDRESS=(PROTOCOL=beq)(PROGRAM=/orasw/app/oracle/product/11.1.0/asm/bin/oracle)(ARGV0=oracle+ASM2)(ENVS='ORACLE_HOME=/
orasw/app/oracle/product/11.1.0/asm,ORACLE_SID=+ASM2')(ARGS='(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))'))(enable=setuser))
ERROR: Failed to connect with connect string:(DESCRIPTION=(ADDRESS=(PROTOCOL=beq)(PROGRAM=/orasw/app/oracle/product/11.1.0/asm/bin/oracle)(ARGV0=oracle+ASM2)(ENVS='ORACLE_HOME=/
orasw/app/oracle/product/11.1.0/asm,ORACLE_SID=+ASM2')(ARGS='(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))'))(enable=setuser))
ERROR: Failed to connect with connect string:(DESCRIPTION=(ADDRESS=(PROTOCOL=beq)(PROGRAM=/orasw/app/oracle/product/11.1.0/asm/bin/oracle)(ARGV0=oracle+ASM2)(ENVS='ORACLE_HOME=/
orasw/app/oracle/product/11.1.0/asm,ORACLE_SID=+ASM2')(ARGS='(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))'))(enable=setuser))
Tue Mar 27 13:25:30 EDT 2012
Errors in file /orasw/app/oracle/admin/BBB/bdump/bbb2_arc0_19370208.trc:
ORA-00313: open failed for members of log group 22 of thread 2
ORA-00312: online log 22 thread 2: '+REDO_DG2/bbb/onlinelog/group_22.281.741544481'
ORA-17503: ksfdopn:2 Failed to open file +REDO_DG2/bbb/onlinelog/group_22.281.741544481
ORA-15055: unable to connect to ASM instance
ORA-12549: TNS:operating system resource quota exceeded
ORA-15055: unable to connect to ASM instance
ORA-12549: TNS:operating system resource quota exceeded
ORA-00312: online log 22 thread 2: '+REDO_DG1/bbb/onlinelog/group_22.282.741544479'
ORA-17503: ksfdopn:2 Failed to open file +REDO_DG1/bbb/onlinelog/group_22.282.741544479
ORA-15055: unable to connect to ASM instance
ORA-12549: TNS:operating system resource quota exceeded
ORA-15055: unable to connect to ASM instance
ORA-12549: TNS:operating system resource quota exceeded
……………………………………..
Unexpected communication failure with ASM instance:
error 1041 (ORA-01041: internal error. hostdef extension doesn't exist
)
NOTE: ASMB process state dumped to trace file /orasw/app/oracle/admin/BBB/bdump/bbb2_arc0_19370208.trc
Thu Mar 29 22:44:24 EDT 2012
Unexpected communication failure with ASM instance:
error 1041 (ORA-01041: internal error. hostdef extension doesn't exist
)
NOTE: ASMB process state dumped to trace file /orasw/app/oracle/admin/BBB/bdump/bbb2_arc0_19370208.trc
Thu Mar 29 22:44:25 EDT 2012
Errors in file /orasw/app/oracle/admin/BBB/bdump/bbb2_arc0_19370208.trc:
ORA-00313: open failed for members of log group 22 of thread 2
ORA-00312: online log 22 thread 2: '+REDO_DG2/bbb/onlinelog/group_22.281.741544481'
ORA-17503: ksfdopn:2 Failed to open file +REDO_DG2/bbb/onlinelog/group_22.281.741544481
ORA-01041: internal error. hostdef extension doesn't exist
ORA-00312: online log 22 thread 2: '+REDO_DG1/bbb/onlinelog/group_22.282.741544479'
ORA-17503: ksfdopn:2 Failed to open file +REDO_DG1/bbb/onlinelog/group_22.282.741544479
ORA-01041: internal error. hostdef extension doesn't exist
檢視跟蹤檔案
aaa099[]:/orasw/dba>more /orasw/app/oracle/admin/BBB/bdump/bbb2_arc1_18018544.trc
Dump file /orasw/app/oracle/admin/BBB/bdump/bbb2_arc1_18018544.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME = /orasw/app/oracle/product/10.2.0.5
System name: AIX
Node name: aaa099
Release: 1
Version: 6
Machine: 00CA36B54C00
Instance name: bbb2
Redo thread mounted by this instance: 2
Oracle process number: 26
Unix process pid: 18018544, image: (ARC1)
ERROR: Failed to connect with connect string:(DESCRIPTION=(ADDRESS=(PROTOCOL=beq)(PROGRAM=/orasw/app/oracle/product/11.1.0/asm/bin/oracle)(ARGV0=oracle+ASM2)(ENVS='ORACLE_HOME=/
orasw/app/oracle/product/11.1.0/asm,ORACLE_SID=+ASM2')(ARGS='(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))'))(enable=setuser))
ERROR: Failed to connect with connect string:(DESCRIPTION=(ADDRESS=(PROTOCOL=beq)(PROGRAM=/orasw/app/oracle/product/11.1.0/asm/bin/oracle)(ARGV0=oracle+ASM2)(ENVS='ORACLE_HOME=/
orasw/app/oracle/product/11.1.0/asm,ORACLE_SID=+ASM2')(ARGS='(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))'))(enable=setuser))
ERROR: Failed to connect with connect string:(DESCRIPTION=(ADDRESS=(PROTOCOL=beq)(PROGRAM=/orasw/app/oracle/product/11.1.0/asm/bin/oracle)(ARGV0=oracle+ASM2)(ENVS='ORACLE_HOME=/
orasw/app/oracle/product/11.1.0/asm,ORACLE_SID=+ASM2')(ARGS='(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))'))(enable=setuser))
Tue Mar 27 13:25:30 EDT 2012
Errors in file /orasw/app/oracle/admin/BBB/bdump/bbb2_arc0_19370208.trc:
ORA-00313: open failed for members of log group 22 of thread 2
ORA-00312: online log 22 thread 2: '+REDO_DG2/bbb/onlinelog/group_22.281.741544481'
ORA-17503: ksfdopn:2 Failed to open file +REDO_DG2/bbb/onlinelog/group_22.281.741544481
ORA-15055: unable to connect to ASM instance
ORA-12549: TNS:operating system resource quota exceeded
ORA-15055: unable to connect to ASM instance
ORA-12549: TNS:operating system resource quota exceeded
ORA-00312: online log 22 thread 2: '+REDO_DG1/bbb/onlinelog/group_22.282.741544479'
ORA-17503: ksfdopn:2 Failed to open file +REDO_DG1/bbb/onlinelog/group_22.282.741544479
ORA-15055: unable to connect to ASM instance
ORA-12549: TNS:operating system resource quota exceeded
ORA-15055: unable to connect to ASM instance
ORA-12549: TNS:operating system resource quota exceeded
……………………………………..
Unexpected communication failure with ASM instance:
error 1041 (ORA-01041: internal error. hostdef extension doesn't exist
)
NOTE: ASMB process state dumped to trace file /orasw/app/oracle/admin/BBB/bdump/bbb2_arc0_19370208.trc
Thu Mar 29 22:44:24 EDT 2012
Unexpected communication failure with ASM instance:
error 1041 (ORA-01041: internal error. hostdef extension doesn't exist
)
NOTE: ASMB process state dumped to trace file /orasw/app/oracle/admin/BBB/bdump/bbb2_arc0_19370208.trc
Thu Mar 29 22:44:25 EDT 2012
Errors in file /orasw/app/oracle/admin/BBB/bdump/bbb2_arc0_19370208.trc:
ORA-00313: open failed for members of log group 22 of thread 2
ORA-00312: online log 22 thread 2: '+REDO_DG2/bbb/onlinelog/group_22.281.741544481'
ORA-17503: ksfdopn:2 Failed to open file +REDO_DG2/bbb/onlinelog/group_22.281.741544481
ORA-01041: internal error. hostdef extension doesn't exist
ORA-00312: online log 22 thread 2: '+REDO_DG1/bbb/onlinelog/group_22.282.741544479'
ORA-17503: ksfdopn:2 Failed to open file +REDO_DG1/bbb/onlinelog/group_22.282.741544479
ORA-01041: internal error. hostdef extension doesn't exist
檢視跟蹤檔案
aaa099[]:/orasw/dba>more /orasw/app/oracle/admin/BBB/bdump/bbb2_arc1_18018544.trc
Dump file /orasw/app/oracle/admin/BBB/bdump/bbb2_arc1_18018544.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME = /orasw/app/oracle/product/10.2.0.5
System name: AIX
Node name: aaa099
Release: 1
Version: 6
Machine: 00CA36B54C00
Instance name: bbb2
Redo thread mounted by this instance: 2
Oracle process number: 26
Unix process pid: 18018544, image: (ARC1)
*** 2012-03-27 11:20:22.367
*** SERVICE NAME:(SYS$BACKGROUND) 2012-03-27 11:20:22.366
*** SESSION ID:(1076.1) 2012-03-27 11:20:22.366
WARNING: kfnUseConn - failure to make a connection—MOS/google找不到一條記錄,baidu居然能搜到兩條記錄,不過沒有相應解釋
WARNING: kfnUseConn - failure to make a connection
WARNING: kfnUseConn - failure to make a connection
WARNING: kfnUseConn - failure to make a connection
ORA-00313: open failed for members of log group 22 of thread 2
ORA-00312: online log 22 thread 2: '+REDO_DG2/bbb/onlinelog/group_22.281.741544481'
ORA-17503: ksfdopn:2 Failed to open file +REDO_DG2/bbb/onlinelog/group_22.281.741544481
ORA-15055: unable to connect to ASM instance
ORA-12549: TNS:operating system resource quota exceeded
ORA-15055: unable to connect to ASM instance
ORA-12549: TNS:operating system resource quota exceeded
ORA-00312: online log 22 thread 2: '+REDO_DG1/bbb/onlinelog/group_22.282.741544479'
ORA-17503: ksfdopn:2 Failed to open file +REDO_DG1/bbb/onlinelog/group_22.282.741544479
ORA-15055: unable to connect to ASM instance
ORA-12549: TNS:operating system resource quota exceeded
ORA-15055: unable to connect to ASM instance
ORA-12549: TNS:operating system resource quota exceeded
ORA-00313: open failed for members of log group 22 of thread 2
ORA-00312: online log 22 thread 2: '+REDO_DG2/bbb/onlinelog/group_22.281.741544481'
ORA-17503: ksfdopn:2 Failed to open file +REDO_DG2/bbb/onlinelog/group_22.281.741544481
ORA-15055: unable to connect to ASM instance
ORA-12549: TNS:operating system resource quota exceeded
ORA-15055: unable to connect to ASM instance
ORA-12549: TNS:operating system resource quota exceeded
ORA-00312: online log 22 thread 2: '+REDO_DG1/bbb/onlinelog/group_22.282.741544479'
ORA-17503: ksfdopn:2 Failed to open file +REDO_DG1/bbb/onlinelog/group_22.282.741544479
ORA-15055: unable to connect to ASM instance
ORA-12549: TNS:operating system resource quota exceeded
ORA-15055: unable to connect to ASM instance
ORA-12549: TNS:operating system resource quota exceeded
logfile 22 open failed:313
*** 2012-03-27 11:20:24.804 22384 kcrr.c—從3月27號就出現以下錯誤
ORA-16038: log 22 sequence# 1370 cannot be archived
ORA-00312: online log 22 thread 2: '+REDO_DG1/bbb/onlinelog/group_22.282.741544479'
ORA-00312: online log 22 thread 2: '+REDO_DG2/bbb/onlinelog/group_22.281.741544481'
*** 2012-03-27 11:20:40.329 logfile 22 open failed:313
*** SERVICE NAME:(SYS$BACKGROUND) 2012-03-27 11:20:22.366
*** SESSION ID:(1076.1) 2012-03-27 11:20:22.366
WARNING: kfnUseConn - failure to make a connection—MOS/google找不到一條記錄,baidu居然能搜到兩條記錄,不過沒有相應解釋
WARNING: kfnUseConn - failure to make a connection
WARNING: kfnUseConn - failure to make a connection
WARNING: kfnUseConn - failure to make a connection
ORA-00313: open failed for members of log group 22 of thread 2
ORA-00312: online log 22 thread 2: '+REDO_DG2/bbb/onlinelog/group_22.281.741544481'
ORA-17503: ksfdopn:2 Failed to open file +REDO_DG2/bbb/onlinelog/group_22.281.741544481
ORA-15055: unable to connect to ASM instance
ORA-12549: TNS:operating system resource quota exceeded
ORA-15055: unable to connect to ASM instance
ORA-12549: TNS:operating system resource quota exceeded
ORA-00312: online log 22 thread 2: '+REDO_DG1/bbb/onlinelog/group_22.282.741544479'
ORA-17503: ksfdopn:2 Failed to open file +REDO_DG1/bbb/onlinelog/group_22.282.741544479
ORA-15055: unable to connect to ASM instance
ORA-12549: TNS:operating system resource quota exceeded
ORA-15055: unable to connect to ASM instance
ORA-12549: TNS:operating system resource quota exceeded
ORA-00313: open failed for members of log group 22 of thread 2
ORA-00312: online log 22 thread 2: '+REDO_DG2/bbb/onlinelog/group_22.281.741544481'
ORA-17503: ksfdopn:2 Failed to open file +REDO_DG2/bbb/onlinelog/group_22.281.741544481
ORA-15055: unable to connect to ASM instance
ORA-12549: TNS:operating system resource quota exceeded
ORA-15055: unable to connect to ASM instance
ORA-12549: TNS:operating system resource quota exceeded
ORA-00312: online log 22 thread 2: '+REDO_DG1/bbb/onlinelog/group_22.282.741544479'
ORA-17503: ksfdopn:2 Failed to open file +REDO_DG1/bbb/onlinelog/group_22.282.741544479
ORA-15055: unable to connect to ASM instance
ORA-12549: TNS:operating system resource quota exceeded
ORA-15055: unable to connect to ASM instance
ORA-12549: TNS:operating system resource quota exceeded
logfile 22 open failed:313
*** 2012-03-27 11:20:24.804 22384 kcrr.c—從3月27號就出現以下錯誤
ORA-16038: log 22 sequence# 1370 cannot be archived
ORA-00312: online log 22 thread 2: '+REDO_DG1/bbb/onlinelog/group_22.282.741544479'
ORA-00312: online log 22 thread 2: '+REDO_DG2/bbb/onlinelog/group_22.281.741544481'
*** 2012-03-27 11:20:40.329 logfile 22 open failed:313
做hanganalyze 3檢視輸出檔案
==============
HANG ANALYSIS:
==============
Open chains found:
Chain 1 : :
<1/1091/1/0xcb6604f8/7889148/rdbms ipc message>
-- <1/950/34226/0xcb665c48/2396294/log file switch (archiving neede>
Other chains found:
Chain 2 : :
<1/951/16693/0xc865d940/2535526/log file switch (archiving neede>
Chain 3 : :
<1/954/5565/0xcb6624b8/213064/log file switch (archiving neede>
-- <1/967/13888/0xc865b190/27140236/row cache lock>
Chain 4 : :
<1/958/23461/0xc865f110/2371616/log file switch (archiving neede>
Chain 5 : :
<1/1012/27628/0xc86599c0/794856/Streams AQ: qmn slave idle wait>
Chain 6 : :
<1/1017/26902/0xcb664c68/6947064/No Wait>
Chain 7 : :
<1/1062/1/0xc865d150/17821858/Streams AQ: waiting for time man>
Chain 8 : :
<1/1074/1/0xcb663c88/1941564/Streams AQ: qmn coordinator idle>
Chain 9 : :
<1/1082/1255/0xcb6683f8/954558/log file switch (archiving neede>
-- <1/1086/1/0xc86581f0/1982594/buffer busy waits>
Chain 10 : :
<1/1089/1/0xcb660ce8/1564852/buffer busy waits>
Chain 11 : :
<1/1104/1/0xcb65dd48/20168766/DIAG idle wait>
Extra information that will be dumped at higher levels:
[level 4] : 1 node dumps -- [REMOTE_WT] [LEAF] [LEAF_NW]
[level 5] : 5 node dumps -- [SINGLE_NODE] [SINGLE_NODE_NW] [IGN_DMP]
[level 6] : 8 node dumps -- [NLEAF]
[level 10] : 30 node dumps -- [IGN]
發現很多log file switch(archiving needed)等待,
登陸ASM例項檢視,diskgroup還有很多空餘
SQL> select GROUP_NUMBER,NAME,FREE_MB from v$asm_diskgroup;
==============
HANG ANALYSIS:
==============
Open chains found:
Chain 1 :
<1/1091/1/0xcb6604f8/7889148/rdbms ipc message>
-- <1/950/34226/0xcb665c48/2396294/log file switch (archiving neede>
Other chains found:
Chain 2 :
<1/951/16693/0xc865d940/2535526/log file switch (archiving neede>
Chain 3 :
<1/954/5565/0xcb6624b8/213064/log file switch (archiving neede>
-- <1/967/13888/0xc865b190/27140236/row cache lock>
Chain 4 :
<1/958/23461/0xc865f110/2371616/log file switch (archiving neede>
Chain 5 :
<1/1012/27628/0xc86599c0/794856/Streams AQ: qmn slave idle wait>
Chain 6 :
<1/1017/26902/0xcb664c68/6947064/No Wait>
Chain 7 :
<1/1062/1/0xc865d150/17821858/Streams AQ: waiting for time man>
Chain 8 :
<1/1074/1/0xcb663c88/1941564/Streams AQ: qmn coordinator idle>
Chain 9 :
<1/1082/1255/0xcb6683f8/954558/log file switch (archiving neede>
-- <1/1086/1/0xc86581f0/1982594/buffer busy waits>
Chain 10 :
<1/1089/1/0xcb660ce8/1564852/buffer busy waits>
Chain 11 :
<1/1104/1/0xcb65dd48/20168766/DIAG idle wait>
Extra information that will be dumped at higher levels:
[level 4] : 1 node dumps -- [REMOTE_WT] [LEAF] [LEAF_NW]
[level 5] : 5 node dumps -- [SINGLE_NODE] [SINGLE_NODE_NW] [IGN_DMP]
[level 6] : 8 node dumps -- [NLEAF]
[level 10] : 30 node dumps -- [IGN]
發現很多log file switch(archiving needed)等待,
登陸ASM例項檢視,diskgroup還有很多空餘
SQL> select GROUP_NUMBER,NAME,FREE_MB from v$asm_diskgroup;
GROUP_NUMBER NAME FREE_MB
------------ ------------------------------ ----------
1 ARCH_DG 1048611
2 DATA_DG 1567056
3 FRA_DG 810885
4 REDO_DG1 31852
5 REDO_DG2 31876
6 TEMP_DG 1821433
------------ ------------------------------ ----------
1 ARCH_DG 1048611
2 DATA_DG 1567056
3 FRA_DG 810885
4 REDO_DG1 31852
5 REDO_DG2 31876
6 TEMP_DG 1821433
該資料庫把歸檔日誌都放在了db_recovery_file_dest 路徑下,當該路徑用完時,也會導致無法歸檔
SQL> archive log list;
Database log mode Archive Mode
Automatic archival Enabled
Archive destination USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence 1379
Next log sequence to archive 1381
Current log sequence 1381
SQL> show parameter db_re
SQL> archive log list;
Database log mode Archive Mode
Automatic archival Enabled
Archive destination USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence 1379
Next log sequence to archive 1381
Current log sequence 1381
SQL> show parameter db_re
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
db_recovery_file_dest string +FRA_DG
db_recovery_file_dest_size big integer 1050G
db_recycle_cache_size big integer 0
但是alertlog中並沒有db_recovery_file_dest爆滿的記錄,相應檢視顯示結果也很正常
SQL> select * from V$FLASH_RECOVERY_AREA_USAGE;
------------------------------------ ----------- ------------------------------
db_recovery_file_dest string +FRA_DG
db_recovery_file_dest_size big integer 1050G
db_recycle_cache_size big integer 0
但是alertlog中並沒有db_recovery_file_dest爆滿的記錄,相應檢視顯示結果也很正常
SQL> select * from V$FLASH_RECOVERY_AREA_USAGE;
FILE_TYPE PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES
------------ ------------------ ------------------------- ---------------
CONTROLFILE 0 0 0
ONLINELOG 0 0 0
ARCHIVELOG 3.46 0 67
BACKUPPIECE 41.6 0 80
IMAGECOPY 0 0 0
FLASHBACKLOG 0 0 0
------------ ------------------ ------------------------- ---------------
CONTROLFILE 0 0 0
ONLINELOG 0 0 0
ARCHIVELOG 3.46 0 67
BACKUPPIECE 41.6 0 80
IMAGECOPY 0 0 0
FLASHBACKLOG 0 0 0
此時大致推測應該是arch程式出現了問題
SQL> show parameter log_archive_max_processes
SQL> show parameter log_archive_max_processes
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
log_archive_max_processes integer 2
arch程式只有兩個,將其調大至4,然後等待幾分鐘,資料庫自動回覆正常
節點2的alert再沒有出現error記錄,也可以正常執行alter system switch logfile;
------------------------------------ ----------- ------------------------------
log_archive_max_processes integer 2
arch程式只有兩個,將其調大至4,然後等待幾分鐘,資料庫自動回覆正常
節點2的alert再沒有出現error記錄,也可以正常執行alter system switch logfile;
此時再去看節點1,row cache lock消失了,而且檢視v$檢視也可以立即返回結果
小結:
3節點RAC有兩個節點出現問題,節點1是由於row cache lock導致的hang,節點2則是因為不能歸檔而hang;
節點1是由於sequence的cache設定為0造成的,以往也有類似情況發生,但是查詢v$檢視則可很快返回結果,這次顯得很詭異;
節點2出現問題好幾天了,先是alert不斷記錄ora-27302 & skgpspawn3,應該是swap分割槽不足造成的(560309.1),不過系統沒有安裝osw,找不到當時的記憶體使用情況,後來出現無法連線ASM例項以及一系列錯誤,
Redo log不能歸檔導致系統hang,檢視歸檔路徑使用情況後,最後嘗試增加arch程式數目,系統恢復正常;
更詭異的是,節點2恢復正常後節點1也正常了,不知道兩者是否有什麼內在聯絡
3節點RAC有兩個節點出現問題,節點1是由於row cache lock導致的hang,節點2則是因為不能歸檔而hang;
節點1是由於sequence的cache設定為0造成的,以往也有類似情況發生,但是查詢v$檢視則可很快返回結果,這次顯得很詭異;
節點2出現問題好幾天了,先是alert不斷記錄ora-27302 & skgpspawn3,應該是swap分割槽不足造成的(560309.1),不過系統沒有安裝osw,找不到當時的記憶體使用情況,後來出現無法連線ASM例項以及一系列錯誤,
Redo log不能歸檔導致系統hang,檢視歸檔路徑使用情況後,最後嘗試增加arch程式數目,系統恢復正常;
更詭異的是,節點2恢復正常後節點1也正常了,不知道兩者是否有什麼內在聯絡
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/15480802/viewspace-719940/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- hang了,嚴重的row cache lock 等待事件--就因大sql文字事件SQL
- 等待事件之Row Cache Lock事件
- 當刪除oracle資料庫user時發生row cache lock 等待事件Oracle資料庫事件
- Rac 環境中分割槽表建立index hang(row cache lock)Index
- log file switch
- hanganalyze解決row cache lock(ZT)
- WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!AIENQ
- [ORACLE 11G]ROW CACHE LOCK 等待Oracle
- MySQL:Innodb 讓MDL LOCK和ROW LOCK 記錄到errlogMySql
- 轉)用hanganalyze解決row cache lock
- (轉)用hanganalyze解決row cache lock
- LOG FILE SWITCH等待事件事件
- 故障排除:"WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! "AIENQ
- WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!的分析AIENQ
- 轉貼_用hanganalyze解決row cache lock
- 用hanganalyze解決row cache lock(轉貼)
- Oracle 11g 遇到log file sync嚴重等待事件Oracle事件
- oracle11g之expdp產生library cache lock及tablespace autoextend off_alert logOracle
- WAITEVENT: "row cache lock" Reference Note (文件 ID 34609.1)AI
- 一次WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCKAIENQ
- 由row cache lock等待事件引起的效能問題事件
- log file switch相關等待事件事件
- Resolving Issues Where 'Row Cache Lock' Waits are OccurringAI
- Metlink:Troubleshooting:WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!AIENQ
- 一次Row Cache Lock問題處理過程
- 11.2資料庫登入出現library cache lock等待(二)資料庫
- 11.2資料庫登入出現library cache lock等待(一)資料庫
- SWITCHOVER主庫出現LOG SWITCH GAP和RESOLVABLE GAP解決一例
- sql server資料庫select產生嚴重阻塞引起效能問題SQLServer資料庫
- 關於log file switch and checkpoint機制
- log file switch (checkpoint incomplete)等待事件事件
- 【等待事件】log file switch (checkpoint incomplete)事件
- Oracle RAC Cache Fusion 系列十:Oracle RAC Enqueues And Lock Part 1OracleENQ
- WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! 與 dc_tablespcesAIENQ
- WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK-[ID 278316.1]AIENQ
- How to Match a Row Cache Object Child Latch to its Row CacheObject
- RAC 資料庫中的'log file sync' 等待事件資料庫事件
- 【RAC】RAC 效能分析 - 'log file sync' 等待事件事件