WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!的分析

studywell發表於2017-02-04

資料庫HANG住,最後被迫重啟。是一個RAC環境,10.2.0.4

1.山西電信告警alert_mega.log

Mon Jun 01 02:50:03 CST 2015
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=31
System State dumped to trace file /opt/oracle/admin/dba/udump/mega_ora_27190.trc
Tue Jun 02 02:50:03 CST 2015
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=32
System State dumped to trace file /opt/oracle/admin/dba/udump/mega_ora_2063.trc
Tue Jun 02 17:53:40 CST 2015
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=37
System State dumped to trace file /opt/oracle/admin/dba/udump/mega_ora_7006.trc
Tue Jun 02 18:00:41 CST 2015
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=38
System State dumped to trace file /opt/oracle/admin/dba/udump/mega_ora_7120.trc
Wed Jun 03 02:50:02 CST 2015
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=39
System State dumped to trace file /opt/oracle/admin/dba/udump/mega_ora_9632.trc
Wed Jun 03 20:54:39 CST 2015
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=41
System State dumped to trace file /opt/oracle/admin/dba/udump/mega_ora_15023.trc
Wed Jun 03 20:55:07 CST 2015
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=42
System State dumped to trace file /opt/oracle/admin/dba/udump/mega_ora_15029.trc
Wed Jun 03 20:55:26 CST 2015
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=43
System State dumped to trace file /opt/oracle/admin/dba/udump/mega_ora_15033.trc
Thu Jun 04 02:50:02 CST 2015
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=44
System State dumped to trace file /opt/oracle/admin/dba/udump/mega_ora_16725.trc
Fri Jun 05 02:50:02 CST 2015
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=45
System State dumped to trace file /opt/oracle/admin/dba/udump/mega_ora_23781.trc
Sat Jun 06 02:50:03 CST 2015
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=46
System State dumped to trace file /opt/oracle/admin/dba/udump/mega_ora_30833.trc
Sat Jun 06 23:50:03 CST 2015
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=47
System State dumped to trace file /opt/oracle/admin/dba/udump/mega_ora_4951.trc
Sun Jun 07 02:50:02 CST 2015
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=48
System State dumped to trace file /opt/oracle/admin/dba/udump/mega_ora_5891.trc
Mon Jun 08 02:50:02 CST 2015
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=49
System State dumped to trace file /opt/oracle/admin/dba/udump/mega_ora_7520.trc
Tue Jun 09 02:50:02 CST 2015
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=50
System State dumped to trace file /opt/oracle/admin/dba/udump/mega_ora_14768.trc
Wed Jun 10 02:50:02 CST 2015
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=51
System State dumped to trace file /opt/oracle/admin/dba/udump/mega_ora_23975.trc
Wed Jun 10 22:24:05 CST 2015
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=52
System State dumped to trace file /opt/oracle/admin/dba/udump/mega_ora_29755.trc
Wed Jun 10 22:25:03 CST 2015
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=53
System State dumped to trace file /opt/oracle/admin/dba/udump/mega_ora_29760.trc
Thu Jun 11 02:50:02 CST 2015
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=54
System State dumped to trace file /opt/oracle/admin/dba/udump/mega_ora_31040.trc

2.<P>在TRACE檔案中包含一個SYSTEMSTATE DUMP ,首先透過ASS.awk來分析一下systemstate dump:</P>
<P><BR><FONT style="BACKGROUND-COLOR: #2bb3d5">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Resource Holder State<BR>Enqueue PR-00000000-00000000&nbsp;&nbsp;&nbsp; 41: last wait for 'os thread startup'<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Latch c0000000c2df3b70&nbsp;&nbsp;&nbsp; ??? Blocker<BR>Rcache object=c000000f9fdf8160,&nbsp;&nbsp;&nbsp; 61: waiting for 'log file switch (checkpoint incomplete)'<BR>Enqueue US-0000004C-00000000&nbsp;&nbsp; 185: waiting for 'log file switch (checkpoint incomplete)'<BR>Enqueue FB-00000006-0186BDC8&nbsp;&nbsp; 187: waiting for 'log file switch (checkpoint incomplete)'<BR>LOCK: handle=c000000f388db3d0&nbsp;&nbsp; 204: waiting for 'log file switch (checkpoint incomplete)'</FONT></P>
<P><FONT style="BACKGROUND-COLOR: #2bb3d5">Object Names<BR>~~~~~~~~~~~~<BR>Enqueue PR-00000000-00000000&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <BR>Latch c0000000c2df3b70&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; holding&nbsp;&nbsp;&nbsp; (efd=5) c0000000c2df3b70 slave cl<BR>Rcache object=c000000f9fdf8160,&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <BR>Enqueue US-0000004C-00000000&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <BR>Enqueue FB-00000006-0186BDC8&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <BR>LOCK: handle=c000000f388db3d0&nbsp;TABL:REPORT.STATQ_AGENT_SUBS_NEW <BR></FONT></P>
<P>看樣子ORAPID=41的會話可能是BLOCKER,從ass生成的報告來看,系統中存在大量的log file switch(checkpoint incomplete)等待。我們首先檢視一下latch c0000000c2df3b70的持有者:</P>
<P>PROCESS 41:<BR>&nbsp; ----------------------------------------<BR>&nbsp; SO: c000000fbe3c1910, type: 2, owner: 0000000000000000, flag: INIT/-/-/0x00<BR>&nbsp; (process) Oracle pid=41, calls cur/top: c000000f9d63cbd0/c000000fa2c63088, flag: (2) SYSTEM<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; int error: 0, call error: 0, sess error: 0, txn error 0<BR>&nbsp; (post info) last post received: 0 0 33<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; last post received-location: ksrpublish<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; last process to post me: c000000fba3f0d80 1 6<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; last post sent: 0 0 24<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; last post sent-location: ksasnd<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; last process posted by me: c000000fb93cbc00 1 6<BR>&nbsp;&nbsp;&nbsp; (latch info) wait_event=0 bits=1<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; holding&nbsp;&nbsp;&nbsp; (efd=5) c0000000c2df3b70 slave class create level=0 <BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Location from where latch is held: ksvcreate: <BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Context saved from call: 0<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; state=busy, wlstate=free<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; waiters [orapid (seconds since: put on list, posted, alive check)]:<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 48 (61788, 1251079676, 3)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 118 (61728, 1251079676, 0)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; waiter count=2<BR>&nbsp;&nbsp;&nbsp; Process Group: DEFAULT, pseudo proc: c000000fbc44e418<BR>&nbsp;&nbsp;&nbsp; O/S info: user: oracle, term: UNKNOWN, ospid: 16929 <BR>&nbsp;&nbsp;&nbsp; OSD pid info: Unix process pid: 16929, image: <A href="mailto:oracle@qdbb1">oracle@qdbb1</A> <FONT color=#f70909>(MMON)</FONT><BR></P>
<P>&nbsp; mmon在等待:</P>
<P>&nbsp;&nbsp;&nbsp; service name: SYS$BACKGROUND<BR>&nbsp;&nbsp;&nbsp; last wait for 'os thread startup' blocking sess=0x0000000000000000 seq=59956 wait_time=1006468 seconds since wait started=62472<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =0, =0, =0<BR>&nbsp;&nbsp;&nbsp; Dumping Session Wait History<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'os thread startup' count=1 wait_time=1006468<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =0, =0, =0<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'os thread startup' count=1 wait_time=958707<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =0, =0, =0<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'os thread startup' count=1 wait_time=1017506<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =0, =0, =0<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'os thread startup' count=1 wait_time=975671<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =0, =0, =0<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'os thread startup' count=1 wait_time=976846<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =0, =0, =0<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'os thread startup' count=1 wait_time=984219<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =0, =0, =0<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'os thread startup' count=1 wait_time=585799<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =0, =0, =0<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'os thread startup' count=1 wait_time=1355858<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =0, =0, =0<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'os thread startup' count=1 wait_time=979583<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =0, =0, =0<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'os thread startup' count=1 wait_time=990744<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =0, =0, =0<BR>看樣子mmon在啟動其他程式的時候在等待啟動結束:</P>
<P>&nbsp;&nbsp;&nbsp; SO: c000000f9e6e3a20, type: 16, owner: c000000fbe3c1910, flag: INIT/-/-/0x00<BR>&nbsp;&nbsp;&nbsp; (osp req holder)<BR>&nbsp;&nbsp;&nbsp; CHILD REQUESTS:<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (osp req) type=2(BACKGROUND) flags=0x20001(STATIC/-) state=8(DEAD) err=0<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; pg=129(default) arg1=0 arg2=(null) reply=(null) <FONT color=#f70909>pname=m000<BR></FONT>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; pid=0 parent=c000000f9e6e3a20 fulfill=0000000000000000<BR>原來mmon在等待m000啟動.那麼mmon等待m000會導致整個系統HANG住嗎?我們需要進一步分析</P>

3.<P>我們回過頭來在來看這個&gt;&gt;&gt; WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=XXX</P>
<P>這個等待從臨晨4點多到當機前一直存在。透過ass的結果,我們可以看到:</P>
<P>254:waiting for 'row cache lock'&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [Rcache object=c000000f9fdf8160,] wait<BR>255:waiting for 'row cache lock'&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [Rcache object=c000000f9fdf8160,] wait<BR>256:waiting for 'row cache lock'&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [Rcache object=c000000f9fdf8160,] wait<BR>257:waiting for 'row cache lock'&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [Rcache object=c000000f9fdf8160,] wait<BR>258:waiting for 'row cache lock'&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [Rcache object=c000000f9fdf8160,] wait<BR>259:waiting for 'row cache lock'&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [Rcache object=c000000f9fdf8160,] wait<BR>260:waiting for 'row cache lock'&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [Rcache object=c000000f9fdf8160,] wait<BR>261:waiting for 'row cache lock'&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [Rcache object=c000000f9fdf8160,] wait<BR>262:waiting for 'row cache lock'&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [Rcache object=c000000f9fdf8160,] wait<BR>263:waiting for 'row cache lock'&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [Rcache object=c000000f9fdf8160,] wait<BR>264:waiting for 'row cache lock'&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [Rcache object=c000000f9fdf8160,] wait<BR>265:waiting for 'row cache lock'&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [Rcache object=c000000f9fdf8160,] wait<BR>266:waiting for 'row cache lock'&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [Rcache object=c000000f9fdf8160,] wait<BR>267:waiting for 'row cache lock'&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [Rcache object=c000000f9fdf8160,] wait<BR>268:waiting for 'row cache lock'&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [Rcache object=c000000f9fdf8160,] wait<BR>269:waiting for 'row cache lock'&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [Rcache object=c000000f9fdf8160,] wait<BR>270:waiting for 'row cache lock'&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [Rcache object=c000000f9fdf8160,] wait<BR>271:waiting for 'row cache lock'&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [Rcache object=c000000f9fdf8160,] wait<BR>272:waiting for 'row cache lock'&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [Rcache object=c000000f9fdf8160,] wait<BR>273:waiting for 'row cache lock'&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [Rcache object=c000000f9fdf8160,] wait<BR>274:waiting for 'row cache lock'&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [Rcache object=c000000f9fdf8160,] wait<BR>275:waiting for 'row cache lock'&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [Rcache object=c000000f9fdf8160,] wait<BR>276:waiting for 'row cache lock'&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [Rcache object=c000000f9fdf8160,] wait<BR>277:waiting for 'row cache lock'&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [Rcache object=c000000f9fdf8160,] wait<BR>278:waiting for 'row cache lock'&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [Rcache object=c000000f9fdf8160,] wait<BR>279:waiting for 'row cache lock'&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [Rcache object=c000000f9fdf8160,] wait<BR>280:waiting for 'row cache lock'&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [Rcache object=c000000f9fdf8160,] wait<BR>281:waiting for 'row cache lock'&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [Rcache object=c000000f9fdf8160,] wait<BR></P>
<P>檢查一下process 254</P>
<P>&nbsp;&nbsp;&nbsp; service name: SYS$USERS<BR>&nbsp;&nbsp;&nbsp; O/S info: user: report, term: , ospid: 17444, machine: qdbb2<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; program: <A href="mailto:task@qdbb2">task@qdbb2</A> (TNS V1-V3)<BR>&nbsp;&nbsp;&nbsp; application name: <A href="mailto:task@qdbb2">task@qdbb2</A> (TNS V1-V3), hash value=516955253<BR>&nbsp;&nbsp;&nbsp; waiting for 'row cache lock' blocking sess=0x0000000000000000 seq=1 wait_time=0 seconds since wait started=3425<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; cache id=7, mode=0, request=3<BR>&nbsp;&nbsp;&nbsp; Dumping Session Wait History<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'row cache lock' count=1 wait_time=2615950<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; cache id=7, mode=0, request=3<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'row cache lock' count=1 wait_time=2929664<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; cache id=7, mode=0, request=3<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'row cache lock' count=1 wait_time=2929674<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; cache id=7, mode=0, request=3<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'row cache lock' count=1 wait_time=2929666<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; cache id=7, mode=0, request=3<BR></P>
<P>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; SO: c000000f9f9198b8, type: 50, owner: c000000ee30a3060, flag: INIT/-/-/0x00<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; row cache enqueue: count=1 session=c000000fba59b660 object=c000000f9fdf8160, request=S<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; savepoint=0x4<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; row cache parent object: address=c000000f9fdf8160 cid=7(dc_users)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; hash=cdb2e447 typ=11 transaction=c000000faae8c710 flags=0000012a<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; own=c000000f9fdf8230[c000000f9f919a08,c000000f9f919a08] wat=c000000f9fdf8240[c000000f9f9198e8,c000000f9f918b08] mode=X</P>
<P>這是一個dc_users的物件,請求s方式的訪問,目前在等待一個mode=x的等待釋放,因此我們搜尋c000000f9fdf8160, mode=X,找到:</P>
<P>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; SO: c000000f9f9199d8, type: 50, owner: c000000faae8c710, flag: INIT/-/-/0x00<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; row cache enqueue: count=1 session=c000000fba5f3b30 object=c000000f9fdf8160, mode=X<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; savepoint=0x6<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; row cache parent object: address=c000000f9fdf8160 cid=7(dc_users)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; hash=cdb2e447 typ=11 transaction=c000000faae8c710 flags=0000012a<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; own=c000000f9fdf8230[c000000f9f919a08,c000000f9f919a08] wat=c000000f9fdf8240[c000000f9f9198e8,c000000f9f918b08] mode=X<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; status=VALID/UPDATE/-/-/IO/-/-/-/-<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; request=N release=TRUE flags=0<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; instance lock id=QH b6ae8f92 4cc6da8c<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; set=0, complete=FALSE<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; set=1, complete=FALSE<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; set=2, complete=FALSE<BR>就是這個會話持有了這個dc_users的row cache object,而且持有方式是不相容的,往前看,發現這個會話是:</P>
<P>PROCESS 61:<BR>&nbsp; ----------------------------------------<BR>&nbsp; SO: c000000fbf3bfe48, type: 2, owner: 0000000000000000, flag: INIT/-/-/0x00<BR>&nbsp; (process) Oracle pid=61, calls cur/top: c000000fa0c0eb30/c000000f8b0fdef8, flag: (0) -<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; int error: 0, call error: 0, sess error: 0, txn error 0<BR>&nbsp; (post info) last post received: 109 0 4<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; last post received-location: kslpsr<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; last process to post me: c000000fbc3c2d98 1 6<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; last post sent: 0 0 24<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; last post sent-location: ksasnd<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; last process posted by me: c000000fbc3c2d98 1 6<BR>&nbsp;&nbsp;&nbsp; (latch info) wait_event=109 bits=0<BR>&nbsp;&nbsp;&nbsp; Process Group: DEFAULT, pseudo proc: c000000fbc44e418<BR>&nbsp;&nbsp;&nbsp; O/S info: user: oracle, term: UNKNOWN, ospid: 2006 <BR>&nbsp;&nbsp;&nbsp; OSD pid info: Unix process pid: 2006, image: <A href="mailto:oracle@qdbb1">oracle@qdbb1</A><BR></P>
<P>&nbsp;&nbsp;&nbsp; SO: c000000fba5f3b30, type: 4, owner: c000000fbf3bfe48, flag: INIT/-/-/0x00<BR>&nbsp;&nbsp;&nbsp; (session) sid: 1973 trans: c000000faae8c710, creator: c000000fbf3bfe48, flag: (41) USR/- BSY/-/-/-/-/-<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; DID: 0001-003D-041A6D36, short-term DID: 0000-0000-00000000<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; txn branch: 0000000000000000<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; oct: 0, prv: 0, sql: 0000000000000000, psql: 0000000000000000, user: 0/SYS<BR>&nbsp;&nbsp;&nbsp; service name: SYS$USERS<BR>&nbsp;&nbsp;&nbsp; O/S info: user: liping, term: L902054, ospid: 163000:163128, machine: <BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; program: plsqldev.exe<BR>&nbsp;&nbsp;&nbsp; application name: plsqldev.exe, hash value=2126984564<BR>&nbsp;&nbsp;&nbsp; waiting for 'log file switch (checkpoint incomplete)' blocking sess=0x0000000000000000 seq=9 wait_time=0 seconds since wait started=2290<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =0, =0, =0<BR>&nbsp;&nbsp;&nbsp; Dumping Session Wait History<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'log file switch (checkpoint incomplete)' count=1 wait_time=87117<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =0, =0, =0<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'log file switch (checkpoint incomplete)' count=1 wait_time=976593<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =0, =0, =0<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'log file switch (checkpoint incomplete)' count=1 wait_time=979179<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =0, =0, =0<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'log file switch (checkpoint incomplete)' count=1 wait_time=7099<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =0, =0, =0<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'log file switch (checkpoint incomplete)' count=1 wait_time=976471<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =0, =0, =0<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'log file switch (checkpoint incomplete)' count=1 wait_time=976611<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =0, =0, =0<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'log file switch (checkpoint incomplete)' count=1 wait_time=973437<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =0, =0, =0<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'log file switch (checkpoint incomplete)' count=1 wait_time=969860<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =0, =0, =0<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'log file switch (checkpoint incomplete)' count=1 wait_time=976535<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =0, =0, =0<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'log file switch (checkpoint incomplete)' count=1 wait_time=978815<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =0, =0, =0<BR>這是一個PL/SQL DEVELOPER的使用者,不過從上面的情況看,這個會話是產生了阻塞大量會話的源頭,不過這個會話本身也是受害者,在等待log file switch(checkpoint incomplete),為什麼會這樣呢?<BR></P>

4.<P>於是我們來看看lgwr的狀態:</P>
<P>&nbsp;&nbsp;&nbsp; SO: c000000fba6167c0, type: 4, owner: c000000fbc3c2d98, flag: INIT/-/-/0x00<BR>&nbsp;&nbsp;&nbsp; (session) sid: 2155 trans: 0000000000000000, creator: c000000fbc3c2d98, flag: (51) USR/- BSY/-/-/-/-/-<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; DID: 0001-0024-00000005, short-term DID: 0000-0000-00000000<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; txn branch: 0000000000000000<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; oct: 0, prv: 0, sql: 0000000000000000, psql: 0000000000000000, user: 0/SYS<BR>&nbsp;&nbsp;&nbsp; service name: SYS$BACKGROUND<BR>&nbsp;&nbsp;&nbsp; waiting for 'rdbms ipc message' blocking sess=0x0000000000000000 seq=13185 wait_time=0 seconds since wait started=0<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; timeout=d3, =0, =0<BR>&nbsp;&nbsp;&nbsp; Dumping Session Wait History<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'control file sequential read' count=1 wait_time=233<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; file#=0, block#=2e, blocks=1<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'control file sequential read' count=1 wait_time=210<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; file#=0, block#=2c, blocks=1<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'control file sequential read' count=1 wait_time=255<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; file#=0, block#=29, blocks=1<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'control file sequential read' count=1 wait_time=213<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; file#=0, block#=27, blocks=1<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'control file sequential read' count=1 wait_time=222<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; file#=2, block#=1, blocks=1<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'control file sequential read' count=1 wait_time=221<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; file#=1, block#=1, blocks=1<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'control file sequential read' count=1 wait_time=282<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; file#=0, block#=1, blocks=1<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'rdbms ipc message' count=1 wait_time=4564<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; timeout=d4, =0, =0<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'rdbms ipc message' count=1 wait_time=3188<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; timeout=d4, =0, =0<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'rdbms ipc message' count=1 wait_time=1<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; timeout=d4, =0, =0</P>
<P>LGWR居然在等待ipc message,說明lgwr並沒有處於工作狀態,或者無事可做,或者等待其他會話處理結束後喚醒它。而目前大家都在等待log file switch,lgwr沒有閒著沒事做的理由,因此我們有理由相信checkpoint出問題了,lgwr是在等待ckpt完成工作,於是我們繼續檢查ckpt的狀態:</P>
<P>&nbsp;</P>

5.<P>搜尋(ckpt)</P>
<P>&nbsp;&nbsp;&nbsp; SO: c000000fbb632130, type: 4, owner: c000000fbb40bc08, flag: INIT/-/-/0x00<BR>&nbsp;&nbsp;&nbsp; (session) sid: 2154 trans: 0000000000000000, creator: c000000fbb40bc08, flag: (51) USR/- BSY/-/-/-/-/-<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; DID: 0001-0025-00000004, short-term DID: 0001-0025-00000005<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; txn branch: 0000000000000000<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; oct: 0, prv: 0, sql: 0000000000000000, psql: 0000000000000000, user: 0/SYS<BR>&nbsp;&nbsp;&nbsp; service name: SYS$BACKGROUND<BR>&nbsp;&nbsp;&nbsp; waiting for 'enq: PR - contention' blocking sess=0xc000000fbf5e4278 seq=61352 wait_time=0 seconds since wait started=62446<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; name|mode=50520006, 0=0, 0=0<BR>&nbsp;&nbsp;&nbsp; Dumping Session Wait History<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'enq: PR - contention' count=1 wait_time=429069<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; name|mode=50520006, 0=0, 0=0</P>
<P>&nbsp;&nbsp;&nbsp;&nbsp; for 'enq: PR - contention' count=1 wait_time=2929688<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; name|mode=50520006, 0=0, 0=0<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'enq: PR - contention' count=1 wait_time=2929668<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; name|mode=50520006, 0=0, 0=0<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'enq: PR - contention' count=1 wait_time=2929682<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; name|mode=50520006, 0=0, 0=0<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'enq: PR - contention' count=1 wait_time=2929671<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; name|mode=50520006, 0=0, 0=0<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'enq: PR - contention' count=1 wait_time=2929679<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; name|mode=50520006, 0=0, 0=0<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'enq: PR - contention' count=1 wait_time=2929666<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; name|mode=50520006, 0=0, 0=0<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'enq: PR - contention' count=1 wait_time=2929683<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; name|mode=50520006, 0=0, 0=0<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'enq: PR - contention' count=1 wait_time=2929624<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; name|mode=50520006, 0=0, 0=0<BR>&nbsp;&nbsp;&nbsp;&nbsp; for 'enq: PR - contention' count=1 wait_time=2929676<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; name|mode=50520006, 0=0, 0=0<BR><BR>ckpt一直在等待PR鎖,這回我們終於看到了blocking sess=0xc000000fbf5e4278 ,這個會話就是阻塞ckpt的元兇,我們來看看是誰:</P>
<P>&nbsp;&nbsp;&nbsp; SO: c000000fbf5e4278, type: 4, owner: c000000fbe3c1910, flag: INIT/-/-/0x00<BR>&nbsp;&nbsp;&nbsp; (session) sid: 2150 trans: 0000000000000000, creator: c000000fbe3c1910, flag: (100051) USR/- BSY/-/-/-/-/-<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; DID: 0001-0029-00000002, short-term DID: 0001-0029-00000003<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; txn branch: 0000000000000000<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; oct: 0, prv: 0, sql: c000000fa3ed29b8, psql: 0000000000000000, user: 0/SYS<BR>&nbsp;&nbsp;&nbsp; service name: SYS$BACKGROUND<BR>&nbsp;&nbsp;&nbsp; last wait for 'os thread startup' blocking sess=0x0000000000000000 seq=59956 wait_time=1006468 seconds since wait started=62472<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =0, =0, =0<BR>我們又找到了mmon,原來mmon才是導致系統HANG住的元兇。</P>

6.<P>案例總結:</P>
<P>1、到目前為止,我們弄清楚了系統HANG住的主要原因是MMON啟動m000的時候,m000沒能夠正常啟動,導致mmon長期持有pr鎖,阻塞了ckpt,導致無法正常做日誌切換,但是mmon未能正常啟動m000的原因還沒有找到,這需要更多的資料進行分析,不在本文討論之內</P>
<P>2、碰到這種問題如何處理?實際上如果我們分析清楚了,只要殺掉mmon程式,或者調整一下一下statistics_level就可能解決問題,而不需要重啟例項。但是這個分析需要時間,最起碼也需要30-60分鐘的分析時間,如果在生產環境下發生,很難給我們這麼長的分析時間。但是大家要注意這個問題的出現時在4點多鐘,系統HANG住是在10點多鐘,中間有很長時間供我們分析。如果我們能及時發現這個異常,是可能在系統HANG住錢解決問題的</P>

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29519108/viewspace-2133059/,如需轉載,請註明出處,否則將追究法律責任。

相關文章