SystemState分析案例(一)
這是1個09年的故障,資料庫無法登陸,殺了發現有問題的程式,以及所有LOCAL=NO的還是不行,最後直接重啟了庫
今天想總結下systemstate,把以前的systemstate拿出來重新分析整理了下,發現了root cause
通過分析trc,發現大量的程式正在等待latch:library cache,一般這是嚴重的解析問題造成。但是從資料庫以前的awr看,沒有嚴重的解析問題
獲得systemstate,隨意找了個正在等待latch:library cache的程式看,其在請求7000001302e0170 這個Child library cache,可能的阻塞者為Process 51
waiting for 7000001302e0170 Child library cache level=5 child#=9
Location from where latch is held: kghfrunp: clatch: wait:
Context saved from call: 0
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
93 (942, 1258509046, 2)
......
8 (735, 1258509046, 2)
waiter count=28
gotten 444254233 times wait, failed first 66211 sleeps 14231
gotten 6228862 times nowait, failed: 40880
possible holder pid = 51 spid=1331410
on wait list for 7000001302e0170
那麼是誰持有 7000001302e0170 這個 latch?搜尋 [ hold.*7000001302e0170 ],發現Process 51持有7000001302e0170,並且,其在請求7000000100e5020 這個 1# child shared pool latch,可能的阻塞者為Process 92
PROCESS 51:
----------------------------------------
SO: 70000013550d080, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=51, calls cur/top: 0/7000001000e1260, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 0
last post received-location: No post
last process to post me: none
last post sent: 0 0 0
last post sent-location: No post
last process posted by me: none
(latch info) wait_event=0 bits=20
Location from where call was made: kghfrunp: alloc: session dur:
waiting for 7000000100e5020 Child shared pool level=7 child#=1
Location from where latch is held: kgh: add extent to reserved list:
Context saved from call: 0
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
27 (915, 1258509034, 2)
......
629 (468, 1258509034, 2)
waiter count=259
gotten 1006956901 times wait, failed first 239663 sleeps 97559
gotten 0 times nowait, failed: 0
possible holder pid = 92 spid=929882
on wait list for 7000000100e5020
holding (efd=6) 7000001302e0170 Child library cache level=5 child#=9
Location from where latch is held: kghfrunp: clatch: wait:
搜尋 [ hold.*7000000100e5020 ],沒有發現誰持有該latch,那麼看看可能的阻塞者Process 92
PROCESS 92:
----------------------------------------
SO: 70000013850e348, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=92, calls cur/top: 700000118a884b8/700000118a884b8, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 187
last post received-location: kglpndl: post after freeing latch
last process to post me: 7000001384f80e8 1 14
last post sent: 0 0 0
last post sent-location: No post
last process posted by me: none
(latch info) wait_event=0 bits=80
holding (efd=9) 700000010021a20 Parent+children shared pool level=7
Location from where latch is held: kghfrunp: alloc: clatch nowait:
Context saved from call: 0
state=busy, wlstate=free
Process Group: DEFAULT, pseudo proc: 7000001385b4978
O/S info: user: oracle, term: UNKNOWN, ospid: 929882
OSD pid info: Unix process pid: 929882, image: oracleXXXXX@oracle
Short stack dump: unable to dump stack due to error 72
----------------------------------------
SO: 7000001386a6978, type: 4, owner: 70000013850e348, flag: INIT/-/-/0x00
(session) sid: 371 trans: 0, creator: 70000013850e348, flag: (41) USR/- BSY/-/-/-/-/-
DID: 0001-005C-00006C20, short-term DID: 0000-0000-00000000
txn branch: 0
oct: 0, prv: 0, sql: 70000007be42f48, psql: 700000136139f00, user: 196/XXXXX5
O/S info: user: , term: , ospid: 1234, machine: was0
program:
last wait for 'latch: shared pool' blocking sess=0x0 seq=284 wait_time=283476 seconds since wait started=922
address=7000000100e5020, number=d5, tries=1d
Dumping Session Wait History
for 'latch: shared pool' count=1 wait_time=283476
address=7000000100e5020, number=d5, tries=1d
for 'latch: shared pool' count=1 wait_time=292978
address=7000000100e5020, number=d5, tries=1c
for 'latch: shared pool' count=1 wait_time=292977
address=7000000100e5020, number=d5, tries=1b
for 'latch: shared pool' count=1 wait_time=292978
address=7000000100e5020, number=d5, tries=1a
for 'latch: shared pool' count=1 wait_time=292978
address=7000000100e5020, number=d5, tries=19
for 'latch: shared pool' count=1 wait_time=292977
address=7000000100e5020, number=d5, tries=18
for 'latch: shared pool' count=1 wait_time=292978
address=7000000100e5020, number=d5, tries=17
for 'latch: shared pool' count=1 wait_time=292979
address=7000000100e5020, number=d5, tries=16
for 'latch: shared pool' count=1 wait_time=293120
address=7000000100e5020, number=d5, tries=15
for 'latch: shared pool' count=1 wait_time=294266
address=7000000100e5020, number=d5, tries=14
temporary object counter: 0
其持有Parent shared pool latch ,並且其請求過7000000100e5020這個1# child shared pool latch
當時我的判斷為該程式自死鎖,除非BUG,latch上應該是不會死鎖的。不過殺了所有連線程式資料庫也沒有恢復,最後強制重啟
今天問了下老熊,他說程式可以在請求父latch的同時,再去請求2個子latch
並且這裡是Dumping Session Wait History,並且也沒有waiting for,正在等待的資訊,也就是說程式現在可能已經獲得了該child latch。
接著我重新看了下trc,發現2# child shared pool latch 也有個大量程式在等待獲得
並且,在這2個資源的wait list上,也沒有發現有Process 92
那麼就符合老熊說的可能性,程式同時請求了父latch和1#,2#子latch,並長時間佔用不釋放,從而阻塞了其他程式
holding (efd=9) 700000010021a20 Parent+children shared pool level=7
這個也應該是表明程式已經獲得了父+子 shared pool latch
同時,我注意到,該程式在dump short stack報錯
Short stack dump: unable to dump stack due to error 72
ORA-72的意思為程式已經不存在,無法dump。就是說這個時候程式已經異常。
既然程式已經出現異常,甚至死亡,那麼這個時候PMON該出來幹事情了啊,那麼PMON這個時候在幹什麼?
PROCESS 2:
----------------------------------------
SO: 7000001384f80e8, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=2, calls cur/top: 70000013583ce90/70000013583ce90, flag: (e) SYSTEM
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 15
last post received-location: ksupsc
last process to post me: 7000001384f90a8 1 6
last post sent: 0 0 187
last post sent-location: kglpndl: post after freeing latch
last process posted by me: 70000013850e348 205 0
(latch info) wait_event=0 bits=0
Process Group: DEFAULT, pseudo proc: 7000001385b4978
O/S info: user: oracle, term: UNKNOWN, ospid: 397448
OSD pid info: Unix process pid: 397448, image: oracle@oracle (PMON)
Short stack dump:
ksdxfstk+002c
----------------------------------------
SO: 7000001388e24b0, type: 11, owner: 7000001384f80e8, flag: INIT/-/-/0x00
(broadcast handle) flag: (2) ACTIVE SUBSCRIBER, owner: 7000001384f80e8,
event: 1, last message event: 13,
last message waited event: 13, messages read: 1
channel: (700000138919d38) scumnt mount lock
scope: 1, event: 22, last mesage event: 13,
publishers/subscribers: 0/13,
messages published: 1
----------------------------------------
SO: 7000001387d49c0, type: 4, owner: 7000001384f80e8, flag: INIT/-/-/0x00
(session) sid: 849 trans: 0, creator: 7000001384f80e8, flag: (51) USR/- BSY/-/-/-/-/-
DID: 0001-0002-00000004, short-term DID: 0000-0000-00000000
txn branch: 0
oct: 0, prv: 0, sql: 0, psql: 0, user: 0/SYS
last wait for 'cursor: mutex X' blocking sess=0x0 seq=30347 wait_time=1 seconds since wait started=0
idn=ed27757c, value=22300000000, where|sleeps=5bccb92
Dumping Session Wait History
for 'cursor: mutex X' count=1 wait_time=1
idn=ed27757c, value=22300000000, where|sleeps=5bccb95
for 'cursor: mutex X' count=1 wait_time=0
idn=ed27757c, value=22300000000, where|sleeps=5bccb94
for 'cursor: mutex X' count=1 wait_time=0
idn=ed27757c, value=22300000000, where|sleeps=5bccb9d
for 'cursor: mutex X' count=1 wait_time=0
idn=ed27757c, value=22300000000, where|sleeps=5bccb9c
for 'cursor: mutex X' count=1 wait_time=1
idn=ed27757c, value=22300000000, where|sleeps=5bccb9b
for 'cursor: mutex X' count=1 wait_time=1
idn=ed27757c, value=22300000000, where|sleeps=5bccba4
for 'cursor: mutex X' count=1 wait_time=0
idn=ed27757c, value=22300000000, where|sleeps=5bccba3
for 'cursor: mutex X' count=1 wait_time=0
idn=ed27757c, value=22300000000, where|sleeps=5bccbac
for 'cursor: mutex X' count=1 wait_time=0
idn=ed27757c, value=22300000000, where|sleeps=5bccbab
for 'cursor: mutex X' count=1 wait_time=0
idn=ed27757c, value=22300000000, where|sleeps=5bccbaa
temporary object counter: 0
PMON曾經在cursor: mutex X上等待。但是這裡沒有waiting for的資訊
但是從Short stack上看,PMON當前也應該正在請求mutex。
為了確認這一點,我搜尋 [ mutex.*ed27757c ],確定PMON的確正在以EXCL模式請求該mutex,阻塞者為SID 547
KGX Atomic Operation Log 700000136bed8d8
Mutex 700000049974b98(547, 0) idn ed27757c oper GET_EXCL
Cursor Parent uid 849 efd 8 whr 12 slp 23941
per=kksAllocCursorStat [KKSSTALOC1] pt1=70000011cae22d0 pt2=70000011cae2338 pt3=0
pt4=0 u41=0 stt=0
並且Process 111以EXCL模式持有該mutex
KGX Atomic Operation Log 70000005eeb5f30
Mutex 700000049974b98(547, 0) idn ed27757c oper EXCL
Cursor Parent uid 547 efd 6 whr 2 slp 0
per=PARENT_ALLOC_CHILD pt1=0 pt2=70000011c24bd90 pt3=0
pt4=0 u41=0 stt=0
那麼程式111在幹啥子了?其在請求7000000100e5020,這個1# shared pool child latch
PROCESS 111:
----------------------------------------
SO: 70000013551bcc0, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=111, calls cur/top: 700000118c44568/700000118c44568, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 187
last post received-location: kglpndl: post after freeing latch
last process to post me: 7000001384f80e8 1 14
last post sent: 0 0 9
last post sent-location: ksqrcl
last process posted by me: 700000138510aa8 13 0
(latch info) wait_event=0 bits=0
Location from where call was made: kghalo:
waiting for 7000000100e5020 Child shared pool level=7 child#=1
Location from where latch is held: kgh: add extent to reserved list:
Context saved from call: 0
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
27 (923, 1258509042, 1)
......
629 (476, 1258509042, 1)
waiter count=259
gotten 1006956901 times wait, failed first 239663 sleeps 97559
gotten 0 times nowait, failed: 0
possible holder pid = 92 spid=929882
on wait list for 7000000100e5020
Process Group: DEFAULT, pseudo proc: 7000001385b4978
O/S info: user: oracle, term: UNKNOWN, ospid: 680104
OSD pid info: Unix process pid: 680104, image: oracleXXXXX@oracle
到這裡現象很明顯了:
Process 92持有來父SP和1,2號子SP,但是持有後其出現了異常,所以其一直持有這些latch不釋放。
這時需要PMON來對這個程式進行清理,但是這個時候,PMON卻在請求一個mutex,而該mutex又被Process 111持有
Process 111又在請求1# child SP latch。該latch又被Process 92持有了
這時就出現了一個latch + mutex的死鎖
沒有找到準確的BUG,但是有一些類似情況指出,可能有這樣的BUG:
PMON以X的模式請求一個mutex,而該mutex被一個死亡的程式持有,並且該程式正在等待PMON去清理
這樣的情況下,用kill -9 殺掉111程式,那麼將導致程式死亡,等待PMON去釋放其持有的資源,而PMON卻在等待111持有的mutex,還是不能解決問題
可能可以通過直接kill,讓程式自然終止,自己釋放持有的資源,但是這要有情況測試才知道
今天想總結下systemstate,把以前的systemstate拿出來重新分析整理了下,發現了root cause
通過分析trc,發現大量的程式正在等待latch:library cache,一般這是嚴重的解析問題造成。但是從資料庫以前的awr看,沒有嚴重的解析問題
獲得systemstate,隨意找了個正在等待latch:library cache的程式看,其在請求7000001302e0170 這個Child library cache,可能的阻塞者為Process 51
waiting for 7000001302e0170 Child library cache level=5 child#=9
Location from where latch is held: kghfrunp: clatch: wait:
Context saved from call: 0
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
93 (942, 1258509046, 2)
......
8 (735, 1258509046, 2)
waiter count=28
gotten 444254233 times wait, failed first 66211 sleeps 14231
gotten 6228862 times nowait, failed: 40880
possible holder pid = 51 spid=1331410
on wait list for 7000001302e0170
那麼是誰持有 7000001302e0170 這個 latch?搜尋 [ hold.*7000001302e0170 ],發現Process 51持有7000001302e0170,並且,其在請求7000000100e5020 這個 1# child shared pool latch,可能的阻塞者為Process 92
PROCESS 51:
----------------------------------------
SO: 70000013550d080, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=51, calls cur/top: 0/7000001000e1260, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 0
last post received-location: No post
last process to post me: none
last post sent: 0 0 0
last post sent-location: No post
last process posted by me: none
(latch info) wait_event=0 bits=20
Location from where call was made: kghfrunp: alloc: session dur:
waiting for 7000000100e5020 Child shared pool level=7 child#=1
Location from where latch is held: kgh: add extent to reserved list:
Context saved from call: 0
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
27 (915, 1258509034, 2)
......
629 (468, 1258509034, 2)
waiter count=259
gotten 1006956901 times wait, failed first 239663 sleeps 97559
gotten 0 times nowait, failed: 0
possible holder pid = 92 spid=929882
on wait list for 7000000100e5020
holding (efd=6) 7000001302e0170 Child library cache level=5 child#=9
Location from where latch is held: kghfrunp: clatch: wait:
搜尋 [ hold.*7000000100e5020 ],沒有發現誰持有該latch,那麼看看可能的阻塞者Process 92
PROCESS 92:
----------------------------------------
SO: 70000013850e348, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=92, calls cur/top: 700000118a884b8/700000118a884b8, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 187
last post received-location: kglpndl: post after freeing latch
last process to post me: 7000001384f80e8 1 14
last post sent: 0 0 0
last post sent-location: No post
last process posted by me: none
(latch info) wait_event=0 bits=80
holding (efd=9) 700000010021a20 Parent+children shared pool level=7
Location from where latch is held: kghfrunp: alloc: clatch nowait:
Context saved from call: 0
state=busy, wlstate=free
Process Group: DEFAULT, pseudo proc: 7000001385b4978
O/S info: user: oracle, term: UNKNOWN, ospid: 929882
OSD pid info: Unix process pid: 929882, image: oracleXXXXX@oracle
Short stack dump: unable to dump stack due to error 72
----------------------------------------
SO: 7000001386a6978, type: 4, owner: 70000013850e348, flag: INIT/-/-/0x00
(session) sid: 371 trans: 0, creator: 70000013850e348, flag: (41) USR/- BSY/-/-/-/-/-
DID: 0001-005C-00006C20, short-term DID: 0000-0000-00000000
txn branch: 0
oct: 0, prv: 0, sql: 70000007be42f48, psql: 700000136139f00, user: 196/XXXXX5
O/S info: user: , term: , ospid: 1234, machine: was0
program:
last wait for 'latch: shared pool' blocking sess=0x0 seq=284 wait_time=283476 seconds since wait started=922
address=7000000100e5020, number=d5, tries=1d
Dumping Session Wait History
for 'latch: shared pool' count=1 wait_time=283476
address=7000000100e5020, number=d5, tries=1d
for 'latch: shared pool' count=1 wait_time=292978
address=7000000100e5020, number=d5, tries=1c
for 'latch: shared pool' count=1 wait_time=292977
address=7000000100e5020, number=d5, tries=1b
for 'latch: shared pool' count=1 wait_time=292978
address=7000000100e5020, number=d5, tries=1a
for 'latch: shared pool' count=1 wait_time=292978
address=7000000100e5020, number=d5, tries=19
for 'latch: shared pool' count=1 wait_time=292977
address=7000000100e5020, number=d5, tries=18
for 'latch: shared pool' count=1 wait_time=292978
address=7000000100e5020, number=d5, tries=17
for 'latch: shared pool' count=1 wait_time=292979
address=7000000100e5020, number=d5, tries=16
for 'latch: shared pool' count=1 wait_time=293120
address=7000000100e5020, number=d5, tries=15
for 'latch: shared pool' count=1 wait_time=294266
address=7000000100e5020, number=d5, tries=14
temporary object counter: 0
其持有Parent shared pool latch ,並且其請求過7000000100e5020這個1# child shared pool latch
當時我的判斷為該程式自死鎖,除非BUG,latch上應該是不會死鎖的。不過殺了所有連線程式資料庫也沒有恢復,最後強制重啟
今天問了下老熊,他說程式可以在請求父latch的同時,再去請求2個子latch
並且這裡是Dumping Session Wait History,並且也沒有waiting for,正在等待的資訊,也就是說程式現在可能已經獲得了該child latch。
接著我重新看了下trc,發現2# child shared pool latch 也有個大量程式在等待獲得
並且,在這2個資源的wait list上,也沒有發現有Process 92
那麼就符合老熊說的可能性,程式同時請求了父latch和1#,2#子latch,並長時間佔用不釋放,從而阻塞了其他程式
holding (efd=9) 700000010021a20 Parent+children shared pool level=7
這個也應該是表明程式已經獲得了父+子 shared pool latch
同時,我注意到,該程式在dump short stack報錯
Short stack dump: unable to dump stack due to error 72
ORA-72的意思為程式已經不存在,無法dump。就是說這個時候程式已經異常。
既然程式已經出現異常,甚至死亡,那麼這個時候PMON該出來幹事情了啊,那麼PMON這個時候在幹什麼?
PROCESS 2:
----------------------------------------
SO: 7000001384f80e8, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=2, calls cur/top: 70000013583ce90/70000013583ce90, flag: (e) SYSTEM
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 15
last post received-location: ksupsc
last process to post me: 7000001384f90a8 1 6
last post sent: 0 0 187
last post sent-location: kglpndl: post after freeing latch
last process posted by me: 70000013850e348 205 0
(latch info) wait_event=0 bits=0
Process Group: DEFAULT, pseudo proc: 7000001385b4978
O/S info: user: oracle, term: UNKNOWN, ospid: 397448
OSD pid info: Unix process pid: 397448, image: oracle@oracle (PMON)
Short stack dump:
ksdxfstk+002c
----------------------------------------
SO: 7000001388e24b0, type: 11, owner: 7000001384f80e8, flag: INIT/-/-/0x00
(broadcast handle) flag: (2) ACTIVE SUBSCRIBER, owner: 7000001384f80e8,
event: 1, last message event: 13,
last message waited event: 13, messages read: 1
channel: (700000138919d38) scumnt mount lock
scope: 1, event: 22, last mesage event: 13,
publishers/subscribers: 0/13,
messages published: 1
----------------------------------------
SO: 7000001387d49c0, type: 4, owner: 7000001384f80e8, flag: INIT/-/-/0x00
(session) sid: 849 trans: 0, creator: 7000001384f80e8, flag: (51) USR/- BSY/-/-/-/-/-
DID: 0001-0002-00000004, short-term DID: 0000-0000-00000000
txn branch: 0
oct: 0, prv: 0, sql: 0, psql: 0, user: 0/SYS
last wait for 'cursor: mutex X' blocking sess=0x0 seq=30347 wait_time=1 seconds since wait started=0
idn=ed27757c, value=22300000000, where|sleeps=5bccb92
Dumping Session Wait History
for 'cursor: mutex X' count=1 wait_time=1
idn=ed27757c, value=22300000000, where|sleeps=5bccb95
for 'cursor: mutex X' count=1 wait_time=0
idn=ed27757c, value=22300000000, where|sleeps=5bccb94
for 'cursor: mutex X' count=1 wait_time=0
idn=ed27757c, value=22300000000, where|sleeps=5bccb9d
for 'cursor: mutex X' count=1 wait_time=0
idn=ed27757c, value=22300000000, where|sleeps=5bccb9c
for 'cursor: mutex X' count=1 wait_time=1
idn=ed27757c, value=22300000000, where|sleeps=5bccb9b
for 'cursor: mutex X' count=1 wait_time=1
idn=ed27757c, value=22300000000, where|sleeps=5bccba4
for 'cursor: mutex X' count=1 wait_time=0
idn=ed27757c, value=22300000000, where|sleeps=5bccba3
for 'cursor: mutex X' count=1 wait_time=0
idn=ed27757c, value=22300000000, where|sleeps=5bccbac
for 'cursor: mutex X' count=1 wait_time=0
idn=ed27757c, value=22300000000, where|sleeps=5bccbab
for 'cursor: mutex X' count=1 wait_time=0
idn=ed27757c, value=22300000000, where|sleeps=5bccbaa
temporary object counter: 0
PMON曾經在cursor: mutex X上等待。但是這裡沒有waiting for的資訊
但是從Short stack上看,PMON當前也應該正在請求mutex。
為了確認這一點,我搜尋 [ mutex.*ed27757c ],確定PMON的確正在以EXCL模式請求該mutex,阻塞者為SID 547
KGX Atomic Operation Log 700000136bed8d8
Mutex 700000049974b98(547, 0) idn ed27757c oper GET_EXCL
Cursor Parent uid 849 efd 8 whr 12 slp 23941
per=kksAllocCursorStat [KKSSTALOC1] pt1=70000011cae22d0 pt2=70000011cae2338 pt3=0
pt4=0 u41=0 stt=0
並且Process 111以EXCL模式持有該mutex
KGX Atomic Operation Log 70000005eeb5f30
Mutex 700000049974b98(547, 0) idn ed27757c oper EXCL
Cursor Parent uid 547 efd 6 whr 2 slp 0
per=PARENT_ALLOC_CHILD pt1=0 pt2=70000011c24bd90 pt3=0
pt4=0 u41=0 stt=0
那麼程式111在幹啥子了?其在請求7000000100e5020,這個1# shared pool child latch
PROCESS 111:
----------------------------------------
SO: 70000013551bcc0, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=111, calls cur/top: 700000118c44568/700000118c44568, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 187
last post received-location: kglpndl: post after freeing latch
last process to post me: 7000001384f80e8 1 14
last post sent: 0 0 9
last post sent-location: ksqrcl
last process posted by me: 700000138510aa8 13 0
(latch info) wait_event=0 bits=0
Location from where call was made: kghalo:
waiting for 7000000100e5020 Child shared pool level=7 child#=1
Location from where latch is held: kgh: add extent to reserved list:
Context saved from call: 0
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
27 (923, 1258509042, 1)
......
629 (476, 1258509042, 1)
waiter count=259
gotten 1006956901 times wait, failed first 239663 sleeps 97559
gotten 0 times nowait, failed: 0
possible holder pid = 92 spid=929882
on wait list for 7000000100e5020
Process Group: DEFAULT, pseudo proc: 7000001385b4978
O/S info: user: oracle, term: UNKNOWN, ospid: 680104
OSD pid info: Unix process pid: 680104, image: oracleXXXXX@oracle
到這裡現象很明顯了:
Process 92持有來父SP和1,2號子SP,但是持有後其出現了異常,所以其一直持有這些latch不釋放。
這時需要PMON來對這個程式進行清理,但是這個時候,PMON卻在請求一個mutex,而該mutex又被Process 111持有
Process 111又在請求1# child SP latch。該latch又被Process 92持有了
這時就出現了一個latch + mutex的死鎖
沒有找到準確的BUG,但是有一些類似情況指出,可能有這樣的BUG:
PMON以X的模式請求一個mutex,而該mutex被一個死亡的程式持有,並且該程式正在等待PMON去清理
這樣的情況下,用kill -9 殺掉111程式,那麼將導致程式死亡,等待PMON去釋放其持有的資源,而PMON卻在等待111持有的mutex,還是不能解決問題
可能可以通過直接kill,讓程式自然終止,自己釋放持有的資源,但是這要有情況測試才知道
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/8242091/viewspace-736142/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- SystemState分析案例(三)
- zt_systemstate案例分析
- 【中亦安圖】Systemstate Dump分析經典案例(7)
- 【中亦安圖】Systemstate Dump分析經典案例(8)
- Mysql之案例分析(一)MySql
- 如何理解systemstate dump
- itpub awr案例分析之一
- Oracle資料庫效能障礙分析利器:SYSTEMSTATE DUMP介紹Oracle資料庫
- 正規表示式案例分析 (一)
- 案例分析
- System State 轉儲分析案例一則
- 一個JDO的成功案例分析 (轉)
- jemeter分析(二) — jmeter案例分析JMeter
- 給大家分享一個案例分析-比較偏僻
- 資料庫Server效能問題分析案例一資料庫Server
- 備庫報警郵件的分析案例(一)
- 一個需求分析做的不夠的案例
- 超市管理案例分析
- 死鎖案例分析
- 故障分析 | MySQL死鎖案例分析MySql
- EOS原始碼分析(3)案例分析原始碼
- 用python客戶價值分析案例一則Python
- 層次分析小案例
- MySQL經典案例分析MySql
- oracle SPA 效能分析案例Oracle
- 《畫家鄉》案例分析
- UI設計案例分析UI
- 案例分析習題2
- GreatSQL 死鎖案例分析SQL
- FineBI入門案例分析
- 分析一個企業CRM系統應用的案例
- 一次elasticsearch 查詢瞬間超時案例分析Elasticsearch
- materialized view的fast和日誌分析和一則案例ZedViewAST
- 一則資料庫無法重啟的案例分析資料庫
- 故障分析 | MySQL 相同 SQL 不同環境執行時間不一樣案例分析MySql
- Oracle分析函式七——分析函式案例Oracle函式
- 不能連線資料庫收集Systemstate Dumps資訊資料庫
- 利用hanganalyz/systemstate dump診斷資料庫hang資料庫