一個ORACLE死鎖問題的追蹤

rainnyzhong發表於2010-01-11

Author: rainnyzhong

Date: 2010-01-08

不得不記錄一下。這個問題花了我一天的時間來追蹤。

最先是懷疑外來鍵欄位是否沒有建索引。然後我給相關的外來鍵欄位都建上索引,死鎖依舊。

後來懷疑是ORACLEBUG,因為ORACLE確實有一個BUG2530125某些版本的oracle檢查到deadlock後會觸發一個關於deadlockbug

我將整個SCHEMA匯出到另外一個DB(64位平臺),發現死鎖依舊。所以暫時去除了是ORACLE BUG的嫌疑.

再後來懷疑儲存過程或JAVA中是否有生成動態SQL(ad-hoc SQL),檢查程式碼,發現確實有動態SQL。將JAVAPROCEDURE中的動態SQL去掉,死鎖依舊。

靠!到這一步,心裡鬱悶的不得了。

再次仔細檢查死鎖的trace檔案,發現有一個LATCH

Parent+children enqueue hash chains level=4

SELECT * FROM V$LATCH L WHERE L.ADDR=upper('20004fc4')

--

1 20004FC4 19 4 enqueue hash chains 1456202064 584423 76 0 14395 1 0 0 76 0 0 0 0 0 0 0 0 0 0 0 0

發現確實存在這麼一個latchenqueue hash chain.我重啟DB,發現這個latch依然存在,並且發現v$latch這個檢視的記錄在DB重啟之前和重啟之後總數量是一樣的(382)。於是,我就再次寫SQL故意重現一把死鎖,再檢視模擬出來的死鎖的trace檔案,發現裡面也有這個latch,可見,只要是死鎖,這個latch就會出現在trace檔案中,沒有什麼特別的地方。

思索再思索,最後發現唯一特別的地方是死鎖圖(deadlock graph)

Deadlock graph:

---------Blocker(s)-------- ---------Waiter(s)---------

Resource Name process session holds waits process session holds waits

TX-0009000d-0000207b 38 721 X 38 721 X

session 721: DID 0001-0026-00001DE8 session 721: DID 0001-0026-00001DE8

Rows waited on:

Session 721: obj - rowid = 0000EAF8 - AAAPzHAAGAAAVGzAAA

(dictionary objn - 60152, file - 6, block - 86451, slot - 0)

正常的死鎖圖,一般是有兩行,而我的只有一行,且是同一SESSION(如上圖),而正常的死鎖圖應該是像下面這樣:

DEADLOCK DETECTED

[Transaction Deadlock]

Current SQL statement for this session:

DELETE FROM pm_user_temp WHERE rownumber=4

The following deadlock is not an ORACLE error. It is a

deadlock due to user error in the design of an application

or from issuing incorrect ad-hoc SQL. The following

information may aid in determining the deadlock:

Deadlock graph:

---------Blocker(s)-------- ---------Waiter(s)---------

Resource Name process session holds waits process session holds waits

TX-00070015-00001881 41 772 X 94 670 X

TX-0001000b-00001822 94 670 X 41 772 X

session 772: DID 0001-0029-00000008 session 670: DID 0001-005E-00000004

session 670: DID 0001-005E-00000004 session 772: DID 0001-0029-00000008

Rows waited on:

Session 670: obj - rowid = 0000EAF8 - AAAPzHAAGAAAVGwAAA

(dictionary objn - 60152, file - 6, block - 86448, slot - 0)

Session 772: obj - rowid = 0000EAF8 - AAAPzHAAGAAAVGwAAC

(dictionary objn - 60152, file - 6, block - 86448, slot - 2)

Information on the OTHER waiting sessions:

Session 670:

pid=94 serial=464 audsid=35675 user: 69/XPC92_IMPORT_JEFF

O/S info: user: rainnyzhong, term: RAINNYZHONG-ECV, ospid: 3008:1404, machine: MSHOMERAINNYZHONG-ECV

program: plsqldev.exe

application name: PL/SQL Developer, hash value=1190136663

action name: SQL Window - New, hash value=3399691616

Current SQL Statement:

DELETE FROM pm_user_temp WHERE rownumber=2

End of information on OTHER waiting sessions.

上面是我模擬出來的死鎖產生的trace檔案中包含的死鎖圖。在一個正常的死鎖圖中,一般是至少是有2個資源被鎖住,而本例的死鎖圖中,只有一個資源被鎖住,且是被同一會話鎖住。這有點特別。

無奈之下,再次回退最近所有新修改的PROCEDUREJAVA程式碼,死鎖依舊。

這個PROCDUER已經執行一段時間了,QA也測了一段時間,一直沒有報這種死鎖的問題,那麼,為什麼現在會報出來呢?苦思,再苦思,最後,開發人員突然想起來說,他用的是MOCK,而不是JBOSS。我們馬上將程式碼放到JBOSS執行,死鎖不再丟擲,到此,困了我一天的死鎖問題終於找到原因。經過諮詢開發人員,原來MOCK是我們的AD TEADM研發出來的一個東東,有點類似模擬JBOSS,主要是用於開發人員的內部測試,在一般的測試中,開發人員只要用MOCK,而不需用JBOSS就可以除錯程式碼。但我們做的MOCK沒辦法真正模擬出JBOSS的事務管理機制,沒辦法實現巢狀事務,自治事務等。檢視ORACLE PROCEDURE,確實存在自治事務,所以,用MOCK一執行,就丟擲來莫名其妙的死鎖,產生的死鎖圖也是非常特別。靠!要不是開發人員自己說出來,我調2天估計也查不出死鎖的原因。

下面為完整的死鎖TRACE檔案

PARSING IN CURSOR #27 len=153 dep=1 uid=69 oct=6 lid=69 tim=1233331432659515 hv=3592496474 ad='46e29a38'

UPDATE PM_USER_TEMP T SET T.DEFAULTROLENAME = 'core.pm/label.companyAdministrator' WHERE T.DEFAULTROLENAME = :B3 AND T.USERID = :B2 AND T.FILENAME = :B1

END OF STMT

PARSE #27:c=0,e=282,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,tim=1233331432659507

DEADLOCK DETECTED

[Transaction Deadlock]

Current SQL statement for this session:

UPDATE PM_USER_TEMP T SET T.DEFAULTROLENAME = 'core.pm/label.companyAdministrator' WHERE T.DEFAULTROLENAME = :B3 AND T.USERID = :B2 AND T.FILENAME = :B1

----- PL/SQL Call Stack -----

object line object

handle number name

0x3d26c364 470 procedure XPC92_IMPORT_JEFF.AIT_LOADTEMPDATA

0x3d26c51c 1 anonymous block

The following deadlock is not an ORACLE error. It is a

deadlock due to user error in the design of an application

or from issuing incorrect ad-hoc SQL. The following

information may aid in determining the deadlock:

Deadlock graph:

---------Blocker(s)-------- ---------Waiter(s)---------

Resource Name process session holds waits process session holds waits

TX-0009000d-0000207b 38 721 X 38 721 X

session 721: DID 0001-0026-00001DE8 session 721: DID 0001-0026-00001DE8

Rows waited on:

Session 721: obj - rowid = 0000EAF8 - AAAPzHAAGAAAVGzAAA

(dictionary objn - 60152, file - 6, block - 86451, slot - 0)

Information on the OTHER waiting sessions:

End of information on OTHER waiting sessions.

===================================================

PROCESS STATE

-------------

Process global information:

process: 0x4cf5d02c, call: 0x496db50c, xact: 0x4b1bbd4c, curses: 0x4cbcf25c, usrses: 0x4cbcf25c

----------------------------------------

SO: 0x4cf5d02c, type: 2, owner: (nil), flag: INIT/-/-/0x00

(process) Oracle pid=38, calls cur/top: 0x496db50c/0x4970dcdc, flag: (0) -

int error: 0, call error: 0, sess error: 0, txn error 0

(post info) last post received: 108 0 4

last post received-location: kslpsr

last process to post me: 4cf519ac 1 6

last post sent: 0 0 24

last post sent-location: ksasnd

last process posted by me: 4cf519ac 1 6

(latch info) wait_event=0 bits=10

holding (efd=19) 20004fc4 Parent+children enqueue hash chains level=4

Location from where latch is held: ksqcmi: kslgpl:

Context saved from call: 0

state=busy, wlstate=free

recovery area:

…..此外省略n

[@more@]

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/684234/viewspace-1030464/,如需轉載,請註明出處,否則將追究法律責任。

相關文章