oracle bug 6825287導致DX鎖等待

sky_dizzy001發表於2014-02-09

1 概述

問題簡述：scdb資料庫鎖阻塞

2 問題描述

scdb資料庫某日上午發生DX鎖阻塞現象，影響部分業務操作。當時監控到的鎖阻塞情況如下：

HOLD_SID H_USER H_LOCKWAIT H_STATUS H_MODULE H_OBJ H_ROW WAIT_SID W_USER W_LOCKWAIT W_STATUS W_MODULE W_OBJ W_ROW H_TYPE H_ID1 H_ID2 H_LMODE H_REQUEST H_CTIME H_BLOCK R_TYPE R_ID1 R_ID2 R_LMODE R_REQUEST R_CTIME R_BLOCK KILLHOLD HSQL WSQL

1 2321 DB_SC ACTIVE 81912 0 91 DB_YW 0700000866676D18 ACTIVE -1 0 DX 38 0 6 0 1975 1 DX 38 0 0 4 1210 0 alter system kill session '2321,64240'; -- kill -9, 7131170 INSERT INTO DB_YW.T_TEMP_SC..... SELECT formatid, globalid, branchid FROM SYS.DBA_PENDING_TRANSACTIONS

3 問題分析

從監控結果看，阻塞鎖型別是DX鎖（Distributed transaction entry），等待鎖的SQL語句是SELECT formatid, globalid, branchid FROM SYS.DBA_PENDING_TRANSACTIONS ，持有鎖的SQL是一條INSERT語句。鎖持有方式是6 - exclusive (X) ，鎖申請方式4 - share (S) 。

DBA_PENDING_TRANSACTIONS是系統資料字典，DBA_PENDING_TRANSACTIONS describes unresolved transactions (either due to failure or if the coordinator has not sent a commit/rollback). DBA_PENDING_TRANSACTIONS is a view which is basically a table @imposed on the structure representing in-memory state of distributed txns.

The DX-enqueue serves as a serialization primitive to prevent concurrency among various tightly-coupled branches of a distributed transaction on any database. It also provides for serialization of 2PC operations like abort and forward going changes.

在一個高負載、高併發、並使用了分散式事務的資料庫環境中,可能會產生DX鎖等待（may present a problem in a high activity system where the application logic performs work in short bursts i.e. work is performed in several client -> server roundtrips. Each trip into the database server, would request the DX-enqueue if the session has a distributed txn attached to it. Additionally, 2PC operations on detached branches also require the DX-enqueue.）

具體描述請查閱Oracle metalink 文件[ID 762412.1]

這是Oracle一個Unpublished Bug（querying DBA_PENDING_TRANSACTIONS in a high concurrency environment and to resolve the potential issue of acquisition of the DX-enqueue with infinite timeout. The fix for this issue is to request the enqueue/lock with a timeout and handle the enqueue/lock acquisition timeouts appropriately. Thus relieving the contention issue.）

4 結論

從鎖阻塞現場情況看，很可能是遇到ORACLE Bug 6825287，這個bug在9i 10g 11.1版本都存在，需要安裝補丁patch 8354686才可能解決，但這個補丁是Oracle停止oracle9i標準服務之後才釋出，所以在Oracle9i版本下屬於“擴充套件支援”範圍，需要與Oracle簽訂Oracle9i“擴充套件支援”服務合同才能獲取該補丁。

5 附Oracle metalink DOC [ID 762412.1]

Select From Dba_pending_transaction or GV$GLOBAL_TRANSACTION Takes A Long Time (very slow) [ID 762412.1]

修改時間 08-FEB-2010 型別 PROBLEM 狀態 PUBLISHED

In this Document
Symptoms
Cause
Solution
References

5.1 Applies to:

Oracle Server - Enterprise Edition - Version: 9.2.0.8 to 11.1.0.6 - Release: 9.2
Information in this document applies to any platform.
Oracle Server Enterprise Edition - Version: 9.2.0.8 to 11.1.0.6

5.2 Symptoms

SELECT FROM DBA_PENDING_TRANSACTION TAKES A LONG TIME,

The investigation shows the session waiting for 'DX' lock .

This issue may also be observed with XA transactions running :
MSDTC / OraMTS / ODBC
- Microsoft Distributed Transaction Coordinator service
- Oracle Services for Microsoft Transaction Server (MTS)
In this case, the issue will fire in the Client Applications which are periodically encountering :
ORA-02049: timeout: distributed transaction waiting for lock

Contention is observed :
gv$session_wait shows session waiting "enq: DX - contention"

The Key points which relate to the DX enqueue under the circumstances here are as follows.

- The DX-enqueue serves as a serialization primitive to prevent concurrency among various tightly-coupled branches of a distributed transaction on any database. It also provides for serialization of 2PC operations like abort and forward going changes (e.g. out-of-band abort requests sent in by Transaction
Monitors like MSDTC on expiration of application transaction time-to-live).

- OraMTS runs a periodic recovery job as a scheduled job via the jobq infrastructure. This purpose of this job is to query
DBA_PENDING_TRANSACTIONS looking for any MSDTC-initiated and coordinated transactions that go in-doubt on the database.

- DBA_PENDING_TRANSACTIONS is a view which is a union of a fixed table and a persistent table.

which is basically a table @imposed on the structure representing in-memory state of distributed txns.
Iterating over this table requires acquisition of the DX-enqueue for each entry in the table. In 9.2 and 10gR1 and 10gR2,this lock is acquired with an infinite wait time in S-mode.

- The above may present a problem in a high activity system where the application logic performs work in short bursts i.e. work is performed in several client -> server roundtrips. Each trip into the database server, would request the DX-enqueue if the session has a distributed txn attached to it.

Additionally, 2PC operations on detached branches also require the DX-enqueue.

The above contributes to DX contention and the ORA-2049 errors reported.

5.3 Cause

The symptoms seem to match Bug 6825287, this bug was closed a duplicate of unpublished bug 5843814, following is the problem description from the developers:

Problem description:
XA_RECOVER is blocked by a Fore Ground process holding DX in 'X' mode

REDISCOVERY INFORMATION:
XA_RECOVER is very slow or query on GV$GLOBAL_TRANSACTION is very slow.

Unpublished Bug 5843814 was filed to address performance issues that were encountered when
querying DBA_PENDING_TRANSACTIONS in a high concurrency environment and to resolve the potential issue of acquisition of the DX-enqueue with infinite timeout.
The fix for this issue is to request the enqueue/lock with a timeout and handle the enqueue/lock acquisition timeouts appropriately. Thus relieving the contention issue.

5.4 Solution

Bug 6825287 is reported on version 9.2 . The base unpublished bug 5843814 was reported and reproduced on both 10.2 and 11.1 and it leads us to believe that the same problem can occur on 9i, 10g or 11g.

To fix the problem, please follow the steps outlined below:

The Original patch created for this issue was patch 5843814 , but then the developers has created new Patch 8354686 to solve some regression and conflict issues reported from the original patch, also the new patch contains also the fix of bug 8354686. the plan to solve this issue then is :

1. Connect to metalink and search for the patch 8354686 and make sure to select the correct platform and RDBMS version.

2. If the patch is not available for your system, please open new service request, and ask the assigned analyst to file a new one off request for your platform.

3. please make sure that you are already on the latest patchset before requesting the patch.

4. Apply the patch on testing environment and after verifying the fix ,apply it to the production system.

In addition as per Note 738108.1, and the document: Oracle Whitepaper ??????Oracle SOA Suite XA and RAC Database Configuration Guide?????? [XA], on OTN: http://www.oracle.com/technology/tech/soa/ha/index.html

There also the same recommendation to apply Patch number : 5843814 which has now been replaced by patch 8354686 so I thoght it will be good to mention this as well. .

5.5 References

BUG:6825287 - QUERY ON DBA_PENDING_TRANSACTIONS HANGS

來自 “ ITPUB部落格 ” ，連結：http://blog.itpub.net/29468144/viewspace-1078421/，如需轉載，請註明出處，否則將追究法律責任。

sysbench花式踩坑之三：自增值導致的鎖等待
2019-02-21
淺複製導致的bug
2023-02-26
當心ORACLE 12.2 RAC新特性引入的BUG導致ORA-4031
2018-06-04
Oracle
Oracle死鎖一例(ORA-00060)，鎖表導致的業務死鎖問題
2018-07-05
Oracle
Oracle 12.2 BUG :分割槽維護導致的 ORA-600 KKPOFPCD3
2020-04-03
Oracle
Oracle 12c因bug導致ORA-04031問題處理過程
2019-08-23
Oracle
萬萬沒想到，一個 MongoDB.Driver 的 bug 導致 .NET5 程式死鎖！
2021-11-22
MongoDB
【ASK_ORACLE】Linux從6升級到7導致Oracle產生大量Log file sync等待事件處理辦法
2022-07-21
OracleLinux事件
Containerd 的 Bug 導致容器被重建！如何避免？
2023-02-07
AI
@Transactional 中使用執行緒鎖導致了鎖失效
2024-07-18
執行緒
MySQL 死鎖和鎖等待
2020-04-13
MySql
【ASK_ORACLE】Oracle 12.2 Bug導致網路卡出現故障後RAC庫未向TCP註冊本地VIP監聽
2021-08-27
OracleTCP
RAC節點hang住, oracle bug導致了cpu過高，無法啟動叢集隔離
2020-07-20
Oracle
分散式鎖導致的超賣問題
2024-08-20
分散式
記php-fpm重啟導致的一個bug
2019-04-26
PHP
Oracle目錄由於TFA觸發bug導致jdb檔案未自動清理引起空間不足
2021-01-21
Oracle
MySQL 鎖bug?
2022-09-19
MySql
MySQL Online DDL導致全域性鎖表案例分析
2019-05-11
MySql
12.2.0.1bug導致的Failed to register in OCRLOCAL group.錯誤
2018-07-24
AI
MySQL8.0的一個bug導致複製延時
2022-12-16
MySql
MySQL鎖等待與死鎖問題分析
2021-03-30
MySql
AdornerDecorator的CacheMode繫結和windows鎖屏導致TableControl鎖死問題
2024-08-29
Windows
ORACLE 常見等待事件
2018-06-27
Oracle事件
【TUNE_ORACLE】等待事件之等待事件類別
2021-04-16
Oracle事件
sock鎖檔案導致的MySQL啟動失敗
2022-02-24
MySql
Oracle RAC啟動因CTSS導致的異常
2020-05-25
Oracle
Oracle RAC日常運維-NetworkManager導致叢集故障
2019-06-08
Oracle運維
UE floating pawn movement Bug Set location 導致速度異常發射
2024-05-07
一個導致JVM實體記憶體消耗大的Bug
2020-07-01
JVM記憶體
Laravel 多條件查詢時粗心導致的一個 BUG
2019-05-30
Laravel
BUG—Nuget包版本不一致導致程式行為與預期不符
2022-01-06
【Not BUG】微軟Winform窗體中設計上的Bug，會導致程式編譯失敗？不，這不是BUG！
2021-01-20
微軟ORM編譯
【鎖】Oracle鎖系列
2018-03-12
Oracle
【TUNE_ORACLE】等待事件之日誌等待“log file sync”
2021-07-16
Oracle事件
【TUNE_ORACLE】等待事件之IO等待“read by other session”
2021-06-07
Oracle事件Session
【TUNE_ORACLE】等待事件之IO等待“direct path read”
2021-04-21
Oracle事件
【TUNE_ORACLE】等待事件之IO等待“direct path write”
2021-05-18
Oracle事件
記一次Oracle RAC for aix 儲存雙控鎖盤導致ASM控制檔案損壞恢復
2019-04-30
OracleAIASM
ZooKeeper 避坑指南： ZooKeeper 3.6.4 版本 BUG 導致的資料不一致問題
2023-05-03