sql server透明閘道器dblink在alert裡報recover登入失敗

gengyonghui發表於2007-07-18

以前也碰到過這種情況,現在警告日誌裡報了多個,看來把這些2pc懸掛事務強制提交就可以搞定了。

ORA-28500: 連線 ORACLE 到非 Oracle 系統時返回此資訊:
[Transparent gateway for MSSQL][Microsoft][ODBC SQL Server Driver][SQL Server]使用者 'RECOVER' 登入失敗。 (SQL State: 00000; SQL Code: 18456)
ORA-02063: ??? 2 lines (?? STAG@ODS_TAR_PTBI_ODS_RES_CPTS_GL)

以下是網上的文件,留作參考

早晨到辦公室聽同事說表被鎖了,一試,發現表中某欄位為1111111的行都被鎖了,SELECT都不行。報錯誤ORA-01591,開啟TOAD的Knowledge eXpert,描述很少,只是說由於分散式事務錯誤而造成鎖定。 詢問同事,昨天透過一個儲存過程呼叫另一個儲存過程出了錯誤,而後者透過透明閘道器insert一些資料到SQl Server資料庫。
立即想到開啟OEM,誰知道大失所望,進入鎖,根本沒發現相關的物件被鎖定,開始有點鬱悶。轉而檢查會話,該使用者有5個會話,都是INACTIVE,不管三七二十一,全部殺掉。結果依舊,並且鎖也沒有出現。遠端登陸上主機,發現CPU和程式都正常,也沒有發現透明閘道器程式掛死(之前曾發現TG4SQL在無業務量時也會出現25%左右的CPU,掛死)。
突然想到看看alert.log,經過仔細搜尋,終於發現:

Wed Nov 17 00:00:04 2004
Errors in file d:oracleadminxdcjudumpxdcj_j006_3020.trc:
ORA-12012: 自動執行作業 82 出錯
ORA-01591: 鎖定已被有問題的分配事務處理6.5.887985掛起
ORA-06512: 在line 6

這正是出錯的地方,往前追溯:

Tue Nov 16 17:35:04 2004
Error 28500 trapped in 2PC on transaction 6.5.887985. Cleaning up.
Error stack returned to user:
ORA-02054: 事務處理6.5.887985有問題
ORA-28500: 連線 ORACLE 到非 Oracle 系統時返回此資訊:
[Transparent gateway for MSSQL]
ORA-02063: 緊接著2 lines(源於ZSMOS_CRM)
Tue Nov 16 17:35:04 2004
DISTRIB TRAN QDCJ.US.ORACLE.COM.5ae32328.6.5.887985
is local tran 6.5.887985 (hex=06.05.d8cb1)
insert pending prepared tran, scn=6606197672830 (hex=602.2010cb7e)
Tue Nov 16 17:35:07 2004
Errors in file d:oracleadminxdcjdumpxdcj_reco_3024.trc:
ORA-28500: connection from ORACLE to a non-Oracle system returned this message:
[Transparent gateway for MSSQL][Microsoft][ODBC SQL Server Driver][SQL Server]使用者 'RECOVER' 登入失敗。 (SQL State: 28000; SQL Code: 18456)
ORA-02063: preceding 2 lines from ZSMOS_CRM

Tue Nov 16 17:35:12 2004
Errors in file d:oracleadminxdcjdumpxdcj_reco_3024.trc:
ORA-28500: connection from ORACLE to a non-Oracle system returned this message:
[Transparent gateway for MSSQL][Microsoft][ODBC SQL Server Driver][SQL Server]使用者 'RECOVER' 登入失敗。 (SQL State: 28000; SQL Code: 18456)
ORA-02063: preceding 2 lines from ZSMOS_CRM

這就是事發地點了。看來是昨天下午遠端事務失敗,但是又沒有返回造成分散式事務掛死,從而鎖定了行。終於找到了詳細的錯誤ORA-02054,進入TOAD一查,說是要等待或者提交該事務,可是怎麼操作呢。還是開啟官方文件搜尋相關內容,在Adminstrator Guide中發現如下內容:
Discovering Problems with a Two-Phase Commit
The user application that commits a distributed transaction is informed of a problem by one of the following error messages:

ORA-02050: transaction ID rolled back,
some remote dbs may be in-doubt
ORA-02051: transaction ID committed,
some remote dbs may be in-doubt
ORA-02054: transaction ID in-doubt


A robust application should save information about a transaction if it receives any of the above errors. This information can be used later if manual distributed transaction recovery is desired.

No action is required by the administrator of any node that has one or more in-doubt distributed transactions due to a network or system failure. The automatic recovery features of Oracle transparently complete any in-doubt transaction so that the same outcome occurs on all nodes of a session tree (that is, all commit or all roll back) after the network or system failure is resolved.

In extended outages, however, you can force the commit or rollback of a transaction to release any locked data. Applications must account for such possibilities.

Determining Whether to Perform a Manual Override
Override a specific in-doubt transaction manually only when one of the following situations exists:

The in-doubt transaction locks data that is required by other transactions. This situation occurs when the ORA-01591 error message interferes with user transactions.
An in-doubt transaction prevents the extents of a rollback segment from being used by other transactions. The first portion of an in-doubt distributed transaction's local transaction ID corresponds to the ID of the rollback segment, as listed by the data dictionary views DBA_2PC_PENDING and DBA_ROLLBACK_SEGS.
The failure preventing the two-phase commit phases to complete cannot be corrected in an acceptable time period. Examples of such cases include a telecommunication network that has been damaged or a damaged database that requires a long recovery time.
Normally, you should make a decision to locally force an in-doubt distributed transaction in consultation with administrators at other locations. A wrong decision can lead to database inconsistencies that can be difficult to trace and that you must manually correct.

If the conditions above do not apply, always allow the automatic recovery features of Oracle to complete the transaction. If any of the above criteria are met, however, consider a local override of the in-doubt transaction.

看來是建議差不多,後面Oracle總是試圖登入SQl Server就是要自動恢復,可是總不成功。察看檢視DBA_2PC_PENDING確實發現了該事務的痕跡。要怎樣操作呢?

Manually Committing an In-Doubt Transaction
Before attempting to commit the transaction, ensure that you have the proper privileges. Note the following requirements:

If the transaction was committed by... Then you must have this privilege...
You
FORCE TRANSACTION

Another user
FORCE ANY TRANSACTION


Committing Using Only the Transaction ID
The following SQL statement commits an in-doubt transaction:

COMMIT FORCE 'transaction_id';


The variable transaction_id is the identifier of the transaction as specified in either the LOCAL_TRAN_ID or GLOBAL_TRAN_ID columns of the DBA_2PC_PENDING data dictionary view.

For example, assume that you query DBA_2PC_PENDING and determine that LOCAL_TRAN_ID for a distributed transaction is 1:45.13.

You then issue the following SQL statement to force the commit of this in-doubt transaction:

COMMIT FORCE '1.45.13';

Committing Using an SCN
Optionally, you can specify the SCN for the transaction when forcing a transaction to commit. This feature allows you to commit an in-doubt transaction with the SCN assigned when it was committed at other nodes.

Consequently, you maintain the synchronized commit time of the distributed transaction even if there is a failure. Specify an SCN only when you can determine the SCN of the same transaction already committed at another node.

For example, assume you want to manually commit a transaction with the following global transaction ID:

SALES.ACME.COM.55d1c563.1.93.29

First, query the DBA_2PC_PENDING view of a remote database also involved with the transaction in question. Note the SCN used for the commit of the transaction at that node. Specify the SCN when committing the transaction at the local node. For example, if the SCN is 829381993, issue:

COMMIT FORCE 'SALES.ACME.COM.55d1c563.1.93.29', 829381993;

See Also:
Oracle9i SQL Reference for more information about using the COMMIT statement


Manually Rolling Back an In-Doubt Transaction
Before attempting to roll back the in-doubt distributed transaction, ensure that you have the proper privileges. Note the following requirements:

If the transaction was committed by... Then you must have this privilege...
You
FORCE TRANSACTION

Another user
FORCE ANY TRANSACTION


The following SQL statement rolls back an in-doubt transaction:

ROLLBACK FORCE 'transaction_id';


The variable transaction_id is the identifier of the transaction as specified in either the LOCAL_TRAN_ID or GLOBAL_TRAN_ID columns of the DBA_2PC_PENDING data dictionary view.

For example, to roll back the in-doubt transaction with the local transaction ID of 2.9.4, use the following statement:

ROLLBACK FORCE '2.9.4';

於是登陸資料庫
COMMIT FORCE '6.5.887985';
然後檢視DBA_2PC_PENDING發現狀態已經改為'COMMIT FORCE',SELECT該表相關行,一切正常。至此,故障解決。
總體來看,直接INSERT ... TABLENAME@SQLDBLK還是很危險的,遇上不能正常返回就出問題了。Oracle的文件是推薦使用包或者儲存過程來解決,此後建議同事改用此方法,目前已經測試透過。

[@more@]

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/112417/viewspace-926977/,如需轉載,請註明出處,否則將追究法律責任。

相關文章