ORA-00600 [2103]的分析

Karsus發表於2009-06-04

昨天下午5點有一臺Production DB Crash, 去Alert.log察看發現是ORA-00600 [2103]

這個ERROR我是第一次見到。

[@more@]

----------------------------------------------------

Wed Jun 3 01:17:45 2009

ARC1: Unable to archive log 3 thread 1 sequence 1935

Log actively being archived by another process

Wed Jun 3 01:17:46 2009

Creating archive destination LOG_ARCHIVE_DEST_1: '/u01/archive/ems/arc1_1935.dbf'

ARCH: Completed archiving log 3 thread 1 sequence 1935

Wed Jun 3 02:52:20 2009

Errors in file /u01/app/oracle/admin/ems/bdump/ems_arc0_1250.trc:

ORA-00600: internal error code, arguments: [2103], [1], [0], [1], [900], [], [], []

Wed Jun 3 02:57:32 2009

Errors in file /u01/app/oracle/admin/ems/bdump/ems_arc1_26800.trc:

ORA-00600: internal error code, arguments: [2103], [1], [0], [1], [900], [], [], []

Wed Jun 3 02:57:33 2009

Errors in file /u01/app/oracle/admin/ems/bdump/ems_rsm0_1222.trc:

ORA-00600: internal error code, arguments: [2103], [1], [0], [1], [900], [], [], []

Wed Jun 3 02:58:19 2009

Errors in file /u01/app/oracle/admin/ems/bdump/ems_arc1_26800.trc:

ORA-00600: internal error code, arguments: [2103], [1], [0], [1], [900], [], [], []

Wed Jun 3 02:58:19 2009

Errors in file /u01/app/oracle/admin/ems/bdump/ems_arc0_1250.trc:

ORA-00600: internal error code, arguments: [2103], [1], [0], [1], [900], [], [], []

Wed Jun 3 02:58:19 2009

Errors in file /u01/app/oracle/admin/ems/bdump/ems_arc1_26800.trc:

ORA-00600: internal error code, arguments: [2103], [1], [0], [1], [900], [], [], []

Wed Jun 3 02:58:19 2009

Errors in file /u01/app/oracle/admin/ems/bdump/ems_arc0_1250.trc:

ORA-00600: internal error code, arguments: [2103], [1], [0], [1], [900], [], [], []

RSM0 started with pid=22, OS id=3191

------------------------------------------------------------------------

去Metalink查了下：正好有一個文件

Diagnosing ORA-600 [2103] issues in a NON RAC environment

照這篇文件的看法，原因主要是這幾種：

An ORA-600 [2103] is signaled when an Oracle process cannot get a CF(Control File) enqueue for 900 seconds.

- Very slow I/O subsystem where the Control files are stored.

- Frequent log switching, redo logs to small or low number.

- Async IO issue or multiple db_writers, you can't use both of them, back one out

- OS / Hardware issues

查了一下相關的TRC，只有

/u01/app/oracle/admin/ems/bdump/ems_rsm0_1222.trc

有當時的system state dump.

CF enqueue的狀況如下：

CKPT:

O/S info: user: oracle, term: UNKNOWN, ospid: 24365

OSD pid info: Unix process pid: 24365, image: oracle@sh-iecdb-01 (CKPT)

……

SO: 0x822cfe60, type: 4, owner: 0x82296840, flag: INIT/-/-/0x00

(session) trans: (nil), creator: 0x82296840, flag: (51) USR/- BSY/-/-/-/-/-

DID: 0001-0005-00000005, short-term DID: 0000-0000-00000000

txn branch: (nil)

oct: 0, prv: 0, sql: (nil), psql: (nil), user: 0/SYS

waiting for 'control file parallel write' blocking sess=0x0 seq=12984 wait_time=0

files=3, blocks=3, requests=3

temporary object counter: 0

SO: 0x823be098, type: 6, owner: 0x8233f800, flag: INIT/-/-/0x00

(enqueue) CF-00000000-00000003 DID: 0001-0005-00000005

lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

res: 82416e20, mode: X, prv: 82416e30, sess: 822cfe60, proc: 82296840

RMAN Process:

O/S info: user: oracle, term: , ospid: 2194, machine: sh-iecdb-01

program: rman@sh-iecdb-01 (TNS V1-V3)

application name: rman@sh-iecdb-01 (TNS V1-V3), hash value=0

action name: 0000002 STARTED, hash value=145214725

waiting for 'control file sequential read' blocking sess=0x0 seq=321 wait_time=0

SO: 0x823bd798, type: 6, owner: 0x822e9a50, flag: INIT/-/-/0x00

(enqueue) CF-00000000-00000002 DID: 0001-0016-0001F0FB

lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

res: 8241a948, mode: X, prv: 8241a958, sess: 822e9a50, proc: 8229c3a0

----------------------------------------

SO: 0x823bd960, type: 6, owner: 0x82341d08, flag: INIT/-/-/0x00

(enqueue) CF-00000000-00000004 DID: 0001-0016-0001F0FB

lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

res: 82421080, mode: S, prv: 82421090, sess: 822e9a50, proc: 8229c3a0

----------------------------------------

SO: 0x823bd830, type: 6, owner: 0x82341d08, flag: INIT/-/-/0x00

(enqueue) CF-00000000-00000000 DID: 0001-0016-0001F0FB

lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

res: 82411bb0, mode: X, prv: 82411bc0, sess: 822e9a50, proc: 8229c3a0

----------------------------------------

ARC0:

last wait for 'enqueue' blocking sess=0x0 seq=64935 wait_time=2930614

name|mode=43460004, id1=0, id2=0

ARC1:

last wait for 'enqueue' blocking sess=0x0 seq=19514 wait_time=2929638

name|mode=43460004, id1=0, id2=0

…

可見CKPT和RMAN持有CF Enqueue, 並且分別在等待

control file parallel write & control file sequential read

而ARCn在等待 CF enqueue超時。

到這裡原因就明白了，開啟Crontab發現2點和2點半各有一個RMAN backup job.

這臺Server和另外一臺Server上的Production DB 互相做Data Guard, 2點鐘會做standby DB的backup to NFS mount point, 2點半會做本機Production的delete backup archived log。

Server本身效能並不強，只是2組Raid1，因為本身的Loading並不高，是做為報關用的DB。

以前資料量比較小，第一個JOB會在第二個之前完成，現在資料量大了之後，有可能2個JOB會出現同時跑的情況，9I的RMAN又是出了名的耗I/O, 使得在對Control file的I/O上出現很大問題。

根據Backup job 的log顯示，第一個JOB完成在3：04分，第二個JOB完成在2：58分

而ORA-00600出現在2：52分。

在晚上17點19分的時侯，當redo需要archive時，

Wed Jun 3 17:19:54 2009

LGWR: Detected ARCH process failure

LGWR: Detected ARCH process failure

LGWR: STARTING ARCH PROCESSES

Wed Jun 3 17:19:54 2009

Errors in file /u01/app/oracle/admin/ems/bdump/ems_lgwr_24363.trc:

ORA-00445: background process "ARC0" did not start after 120 seconds

ARC0 started with pid=24, OS id=5991

Wed Jun 3 17:19:54 2009

Errors in file /u01/app/oracle/admin/ems/bdump/ems_lgwr_24363.trc:

ORA-00449: background process 'ARC0' unexpectedly terminated with error 445

ORA-00445: background process "" did not start after seconds

Wed Jun 3 17:19:54 2009

LGWR: terminating instance due to error 449

Instance terminated by LGWR, pid = 24363

Metalink上還有一個類似的例子是RMAN backup to NFS的時侯出現這個錯誤。

它認為是在backup control file to a non-catalog backup的時侯，control file 有concurrently update, 從而在control file 上產生deadlock.

這個問題只出現在backup destination 是NFS的情況，影響範圍9.0.1~10.2.0.3

解決方案是把control file的backup放在local.

來自 “ ITPUB部落格 ” ，連結：http://blog.itpub.net/10856805/viewspace-1022782/，如需轉載，請註明出處，否則將追究法律責任。

相關文章

ORA-00600: internal error code, arguments: [2103]錯誤
2009-05-31
Error
ORA-00600: internal error code, arguments: [2103], [1], [0], [1], [900], [], [],
2007-03-30
Error
acm:::2103 鋪地磚
2020-12-13
ACM
ORA-00600錯誤分析
2014-11-13
一次ORA-00600問題的排查和分析(上）
2015-06-11
一次ORA-00600問題的排查和分析(下）
2015-06-12
ora-00600 3020 問題分析處理
2016-05-09
一個ORA-00600問題的簡單分析(r12筆記第18天)
2017-03-29
筆記
ORA-00600 Error的通用處理
2016-05-31
Error
Oracle 12c PDB遷移及ORA-00600錯誤分析和解決
2016-11-03
Oracle
ORA-00600: [qksdsInitSample:2]
2019-07-04
ORA-00600:內部錯誤程式碼,引數:[qertbFetchByRowID],[],[],[],[],[],[],[]分析與處理
2013-04-21
ORA-00600: internal error code
2013-08-16
Error
ORA-00600:: [ttcgcshnd-1], [0], [], [], [], [], [], []
2010-09-01
GC
oracle錯誤之ORA-00600
2012-03-13
Oracle
ORA-00600:內部錯誤程式碼,引數:[32695], [hash aggregation can't be done]的分析處理
2013-04-21
系統crash掉導致ORA-00600的處理
2019-06-14
oralce ora-00600 [kkslgbv0] 的解決方法
2012-12-10
ora-00600 [ktspgetmyb-1]問題的處理
2013-04-22
oralce ora-00600 [kkslgbv0] 的解決方法
2011-10-12
ORA-00600 [4194], [55]處理
2017-10-26
ORA-00600: internal error code, arguments: [kpnatdm】
2014-04-18
Error
ORA-00600: internal error code, arguments: [525]
2011-02-15
Error
備庫中ORA-00600錯誤的簡單修復
2015-10-02
ORA-00600: internal error code, arguments: [4194]
2017-09-11
Error
ORA-00600: internal error code, arguments: [15753]
2015-05-31
Error
ORA-00600 [25027]問題處理
2014-09-27
ORA-00600: internal error code, arguments: [17087]
2011-02-11
Error
ORA-00600: [keltnfy-ldmInit], [46], [1]
2010-10-22
ORA-00600: internal error code, arguments: [Cursor not typechecked],
2009-12-24
Error
ORA-00600:[2252] 錯誤解決
2012-06-05
資料庫不能startup nomount ora-00600
2007-06-07
資料庫
ORA-00600: internal error code, arguments: [32695]
2012-04-10
Error
ORA-00600 2662問題解決
2007-12-21
ORA-00600: internal error code, arguments: [kzsrsea] DataGuard環境的異常
2024-02-01
Error
ORA-00600: [kcratr1_lastbwr]錯誤的處理辦法
2012-03-12
AST
Oracle recover current redo ORA-00600:[4193] (oracle 故障恢復current redo日誌ORA-00600:[4193]報錯）
2021-04-18
Oracle
ORA-00600[kluinit:new add column in directpath 2]
2018-12-31
UI