ORA-00600 [kcrfr_update_nab_2]處理過程

panpong發表於2014-09-10

ORA-00600 [kcrfr_update_nab_2]處理過程

       資料庫testdb1 AIXoracle 10.2.0.1asm;當刪除partition時掛起,如:

ALTER TABLE DXUSER.HISTWEBCDMA1X DROP PARTITION P20130728;等待事件“DFS lock handle”,這個等待事件為CI跨例項的等待,有DLM管理;由於資料庫是單例項的,涉及跨例項只能是ASM例項;

查詢asm alert日誌,ASM例項有報錯日誌+asm_ora_762040.trc

*** 2014-03-03 12:15:18.846

*** SERVICE NAME:() 2014-03-03 12:15:18.825

*** SESSION ID:(36.7347) 2014-03-03 12:15:18.825

Waited for detached process: RBAL for 300 seconds:

同時,在errpt中發現報錯:

testdb1#errpt |tail

825849BF   0303104614 T H fcs0           ADAPTER ERROR

C62E1EB7   0303104614 P H hdisk63        DISK OPERATION ERROR

C62E1EB7   0303104614 P H hdisk12        DISK OPERATION ERROR

C62E1EB7   0303104614 P H hdisk124       DISK OPERATION ERROR

C62E1EB7   0303104614 P H hdisk74        DISK OPERATION ERROR

B8FBD189   0303104614 T S fscsi0         SOFTWARE PROGRAM ERROR

B8FBD189   0303104614 T S fscsi0         SOFTWARE PROGRAM ERROR

825849BF   0303104614 T H fcs0           ADAPTER ERROR

825849BF   0303104614 T H fcs0           ADAPTER ERROR

系統報錯顯示,為硬碟或儲存控制器等故障,於是通報故障;經過確認處理,更換儲存部件,然後硬重啟了資料庫伺服器;等我檢查資料庫伺服器時,資料庫不能開啟:

SQL> startup open

ORACLE instance started.

 

Total System Global Area 1.6744E+10 bytes

Fixed Size                  2050200 bytes

Variable Size            1694500712 bytes

Database Buffers         1.5032E+10 bytes

Redo Buffers               14725120 bytes

Database mounted.

ORA-00600: internal error code, arguments: [kcrfr_update_nab_2],[0x7000003EF9D93F0], [2], [], [], [], [], []

 

檢視alert日誌:

Beginning crash recovery of 1 threads

  parallel recovery started with 15 processes

 Tue Mar  4 07:47:39 2014

 Started redo scan

 Tue Mar  4 07:47:40 2014

 Errors in file /u01/app/oracle/admin/testdb/udump/testdb_ora_135988.trc:

 ORA-00600: internal error code, arguments: [kcrfr_update_nab_2], [0x7000003EF9D

993F0], [2], [], [], [], [], []

 Tue Mar  4 07:47:42 2014

 Aborting crash recovery due to error 600

接著檢視錯誤日誌:

testdb1$more /u01/app/oracle/admin/testdb/udump/testdb_ora_135988.trc

 /u01/app/oracle/admin/testdb/udump/testdb_ora_135988.trc

 Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production

 With the Partitioning, Oracle Label Security, OLAP and Data Mining Scoring Engi

ne options

 ORACLE_HOME = /u01/app/oracle/product/10.2.0/db_1

 System name:    AIX

 Node name:      testdb1

 Release:        3

 Version:        5

 Machine:        00C051B24C00

 Instance name: testdb

 Redo thread mounted by this instance: 1

 Oracle process number: 16

 Unix process pid: 135988, image: oracle@testdb1 (TNS V1-V3)

 

 *** 2014-03-04 07:47:34.099

 *** SERVICE NAME:() 2014-03-04 07:47:34.088

 *** SESSION ID:(1643.3) 2014-03-04 07:47:34.088

 Successfully allocated 15 recovery slaves

 Using 20 overflow buffers per recovery slave

 Thread 1 checkpoint: logseq 21269, block 2, scn 109974607248

   cache-low rba: logseq 21269, block 569541

     on-disk rba: logseq 21269, block 584155, scn 109974738191

從上面日誌看是在Started redo scan之後報錯,而報錯的日誌序號為21269,現在檢視logseq21269是哪個日誌,

SQL> select * from v$log;

 

     GROUP#    THREAD#  SEQUENCE#      BYTES    MEMBERS ARC STATUS

 ---------- ---------- ---------- ---------- ---------- --- ----------------

          1          1      21268   52428800          2 NO  INACTIVE

          2          1      21266   52428800          2 NO  INACTIVE

          6          1      21265  524288000          2 NO  INACTIVE

          4          1      21269  524288000          2 NO  CURRENT

          5          1      21264  524288000          2 NO  INACTIVE

          3          1      21267   52428800          2 NO  INACTIVE

日誌組4為,

SQL> select member fromv$logfile

+SYSDG/testdb/onlinelog/group_4.267.676633559 +DATADG1/testdb/onlinelog/group_4.363.676633561

       查詢網路發現這個ORA-00600[kcrfr_update_nab_2]錯誤為罕見報錯,MOS和網路上相關資訊較少;MOS上多認為是bug,沒有繞開和解決方法;只能求助google,找到一篇“kcrfr_update_nab_2”文章,記錄了作者的解決過程(kcrfr_update_nab_2/),大體過程是刪除報錯日誌組中的組員2檔案(即日誌組中的第二個組員),然後recover database,再open,開啟資料庫後重建出錯日誌組;

具體操作:

SQL> startup open

ORACLE instance started.

 

Total System Global Area 1.6744E+10 bytes

Fixed Size                  2050200 bytes

Variable Size            1694500712 bytes

Database Buffers         1.5032E+10 bytes

Redo Buffers               14725120 bytes

Database mounted.

ORA-00600: internal error code, arguments: [kcrfr_update_nab_2],

[0x7000003EF9D93F0], [2], [], [], [], [], []

 

找到報錯日誌組的redo檔案,刪除member 1檔案,即日誌組的第2個組員檔案;

$asmcmd

ASMCMD> cd +datadg1/testdb/ONLINELOG/

ASMCMD> ls

 group_1.360.676633379

 group_2.361.676633469

 group_3.362.676633477

 group_4.363.676633561

 group_5.364.676633571

 group_6.365.676633579

ASMCMD> rm group_4.363.676633561

 

SQL> recover database;

Media recovery complete.

 

SQL> shutdown immediate

SQL>startup open;

資料庫開啟後,要重建報錯redo group,即group 4

SQL>alter database drop logfile group 4;

SQL>alter database add logfile thread 1 group 4 ('+SYSDG','+DATADG1') size 512M ;

 

 

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/16976507/viewspace-1266952/,如需轉載,請註明出處,否則將追究法律責任。

相關文章