WCDMA測試庫故障處理過程

1.問題定位
初步觀察，資料庫程式全部關閉，ipc資源也被釋放，但alert裡面沒有關閉和報錯資訊。

2.嘗試啟動資料庫
SQL>sqlplus / as sysdba

SQL>startup

Errors in file /opt/oracle/admin/GDWDB1/udump/gdwdb1_ora_13740.trc:
ORA-00320: cannot read file header from log 1 of thread 1
ORA-00312: online log 1 thread 1: '/data/GDWDB1/GDWDB1/redo01.log'
ORA-27091: unable to queue I/O
ORA-27069: attempt to do I/O beyond the range of the file
Additional information: 1
Additional information: 1
Thu Oct 8 16:53:48 2009
Aborting crash recovery due to error 320
Thu Oct 8 16:53:48 2009
Errors in file /opt/oracle/admin/GDWDB1/udump/gdwdb1_ora_13740.trc:
ORA-00320: cannot read file header from log 1 of thread 1
ORA-00312: online log 1 thread 1: '/data/GDWDB1/GDWDB1/redo01.log'
ORA-27091: unable to queue I/O
ORA-27069: attempt to do I/O beyond the range of the file
Additional information: 1
Additional information: 1
ORA-320 signalled during: ALTER DATABASE OPEN...

報的是redo檔案錯誤，無法讀取檔案頭。
檢視了一下redo檔案，發現三個redo檔案的大小都是0。

再檢視跟蹤檔案/opt/oracle/admin/GDWDB1/udump/gdwdb1_ora_13740.trc，也沒有看到什麼線索。
Ioctl ASYNC_CONFIG error, errno = 1
/opt/oracle/admin/GDWDB1/udump/gdwdb1_ora_17405.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /opt/oracle/OraHome_1
System name:    HP-UX
Node name:      GDWDB
Release:        B.11.31
Version:        U
Machine:        ia64
Instance name: GDWDB1
Redo thread mounted by this instance: 1
Oracle process number: 0
Unix process pid: 17405, image: oracle@GDWDB

資料庫沒有歸檔也沒有備份。沒辦法，XX的東西，見怪不怪了。

3.既然redo檔案都沒有了，就無法保證資料的一致性了，先清理一下redo，故障依然，但具體錯誤也忘記了。
SQL>alter database clear unarchived logfile group 1;
SQL>alter database clear unarchived logfile group 2;
SQL>alter database clear unarchived logfile group 3;

4.使用resetlogs開啟資料庫
SQL> shutdown immediate
ORA-01507: database not mounted

ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.

Total System Global Area 2.1475E+10 bytes
Fixed Size                  2241992 bytes
Variable Size            5192837688 bytes
Database Buffers         1.6274E+10 bytes
Redo Buffers                5857280 bytes
Database mounted.
SQL> recover database using backup controlfile until cancel;
ORA-00279: change 62463922 generated at 10/08/2009 18:52:37 needed for thread 1
ORA-00289: suggestion : /opt/oracle/OraHome_1/dbs/arch1_1_699734980.dbf
ORA-00280: change 62463922 for thread 1 is in sequence #1

Specify log: {=suggested | filename | AUTO | CANCEL}
cancel
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/data/GDWDB1/GDWDB1/system01.dbf'

ORA-01112: media recovery not started

SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel

5.使用隱含引數_allow_resetlogs_corruption 以resetlogs方式開啟資料庫
SQL> shutdown immediate
ORA-01507: database not mounted

ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.

Total System Global Area 2.1475E+10 bytes
Fixed Size                  2241992 bytes
Variable Size            5192837688 bytes
Database Buffers         1.6274E+10 bytes
Redo Buffers                5857280 bytes
Database mounted.
SQL> recover database using backup controlfile until cancel;
ORA-00279: change 62463922 generated at 10/08/2009 18:52:37 needed for thread 1
ORA-00289: suggestion : /opt/oracle/OraHome_1/dbs/arch1_1_699734980.dbf
ORA-00280: change 62463922 for thread 1 is in sequence #1

Specify log: {=suggested | filename | AUTO | CANCEL}
cancel
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/data/GDWDB1/GDWDB1/system01.dbf'

ORA-01112: media recovery not started

SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel

檢視alert日誌，發現如下錯誤
Errors in file /opt/oracle/admin/GDWDB1/udump/gdwdb1_ora_1370.trc:
ORA-00600: internal error code, arguments: [2662], [0], [62464000], [0], [62512366], [8388617], [], []
Thu Oct 8 20:12:00 2009
Errors in file /opt/oracle/admin/GDWDB1/udump/gdwdb1_ora_1370.trc:
ORA-00600: internal error code, arguments: [2662], [0], [62464000], [0], [62512366], [8388617], [], []
Thu Oct 8 20:12:00 2009
Error 600 happened during db open, shutting down database
USER: terminating instance due to error 600
Instance terminated by USER, pid = 1370
ORA-1092 signalled during: ALTER DATABASE OPEN...

看看關於這個錯誤的資料，原因是scn不一致引起的
ORA-00600: internal error code, arguments: [2662], [0], [62464000], [0], [62512366], [8388617], [], []
就是說當前scn：62464000 < 檔案scn：62512366

6.強行推進scn
SQL>alter system set "_ALLOW_RESETLOGS_CORRUPTION"=false scope=spfile;
SQL>shutdown immediate
SQL>startup mount
SQL>alter session set events 'IMMEDIATE trace name ADJUST_SCN level 2';--將scn推進2百萬。
SQL>alter database open;

新的錯誤又出來了，
ksedmp: internal or fatal errorORA-00600: internal error code, arguments: [4193], [1171], [1187], [], [], [], [], []
Current SQL statement for this session:
UPDATE SMON_SCN_TIME SET SCN_WRP=:1, SCN_BAS=:2, TIME_MP=:3, TIME_DP=:4
WHERE TIME_MP = :5 AND THREAD = :6 AND ROWNUM <= 1
還有部分錯誤報undo的錯誤。

再看看資料，說這錯誤是上一錯誤ORA-00600: internal error code, arguments: [2662]引起的副作用，需要重建一下undo表空間。

7.重建undo表空間
SQL>create undo tablespace undotbs02 as datafile '/data/GDWDB1/undotbs02.dbf' size 500M;
SQL>alter system set undo_tablespace = undotbs02 scope=both;
SQL>shutdown immediate
SQL>startup
資料庫成功開啟。

8.使用expdp將資料備份。

來自 “ ITPUB部落格 ” ，連結：http://blog.itpub.net/13605188/viewspace-616196/，如需轉載，請註明出處，否則將追究法律責任。

WCDMA測試庫故障處理過程

相關文章