測試環境控制檔案被誤刪的故障處理

wei-xh發表於2014-07-09
今天在測試庫執行select * from v$log的時候報錯
select * from v$log
              *
ERROR at line 1:
ORA-00210: cannot open the specified control file
ORA-00202: control file: '/opt/oracle/oradata/control01.ctl'
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3

報錯資訊很明確,控制檔案找不到了。根據控制檔案的路徑,透過作業系統命令來檢視,檔案確實已經不在了。
ls -l /opt/oracle/oradata/control01.ctl
ls: cannot access /opt/oracle/oradata/control01.ctl: No such file or directory


select * from v$controlfile;
STATUS         NAME                                     IS_REC BLOCK_SIZE FILE_SIZE_BLKS
-------------- ---------------------------------------- ------ ---------- --------------
               /opt/oracle/oradata/control01.ctl        NO          16384           1972


糟糕的是,測試庫的控制檔案只有一份,如果有多份的話,透過複製+重新命名可以解決一個控制檔案丟失的問題(關閉資料庫後,進行複製+重新命名)。

在這種情況下,要不停掉資料庫,手工建立控制檔案,或者第二種辦法:由於資料庫還是開啟的狀態,雖然檔案被刪除,但是例項(程式)依然會對訪問過的檔案保持開啟狀態。

ps -ef | grep lgwr
oracle   11780     1  0 Jun27 ?        00:05:02 ora_lgwr_fio

cd /proc/11780/fd

ls -ltr
total 0
lr-x------. 1 oracle oinstall 64 Jul  9 09:25 9 -> /dev/null
lr-x------. 1 oracle oinstall 64 Jul  9 09:25 8 -> /dev/null
lrwx------. 1 oracle oinstall 64 Jul  9 09:25 7 -> /opt/oracle/products/11.2.0/dbs/hc_fio.dat
lr-x------. 1 oracle oinstall 64 Jul  9 09:25 6 -> /dev/null
lr-x------. 1 oracle oinstall 64 Jul  9 09:25 5 -> /dev/null
lr-x------. 1 oracle oinstall 64 Jul  9 09:25 4 -> /dev/null
lr-x------. 1 oracle oinstall 64 Jul  9 09:25 3 -> /dev/null
l-wx------. 1 oracle oinstall 64 Jul  9 09:25 27 -> /opt/oracle/diag/rdbms/dlsp/fio/trace/fio_lgwr_11780.trm
l-wx------. 1 oracle oinstall 64 Jul  9 09:25 26 -> /opt/oracle/diag/rdbms/dlsp/fio/trace/fio_lgwr_11780.trc
lrwx------. 1 oracle oinstall 64 Jul  9 09:25 257 -> /dev/dfa
lrwx------. 1 oracle oinstall 64 Jul  9 09:25 256 -> /opt/oracle/oradata/control01.ctl (deleted)

可以看到LGWR程式保持了對檔案 256 -> /opt/oracle/oradata/control01.ctl (deleted) 的訪問,可以透過複製命令把檔案還原。

cp 256 /opt/oracle/oradata/control01.ctl
chown oracle.oinstall /opt/oracle/oradata/control01.ctl

再檢視v$log,已經可已正常檢視了。

select * from v$log;


    GROUP#    THREAD#  SEQUENCE#      BYTES  BLOCKSIZE    MEMBERS ARCHIV STATUS                           FIRST_CHANGE# FIRST_TIME
---------- ---------- ---------- ---------- ---------- ---------- ------ -------------------------------- ------------- -------------------
NEXT_CHANGE# NEXT_TIME
------------ -------------------
         1          1       3436 1073741824        512          1 YES    INACTIVE                               2011282 2014-07-09 09:18:19
     3684369 2014-07-09 09:20:28


         2          1       3437 1073741824        512          1 YES    ACTIVE                                 3684369 2014-07-09 09:20:28
     4951745 2014-07-09 09:25:08


         3          1       3438 1073741824        512          1 NO     CURRENT                                4951745 2014-07-09 09:25:08
  2.8147E+14



但是緊接著,例項突然就DOWN掉了,檢視ALERT檔案。
********************* ATTENTION: ********************
 The controlfile header block returned by the OS
 has a sequence number that is too old.
 The controlfile might be corrupted.
 PLEASE DO NOT ATTEMPT TO START UP THE INSTANCE
 without following the steps below.
 RE-STARTING THE INSTANCE CAN CAUSE SERIOUS DAMAGE
 TO THE DATABASE, if the controlfile is truly corrupted.
 In order to re-start the instance safely,
 please do the following:
 (1) Save all copies of the controlfile for later
     analysis and contact your OS vendor and Oracle support.
 (2) Mount the instance and issue:
     ALTER DATABASE BACKUP CONTROLFILE TO TRACE;
 (3) Unmount the instance.
 (4) Use the script in the trace file to
     RE-CREATE THE CONTROLFILE and open the database.
*****************************************************
CKPT (ospid: 11782): terminating the instance
Wed Jul 09 09:30:08 2014
System state dump requested by (instance=1, osid=11782 (CKPT)), summary=[abnormal instance termination].
System State dumped to trace file /opt/oracle/diag/rdbms/dlsp/fio/trace/fio_diag_11761.trc
Wed Jul 09 09:30:08 2014
ORA-1092 : opitsk aborting process
Non critical error ORA-48913 caught while writing to trace file "/opt/oracle/diag/rdbms/dlsp/fio/trace/fio_diag_11761.trc"
Error message: ORA-48913: Writing into trace file failed, file size limit [2000000] reached
Writing to the above trace file is disabled for now on...
Dumping diagnostic data in directory=[cdmp_20140709093008], requested by (instance=1, osid=11782 (CKPT)), summary=[abnormal instance termination].
Instance terminated by CKPT, pid = 11782

基本的意思是,控制檔案太舊了,然後檢查點程式終止了例項。

嘗試開啟資料庫

@>startup
ORACLE instance started.


Total System Global Area 1.9108E+10 bytes
Fixed Size                  2236328 bytes
Variable Size            2952790104 bytes
Database Buffers         1.6106E+10 bytes
Redo Buffers               46886912 bytes
Database mounted.
ORA-00338: log 1 of thread 1 is more recent than control file
ORA-00312: online log 1 thread 1: '+DG_DATA/dlsp/redo01.log'
提示日誌裡記錄的資訊(檢查點資訊)比控制檔案新。

看來只能重新建立控制檔案了
alter database backup controlfile to trace as '/tmp/a.txt';

@>startup nomount
ORACLE instance started.


Total System Global Area 1.9108E+10 bytes
Fixed Size                  2236328 bytes
Variable Size            2952790104 bytes
Database Buffers         1.6106E+10 bytes
Redo Buffers               46886912 bytes
@>
@>CREATE CONTROLFILE REUSE DATABASE "DLSP" NORESETLOGS FORCE LOGGING ARCHIVELOG
  2      MAXLOGFILES 60
  3      MAXLOGMEMBERS 5
  4      MAXDATAFILES 2000
  5      MAXINSTANCES 1
  6      MAXLOGHISTORY 8631
  7  LOGFILE
  8    GROUP 1 '+DG_DATA/dlsp/redo01.log'  SIZE 1024M BLOCKSIZE 512,
  9    GROUP 2 '+DG_DATA/dlsp/redo02.log'  SIZE 1024M BLOCKSIZE 512,
 10    GROUP 3 '+DG_DATA/dlsp/redo03.log'  SIZE 1024M BLOCKSIZE 512
 11  -- STANDBY LOGFILE
 12  DATAFILE
 13    '+DG_DATA/dlsp/system01.dbf',
 14    '+DG_DATA/dlsp/sysaux01.dbf',
 15    '+DG_DATA/dlsp/undotbs01.dbf',
 16    '+DG_DATA/dlsp/user01.dbf'
 17  CHARACTER SET UTF8
 18  ;


Control file created.


@>alter database open;
alter database open
*
ERROR at line 1:
ORA-01113: file 1 needs media recovery
ORA-01110: data file 1: '+DG_DATA/dlsp/system01.dbf'
提示資料庫需要恢復。

@>recover database; 
Media recovery complete.
@>alter database open;


Database altered.


@>col name for a40
@>select * from v$controlfile;


STATUS         NAME                                     IS_REC BLOCK_SIZE FILE_SIZE_BLKS
-------------- ---------------------------------------- ------ ---------- --------------
               /opt/oracle/oradata/control01.ctl        NO          16384           1972

至此整個恢復過程結束。

雖然理論上可以把誤刪除的控制檔案、資料檔案、日誌檔案找回來(資料庫必須沒關閉!!),但是就像本案例看到的,可能會遭遇這樣那樣的錯誤,DBA對資料庫的操作一定要謹慎。


來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/22034023/viewspace-1213644/,如需轉載,請註明出處,否則將追究法律責任。

相關文章