今天在測試庫執行select * from v$log的時候報錯
select * from v$log
*
ERROR at line 1:
ORA-00210: cannot open the specified control file
ORA-00202: control file: '/opt/oracle/oradata/control01.ctl'
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3

報錯資訊很明確，控制檔案找不到了。根據控制檔案的路徑，透過作業系統命令來檢視，檔案確實已經不在了。
ls -l /opt/oracle/oradata/control01.ctl
ls: cannot access /opt/oracle/oradata/control01.ctl: No such file or directory

select * from v$controlfile;
STATUS NAME IS_REC BLOCK_SIZE FILE_SIZE_BLKS
-------------- ---------------------------------------- ------ ---------- --------------
/opt/oracle/oradata/control01.ctl NO 16384 1972

糟糕的是，測試庫的控制檔案只有一份，如果有多份的話，透過複製+重新命名可以解決一個控制檔案丟失的問題（關閉資料庫後，進行複製+重新命名）。

在這種情況下，要不停掉資料庫，手工建立控制檔案，或者第二種辦法：由於資料庫還是開啟的狀態，雖然檔案被刪除，但是例項（程式）依然會對訪問過的檔案保持開啟狀態。

ps -ef | grep lgwr
oracle 11780 1 0 Jun27 ? 00:05:02 ora_lgwr_fio

cd /proc/11780/fd

ls -ltr
total 0
lr-x------. 1 oracle oinstall 64 Jul 9 09:25 9 -> /dev/null
lr-x------. 1 oracle oinstall 64 Jul 9 09:25 8 -> /dev/null
lrwx------. 1 oracle oinstall 64 Jul 9 09:25 7 -> /opt/oracle/products/11.2.0/dbs/hc_fio.dat
lr-x------. 1 oracle oinstall 64 Jul 9 09:25 6 -> /dev/null
lr-x------. 1 oracle oinstall 64 Jul 9 09:25 5 -> /dev/null
lr-x------. 1 oracle oinstall 64 Jul 9 09:25 4 -> /dev/null
lr-x------. 1 oracle oinstall 64 Jul 9 09:25 3 -> /dev/null
l-wx------. 1 oracle oinstall 64 Jul 9 09:25 27 -> /opt/oracle/diag/rdbms/dlsp/fio/trace/fio_lgwr_11780.trm
l-wx------. 1 oracle oinstall 64 Jul 9 09:25 26 -> /opt/oracle/diag/rdbms/dlsp/fio/trace/fio_lgwr_11780.trc
lrwx------. 1 oracle oinstall 64 Jul 9 09:25 257 -> /dev/dfa
lrwx------. 1 oracle oinstall 64 Jul 9 09:25 256 -> /opt/oracle/oradata/control01.ctl (deleted)

可以看到LGWR程式保持了對檔案 256 -> /opt/oracle/oradata/control01.ctl (deleted) 的訪問，可以透過複製命令把檔案還原。

cp 256 /opt/oracle/oradata/control01.ctl
chown oracle.oinstall /opt/oracle/oradata/control01.ctl

再檢視v$log，已經可已正常檢視了。

select * from v$log;

GROUP# THREAD# SEQUENCE# BYTES BLOCKSIZE MEMBERS ARCHIV STATUS FIRST_CHANGE# FIRST_TIME
---------- ---------- ---------- ---------- ---------- ---------- ------ -------------------------------- ------------- -------------------
NEXT_CHANGE# NEXT_TIME
------------ -------------------
1 1 3436 1073741824 512 1 YES INACTIVE 2011282 2014-07-09 09:18:19
3684369 2014-07-09 09:20:28

2 1 3437 1073741824 512 1 YES ACTIVE 3684369 2014-07-09 09:20:28
4951745 2014-07-09 09:25:08

3 1 3438 1073741824 512 1 NO CURRENT 4951745 2014-07-09 09:25:08
2.8147E+14

但是緊接著，例項突然就DOWN掉了，檢視ALERT檔案。
********************* ATTENTION: ********************
The controlfile header block returned by the OS
has a sequence number that is too old.
The controlfile might be corrupted.
PLEASE DO NOT ATTEMPT TO START UP THE INSTANCE
without following the steps below.
RE-STARTING THE INSTANCE CAN CAUSE SERIOUS DAMAGE
TO THE DATABASE, if the controlfile is truly corrupted.
In order to re-start the instance safely,
please do the following:
(1) Save all copies of the controlfile for later
analysis and contact your OS vendor and Oracle support.
(2) Mount the instance and issue:
ALTER DATABASE BACKUP CONTROLFILE TO TRACE;
(3) Unmount the instance.
(4) Use the script in the trace file to
RE-CREATE THE CONTROLFILE and open the database.
*****************************************************
CKPT (ospid: 11782): terminating the instance
Wed Jul 09 09:30:08 2014
System state dump requested by (instance=1, osid=11782 (CKPT)), summary=[abnormal instance termination].
System State dumped to trace file /opt/oracle/diag/rdbms/dlsp/fio/trace/fio_diag_11761.trc
Wed Jul 09 09:30:08 2014
ORA-1092 : opitsk aborting process
Non critical error ORA-48913 caught while writing to trace file "/opt/oracle/diag/rdbms/dlsp/fio/trace/fio_diag_11761.trc"
Error message: ORA-48913: Writing into trace file failed, file size limit [2000000] reached
Writing to the above trace file is disabled for now on...
Dumping diagnostic data in directory=[cdmp_20140709093008], requested by (instance=1, osid=11782 (CKPT)), summary=[abnormal instance termination].
Instance terminated by CKPT, pid = 11782

基本的意思是，控制檔案太舊了，然後檢查點程式終止了例項。

嘗試開啟資料庫

@>startup
ORACLE instance started.

Total System Global Area 1.9108E+10 bytes
Fixed Size 2236328 bytes
Variable Size 2952790104 bytes
Database Buffers 1.6106E+10 bytes
Redo Buffers 46886912 bytes
Database mounted.
ORA-00338: log 1 of thread 1 is more recent than control file
ORA-00312: online log 1 thread 1: '+DG_DATA/dlsp/redo01.log'

提示日誌裡記錄的資訊（檢查點資訊）比控制檔案新。

看來只能重新建立控制檔案了
alter database backup controlfile to trace as '/tmp/a.txt';

@>startup nomount
ORACLE instance started.

Total System Global Area 1.9108E+10 bytes
Fixed Size 2236328 bytes
Variable Size 2952790104 bytes
Database Buffers 1.6106E+10 bytes
Redo Buffers 46886912 bytes
@>
@>CREATE CONTROLFILE REUSE DATABASE "DLSP" NORESETLOGS FORCE LOGGING ARCHIVELOG
2 MAXLOGFILES 60
3 MAXLOGMEMBERS 5
4 MAXDATAFILES 2000
5 MAXINSTANCES 1
6 MAXLOGHISTORY 8631
7 LOGFILE
8 GROUP 1 '+DG_DATA/dlsp/redo01.log' SIZE 1024M BLOCKSIZE 512,
9 GROUP 2 '+DG_DATA/dlsp/redo02.log' SIZE 1024M BLOCKSIZE 512,
10 GROUP 3 '+DG_DATA/dlsp/redo03.log' SIZE 1024M BLOCKSIZE 512
11 -- STANDBY LOGFILE
12 DATAFILE
13 '+DG_DATA/dlsp/system01.dbf',
14 '+DG_DATA/dlsp/sysaux01.dbf',
15 '+DG_DATA/dlsp/undotbs01.dbf',
16 '+DG_DATA/dlsp/user01.dbf'
17 CHARACTER SET UTF8
18 ;

Control file created.

@>alter database open;
alter database open
*
ERROR at line 1:
ORA-01113: file 1 needs media recovery
ORA-01110: data file 1: '+DG_DATA/dlsp/system01.dbf'
提示資料庫需要恢復。

@>recover database;
Media recovery complete.
@>alter database open;

Database altered.

@>col name for a40
@>select * from v$controlfile;

STATUS NAME IS_REC BLOCK_SIZE FILE_SIZE_BLKS
-------------- ---------------------------------------- ------ ---------- --------------
/opt/oracle/oradata/control01.ctl NO 16384 1972

至此整個恢復過程結束。

雖然理論上可以把誤刪除的控制檔案、資料檔案、日誌檔案找回來（資料庫必須沒關閉！！），但是就像本案例看到的，可能會遭遇這樣那樣的錯誤，DBA對資料庫的操作一定要謹慎。

測試環境控制檔案被誤刪的故障處理

相關文章