測試環境控制檔案被誤刪的故障處理
今天在測試庫執行select * from v$log的時候報錯
select * from v$log
*
ERROR at line 1:
ORA-00210: cannot open the specified control file
ORA-00202: control file: '/opt/oracle/oradata/control01.ctl'
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
報錯資訊很明確,控制檔案找不到了。根據控制檔案的路徑,透過作業系統命令來檢視,檔案確實已經不在了。
ls -l /opt/oracle/oradata/control01.ctl
ls: cannot access /opt/oracle/oradata/control01.ctl: No such file or directory
select * from v$controlfile;
STATUS NAME IS_REC BLOCK_SIZE FILE_SIZE_BLKS
-------------- ---------------------------------------- ------ ---------- --------------
/opt/oracle/oradata/control01.ctl NO 16384 1972
糟糕的是,測試庫的控制檔案只有一份,如果有多份的話,透過複製+重新命名可以解決一個控制檔案丟失的問題(關閉資料庫後,進行複製+重新命名)。
在這種情況下,要不停掉資料庫,手工建立控制檔案,或者第二種辦法:由於資料庫還是開啟的狀態,雖然檔案被刪除,但是例項(程式)依然會對訪問過的檔案保持開啟狀態。
ps -ef | grep lgwr
oracle 11780 1 0 Jun27 ? 00:05:02 ora_lgwr_fio
cd /proc/11780/fd
ls -ltr
total 0
lr-x------. 1 oracle oinstall 64 Jul 9 09:25 9 -> /dev/null
lr-x------. 1 oracle oinstall 64 Jul 9 09:25 8 -> /dev/null
lrwx------. 1 oracle oinstall 64 Jul 9 09:25 7 -> /opt/oracle/products/11.2.0/dbs/hc_fio.dat
lr-x------. 1 oracle oinstall 64 Jul 9 09:25 6 -> /dev/null
lr-x------. 1 oracle oinstall 64 Jul 9 09:25 5 -> /dev/null
lr-x------. 1 oracle oinstall 64 Jul 9 09:25 4 -> /dev/null
lr-x------. 1 oracle oinstall 64 Jul 9 09:25 3 -> /dev/null
l-wx------. 1 oracle oinstall 64 Jul 9 09:25 27 -> /opt/oracle/diag/rdbms/dlsp/fio/trace/fio_lgwr_11780.trm
l-wx------. 1 oracle oinstall 64 Jul 9 09:25 26 -> /opt/oracle/diag/rdbms/dlsp/fio/trace/fio_lgwr_11780.trc
lrwx------. 1 oracle oinstall 64 Jul 9 09:25 257 -> /dev/dfa
lrwx------. 1 oracle oinstall 64 Jul 9 09:25 256 -> /opt/oracle/oradata/control01.ctl (deleted)
可以看到LGWR程式保持了對檔案 256 -> /opt/oracle/oradata/control01.ctl (deleted) 的訪問,可以透過複製命令把檔案還原。
cp 256 /opt/oracle/oradata/control01.ctl
chown oracle.oinstall /opt/oracle/oradata/control01.ctl
再檢視v$log,已經可已正常檢視了。
select * from v$log;
GROUP# THREAD# SEQUENCE# BYTES BLOCKSIZE MEMBERS ARCHIV STATUS FIRST_CHANGE# FIRST_TIME
---------- ---------- ---------- ---------- ---------- ---------- ------ -------------------------------- ------------- -------------------
NEXT_CHANGE# NEXT_TIME
------------ -------------------
1 1 3436 1073741824 512 1 YES INACTIVE 2011282 2014-07-09 09:18:19
3684369 2014-07-09 09:20:28
2 1 3437 1073741824 512 1 YES ACTIVE 3684369 2014-07-09 09:20:28
4951745 2014-07-09 09:25:08
3 1 3438 1073741824 512 1 NO CURRENT 4951745 2014-07-09 09:25:08
2.8147E+14
但是緊接著,例項突然就DOWN掉了,檢視ALERT檔案。
********************* ATTENTION: ********************
The controlfile header block returned by the OS
has a sequence number that is too old.
The controlfile might be corrupted.
PLEASE DO NOT ATTEMPT TO START UP THE INSTANCE
without following the steps below.
RE-STARTING THE INSTANCE CAN CAUSE SERIOUS DAMAGE
TO THE DATABASE, if the controlfile is truly corrupted.
In order to re-start the instance safely,
please do the following:
(1) Save all copies of the controlfile for later
analysis and contact your OS vendor and Oracle support.
(2) Mount the instance and issue:
ALTER DATABASE BACKUP CONTROLFILE TO TRACE;
(3) Unmount the instance.
(4) Use the script in the trace file to
RE-CREATE THE CONTROLFILE and open the database.
*****************************************************
CKPT (ospid: 11782): terminating the instance
Wed Jul 09 09:30:08 2014
System state dump requested by (instance=1, osid=11782 (CKPT)), summary=[abnormal instance termination].
System State dumped to trace file /opt/oracle/diag/rdbms/dlsp/fio/trace/fio_diag_11761.trc
Wed Jul 09 09:30:08 2014
ORA-1092 : opitsk aborting process
Non critical error ORA-48913 caught while writing to trace file "/opt/oracle/diag/rdbms/dlsp/fio/trace/fio_diag_11761.trc"
Error message: ORA-48913: Writing into trace file failed, file size limit [2000000] reached
Writing to the above trace file is disabled for now on...
Dumping diagnostic data in directory=[cdmp_20140709093008], requested by (instance=1, osid=11782 (CKPT)), summary=[abnormal instance termination].
Instance terminated by CKPT, pid = 11782
基本的意思是,控制檔案太舊了,然後檢查點程式終止了例項。
嘗試開啟資料庫
@>startup
ORACLE instance started.
Total System Global Area 1.9108E+10 bytes
Fixed Size 2236328 bytes
Variable Size 2952790104 bytes
Database Buffers 1.6106E+10 bytes
Redo Buffers 46886912 bytes
Database mounted.
ORA-00338: log 1 of thread 1 is more recent than control file
ORA-00312: online log 1 thread 1: '+DG_DATA/dlsp/redo01.log'
看來只能重新建立控制檔案了
alter database backup controlfile to trace as '/tmp/a.txt';
@>startup nomount
ORACLE instance started.
Total System Global Area 1.9108E+10 bytes
Fixed Size 2236328 bytes
Variable Size 2952790104 bytes
Database Buffers 1.6106E+10 bytes
Redo Buffers 46886912 bytes
@>
@>CREATE CONTROLFILE REUSE DATABASE "DLSP" NORESETLOGS FORCE LOGGING ARCHIVELOG
2 MAXLOGFILES 60
3 MAXLOGMEMBERS 5
4 MAXDATAFILES 2000
5 MAXINSTANCES 1
6 MAXLOGHISTORY 8631
7 LOGFILE
8 GROUP 1 '+DG_DATA/dlsp/redo01.log' SIZE 1024M BLOCKSIZE 512,
9 GROUP 2 '+DG_DATA/dlsp/redo02.log' SIZE 1024M BLOCKSIZE 512,
10 GROUP 3 '+DG_DATA/dlsp/redo03.log' SIZE 1024M BLOCKSIZE 512
11 -- STANDBY LOGFILE
12 DATAFILE
13 '+DG_DATA/dlsp/system01.dbf',
14 '+DG_DATA/dlsp/sysaux01.dbf',
15 '+DG_DATA/dlsp/undotbs01.dbf',
16 '+DG_DATA/dlsp/user01.dbf'
17 CHARACTER SET UTF8
18 ;
Control file created.
@>alter database open;
alter database open
*
ERROR at line 1:
ORA-01113: file 1 needs media recovery
ORA-01110: data file 1: '+DG_DATA/dlsp/system01.dbf'
提示資料庫需要恢復。
@>recover database;
Media recovery complete.
@>alter database open;
Database altered.
@>col name for a40
@>select * from v$controlfile;
STATUS NAME IS_REC BLOCK_SIZE FILE_SIZE_BLKS
-------------- ---------------------------------------- ------ ---------- --------------
/opt/oracle/oradata/control01.ctl NO 16384 1972
至此整個恢復過程結束。
雖然理論上可以把誤刪除的控制檔案、資料檔案、日誌檔案找回來(資料庫必須沒關閉!!),但是就像本案例看到的,可能會遭遇這樣那樣的錯誤,DBA對資料庫的操作一定要謹慎。
select * from v$log
*
ERROR at line 1:
ORA-00210: cannot open the specified control file
ORA-00202: control file: '/opt/oracle/oradata/control01.ctl'
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
報錯資訊很明確,控制檔案找不到了。根據控制檔案的路徑,透過作業系統命令來檢視,檔案確實已經不在了。
ls -l /opt/oracle/oradata/control01.ctl
ls: cannot access /opt/oracle/oradata/control01.ctl: No such file or directory
select * from v$controlfile;
STATUS NAME IS_REC BLOCK_SIZE FILE_SIZE_BLKS
-------------- ---------------------------------------- ------ ---------- --------------
/opt/oracle/oradata/control01.ctl NO 16384 1972
糟糕的是,測試庫的控制檔案只有一份,如果有多份的話,透過複製+重新命名可以解決一個控制檔案丟失的問題(關閉資料庫後,進行複製+重新命名)。
在這種情況下,要不停掉資料庫,手工建立控制檔案,或者第二種辦法:由於資料庫還是開啟的狀態,雖然檔案被刪除,但是例項(程式)依然會對訪問過的檔案保持開啟狀態。
ps -ef | grep lgwr
oracle 11780 1 0 Jun27 ? 00:05:02 ora_lgwr_fio
cd /proc/11780/fd
ls -ltr
total 0
lr-x------. 1 oracle oinstall 64 Jul 9 09:25 9 -> /dev/null
lr-x------. 1 oracle oinstall 64 Jul 9 09:25 8 -> /dev/null
lrwx------. 1 oracle oinstall 64 Jul 9 09:25 7 -> /opt/oracle/products/11.2.0/dbs/hc_fio.dat
lr-x------. 1 oracle oinstall 64 Jul 9 09:25 6 -> /dev/null
lr-x------. 1 oracle oinstall 64 Jul 9 09:25 5 -> /dev/null
lr-x------. 1 oracle oinstall 64 Jul 9 09:25 4 -> /dev/null
lr-x------. 1 oracle oinstall 64 Jul 9 09:25 3 -> /dev/null
l-wx------. 1 oracle oinstall 64 Jul 9 09:25 27 -> /opt/oracle/diag/rdbms/dlsp/fio/trace/fio_lgwr_11780.trm
l-wx------. 1 oracle oinstall 64 Jul 9 09:25 26 -> /opt/oracle/diag/rdbms/dlsp/fio/trace/fio_lgwr_11780.trc
lrwx------. 1 oracle oinstall 64 Jul 9 09:25 257 -> /dev/dfa
lrwx------. 1 oracle oinstall 64 Jul 9 09:25 256 -> /opt/oracle/oradata/control01.ctl (deleted)
可以看到LGWR程式保持了對檔案 256 -> /opt/oracle/oradata/control01.ctl (deleted) 的訪問,可以透過複製命令把檔案還原。
cp 256 /opt/oracle/oradata/control01.ctl
chown oracle.oinstall /opt/oracle/oradata/control01.ctl
再檢視v$log,已經可已正常檢視了。
select * from v$log;
GROUP# THREAD# SEQUENCE# BYTES BLOCKSIZE MEMBERS ARCHIV STATUS FIRST_CHANGE# FIRST_TIME
---------- ---------- ---------- ---------- ---------- ---------- ------ -------------------------------- ------------- -------------------
NEXT_CHANGE# NEXT_TIME
------------ -------------------
1 1 3436 1073741824 512 1 YES INACTIVE 2011282 2014-07-09 09:18:19
3684369 2014-07-09 09:20:28
2 1 3437 1073741824 512 1 YES ACTIVE 3684369 2014-07-09 09:20:28
4951745 2014-07-09 09:25:08
3 1 3438 1073741824 512 1 NO CURRENT 4951745 2014-07-09 09:25:08
2.8147E+14
但是緊接著,例項突然就DOWN掉了,檢視ALERT檔案。
********************* ATTENTION: ********************
The controlfile header block returned by the OS
has a sequence number that is too old.
The controlfile might be corrupted.
PLEASE DO NOT ATTEMPT TO START UP THE INSTANCE
without following the steps below.
RE-STARTING THE INSTANCE CAN CAUSE SERIOUS DAMAGE
TO THE DATABASE, if the controlfile is truly corrupted.
In order to re-start the instance safely,
please do the following:
(1) Save all copies of the controlfile for later
analysis and contact your OS vendor and Oracle support.
(2) Mount the instance and issue:
ALTER DATABASE BACKUP CONTROLFILE TO TRACE;
(3) Unmount the instance.
(4) Use the script in the trace file to
RE-CREATE THE CONTROLFILE and open the database.
*****************************************************
CKPT (ospid: 11782): terminating the instance
Wed Jul 09 09:30:08 2014
System state dump requested by (instance=1, osid=11782 (CKPT)), summary=[abnormal instance termination].
System State dumped to trace file /opt/oracle/diag/rdbms/dlsp/fio/trace/fio_diag_11761.trc
Wed Jul 09 09:30:08 2014
ORA-1092 : opitsk aborting process
Non critical error ORA-48913 caught while writing to trace file "/opt/oracle/diag/rdbms/dlsp/fio/trace/fio_diag_11761.trc"
Error message: ORA-48913: Writing into trace file failed, file size limit [2000000] reached
Writing to the above trace file is disabled for now on...
Dumping diagnostic data in directory=[cdmp_20140709093008], requested by (instance=1, osid=11782 (CKPT)), summary=[abnormal instance termination].
Instance terminated by CKPT, pid = 11782
基本的意思是,控制檔案太舊了,然後檢查點程式終止了例項。
嘗試開啟資料庫
@>startup
ORACLE instance started.
Total System Global Area 1.9108E+10 bytes
Fixed Size 2236328 bytes
Variable Size 2952790104 bytes
Database Buffers 1.6106E+10 bytes
Redo Buffers 46886912 bytes
Database mounted.
ORA-00338: log 1 of thread 1 is more recent than control file
ORA-00312: online log 1 thread 1: '+DG_DATA/dlsp/redo01.log'
提示日誌裡記錄的資訊(檢查點資訊)比控制檔案新。
看來只能重新建立控制檔案了
alter database backup controlfile to trace as '/tmp/a.txt';
@>startup nomount
ORACLE instance started.
Total System Global Area 1.9108E+10 bytes
Fixed Size 2236328 bytes
Variable Size 2952790104 bytes
Database Buffers 1.6106E+10 bytes
Redo Buffers 46886912 bytes
@>
@>CREATE CONTROLFILE REUSE DATABASE "DLSP" NORESETLOGS FORCE LOGGING ARCHIVELOG
2 MAXLOGFILES 60
3 MAXLOGMEMBERS 5
4 MAXDATAFILES 2000
5 MAXINSTANCES 1
6 MAXLOGHISTORY 8631
7 LOGFILE
8 GROUP 1 '+DG_DATA/dlsp/redo01.log' SIZE 1024M BLOCKSIZE 512,
9 GROUP 2 '+DG_DATA/dlsp/redo02.log' SIZE 1024M BLOCKSIZE 512,
10 GROUP 3 '+DG_DATA/dlsp/redo03.log' SIZE 1024M BLOCKSIZE 512
11 -- STANDBY LOGFILE
12 DATAFILE
13 '+DG_DATA/dlsp/system01.dbf',
14 '+DG_DATA/dlsp/sysaux01.dbf',
15 '+DG_DATA/dlsp/undotbs01.dbf',
16 '+DG_DATA/dlsp/user01.dbf'
17 CHARACTER SET UTF8
18 ;
Control file created.
@>alter database open;
alter database open
*
ERROR at line 1:
ORA-01113: file 1 needs media recovery
ORA-01110: data file 1: '+DG_DATA/dlsp/system01.dbf'
提示資料庫需要恢復。
@>recover database;
Media recovery complete.
@>alter database open;
Database altered.
@>col name for a40
@>select * from v$controlfile;
STATUS NAME IS_REC BLOCK_SIZE FILE_SIZE_BLKS
-------------- ---------------------------------------- ------ ---------- --------------
/opt/oracle/oradata/control01.ctl NO 16384 1972
至此整個恢復過程結束。
雖然理論上可以把誤刪除的控制檔案、資料檔案、日誌檔案找回來(資料庫必須沒關閉!!),但是就像本案例看到的,可能會遭遇這樣那樣的錯誤,DBA對資料庫的操作一定要謹慎。
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/22034023/viewspace-1213644/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 被誤刪的檔案正確處理方法,快速找回誤刪的檔案
- oracle資料檔案被誤刪除後的災難處理方法Oracle
- RAC環境網路故障測試
- oracle 案例-控制檔案丟失故障處理過程Oracle
- UNDO表空間下的資料檔案被誤刪除後的處理方法
- 【故障處理】DG環境主庫丟失歸檔情況下資料檔案的恢復
- 誤刪資料庫資料檔案的處理方法資料庫
- 【故障處理】RAC環境第二節點無法歸檔的詭異問題處理
- 批處理-刪除環境變數變數
- RAC環境重建控制檔案
- 被誤刪的檔案快速恢復方法
- lustre檔案系統環境搭建及測試
- WCDMA測試庫故障處理過程
- otedisk OCR 檔案管理故障處理
- RAC環境中的快照控制檔案
- 處理MySQL複製環境Slave故障的一個案例MySql
- ORACLE伺服器異常斷電,控制檔案故障的處理步驟Oracle伺服器
- 恢復ext4檔案系統被誤刪的檔案
- Linux環境利用恢復被rm意外刪除資料檔案Linux
- 【恢復】非歸檔模式下因誤刪除資料檔案導致資料庫無法OPEN的故障處理模式資料庫
- 控制檔案恢復測試
- Pytest 如何使用切換被測試環境
- windows刪除檔案的批處理操作Windows
- win10中mdf檔案被誤刪如何恢復_win10系統mdf檔案被誤刪的恢復方法Win10
- 控制檔案丟失處理方法
- 表空間資料檔案故障處理
- 【故障處理】ORA-12162 錯誤的處理
- 誤刪除資料檔案、控制檔案的非RMAN恢復方法
- 關於測試環境版本管理(對於衝突檔案)
- 【故障處理】DBCA建庫詭異問題處理--rac環境不能建立rac庫
- 【故障處理】CRS-1153錯誤處理
- 【故障處理】ORA-19809錯誤處理
- oracle 誤刪除的處理方法Oracle
- Go多協程併發環境下的錯誤處理Go
- 配置開發環境、生成環境、測試環境開發環境
- alert_SID.log 告警檔案過大或誤刪除後的處理
- 非歸檔模式下資料檔案被誤刪的解決方法模式
- 控制檔案的誤解