缺少歸檔導致不完整恢復

yangtingkun發表於2012-03-04

簡單介紹一次客戶由於缺少歸檔造成資料丟失的案例。

 

 

分享一個案例:客戶的一個測試資料庫要遷移到另一臺伺服器上,客戶DBA在將資料庫備份到帶庫後,對原資料庫伺服器進行了格式化。

DBA在目標伺服器上進行資料庫的恢復時發現了問題。資料庫的備份雖然成功完成,但是隨後的歸檔日誌的備份由於報錯失敗了。現在只有一個全庫的熱備份,而沒有任何歸檔日誌的備份。目前已經不是丟失資料的問題了,而是缺少一致性的備份,資料庫根本無法開啟。

當然這個案例的恢復對於我們來說不算什麼困難的事情,全庫恢復後,直接RECOVER UNTIL CANCEL,嘗試直接OPEN RESETLOGS,會出現錯誤:

SQL> conn / as sysdba
Connected.
SQL> alter database open read only;
alter database open read only
*
ERROR at line 1:
ORA-16004: backup database requires recovery
ORA-01152: file 1 was not restored from a sufficiently old backup
ORA-01110: data file 1: '/oradata/orcl/system01.dbf'

透過新增“_allow_resetlogs_corruption=true,然後RECOVER DATABASE USING BACKUP CONTROLFILE UNTIL CANCEL;最後透過ALTER DATABASE OPEN RESETLOGS開啟。

資料庫開啟後,很快就會由於嘗試恢復UNDO中的資訊導致資料庫的再次DOWN掉:

Mon Jan 16 14:27:05 2012
Errors in file /oracle/admin/orcl/bdump/orcl_smon_53215672.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Opening with internal Resource Manager plan
where NUMA PG = 1, CPUs = 48
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=40, OS id=40239468
Mon Jan 16 14:27:06 2012
ORACLE Instance orcl (pid = 13) - Error 600 encountered while recovering transaction (9, 7).
Mon Jan 16 14:27:06 2012
Errors in file /oracle/admin/orcl/bdump/orcl_smon_53215672.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Mon Jan 16 14:27:07 2012
ORA-01555 caused by SQL statement below (SQL ID: 5wc2915k44m38, Query Duration=0 sec, SCN: 0x0000.003bd268):
Mon Jan 16 14:27:07 2012
select user#,type# from user$ where name=:1
Mon Jan 16 14:27:07 2012
LOGSTDBY: Validating controlfile with logical metadata
Mon Jan 16 14:27:07 2012
LOGSTDBY: Validation complete
Completed: alter database open resetlogs
Mon Jan 16 14:32:12 2012
Errors in file /oracle/admin/orcl/bdump/orcl_smon_53215672.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Mon Jan 16 14:32:13 2012
ORACLE Instance orcl (pid = 13) - Error 600 encountered while recovering transaction (9, 7).
Mon Jan 16 14:32:13 2012
Errors in file /oracle/admin/orcl/bdump/orcl_smon_53215672.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Mon Jan 16 14:37:15 2012
Errors in file /oracle/admin/orcl/bdump/orcl_smon_53215672.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Mon Jan 16 14:37:16 2012
ORACLE Instance orcl (pid = 13) - Error 600 encountered while recovering transaction (9, 7).
Mon Jan 16 14:37:16 2012
Errors in file /oracle/admin/orcl/bdump/orcl_smon_53215672.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Mon Jan 16 14:42:17 2012
Errors in file /oracle/admin/orcl/bdump/orcl_smon_53215672.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Mon Jan 16 14:42:18 2012
ORACLE Instance orcl (pid = 13) - Error 600 encountered while recovering transaction (9, 7).
Mon Jan 16 14:42:18 2012
Errors in file /oracle/admin/orcl/bdump/orcl_smon_53215672.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Mon Jan 16 14:47:19 2012
Errors in file /oracle/admin/orcl/bdump/orcl_smon_53215672.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Mon Jan 16 14:47:20 2012
Errors in file /oracle/admin/orcl/bdump/orcl_pmon_58655160.trc:
ORA-00474: SMON process terminated with error
Mon Jan 16 14:47:20 2012
PMON: terminating instance due to error 474
Instance terminated by PMON, pid = 58655160

對於這個錯誤,可以透過設定EVENTS 10513來避免後臺PMON程式進行資料庫的事務回滾,也可以透過設定_CORRUPTED_ROLLBACK_SEGMENTS來避免這個錯誤的產生。

再次重啟後,透過EXP匯出全庫,重建資料庫後匯入即可。

其實這裡想要探討的並非是恢復技術本身,而是備份策略以及備份有效性檢查的重要性。DBA在備份完成後,只需要簡單的看一下備份輸出的LOG檔案,就可以馬上判斷備份是否完整,可惜的是,就是這個簡單的確認操作沒有進行,導致了最終重啟資料庫且丟失資料的損失。

 

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/4227/viewspace-717874/,如需轉載,請註明出處,否則將追究法律責任。

相關文章