聊聊Oracle Data Recovery Advisor(DRA)(下)

不一樣的天空w發表於2017-11-24

上篇我們介紹了Oracle 11gData Recover AdvisorDRA)特性,並且給出了一個在啟動過程出錯故障的演示。本篇我們繼續看資料庫執行過程中,問題過程修復。

 

4、執行過程中故障

 

在執行過程中的oracle故障,壞塊和檔案異常刪除出現的比較多,特別是初級DBA剛剛上手的時候。我們先來模擬一下這個場景。

 

Undo表空間是Oracle核心表空間之一,刪除之後會引起比較嚴重的問題故障。

 

 

SQL> select file_name from dba_data_files where tablespace_name='UNDOTBS1';

 

FILE_NAME

--------------------------------------------------------------------------------

/u01/oradata/WILSON/datafile/o1_mf_undotbs1_7xt3yzl5_.dbf

 

 

當前資料庫處在Open執行狀態,突然Undo檔案被後OS層面刪除。

 

 

[oracle@bspdev datafile]$ ls -l | grep undo

-rw-r----- 1 oracle oinstall 346038272 Sep  6 07:21 o1_mf_undotbs1_7xt3yzl5_.dbf

[oracle@bspdev datafile]$ mv o1_mf_undotbs1_7xt3yzl5_.dbf o1_mf_undotbs1_7xt3yzl5_.dbf.bak

[oracle@bspdev datafile]$ ls -l | grep undo

-rw-r----- 1 oracle oinstall 346038272 Sep  6 07:21 o1_mf_undotbs1_7xt3yzl5_.dbf.bak

 

 

此時,alert log中可以出現上篇中那個“checker”的工作過程。

 

 

Fri Sep 06 07:25:47 2013

Checker run found 1 new persistent data failures

Fri Sep 06 07:26:34 2013

Starting background process SMCO

Fri Sep 06 07:26:34 2013

SMCO started with pid=19, OS id=4819

Fri Sep 06 07:26:46 2013

Errors in file /u01/diag/rdbms/wilson/wilson/trace/wilson_mmnl_4418.trc:

ORA-01116: error in opening database file 3

ORA-01110: data file 3: '/u01/oradata/WILSON/datafile/o1_mf_undotbs1_7xt3yzl5_.dbf'

ORA-27041: unable to open file

Linux Error: 2: No such file or directory

Additional information: 3

Fri Sep 06 07:26:48 2013

Errors in file /u01/diag/rdbms/wilson/wilson/trace/wilson_m000_4835.trc:

ORA-01116: error in opening database file 3

ORA-01110: data file 3: '/u01/oradata/WILSON/datafile/o1_mf_undotbs1_7xt3yzl5_.dbf'

ORA-27041: unable to open file

Linux Error: 2: No such file or directory

Additional information: 3

 

 

差不多兩秒鐘報一個錯誤,發現檔案被刪除無法開啟。

 

此時,我們在rman上使用list failure命令,檢視生成的錯誤資訊。

 

 

RMAN> list failure all;

 

List of Database Failures

=========================

 

Failure ID Priority Status    Time Detected Summary

---------- -------- --------- ------------- -------

242        HIGH     OPEN      06-SEP-13     One or more non-system datafiles are missing

 

 

我們使用advisor failure,檢視一個Oracle的建議。

 

 

RMAN> advise failure ;

 

List of Database Failures

=========================

 

Failure ID Priority Status    Time Detected Summary

---------- -------- --------- ------------- -------

242        HIGH     OPEN      06-SEP-13     One or more non-system datafiles are missing

 

analyzing automatic repair options; this may take some time

allocated channel: ORA_DISK_1

channel ORA_DISK_1: SID=30 device type=DISK

analyzing automatic repair options complete

 

Mandatory Manual Actions

========================

1. If file /u01/oradata/WILSON/datafile/o1_mf_undotbs1_7xt3yzl5_.dbf was unintentionally renamed or moved, restore it

2. Automatic repairs may be available if you shutdown the database and restart it in mount mode

3. Contact Oracle Support Services if the preceding recommendations cannot be used, or if they do not fix the failures selected for repair

 

Optional Manual Actions

=======================

no manual actions available

 

Automated Repair Options

========================

no automatic repair options available

 

 

注意,在automated repair options中,我們沒有看到指令碼資訊。說明Oracle好像在目前也沒有太好的方法。在Manual Actions中,Oracle DRA要求將資料庫重啟到mount狀態,才能有自動指令碼的出現。Manual Actions是那些Oracle覺得需要使用者手工執行才能繼續下去的步驟。

 

重新啟動一下庫,載入到mount狀態。

 

--強制關閉

RMAN> shutdown abort;

Oracle instance shut down

 

RMAN> startup mount;

 

connected to target database (not started)

Oracle instance started

database mounted

 

Total System Global Area     849530880 bytes

 

Fixed Size                     1339824 bytes

Variable Size                616566352 bytes

Database Buffers             226492416 bytes

Redo Buffers                   5132288 bytes

 

 

此時再次使用DRA工具,看問題和提示內容。

 

 

RMAN> advise failure;

 

List of Database Failures

=========================

 

Failure ID Priority Status    Time Detected Summary

---------- -------- --------- ------------- -------

242        HIGH     OPEN      06-SEP-13     One or more non-system datafiles are missing

 

analyzing automatic repair options; this may take some time

allocated channel: ORA_DISK_1

channel ORA_DISK_1: SID=18 device type=DISK

analyzing automatic repair options complete

 

Mandatory Manual Actions

========================

no manual actions available

 

Optional Manual Actions

=======================

1. If file /u01/oradata/WILSON/datafile/o1_mf_undotbs1_7xt3yzl5_.dbf was unintentionally renamed or moved, restore it

 

Automated Repair Options

========================

Option Repair Description

------ ------------------

1      Restore and recover datafile 3 

  Strategy: The repair includes complete media recovery with no data loss

  Repair script. /u01/diag/rdbms/wilson/wilson/hm/reco_1850469943.hm

 

 

在上篇中,我們是手工開啟hm檔案,看裡面的指令碼。其實還可以使用repair failure review命令來檢視執行語句。

 

 

RMAN> repair failure preview;

 

Strategy: The repair includes complete media recovery with no data loss

Repair script. /u01/diag/rdbms/wilson/wilson/hm/reco_1850469943.hm

 

contents of repair script.:

   # restore and recover datafile

   restore datafile 3;

   recover datafile 3;

 

 

注意:此時Oracle DRA發現了當前我們有Undo的備份和歸檔日誌。所以使用restore之後伴隨recover,可以快速實現恢復。

 

如果在preview中沒有發現什麼問題,可以repair failure命令執行進行恢復。

 

 

RMAN> repair failure;

 

Strategy: The repair includes complete media recovery with no data loss

Repair script. /u01/diag/rdbms/wilson/wilson/hm/reco_1850469943.hm

 

contents of repair script.:

   # restore and recover datafile

   restore datafile 3;

   recover datafile 3;

 

Do you really want to execute the above repair (enter YES or NO)? yes

executing repair script

 

Starting restore at 06-SEP-13

using channel ORA_DISK_1

 

channel ORA_DISK_1: starting datafile backup set restore

channel ORA_DISK_1: specifying datafile(s) to restore from backup set

channel ORA_DISK_1: restoring datafile 00003 to /u01/oradata/WILSON/datafile/o1_mf_undotbs1_7xt3yzl5_.dbf

channel ORA_DISK_1: reading from backup piece /u01/flash_recovery_area/WILSON/backupset/2013_09_06/o1_mf_nnndf_TAG20130906T061608_92l0od6w_.bkp

channel ORA_DISK_1: piece handle=/u01/flash_recovery_area/WILSON/backupset/2013_09_06/o1_mf_nnndf_TAG20130906T061608_92l0od6w_.bkp tag=TAG20130906T061608

channel ORA_DISK_1: restored backup piece 1

channel ORA_DISK_1: restore complete, elapsed time: 00:00:25

Finished restore at 06-SEP-13

 

Starting recover at 06-SEP-13

using channel ORA_DISK_1

 

starting media recovery

media recovery complete, elapsed time: 00:00:02

 

Finished recover at 06-SEP-13

repair failure complete

 

--可以選擇開啟資料庫

Do you want to open the database (enter YES or NO)? yes

database opened

 

 

我們在alert log中,可以監控到恢復的步驟。

 

--Restore過程

Fri Sep 06 07:35:49 2013

Full restore complete of datafile 3 /u01/oradata/WILSON/datafile/o1_mf_undotbs1_92l5b0v4_.dbf.  Elapsed time: 0:00:15

  checkpoint is 3838694

  last deallocation scn is 3817636

  Undo Optimization current scn is 3815429

Fri Sep 06 07:35:54 2013

alter database recover datafile list clear

Completed: alter database recover datafile list clear

--recovery過程

alter database recover if needed

 datafile 3

Media Recovery Start

Serial Media Recovery started

Recovery of Online Redo Log: Thread 1 Group 2 Seq 176 Reading mem 0

  Mem# 0: /u01/oradata/WILSON/onlinelog/o1_mf_2_870n48hc_.log

  Mem# 1: /u01/flash_recovery_area/WILSON/onlinelog/o1_mf_2_870n4dtl_.log

Recovery of Online Redo Log: Thread 1 Group 3 Seq 177 Reading mem 0

  Mem# 0: /u01/oradata/WILSON/onlinelog/o1_mf_3_870n4lsg_.log

  Mem# 1: /u01/flash_recovery_area/WILSON/onlinelog/o1_mf_3_870n4o31_.log

Recovery of Online Redo Log: Thread 1 Group 1 Seq 178 Reading mem 0

  Mem# 0: /u01/oradata/WILSON/onlinelog/o1_mf_1_870n42n1_.log

  Mem# 1: /u01/flash_recovery_area/WILSON/onlinelog/o1_mf_1_870n44z3_.log

Media Recovery Complete (wilson)

Completed: alter database recover if needed

 datafile 3

Fri Sep 06 07:36:04 2013

alter database open

 

 

此時,資料庫錯誤消除。

 

 

RMAN> list failure;

no failures found that match specification

 

 

最後,我們還有一個命令可以使用,就是change failureChange Failure命令的作用就是顯示的將錯誤的狀態修改掉。最常用的做法是:當一個錯誤發生的時候,如果我們沒有在RMAN層面上去解決,比如使用冷備份方法還原。Failure資訊是不會變化狀態的。此時,可以使用change failure命令將狀態設定為Closed,命令如:change failure all closed

 

5、結論

 

注意,目前的11g版本中,Data Recovery Advisor還不支援RAC環境。

 

隨著版本的推進,越來越多的Advisor出現在我們周圍。從目前看,Advisor只是一個資訊諮詢專家庫,我們可以聽也可以不聽。很多老資格的DBA對這些“花哨”產品也是比較不屑。筆者認為大可不必。

 

工具的出現,自動化、智慧化是任何一個事物的必然過程。可能在早期的版本中,一些Advisor存在這樣或者那樣的問題。但是隨著不斷的改進升級,這些Advisor變的越來越智慧,也是不可辯駁的事實。最終智慧化也只是時間的問題了。

 

那麼,作為傳統業務的DBA我們自己,應該怎麼做呢?首先,原理一定要作為基礎。任何技術,特別是Oracle近幾個版本,都遵循9i時期奠定的基礎框架和機制。很多花哨產品都是以此為基礎進行研發,所以理解基礎很重要。其次,業務價值。開發DBA是一個體現業務價值的重要方面,將資料庫的理念帶入到架構設計、開發過程,可以讓我們的系統銜接的更平順。最後就是行業優勢,Oracle是死的,應用行業是多樣的。每一個行業都有自己的特點和取向。作為DBA,特別是資深DBA,對業務資料的敏感度要遠大於開發團隊的很多人,把價值發揮出來,空間自然不會小。

 

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/31397003/viewspace-2147717/,如需轉載,請註明出處,否則將追究法律責任。

相關文章