備庫跳歸檔恢復的有趣案例

jeanron100發表於2016-06-03
    在Data Guard環境中,主備庫基本都是使用歸檔來傳遞資料的變化。如果主備的歸檔傳輸中斷,同時主庫的歸檔被刪除或者損壞,這種情況下備庫是沒法開始繼續接收歸檔,應用新的資料變更了。
    看到網友的文章中提到了SCN恢復的想法,感覺非常有意思,明白了思路,自己在本地也測試了一把,發現真是有趣。
    一般來說,主庫的歸檔丟失,常規的思路只能是重建備庫了。其實我們可以換一個角度來看這個問題,資料的變化在歸檔中是一個連續的過程,而在日誌檔案,資料檔案中則是一個狀態。我們可以直接透過物理增量備份的方式來恢復得到一個增量的資料變更結果集,在備庫直接應用即可,這個增量資料集中是包含了歸檔日誌中的資料變更,只是表現形式會有所不同。
    所以明白了這一點,我們就來實踐看看主庫中歸檔缺失的情況下,還是可以無需重建備庫而同步增量的資料變更。
主庫的狀態如下:
SQL> select open_mode from v$database;
OPEN_MODE
--------------------
READ WRITE
首先來得到一個基本的SCN值作為標記。
SQL> SELECT CURRENT_SCN FROM V$DATABASE;
CURRENT_SCN
-----------
    3213758
 
然後檢視備庫的SCN
SQL>  SELECT CURRENT_SCN FROM V$DATABASE;
CURRENT_SCN
-----------
    3213745
 
這個時候直接在備庫做類似斷電的操作。
SQL> shutdown abort
ORACLE instance shut down.     
這個時候主備之間的歸檔傳輸就會停止,我們開始在主庫中做一個資料變更,切換日誌,保證主備端存在較多的資料變更
主庫中切換日誌:
SQL> alter system switch logfile;
System altered.
SQL> alter system switch logfile;
System altered.
然後新建一個表,可以作為標記。
SQL> create table test as select *from all_objects;
Table created.
然後再次切換日誌
SQL> SQL> alter system switch logfile;
System altered.
好了主庫的變更就做好了,我們看看主庫的歸檔情況:
-rw-r----- 1 oracle oinstall 22546944 Jun  3 22:05 o1_mf_1_40_co33ob7r_.arc
-rw-r----- 1 oracle oinstall    41472 Jun  3 22:05 o1_mf_1_41_co33of81_.arc
-rw-r----- 1 oracle oinstall  3074560 Jun  3 22:11 o1_mf_1_42_co3402tz_.arc
-rw-r----- 1 oracle oinstall  2099200 Jun  3 22:15 o1_mf_1_43_co348mjq_.arc
-rw-r----- 1 oracle oinstall     3072 Jun  3 22:15 o1_mf_1_44_co348qxg_.arc
-rw-r----- 1 oracle oinstall  8830464 Jun  3 22:16 o1_mf_1_45_co3495t0_.arc
根據備庫的日誌,日誌序列號44的歸檔肯定是沒有應用到備庫的,我們來手工修改一下歸檔名稱,讓它無法在備庫應用。
修改歸檔名稱後,序列號44的歸檔就很醒目了。看名字肯定是應用不到備庫的。
[oracle@BX_133_45 2016_06_03]$ ll
total 35748
-rw-r----- 1 oracle oinstall 22546944 Jun  3 22:05 o1_mf_1_40_co33ob7r_.arc
-rw-r----- 1 oracle oinstall    41472 Jun  3 22:05 o1_mf_1_41_co33of81_.arc
-rw-r----- 1 oracle oinstall  3074560 Jun  3 22:11 o1_mf_1_42_co3402tz_.arc
-rw-r----- 1 oracle oinstall  2099200 Jun  3 22:15 o1_mf_1_43_co348mjq_.arc
-rw-r----- 1 oracle oinstall     3072 Jun  3 22:15 o1_mf_1_44_co348qxg_.arc.bak
-rw-r----- 1 oracle oinstall  8830464 Jun  3 22:16 o1_mf_1_45_co3495t0_.arc

然後在備庫操作
啟動備庫 STARTUP
備庫的日誌如下,可以看到備庫在接收應用歸檔44的時候,發現了GAP,但是卻無法修復。
All non-current ORLs have been archived.
Media Recovery Log /U01/app/oracle/fast_recovery_area/DGTEST2/archivelog/2016_06_03/o1_mf_1_43_co34kd5o_.arc
Media Recovery Waiting for thread 1 sequence 44
Fetching gap sequence in thread 1, gap sequence 44-44
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE  THROUGH ALL SWITCHOVER DISCONNECT  USING CURRENT LOGFILE
Fri Jun 03 22:22:47 2016
FAL[client]: Failed to request gap sequence
 GAP - thread 1 sequence 44-44
 DBID 3866107499 branch 909583854
FAL[client]: All defined FAL servers have been attempted.
------------------------------------------------------------
Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization
parameter is defined to a value that's sufficiently large
enough to maintain adequate log switch information to resolve
archivelog gaps.
好了,這個時候我們來開始修復這個問題,檢視備庫的SCN情況。
SQL> SELECT CURRENT_SCN FROM V$DATABASE;
CURRENT_SCN
-----------
    3214028

我們就以備庫的這個SCN為基礎,從主庫匯出一個增量備份。  
主庫:

[oracle@BX_133_45 2016_06_03]$ rman target /
connected to target database: DGTEST (DBID=3866107499)
RMAN> BACKUP INCREMENTAL FROM SCN 3214028 DATABASE FORMAT '/home/oracle/ForStandby_%U' tag 'FORSTANDBY';   
Starting backup at 03-JUN-16
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=225 device type=DISK
backup will be obsolete on date 10-JUN-16
archived logs will not be kept or backed up
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=/U01/app/oracle/oradata/dgtest/system01.dbf
input datafile file number=00002 name=/U01/app/oracle/oradata/dgtest/sysaux01.dbf
input datafile file number=00003 name=/U01/app/oracle/oradata/dgtest/undotbs01.dbf
input datafile file number=00004 name=/U01/app/oracle/oradata/dgtest/users01.dbf
channel ORA_DISK_1: starting piece 1 at 03-JUN-16
channel ORA_DISK_1: finished piece 1 at 03-JUN-16
piece handle=/home/oracle/ForStandby_07r78fk0_1_1 tag=FORSTANDBY comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
。。。
piece handle=/home/oracle/ForStandby_08r78fk2_1_1 tag=FORSTANDBY comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 03-JUN-16
傳輸備份到備庫,恢復
中間插播一個小插曲。
這是一個11g的備庫,所以取消日誌應用後,備庫就成為了只讀狀態。
備庫:
SQL> recover managed standby database cancel;
Media recovery complete.
SQL> select open_mode from v$database;
OPEN_MODE
----------------------------------------
READ ONLY
我們開啟rman日誌恢復會提示下面的奇怪問題。
[oracle@WEB_YQ_64.48 ~]$rman target /
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-00554: initialization of internal recovery manager package failed
RMAN-04005: error from target database:
ORA-06553: PLS-801: internal error [56327]
RMAN-04015: error setting target database character set to ZHS16GBK
其實解決方法就是重啟至mount狀態,然後恢復就沒有問題了。
[oracle@WEB_YQ_64.48 ~]$rman target /
connected to target database: DGTEST (DBID=3866107499, not open)
RMAN> CATALOG START WITH '/home/oracle/tmp';
using target database control file instead of recovery catalog
searching for all files that match the pattern /home/oracle/tmp
List of Files Unknown to the Database
=====================================
File Name: /home/oracle/tmp/ForStandby_07r78fk0_1_1
File Name: /home/oracle/tmp/ForStandby_08r78fk2_1_1

Do you really want to catalog the above files (enter YES or NO)? yes
cataloging files...
cataloging done

List of Cataloged Files
=======================
File Name: /home/oracle/tmp/ForStandby_07r78fk0_1_1
File Name: /home/oracle/tmp/ForStandby_08r78fk2_1_1
RMAN> recover database noredo ;
Starting recover at 2016-06-03 22:30:33
allocated channel: ORA_DISK_1
。。。
channel ORA_DISK_1: restore complete, elapsed time: 00:00:03
Finished recover at 2016-06-03 22:30:38

恢復的過程中,可以看到alert.log的輸出如下:
Fri Jun 03 22:30:35 2016
Incremental restore complete of datafile 4 /U01/app/oracle/oradata/dgtest/users01.dbf
  checkpoint is 3215512
  last deallocation scn is 3
Incremental restore complete of datafile 3 /U01/app/oracle/oradata/dgtest/undotbs01.dbf
  checkpoint is 3215512
  last deallocation scn is 3
Incremental restore complete of datafile 2 /U01/app/oracle/oradata/dgtest/sysaux01.dbf
  checkpoint is 3215512
  last deallocation scn is 995211
Incremental restore complete of datafile 1 /U01/app/oracle/oradata/dgtest/system01.dbf
  checkpoint is 3215512
  last deallocation scn is 993074
SCN也在遞增,而恢復成功之後的,控制檔案還是以前的。  所以檢視SCN的還是恢復前的狀態。   
SQL>SELECT CURRENT_SCN FROM V$DATABASE
CURRENT_SCN
-----------
    3214028
解決方法也很簡單,直接從主庫生成控制檔案,複製到備庫即可。
主庫:
SQL> alter database create standby controlfile as '/home/oracle/std_con01.ctl';
Database altered.
然後在備庫應用即可,當然備庫需要在nomount狀態
startup nomount

RMAN> RESTORE STANDBY CONTROLFILE FROM '/home/oracle/tmp/std_con01.ctl';
Starting restore at 2016-06-03 22:39:20
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=189 device type=DISK

channel ORA_DISK_1: copied control file copy
output file name=/U01/app/oracle/oradata/dgtest/control01.ctl
output file name=/U01/app/oracle/fast_recovery_area/dgtest/control02.ctl
Finished restore at 2016-06-03 22:39:21

再次檢視SCN就是一個相對較高的值了。
SQL> SELECT CURRENT_SCN FROM V$DATABASE;
CURRENT_SCN
-----------
    3223400
而這個時候檢視alert.log的輸出就會發現,日誌會從序列號47開始應用,直接跳過了44,45,46三個歸檔。   
Media Recovery Waiting for thread 1 sequence 47
Fetching gap sequence in thread 1, gap sequence 47-48
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE  THROUGH ALL SWITCHOVER DISCONNECT  USING CURRENT LOGFILE
Fri Jun 03 22:40:12 2016
RFS[3]: Assigned to RFS process 29087
RFS[3]: Opened log for thread 1 sequence 47 dbid -428859797 branch 909583854
Fri Jun 03 22:40:12 2016
RFS[4]: Assigned to RFS process 29089
RFS[4]: Opened log for thread 1 sequence 48 dbid -428859797 branch 909583854
Archived Log entry 2 added for thread 1 sequence 47 rlc 909583854 ID 0xe6703c6b dest 2:
Media Recovery Log /U01/app/oracle/fast_recovery_area/DGTEST2/archivelog/2016_06_03/o1_mf_1_47_co35pdcj_.arc
Media Recovery Waiting for thread 1 sequence 48 (in transit)
Archived Log entry 3 added for thread 1 sequence 48 rlc 909583854 ID 0xe6703c6b dest 2:
Media Recovery Log /U01/app/oracle/fast_recovery_area/DGTEST2/archivelog/2016_06_03/o1_mf_1_48_co35pdgw_.arc
Fri Jun 03 22:40:20 2016
Media Recovery Log /U01/app/oracle/fast_recovery_area/DGTEST2/archivelog/2016_06_03/o1_mf_1_49_co35p231_.arc
Media Recovery Waiting for thread 1 sequence 50 (in transit) 
至此,整個恢復的過程就順利完成了,如果是一個TB級的資料庫出現此類問題,我們就可以避免重建備庫,而使用這些小技巧即可從繁瑣中解放出來。



來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/23718752/viewspace-2113740/,如需轉載,請註明出處,否則將追究法律責任。

相關文章