dataguard中MRP無法啟動的問題分析和解決

kunlunzhiying發表於2017-10-22
自己手頭有一套dataguard環境,因為也有些日子沒有用了,結果突然心血來潮準備啟動起來學習一下,突然發現在敲了命令 recover managed standby database disconnect from session之後,命令執行正常,但是後臺卻報了ora錯誤。
Sat Jun 27 23:16:39 2015
Recovery Slave PR00 previously exited with exception 1157
Errors in file /u02/dg11g/diag/rdbms/dg11g/DG11G/trace/DG11G_mrp0_6514.trc:
ORA-01157: cannot identify/lock data file 7 - see DBWR trace file
ORA-01110: data file 7: '/u02/dg11g/oradata/DG11G/test_new01.dbf'
MRP0: Background Media Recovery process shutdown (DG11G)
Sat Jun 27 23:16:39 2015
Completed: ALTER DATABASE RECOVER  managed standby database disconnect from session
RFS[162]: Opened log for thread 1 sequence 171 dbid 1028247664 branch 880742847
RFS[161]: Opened log for thread 1 sequence 173 dbid 1028247664 branch 880742847
RFS[160]: Opened log for thread 1 sequence 172 dbid 1028247664 branch 880742847
透過上面的日誌我們可以看到,MRP程式是在做資料恢復的時候報了ora錯誤ora-01157
但是RFS還是沒有問題,RFS主要是從主庫來傳輸歸檔檔案的,可以看到能夠正常從主庫中傳輸歸檔日誌,sequence#號為171,173,172的歸檔日誌都傳輸到了備庫。

本來這個問題沒有引起多大的關注,想可能是哪些歸檔檔案沒有用到導致的,但是發現MRP壓根用不了。所以儘管歸檔傳輸完成了,但是資料變更還是應用不到備庫。
檢視v$archive_gap沒有任何記錄,說明沒有歸檔日誌apply的時候出現問題。

我們來看看這個ora問題的一些明細資訊,提示是在7號資料檔案的地方報了ora-01157錯誤。
Errors in file /u02/dg11g/diag/rdbms/dg11g/DG11G/trace/DG11G_mrp0_6514.trc:
ORA-01157: cannot identify/lock data file 7 - see DBWR trace file
ORA-01110: data file 7: '/u02/dg11g/oradata/DG11G/test_new01.dbf'
從官方對於這個問題的描述來看,似乎是資料檔案出了問題。
$ oerr ora 01157
01157, 00000, "cannot identify/lock data file %s - see DBWR trace file"
// *Cause:  The background process was either unable to find one of the data 
//         files or failed to lock it because the file was already in use.
//         The database will prohibit access to this file but other files will
//         be unaffected. However the first instance to open the database will
//         need to access all online data files. Accompanying error from the
//         operating system describes why the file could not be identified.
// *Action: Have operating system make file available to database. Then either
//         open the database or do ALTER SYSTEM CHECK DATAFILES.
因為這個環境被折騰了不知道多少遍,反覆切換,反覆測試,我都不記得是哪些特殊的操作導致了這個問題了。所以這個問題還得從頭來分析。
首先檢視了一下/u02/dg11g/oradata/DG11G/test_new01.dbf
這個檔案,發現在檔案系統中竟然不存在。
但是在資料字典資訊中卻存在,使用的sql語句為,可以返回對應的記錄來。
select name,file# from v$datafile where file#=7;

從這個情況來看,可能是在備庫端誤刪除了這個資料檔案造成的。對於刪除的資料檔案我們怎麼來評估呢,首先得檢視主庫,檢視主庫中的檔案情況,但是在主庫中這個資料檔案和表空間壓根不存在。
這樣一來這個問題就有些棘手了。
如果能夠修復MRP的問題,看似這個問題就引刃而解,如果修復不了,可能這個dataguard就不可用了,可能得考慮重建一個物理備庫了。
對此我們採取保守態度,帶著一絲嘗試看看備庫能不能啟動到open read only狀態。
但是這三個操作的結果讓我有些迷茫了。
open不了,說可能需要恢復,恢復的檔案竟然是system01.dbf,嘗試recover until cancel也未果。
idle> alter database open read only;
alter database open read only
*
ERROR at line 1:
ORA-10458: standby database requires recovery
ORA-01196: file 1 is inconsistent due to a failed media recovery session
ORA-01110: data file 1: '/u02/dg11g/oradata/DG11G/system01.dbf'

idle> recover database until cancel;
ORA-00283: recovery session canceled due to errors
ORA-01610: recovery using the BACKUP CONTROLFILE option must be done

idle> alter database open read only;
alter database open read only
*
ERROR at line 1:
ORA-10458: standby database requires recovery
ORA-01196: file 1 is inconsistent due to a failed media recovery session
ORA-01110: data file 1: '/u02/dg11g/oradata/DG11G/system01.dbf'

對於這個問題,如果有一個sql語句能夠一針見血的解決問題就好了,自己在反覆嘗試之後發現還是有的,問題的解決思路就是先解決ORA-01157問題,然後dataguard中的MRP問題就能引刃而解。
對於ora-01157這個問題中的資料檔案在主庫中不存在,但是在備庫的資料字典中存在,我們可以直接在備庫中把資料字典中的問題先解決了。
idle> alter database datafile '/u02/dg11g/oradata/DG11G/test_new01.dbf' offline drop;
Database altered.
然後dataguard的日誌中就出現而來轉機,在後臺會去校驗這個檔案的問題,只是丟擲了一個警告。Warning: Datafile 7 (/u02/ora11g/oradata/TEST11G/test_new01.dbf) is offline during full database recovery and will not be recovered
然後MRP就正常啟動了。後臺開始使用歸檔檔案做資料恢復了。

alter database datafile '/u02/dg11g/oradata/DG11G/test_new01.dbf' offline drop
Completed: alter database datafile '/u02/dg11g/oradata/DG11G/test_new01.dbf' offline drop
Sat Jun 27 23:24:08 2015
ALTER DATABASE RECOVER  managed standby database disconnect from session
Attempt to start background Managed Standby Recovery process (DG11G)
Sat Jun 27 23:24:08 2015
MRP0 started with pid=25, OS id=8431
MRP0: Background Managed Standby Recovery process started (DG11G)
 started logmerger process
Sat Jun 27 23:24:13 2015
Managed Standby Recovery not using Real Time Apply
Parallel Media Recovery started with 2 slaves
Warning: Datafile 7 (/u02/ora11g/oradata/TEST11G/test_new01.dbf) is offline during full database recovery and will not be recovered
Waiting for all non-current ORLs to be archived...
All non-current ORLs have been archived.
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_121_880742847.dbf
Completed: ALTER DATABASE RECOVER  managed standby database disconnect from session
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_122_880742847.dbf
Sat Jun 27 23:24:31 2015
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_123_880742847.dbf
Recovery deleting file #7:'/u02/dg11g/oradata/DG11G/test_new01.dbf' from controlfile.
Deleted file /u02/dg11g/oradata/DG11G/test_new01.dbf
Recovery dropped tablespace 'TEST_NEW'
Recovery created file /u02/dg11g/oradata/DG11G/test_new01.dbf
Successfully added datafile 7 to media recovery
Datafile #7: '/u02/dg11g/oradata/DG11G/test_new01.dbf'
Recovery deleting file #7:'/u02/dg11g/oradata/DG11G/test_new01.dbf' from controlfile.
Deleted file /u02/dg11g/oradata/DG11G/test_new01.dbf
Recovery dropped tablespace 'TEST_NEW'
Recovery deleting file #7:'/u02/dg11g/oradata/DG11G/test_new01.dbf' from controlfile.
Deleted file /u02/dg11g/oradata/DG11G/test_new01.dbf
Recovery dropped tablespace 'TEST_NEW'
Recovery deleting file #7:'/u02/dg11g/oradata/DG11G/test_new01.dbf' from controlfile.
Deleted file /u02/dg11g/oradata/DG11G/test_new01.dbf
Recovery dropped tablespace 'TEST_NEW'
Media Recovery Log /u02/dg11g/switchover/DG11G/archivelog/1_124_880742847.dbf
Media Recovery Log /u02/dg11g/switchover/DG11G/archivelog/1_125_880742847.dbf
Recovery deleting file #7:'/u02/dg11g/oradata/DG11G/test_new01.dbf' from controlfile.
Deleted file /u02/dg11g/oradata/DG11G/test_new01.dbf
Recovery dropped tablespace 'TEST_NEW'
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_126_880742847.dbf
Sat Jun 27 23:24:49 2015
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_127_880742847.dbf
Sat Jun 27 23:25:01 2015
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_128_880742847.dbf
Sat Jun 27 23:25:17 2015
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_129_880742847.dbf
Sat Jun 27 23:25:29 2015

比較有意思的是檢視日誌可以看到,資料檔案被反覆建立刪除了很多次。最後還是以drop終止。
然後就開始使用一大堆的歸檔檔案做資料恢復了。

Sat Jun 27 23:28:30 2015
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_172_880742847.dbf
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_173_880742847.dbf
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_174_880742847.dbf
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_175_880742847.dbf
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_176_880742847.dbf
Sat Jun 27 23:28:40 2015
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_177_880742847.dbf
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_178_880742847.dbf
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_179_880742847.dbf
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_180_880742847.dbf
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_181_880742847.dbf
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_182_880742847.dbf
Sat Jun 27 23:28:52 2015
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_183_880742847.dbf
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_184_880742847.dbf

在主庫中檢視,redo的序列號185,備庫中的序列號是184。
sys@TEST11G> select sequence#,status from v$log;
 SEQUENCE# STATUS
---------- ----------------
       184 INACTIVE
       185 CURRENT
       183 INACTIVE

在備庫中檢視後臺程式的情況,可以看到MRP已經記錄在冊了。
idle> select process,status,sequence# from v$managed_standby;
PROCESS   STATUS        SEQUENCE#
--------- ------------ ----------
ARCH      CONNECTED             0
ARCH      CONNECTED             0
ARCH      CONNECTED             0
ARCH      CONNECTED             0
MRP0      WAIT_FOR_LOG        186

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/28211342/viewspace-2146222/,如需轉載,請註明出處,否則將追究法律責任。

相關文章