自己手頭有一套dataguard環境，因為也有些日子沒有用了，結果突然心血來潮準備啟動起來學習一下，突然發現在敲了命令 recover managed standby database disconnect from session之後,命令執行正常,但是後臺卻報了ora錯誤。
Sat Jun 27 23:16:39 2015
Recovery Slave PR00 previously exited with exception 1157
Errors in file /u02/dg11g/diag/rdbms/dg11g/DG11G/trace/DG11G_mrp0_6514.trc:
ORA-01157: cannot identify/lock data file 7 - see DBWR trace file
ORA-01110: data file 7: '/u02/dg11g/oradata/DG11G/test_new01.dbf'
MRP0: Background Media Recovery process shutdown (DG11G)
Sat Jun 27 23:16:39 2015
Completed: ALTER DATABASE RECOVER managed standby database disconnect from session
RFS[162]: Opened log for thread 1 sequence 171 dbid 1028247664 branch 880742847
RFS[161]: Opened log for thread 1 sequence 173 dbid 1028247664 branch 880742847
RFS[160]: Opened log for thread 1 sequence 172 dbid 1028247664 branch 880742847
透過上面的日誌我們可以看到，MRP程式是在做資料恢復的時候報了ora錯誤ora-01157
但是RFS還是沒有問題，RFS主要是從主庫來傳輸歸檔檔案的，可以看到能夠正常從主庫中傳輸歸檔日誌，sequence#號為171,173,172的歸檔日誌都傳輸到了備庫。

本來這個問題沒有引起多大的關注，想可能是哪些歸檔檔案沒有用到導致的，但是發現MRP壓根用不了。所以儘管歸檔傳輸完成了，但是資料變更還是應用不到備庫。
檢視v$archive_gap沒有任何記錄，說明沒有歸檔日誌apply的時候出現問題。

我們來看看這個ora問題的一些明細資訊,提示是在7號資料檔案的地方報了ora-01157錯誤。
Errors in file /u02/dg11g/diag/rdbms/dg11g/DG11G/trace/DG11G_mrp0_6514.trc:
ORA-01157: cannot identify/lock data file 7 - see DBWR trace file
ORA-01110: data file 7: '/u02/dg11g/oradata/DG11G/test_new01.dbf'
從官方對於這個問題的描述來看，似乎是資料檔案出了問題。
$ oerr ora 01157
01157, 00000, "cannot identify/lock data file %s - see DBWR trace file"
// *Cause: The background process was either unable to find one of the data
// files or failed to lock it because the file was already in use.
// The database will prohibit access to this file but other files will
// be unaffected. However the first instance to open the database will
// need to access all online data files. Accompanying error from the
// operating system describes why the file could not be identified.
// *Action: Have operating system make file available to database. Then either
// open the database or do ALTER SYSTEM CHECK DATAFILES.
因為這個環境被折騰了不知道多少遍，反覆切換，反覆測試，我都不記得是哪些特殊的操作導致了這個問題了。所以這個問題還得從頭來分析。
首先檢視了一下/u02/dg11g/oradata/DG11G/test_new01.dbf 這個檔案，發現在檔案系統中竟然不存在。
但是在資料字典資訊中卻存在，使用的sql語句為，可以返回對應的記錄來。
select name,file# from v$datafile where file#=7;

從這個情況來看，可能是在備庫端誤刪除了這個資料檔案造成的。對於刪除的資料檔案我們怎麼來評估呢，首先得檢視主庫，檢視主庫中的檔案情況,但是在主庫中這個資料檔案和表空間壓根不存在。
這樣一來這個問題就有些棘手了。
如果能夠修復MRP的問題，看似這個問題就引刃而解，如果修復不了，可能這個dataguard就不可用了，可能得考慮重建一個物理備庫了。
對此我們採取保守態度，帶著一絲嘗試看看備庫能不能啟動到open read only狀態。
但是這三個操作的結果讓我有些迷茫了。
open不了，說可能需要恢復，恢復的檔案竟然是system01.dbf，嘗試recover until cancel也未果。
idle> alter database open read only;
alter database open read only
*
ERROR at line 1:
ORA-10458: standby database requires recovery
ORA-01196: file 1 is inconsistent due to a failed media recovery session
ORA-01110: data file 1: '/u02/dg11g/oradata/DG11G/system01.dbf'

idle> recover database until cancel;
ORA-00283: recovery session canceled due to errors
ORA-01610: recovery using the BACKUP CONTROLFILE option must be done

idle> alter database open read only;
alter database open read only
*
ERROR at line 1:
ORA-10458: standby database requires recovery
ORA-01196: file 1 is inconsistent due to a failed media recovery session
ORA-01110: data file 1: '/u02/dg11g/oradata/DG11G/system01.dbf'

對於這個問題，如果有一個sql語句能夠一針見血的解決問題就好了，自己在反覆嘗試之後發現還是有的，問題的解決思路就是先解決ORA-01157問題，然後dataguard中的MRP問題就能引刃而解。
對於ora-01157這個問題中的資料檔案在主庫中不存在，但是在備庫的資料字典中存在，我們可以直接在備庫中把資料字典中的問題先解決了。
idle> alter database datafile '/u02/dg11g/oradata/DG11G/test_new01.dbf' offline drop;
Database altered.
然後dataguard的日誌中就出現而來轉機，在後臺會去校驗這個檔案的問題，只是丟擲了一個警告。Warning: Datafile 7 (/u02/ora11g/oradata/TEST11G/test_new01.dbf) is offline during full database recovery and will not be recovered
然後MRP就正常啟動了。後臺開始使用歸檔檔案做資料恢復了。

alter database datafile '/u02/dg11g/oradata/DG11G/test_new01.dbf' offline drop
Completed: alter database datafile '/u02/dg11g/oradata/DG11G/test_new01.dbf' offline drop
Sat Jun 27 23:24:08 2015
ALTER DATABASE RECOVER managed standby database disconnect from session
Attempt to start background Managed Standby Recovery process (DG11G)
Sat Jun 27 23:24:08 2015
MRP0 started with pid=25, OS id=8431
MRP0: Background Managed Standby Recovery process started (DG11G)
started logmerger process
Sat Jun 27 23:24:13 2015
Managed Standby Recovery not using Real Time Apply
Parallel Media Recovery started with 2 slaves
Warning: Datafile 7 (/u02/ora11g/oradata/TEST11G/test_new01.dbf) is offline during full database recovery and will not be recovered
Waiting for all non-current ORLs to be archived...
All non-current ORLs have been archived.
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_121_880742847.dbf
Completed: ALTER DATABASE RECOVER managed standby database disconnect from session
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_122_880742847.dbf
Sat Jun 27 23:24:31 2015
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_123_880742847.dbf
Recovery deleting file #7:'/u02/dg11g/oradata/DG11G/test_new01.dbf' from controlfile.
Deleted file /u02/dg11g/oradata/DG11G/test_new01.dbf
Recovery dropped tablespace 'TEST_NEW'
Recovery created file /u02/dg11g/oradata/DG11G/test_new01.dbf
Successfully added datafile 7 to media recovery
Datafile #7: '/u02/dg11g/oradata/DG11G/test_new01.dbf'
Recovery deleting file #7:'/u02/dg11g/oradata/DG11G/test_new01.dbf' from controlfile.
Deleted file /u02/dg11g/oradata/DG11G/test_new01.dbf
Recovery dropped tablespace 'TEST_NEW'
Recovery deleting file #7:'/u02/dg11g/oradata/DG11G/test_new01.dbf' from controlfile.
Deleted file /u02/dg11g/oradata/DG11G/test_new01.dbf
Recovery dropped tablespace 'TEST_NEW'
Recovery deleting file #7:'/u02/dg11g/oradata/DG11G/test_new01.dbf' from controlfile.
Deleted file /u02/dg11g/oradata/DG11G/test_new01.dbf
Recovery dropped tablespace 'TEST_NEW'
Media Recovery Log /u02/dg11g/switchover/DG11G/archivelog/1_124_880742847.dbf
Media Recovery Log /u02/dg11g/switchover/DG11G/archivelog/1_125_880742847.dbf
Recovery deleting file #7:'/u02/dg11g/oradata/DG11G/test_new01.dbf' from controlfile.
Deleted file /u02/dg11g/oradata/DG11G/test_new01.dbf
Recovery dropped tablespace 'TEST_NEW'
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_126_880742847.dbf
Sat Jun 27 23:24:49 2015
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_127_880742847.dbf
Sat Jun 27 23:25:01 2015
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_128_880742847.dbf
Sat Jun 27 23:25:17 2015
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_129_880742847.dbf
Sat Jun 27 23:25:29 2015

比較有意思的是檢視日誌可以看到，資料檔案被反覆建立刪除了很多次。最後還是以drop終止。
然後就開始使用一大堆的歸檔檔案做資料恢復了。

Sat Jun 27 23:28:30 2015
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_172_880742847.dbf
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_173_880742847.dbf
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_174_880742847.dbf
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_175_880742847.dbf
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_176_880742847.dbf
Sat Jun 27 23:28:40 2015
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_177_880742847.dbf
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_178_880742847.dbf
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_179_880742847.dbf
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_180_880742847.dbf
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_181_880742847.dbf
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_182_880742847.dbf
Sat Jun 27 23:28:52 2015
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_183_880742847.dbf
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_184_880742847.dbf

在主庫中檢視，redo的序列號185,備庫中的序列號是184。
sys@TEST11G> select sequence#,status from v$log;
SEQUENCE# STATUS
---------- ----------------
184 INACTIVE
185 CURRENT
183 INACTIVE

在備庫中檢視後臺程式的情況，可以看到MRP已經記錄在冊了。
idle> select process,status,sequence# from v$managed_standby;
PROCESS STATUS SEQUENCE#
--------- ------------ ----------
ARCH CONNECTED 0
ARCH CONNECTED 0
ARCH CONNECTED 0
ARCH CONNECTED 0
MRP0 WAIT_FOR_LOG 186

dataguard中MRP無法啟動的問題分析和解決

相關文章