一次dg 因密碼檔案與gap引起歸檔日誌無法應用的處理

還不算暈發表於2015-12-30
問題描述:
Linux上11.2.0.4.0 RAC-->RAC做的DG。
在主庫一個節點使用alter user sys命令對SYS使用者進行了更新,之後因密碼問題日誌無法同步。
發現問題後重新同步了密碼檔案;但是備庫無法通過FAL配置獲取節點2的歸檔日誌,GAP機制未生產。


解決方法:

密碼檔案的錯誤:
在以前也遇到過11.2.0.3時只能通過傳輸密碼檔案同步,通過相同命令在不同節點建立密碼檔案無法同步的問題。
本次同樣通過傳輸密碼檔案同步。

重新配置FAL後,仍無法獲取歸檔日誌:

此時的思路是可以嘗試手動從主庫的節點scp傳輸歸檔日誌過來進行註冊和恢復或者主庫重新關閉或開啟相應的log目錄,或者使用一下重啟備庫,重新啟動相應的程式。

涉及的歸檔較多,因此先嚐試了重啟DG備庫,幸運的是通過重啟DG的備庫的方式解決了此問題。

此資料庫還有日誌切換頻繁(業務時段在每小時30次左右,高峰時達到50),此處就不再多說了。


關於GAP機制:
Dataguard的Gap處理機制是從9i開始設定fal_server和fal_client。
Oracle提供了2種log gap的檢測和處理機制。對於gap的處理,fal_*引數在某些情況下並不是必須配置的。
   1.Automatic Gap Resolution
   2.FAL Gap Resolution
1.Automatic Gap Resolution
    從9i開始,Dataguard就引入了自動日誌缺失檢測的機制,無需設定任何fal_*引數,Datguard便執行在這種機制下。
當Lgwr和Arch程式傳送redo/archive到standby端的時候,當前log sequence會同standby端RFS程式上次接收到的log sequence做比較,如果發現二者有斷檔,RFS會傳送請求到primary端,要求主庫傳送缺失的日誌。從9iR2開始,Automatic gap resolution 功能上得到增強。主庫上的ARCH程式會每分鐘檢查備庫上的日誌gap情況並做相應處理。
2.FAL Gap Resolution
    FAL是Fetch Archive Log的縮寫,通過配置FALserver和FALclient實現Gap檢測的一種機制。當備端的RFS程式收到
archivelog的時候,更新standby的控制檔案以記錄這些歸檔資訊,一旦MRP發現控制檔案被更新,會進行Recover/Apply log。如果MRP發現所需的日誌出現缺失或者所需的日誌檔案不可用(損壞或者被物理移除等),會通過FAL來傳送相應的處理請求。MRP是standby端的恢復程式,不像RFS程式一樣與parimary有直接關聯,通過FAL的引數配置來主動請求primary處理gap。
 FAL_Server和fal_client是standby端的引數配置,考慮到switchover的平滑性,可考慮在primary 端也做預先設定。
FAL_SERVER: 指向primary端的Oracle Net service
FAL_CLIENTL: 指向standby端的Oracle Net service   
在9iR2以上版本中,Oracle首先嚐試使用FAL Gap Resolution 進行GAP處理,當發現FAL機制並沒有配置生效的時候,
進而嘗試使用Automatic Gap Resolution進行處理。
   對於一些cascade dataguard架構,FAL Gap Resolution是更好的gap處理方式。另外,Automatic gap resolution
在某些版本的dg環境下存在bug(比如bug 5929647等),需要不得不配置FAL引數。

-----------------------------------------------------------------------------------

具體的問題資訊:
1.密碼問題時的報錯:
主庫:
Wed Dec 23 20:06:43 2015
Error 1017 received logging on to the standby
------------------------------------------------------------
Check that the primary and standby are using a password file
and remote_login_passwordfile is set to SHARED or EXCLUSIVE,
and that the SYS password is same in the password files.
      returning error ORA-16191
----------------------------------------------------------

同步了密碼檔案(使用RAC主庫一個節點上的傳輸到其它節點。
--遇到過11.2.0.3時只能通過傳輸密碼檔案,通過相同命令在不同節點建立密碼檔案無法同步的問題。

2.FAL機制沒有正常工作
同步密碼檔案後,出現新問題,報錯如下:
--在此之前已經設定過正確 的fal_server引數:
Wed Dec 23 19:56:05 2015
ALTER SYSTEM SET fal_server='primary','primary2' SCOPE=BOTH;
備庫啟動日誌應用後日志:
Wed Dec 23 19:11:19 2015
Media Recovery Log +DATA/hnplusdb/arch/1_15423_879093457.dbf
Media Recovery Log +DATA/hnplusdb/arch/2_12297_879093457.dbf
Media Recovery Log +DATA/hnplusdb/arch/1_15424_879093457.dbf
Media Recovery Log +DATA/hnplusdb/arch/1_15425_879093457.dbf
Media Recovery Waiting for thread 2 sequence 12298
Fetching gap sequence in thread 2, gap sequence 12298-12397
Wed Dec 23 19:13:20 2015
FAL[client]: Failed to request gap sequence
 GAP - thread 2 sequence 12298-12397
 DBID 1714301265 branch 879093457
FAL[client]: All defined FAL servers have been attempted.
------------------------------------------------------------
Wed Dec 23 20:09:19 2015
alter database recover managed standby database using current logfile disconnect from session
Attempt to start background Managed Standby Recovery process (hnplusdb1)
Wed Dec 23 20:09:19 2015
MRP0 started with pid=43, OS id=28250
MRP0: Background Managed Standby Recovery process started (hnplusdb1)
 started logmerger process
Wed Dec 23 20:09:24 2015
Managed Standby Recovery starting Real Time Apply
Parallel Media Recovery started with 64 slaves
Waiting for all non-current ORLs to be archived...
All non-current ORLs have been archived.
Wed Dec 23 20:09:25 2015
Media Recovery Waiting for thread 2 sequence 12298
Fetching gap sequence in thread 2, gap sequence 12298-12397
Completed: alter database recover managed standby database using current logfile disconnect from session
Wed Dec 23 20:11:28 2015
FAL[client]: Failed to request gap sequence
 GAP - thread 2 sequence 12298-12397
 DBID 1714301265 branch 879093457
FAL[client]: All defined FAL servers have been attempted.
----------------------------------------------------------

查詢最初報錯時的資訊:
Tue Dec 22 19:47:00 2015
RFS[3]: No standby redo logfiles available for thread 2
RFS[3]: Opened log for thread 2 sequence 12296 dbid 1714301265 branch 879093457
Archived Log entry 13831 added for thread 2 sequence 12296 rlc 879093457 ID 0x662d8951 dest 2:
Tue Dec 22 19:48:00 2015
RFS[3]: No standby redo logfiles available for thread 2
RFS[3]: Opened log for thread 2 sequence 12297 dbid 1714301265 branch 879093457
Archived Log entry 13832 added for thread 2 sequence 12297 rlc 879093457 ID 0x662d8951 dest 2:
Tue Dec 22 19:49:00 2015
RFS[3]: No standby redo logfiles available for thread 2
Creating archive destination file : +DATA/hnplusdb/arch/2_12298_879093457.dbf (63502 blocks)
Tue Dec 22 19:49:51 2015
Unable to create archive log file '+DATA/hnplusdb/arch/2_12290_879093457.dbf'
ARC1: Error 19504 Creating archive log file to '+DATA/hnplusdb/arch/2_12290_879093457.dbf'
ARCH: Archival stopped, error occurred. Will continue retrying
ORACLE Instance hnplusdb1 - Archival Error
ORA-16038: log 12 sequence# 12290 cannot be archived
ORA-19504: failed to create file ""
ORA-00312: online log 12 thread 2: '+DATA/hnplusdb/onlinelog/slog6.log'
Tue Dec 22 19:49:51 2015
ARC3: Archiving not possible: error count exceeded
ARCH: Archival stopped, error occurred. Will continue retrying
ORACLE Instance hnplusdb1 - Archival Error


在主庫進行查詢,相應的日誌都沒有被刪除,很幸運,接下來就是處理GAP的問題了。
------
通過如下反覆啟動日誌應用無法解決的(問題關鍵是主庫沒有傳輸過來或者說是備庫的GAP機制也無法連過去獲取檔案)
alter database recover managed standby database using current logfile disconnect from session
alter database recover managed standby database cancel
重啟備庫後解決:
Physical Standby Database mounted.
Lost write protection disabled
ARC2: Becoming the active heartbeat ARCH
ARC2: Becoming the active heartbeat ARCH
Completed: ALTER DATABASE   MOUNT
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Wed Dec 23 20:21:32 2015
Using STANDBY_ARCHIVE_DEST parameter default value as +DATA/hnplusdb/arch
Wed Dec 23 20:21:33 2015
RFS[1]: Assigned to RFS process 34331
RFS[1]: Opened log for thread 1 sequence 15578 dbid 1714301265 branch 879093457
Wed Dec 23 20:21:33 2015
RFS[2]: Assigned to RFS process 34329
RFS[2]: Opened log for thread 1 sequence 15577 dbid 1714301265 branch 879093457
RFS[3]: Assigned to RFS process 34318
RFS[3]: Opened log for thread 1 sequence 15579 dbid 1714301265 branch 879093457
Wed Dec 23 20:21:33 2015
RFS[4]: Assigned to RFS process 34320
RFS[4]: Opened log for thread 2 sequence 12300 dbid 1714301265 branch 879093457
Wed Dec 23 20:21:33 2015
RFS[5]: Assigned to RFS process 34337
RFS[5]: Opened log for thread 2 sequence 12298 dbid 1714301265 branch 879093457
Wed Dec 23 20:21:33 2015

此時主庫的節點2已經傳日誌過來。
啟動日誌應用
Wed Dec 23 20:25:32 2015
alter database recover managed standby database using current logfile disconnect from session

等待主、備資料庫日誌同步、一致即可。




相關文章