記得之前在一半技術一半生活中分享過一個設計，因為業務的需求，為了提高業務的處理效率，採用了根據業務的拆庫拆表的方式，類似下面的圖示。

開發團隊也很給力，幫我們協調了好的機器，加了記憶體，也在新業務2的環境上同步了表結構，抽取了部分資料，然後業務2就開始了緊張的測試，
透過這幾天的測試，發現系統的效能逐步穩定下來。忙完了這茬，趕緊來考慮搭建備庫。
自己也算是搭建過很多dataguard環境了，一般的環境中檢測dataguard搭建成功與否的一種方式就是使用dg broker來驗證，一條簡單的show configuration命令如果顯示SUCCESS則基本意味著備庫搭建成功。所以新申請的機器也沒有做過多的改動，感覺都是現成的了。
這個環境有一些特殊，特殊之處就是主庫為ASM儲存，備庫為普通檔案系統，所以主要的工作就是設定兩個convert引數了。使用dg broker能夠給予我們非常多的便利。這也是越來越依賴dg broker的原因，搭建備庫還是採用最經典的active dupliate方式。
> rman target sys@testbi auxiliary sys@stestbi nocatalog
>duplicate target database for standby from active database nofilenamecheck;
同步很快就完成了，然後我就開始設定dg broker的配置。
create configuration dg_testbi as
primary database is testbi
connect identifier is testbi;

add database stestbi as
connect identifier is stestbi
maintained as physical;
設定完畢，手工檢查show configuartion為success
DGMGRL> enable configuration;
Enabled.
DGMGRL> show configuration;
Configuration - dg_testbi
Protection Mode: MaxPerformance
Databases:
    testbi - Primary database
    stestbi - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS
看起來一切都在計劃和控制之中。準備手工，但是發現一個比較奇怪的問題，就是備庫是11gR2的，但是無法啟動到open階段。
手工嘗試啟動直接報錯。
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-10458: standby database requires recovery
ORA-01152: file 1 was not restored from a sufficiently old backup
ORA-01110: data file 1:
'/home/U01/app/oracle/oradata/testbi/datafile/system.407.899224793'
看這個情況是備份集出現了問題。
這個時候再次檢視dg broker的狀態就會有錯誤 Error: ORA-16724: cannot resolve gap for one or more standby databases
DGMGRL> show configuration;
Configuration - dg_testbi
Protection Mode: MaxPerformance
Databases:
    testbi - Primary database
      Error: ORA-16724: cannot resolve gap for one or more standby databases
    stestbi - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
ERROR
如此一來這個備庫還是有一些問題，嘗試檢視fal_client,fal_servre的設定也沒有發現任何問題。但是僥倖重新設定配置，竟然又成功了。
DGMGRL> remove configuration;
Removed configuration
DGMGRL> create configuration dg_testbi as
primary database is testbi
connect identifier is testbi;
Configuration "dg_testbi" created with primary database "testbi"
DGMGRL> add database stestbi as
connect identifier is stestbi
maintained as physical;
Database "stestbi" added
DGMGRL> enable configuration;
Enabled.
DGMGRL> show configuration;
Configuration - dg_testbi
Protection Mode: MaxPerformance
Databases:
    testbi - Primary database
    stestbi - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS
然後再次open，問題依舊，這可是11gR2的庫，ADG也要求不高，問題依舊是 Error: ORA-16724: cannot resolve gap for one or more standby databases
當然設定顯示為SUCCESS,我使用verbose的方式檢視備庫的情況，發現已經有了近4個半小時的延時。
DGMGRL> show database stestbi;
Database - stestbi
Role:            PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag:   (unknown)
Apply Lag:       4 hours 29 minutes 48 seconds
Real Time Query: OFF
Instance(s):
    testbi
Database Status:
SUCCESS
DGMGRL> DGMGRL> exit
這部分日誌就是不應用，從後臺日誌也可以看出，只用RFS工作，檢視MRP也沒有丟擲什麼錯誤來。
當然這個問題看起來蠻奇怪，還是需要反覆驗證，嘗試取消日誌應用，然後把備庫開啟到read only狀態，11gR2預設會把它再設定為real time apply的方式，從日誌裡也可以看出。
備庫中的alert日誌內容如下：
Managed Standby Recovery starting Real Time Apply
Media Recovery Waiting for thread 1 sequence 101
Wed Dec 30 23:00:34 2015
Standby crash recovery need archive log for thread 1 sequence 101 to continue.
Please verify that primary database is transporting redo logs to the standby database.
Wait timeout: thread 1 sequence 101
Standby crash recovery aborted due to error 16016.
Errors in file /home/U01/app/oracle/diag/rdbms/stestbi/testbi/trace/testbi_ora_3241.trc:
ORA-16016: archived log for thread 1 sequence# 101 unavailable
Recovery interrupted!
Completed standby crash recovery.
Signalling error 1152 for datafile 1!
Errors in file /home/U01/app/oracle/diag/rdbms/stestbi/testbi/trace/testbi_ora_3241.trc:
ORA-10458: standby database requires recovery
ORA-01152: file 1 was not restored from a sufficiently old backup
ORA-01110: data file 1: '/home/U01/app/oracle/oradata/testbi/datafile/system.407.899224793'
ORA-10458 signalled during: alter database open...
可以發現原來備庫中已經接收不到序列號為101的歸檔了。
在備庫中檢視，確實只有102開頭的歸檔了，那麼101的歸檔呢。
這個時候回過頭來再看，發現主庫竟然默默在執行著一個crontab 任務。而且觸發頻率較高。
0,15,30,45 * * * * $HOME/dbadmin/scripts/rm_archive.sh
檢視這個指令碼的內容，已經讓我心灰意冷。這個指令碼本身還是存在一些問題，算是直接刪除歸檔的節奏。也沒有判斷是否應用到備庫。
#!/bin/bash
. ~oracle/.bash_profile
rman target / <<EOF
CONFIGURE ARCHIVELOG DELETION POLICY TO none;
crosscheck archivelog all;
delete noprompt expired archivelog all;
delete noprompt archivelog until time "sysdate-1/12";
exit
EOF
當然我們需要修改一下。
至少得讓歸檔應用到備庫去。
CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON ALL STANDBY;
crosscheck archivelog all;
delete noprompt expired archivelog all;
delete noprompt archivelog until time "sysdate-1";
看來自己真是給自己埋了一個坑，自己也毫不猶豫就跳了進去，等回過頭來，發現又是一場白忙活，因為庫不是很大，如果統計庫幾個T,幾十個T，那就絕對會被耗掉意志。

11g備庫中碰到自己給自己埋的坑

相關文章