現象:
ADG無法切換:驗證時就報錯 ORA-16467
記錄問題,順便展現一次troubleshooting的心路歷程。
具體查詢:
在主庫操作,
@primary
切換驗證:
alter database switchover to demorac verify;
報錯ORA-16467:
SQL> alter database switchover to demorac verify;
alter database switchover to demorac verify
*
ERROR at line 1:
ORA-16467: switchover target is not synchronized
主庫alert告警日誌:
ORA-16467 signalled during: alter database switchover to demorac verify...
主庫傳輸鏈路並沒有報錯:
SQL> select error from v$archive_dest where dest_id= 2;
ERROR
-----------------------------------------------------------------
但是,如果去查v$archive_dest_status,就會發現問題,說有可解決的GAP:
SQL> select dest_id, status, gap_status from v$archive_dest_status where dest_id = 2;
DEST_ID STATUS GAP_STATUS
---------- --------- ------------------------
2 VALID RESOLVABLE GAP
但是去備庫查詢,
@standby,
ADG並沒有任何延遲:
SQL> select * from v$dataguard_stats;
SOURCE_DBID SOURCE_DB_ NAME VALUE UNIT TIME_COMPUTED DATUM_TIME CON_ID
----------- ---------- ---------------------- ---------------------- ------------------------------ ------------------------------ ------------------------------ ----------
0 transport lag +00 00:00:00 day(2) to second(0) interval 05/11/2023 18:16:43 05/11/2023 18:16:41 0
0 apply lag +00 00:00:00 day(2) to second(0) interval 05/11/2023 18:16:43 05/11/2023 18:16:41 0
0 apply finish time day(2) to second(3) interval 05/11/2023 18:16:43 0
0 estimated startup time 18 second 05/11/2023 18:16:43 0
查MOS文件資料,有一個bug:
- Bug 33663444 - DataGuard: "alter database switchover verify" fails with ORA-16467 (Doc ID 33663444.8)
可是很快也排除掉:現象細節不完全匹配,另外這個bug正好在19.16已經修復:
[oracle@bogon ~]$ $ORACLE_HOME/OPatch/opatch lsinventory|grep 33663444
29353271, 29706141, 26352569, 33663444, 30476768, 30092280, 30843271
但這個文件給的一些解釋中也得到了一些啟發:
REDISCOVERY INFORMATION:
SELECT GAP_STATUS FROM V$ARCHIVE_DEST_STATUS with show unresolvable gap.
Workaround
temporarily re-enable the disabled redo thread
首先,我們查V$ARCHIVE_DEST_STATUS已經確認是resolvable gap,不匹配但是也有問題。
然後redo thread的re-enable,這個workaround讓我去想到查詢redo的thread,結果發現果然有些異常:
SQL> select * from v$log;
GROUP# THREAD# SEQUENCE# BYTES BLOCKSIZE MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIM NEXT_CHANGE# NEXT_TIME CON_ID
---------- ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- --------- ------------ --------- ----------
1 1 1132 209715200 512 1 NO CURRENT 44911942 11-MAY-23 9.2954E+18 0
2 1 1130 209715200 512 1 YES INACTIVE 44901958 11-MAY-23 44911931 11-MAY-23 0
3 1 1131 209715200 512 1 YES INACTIVE 44911931 11-MAY-23 44911942 11-MAY-23 0
4 2 0 104857600 512 1 YES UNUSED 0 0 0
5 2 0 104857600 512 1 YES UNUSED 0 0 0
我這裡目前主庫是單例項,而備庫才是RAC,可是,為何主庫的redo居然會有 thread#=2 的redo?
雖然都是unused,但出現在這裡就很奇怪!
不知道誰動了這個環境,想不起來做過這樣的測試。但可以肯定的是,完全可以把這個不該存在的thread刪除掉!
直接刪除會報錯:
alter database drop logfile group 4;
alter database drop logfile group 5;
SQL> alter database drop logfile group 4;
alter database drop logfile group 4
*
ERROR at line 1:
ORA-01567: dropping log 4 would leave less than 2 log files for instance UNNAMED_INSTANCE_2 (thread 2)
ORA-00312: online log 4 thread 2: '/flash/fast_recovery_area/DEMO/onlinelog/o1_mf_4_kxn73zny_.log'
SQL> alter database drop logfile group 5;
alter database drop logfile group 5
*
ERROR at line 1:
ORA-01567: dropping log 5 would leave less than 2 log files for instance UNNAMED_INSTANCE_2 (thread 2)
ORA-00312: online log 5 thread 2: '/flash/fast_recovery_area/DEMO/onlinelog/o1_mf_5_kxn73zqd_.log'
刪除不掉就嘗試去先禁用掉這個沒有用的thread 2,然後再次嘗試刪除:
alter database disable thread 2;
SQL> alter database disable thread 2;
Database altered.
SQL> select * from v$log;
GROUP# THREAD# SEQUENCE# BYTES BLOCKSIZE MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIM NEXT_CHANGE# NEXT_TIME CON_ID
---------- ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- --------- ------------ --------- ----------
1 1 1132 209715200 512 1 NO CURRENT 44911942 11-MAY-23 9.2954E+18 0
2 1 1130 209715200 512 1 YES INACTIVE 44901958 11-MAY-23 44911931 11-MAY-23 0
3 1 1131 209715200 512 1 YES INACTIVE 44911931 11-MAY-23 44911942 11-MAY-23 0
4 2 0 104857600 512 1 YES UNUSED 0 0 0
5 2 0 104857600 512 1 YES UNUSED 0 0 0
SQL> alter database drop logfile group 4;
Database altered.
SQL> alter database drop logfile group 5;
Database altered.
SQL>
SQL>
SQL> select * from v$log;
GROUP# THREAD# SEQUENCE# BYTES BLOCKSIZE MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIM NEXT_CHANGE# NEXT_TIME CON_ID
---------- ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- --------- ------------ --------- ----------
1 1 1132 209715200 512 1 NO CURRENT 44911942 11-MAY-23 9.2954E+18 0
2 1 1130 209715200 512 1 YES INACTIVE 44901958 11-MAY-23 44911931 11-MAY-23 0
3 1 1131 209715200 512 1 YES INACTIVE 44911931 11-MAY-23 44911942 11-MAY-23 0
刪除成功!這次想來總該可以了吧?
SQL> alter database switchover to demorac verify;
alter database switchover to demorac verify
*
ERROR at line 1:
ORA-16467: switchover target is not synchronized
額,還是不行,但感覺再刺激刺激就OK了,繼續嘗試之前的十八般武藝(主要還是多切幾次日誌):
SQL> select dest_id, status, gap_status from v$archive_dest_status where dest_id = 2;
DEST_ID STATUS GAP_STATUS
---------- --------- ------------------------
2 VALID RESOLVABLE GAP
SQL>
SQL> alter system switch logfile;
System altered.
SQL> select dest_id, status, gap_status from v$archive_dest_status where dest_id = 2;
DEST_ID STATUS GAP_STATUS
---------- --------- ------------------------
2 VALID RESOLVABLE GAP
SQL> alter system archive log current;
System altered.
SQL> select dest_id, status, gap_status from v$archive_dest_status where dest_id = 2;
DEST_ID STATUS GAP_STATUS
---------- --------- ------------------------
2 VALID RESOLVABLE GAP
SQL> alter system switch logfile;
alter system switch logfile;
System altered.
SQL>
System altered.
SQL>
SQL>
SQL>
SQL> alter system switch logfile;
System altered.
SQL> select dest_id, status, gap_status from v$archive_dest_status where dest_id = 2;
DEST_ID STATUS GAP_STATUS
---------- --------- ------------------------
2 VALID NO GAP
SQL>
哎呀,直接提示沒有GAP了,趕緊嘗試繼續切換。。
SQL> alter database switchover to demorac verify;
alter database switchover to demorac verify
*
ERROR at line 1:
ORA-16475: succeeded with warnings, check alert log for more details
SQL> !date
Thu May 11 18:36:31 CST 2023
額?成功了,但是有警告,看告警日誌又是一個新的ORA-16475 錯誤。
2023-05-11T18:36:14.906996+08:00
alter database switchover to demorac verify
2023-05-11T18:36:15.094516+08:00
SWITCHOVER VERIFY: Send VERIFY request to switchover target DEMORAC
SWITCHOVER VERIFY WARNING: switchover target temporary files are not the same with the primary. See switchover target's alert log for details.
ORA-16475 signalled during: alter database switchover to demorac verify...
哎呀,去看看備庫的alert日誌:
2023-05-11T18:41:29.407685+08:00
SWITCHOVER VERIFY BEGIN
SWITCHOVER VERIFY WARNING: primary database has 5 temporary files, this database has 4 temporary files. More temp files should be added to this database.
SWITCHOVER VERIFY COMPLETE
注意這個時間不太一致是因為兩個機器時間不一樣(差了5分鐘),可以先不用管。
看問題本身是說臨時檔案,這無所謂,切換後可以自動建立。
快快來一把久違的切換吧!
--主庫執行成功
alter database switchover to demorac;
--新主庫demorac
alter database open;
--新備庫demo
startup
recover managed standby database disconnect;
具體ADG切換參考:
嗯,終於OK了,也感覺肚子餓了,去點餐了。