ORACLE ADG 最大可用模式下例項啟動失敗分析

yingyifeng306發表於2022-04-15

一、系統環境

伺服器:X86兩節點,HP儲存

作業系統:redhat6.9

資料庫版本:11.2.0.4


二、故障介紹

RAC生產庫一號節點物理機當機後,無法啟動例項。二號節點服務正常執行,Dataguard同步正常,未中斷。

故障節點硬重啟後,GI服務正常啟動,但資料庫例項啟動失敗

告警如下:

ARC3 started with pid=38, OS id=9473

ARC1: Archival started

ARC2: Archival started

ARC1: Becoming the 'no FAL' ARCH

ARC1: Becoming the 'no SRL' ARCH

ARC2: Becoming the heartbeat ARCH

LGWR: Primary database is in MAXIMUM AVAILABILITY mode

LGWR: Destination LOG_ARCHIVE_DEST_1 is not serviced by LGWR

ARC3: Archival started

ARC0: STARTING ARCH PROCESSES COMPLETE

Destination LOG_ARCHIVE_DEST_2 is UNSYNCHRONIZED

LGWR: Minimum of 1 applicable standby database required

Errors in file /oracle/app/diag/rdbms/orcl/orcl2/trace/orcl2_lgwr_9361.trc:

ORA-16072: a minimum of one standby database destination is required

LGWR (ospid: 9361): terminating the instance due to error 16072

Thu Apr 14 20:58:24 2022

System state dump requested by (instance=2, osid=9361 (LGWR)), summary=[abnormal instance termination].

System State dumped to trace file /oracle/app/diag/rdbms/orcl/orcl2/trace/orcl2_diag_9335_20220414205824.trc

Dumping diagnostic data in directory=[cdmp_20220414205824], requested by (instance=2, osid=9361 (LGWR)), summary=[abnormal instance termination].

Instance terminated by LGWR, pid = 9361


三、處理過程

可從報錯

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

LGWR: Primary database is in MAXIMUM AVAILABILITY mode

LGWR: Destination LOG_ARCHIVE_DEST_1 is not serviced by LGWR

Destination LOG_ARCHIVE_DEST_2 is UNSYNCHRONIZED

LGWR: Minimum of 1 applicable standby database required

Errors in file /oracle/app/diag/rdbms/orcl/orcl2/trace/orcl2_lgwr_9361.trc:

ORA-16072: a minimum of one standby database destination is required

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

中看到當前Dataguard執行在MAXIMUM AVAILABILITY模式,LOG_ARCHIVE_DEST_2為非同步狀態,要求到少有一個standby database destination是正常的。當前條件不滿足導致例項啟動時被LGWR中斷。

但此問題比較詭異,當前例項2正常在執行,dataguard同步是正常的,未中斷,按道理例項1可正常啟動並加入叢集。

為儘快恢復業務,臨時處理方法如下:

1)將dataguard同步模式降級為MAXIMIZE PERFORMANCE

ALTER DATABASE SET STANDBY TO MAXIMIZE PERFORMANCE;

2)啟動故障例項

startup


完成上述修改後,例項1正常啟動並加入叢集


四、 MAXIMUM AVAILABILITY下例項啟動故障分析

檢查生產DG相關引數如下:

log_archive_config string

log_archive_dest_2 string SERVICE=orclstd LGWR SYNC VALI

D_FOR=(ONLINE_LOGFILES,PRIMARY

_ROLE)

dg_broker_config_file1 string /oracle/app/product/11.2.0/db_

1/dbs/dr1orcl.dat

dg_broker_config_file2 string /oracle/app/product/11.2.0/db_

1/dbs/dr2orcl.dat


五、疑問:

log_archive_config 引數未按規範配置是否是導致MAXIMUM AVAILABILITY模式下RAC其中一節點重啟後不能再正常啟動的原因?


六、測試:

測試一:

1)在測試環境中,去掉log_archive_config引數,並修改DG同步模式為MAXIMUM AVAILABILITY

2)重啟其中一個例項遇到相同故障

測試二:

1)配置上log_archive_config DG_CONFIG=(orcl,orclstd),並修改DG同步模式為MAXIMUM AVAILABILITY

2)重啟其中一個例項,可正常啟動。


七、總結

初始化引數LOG_ARCHIVE_CONFIG用於控制傳送歸檔日誌到遠端位置、接收遠端歸檔日誌,搭建ADG環境時,可以配置也可以不配置,ADG都可正常執行。

但當同步模式為MAXIMUM AVAILABILITY模式時,

1)RAC環境會遇到例項重啟後無法正常啟動的情況。

2)無法線上從MAXIMIZE PERFORMANCE切換為MAXIMIZE AVAILABILITY,提示需要mount狀態進行模式切換。



來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/23732248/viewspace-2887275/,如需轉載,請註明出處,否則將追究法律責任。

相關文章