【DataGuard】不能沒有你，我的Standby ——Oracle DataGuard最大保護模式故障實驗

dbhelper發表於2014-11-27

不能沒有你，我的Standby

——Oracle DataGuard最大保護模式故障實驗

文章摘要：通過實驗模擬DataGuard在最大保護模式下，網路故障導致所有物理備庫（Physical Standby）不可用時，對主庫會產生的影響。

Data Guard 提供三種資料保護模式：最大保護（Maximum Protection），最大可用（Maximum Availability）和最大效能（Maximum Performance）。
最大保護（Maximum Protection）能夠確保絕無資料丟失。但要實現這一步當然是有代價的，它要求所有的事務在提交前其REDO不僅被寫入到本地的Online Redologs，還要同時寫入到Standby資料庫的Standby Redologs，並確認REDO資料至少在一個Standby資料庫中可用（如果有多個的話），然後才會在Primary資料庫上提交事物。為了確保資料不丟失，如果出現故障導致主庫不能將redo日誌寫入至少一個物理備庫中時，主庫會被關閉。

【Oracle11g官方文件中關於Dataguard 最大保護模式的介紹】

Maximum Protection
This protection mode ensures that no data loss will occur if the primary database fails. To provide this level of protection, the redo data needed to recover a transaction must be written to both the online redo log and to the standby redo log on at least one synchronized standby database before the transaction commits.To ensure that data loss cannot occur, the primary database will shut down, rather than continue processing transactions, if it cannot write its redo stream to at least one synchronized standby database.

【實驗環境】
Red Hat Enterprise Linux Server release 5.4
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0

【主庫、物理備庫結構資訊】

【實驗過程】

一、查詢當前主庫、物理備庫保護模式

主庫Pirmary

select database_role,open_mode,protection_mode,protection_level from v$database;

物理備庫Physical Standby

select database_role,open_mode,protection_mode,protection_level from v$database;

二、模擬故障（網路故障導致主庫與備庫通訊中斷或物理備庫當機）

首先在最大保護模式下，嘗試停止主庫的遠端歸檔，是不允許修改的。
alter system set log_archive_dest_state_2=deffer;

因為當前實驗環境只有一個物理備庫，可以通過down掉物理備庫網路卡，來模擬主庫與物理備庫之間網路故障或物理備庫當機等情況。
關閉物理備庫網路卡： ifconfig eth0 down

三、監測主庫狀態

關閉物理備庫網路卡時，跟蹤主庫告警日誌。
tail -f /u01/app/oracle/diag/rdbms/beijing/BJ/trace/alert_BJ.log
告警日誌中開始有報錯資訊

Thu Jul 17 12:11:04 2014
ORA-16198: LGWR received timedout error from KSR
LGWR: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (16198)
ORA-16198: LGWR received timedout error from KSR
Thu Jul 17 12:11:14 2014
NSS2 started with pid=20, OS id=4477
Thu Jul 17 12:11:29 2014
***********************************************************************
Fatal NI connect error 12543, connecting to:
(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=shanghai.lxh.net)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=shanghai)(CID=(PROGRAM=oracle)(HOST=beijing.lxh.net)(USER=oracle))))
VERSION INFORMATION:
TNS for Linux: Version 11.2.0.3.0 - Production
TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.3.0 - Production
Time: 17-JUL-2014 12:11:29
Tracing not turned on.
Tns error struct:
ns main err code: 12543
TNS-12543: TNS:destination host unreachable
ns secondary err code: 12560
nt main err code: 513
TNS-00513: Destination host unreachable
nt secondary err code: 113
nt OS err code: 0
***********************************************************************

此時在主庫上模擬業務操作會處於掛起狀態無發成功
create table scott.test2 as select * from scott.emp;
跟蹤日誌中不斷更新報錯資訊，主庫無法與物理備庫通訊，最終導致主庫被關閉。

主庫告警日誌顯示，所有Standby database不可達，需要關閉例項來保護主庫。

點選(此處)摺疊或開啟

Fatal NI connect error 12543, connecting to:
(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=shanghai.lxh.net)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=shanghai)(CID=(PROGRAM=oracle)(HOST=beijing.lxh.net)(USER=oracle))))
VERSION INFORMATION:
TNS for Linux: Version 11.2.0.3.0 - Production
TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.3.0 - Production
Time: 17-JUL-2014 12:16:19
Tracing not turned on.
Tns error struct:
ns main err code: 12543
TNS-12543: TNS:destination host unreachable
ns secondary err code: 12560
nt main err code: 513
TNS-00513: Destination host unreachable
nt secondary err code: 113
nt OS err code: 0
Error 12543 received logging on to the standby
Thu Jul 17 12:16:20 2014
LGWR: Error 12543 attaching to RFS for reconnect
Error 16198 for archive log file 1 to \'sh\'
Destination LOG_ARCHIVE_DEST_2 is UNSYNCHRONIZED
LGWR: All standby destinations have failed
******************************************************
WARNING: All standby database destinations have failed
WARNING: Instance shutdown required to protect primary
******************************************************
LGWR (ospid: 4337): terminating the instance due to error 16098
Thu Jul 17 12:16:20 2014
System state dump requested by (instance=1, osid=4337 (LGWR)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/beijing/BJ/trace/BJ_diag_4327.trc
Dumping diagnostic data in directory=[cdmp_20140717121620], requested by (instance=1, osid=4337 (LGWR)), summary=[abnormal instance termination].
Instance terminated by LGWR, pid = 4337

【實驗總結】
通過上述實驗模擬Oracle DataGuard在最大保護模式時，物理備庫當機或者網路故障導致所有物理備庫不可用的情況，可以看到備庫的不可達最終會導致主庫Shutdown。
因此最大保護模式可以絕對保證資料沒有丟失，但對主庫與備庫的網路連通情況、備庫的穩定性等也有非常高的要求。

Oracle 11g官方文件中，在對最大保護模式的描述中，也推薦至少使用兩個Standby database，來避免單個備庫故障導致主庫被關閉的情況。

來自 “ ITPUB部落格 ” ，連結：http://blog.itpub.net/29475508/viewspace-1244474/，如需轉載，請註明出處，否則將追究法律責任。

【DataGuard】Oracle DataGuard 最高可用模式故障實驗
2014-08-24
Oracle模式
【DataGuard】Oracle DataGuard 資料保護模式切換
2014-08-09
Oracle模式
聊聊Dataguard的三種保護模式實驗（上）
2015-11-03
模式
聊聊Dataguard的三種保護模式實驗（下）
2015-11-10
模式
oracle實驗記錄 (oracle 10G dataguard(6)保護模式)
2009-11-16
Oracle模式
10gR2最大保護模式DataGuard建立
2007-12-25
模式
建立DATAGUARD最大保護模式-測試手記
2013-04-08
模式
【DataGuard】Oracle Dataguard三種保護模式特點總結
2017-01-12
Oracle模式
dataguard三種保護模式
2014-11-06
模式
DataGuard切換保護模式
2016-06-14
模式
0gR2最大保護模式DataGuard建立 (轉載)
2013-04-06
模式
Oracle Dataguard Standby Redo Log的兩個實驗
2017-01-10
Oracle
DataGuard故障：Standby日誌檔案正常傳輸但沒有Apply
2016-03-10
APP
一步一步學DataGuard(22)Standby之選擇資料保護模式
2008-04-04
模式
oracle 之dataguard standby 切換
2010-12-07
Oracle
Oracle 9I dataguard(standby)
2008-03-19
Oracle
11g dataguard 型別、保護模式、服務
2017-03-10
型別模式
oracle實驗記錄 (oracle 10G dataguard(11)建立logical standby)
2009-11-16
Oracle
探索Oracle11gR2 之 DataGuard_03 三種保護模式
2013-08-15
Oracle模式
Dataguard日常維護及故障解決
2014-09-03
最大效能模式DATAGUARD 搭建及SWITCH
2009-12-17
模式
11g 最大保護模式 standby database網路故障導致主庫當機
2015-06-12
模式Database
【DataGuard】Oracle 11g DataGuard 新特性之 Snapshot Standby Database
2014-08-24
OracleDatabase
DataGuard:Physical Standby Switchover
2009-05-14
【DataGuard】Oracle 11g physical standby switchover
2014-08-19
Oracle
Oracle DataGuard Standby database ID mismatch錯誤
2008-01-30
OracleDatabase
【DataGuard】調整Data Guard資料保護模式詳細步驟
2011-10-12
模式
【DataGuard】使用Grid Control調整Oracle物理Data Guard資料保護模式
2010-08-12
Oracle模式
DataGuard搭建物理StandBy
2014-10-30
DataGuard搭建邏輯StandBy
2014-11-10
Dataguard(Standby) 後臺程式
2013-04-11
DataGuard:Physical Standby Failover
2009-05-15
AI
DataGuard:Logical Standby Switchover
2009-05-15
oracle 12c 支援級聯 standby dataguard
2015-01-27
Oracle
ORACLE10g DataGuard 配置Physical Standby Database
2014-05-30
OracleDatabase
[轉帖]Oracle9i Standby (Dataguard) 建立
2009-03-04
Oracle
Oracle Dataguard故障轉移(failover)操作
2020-07-28
OracleAI
DATAGUARD監控，保護和自動修復最佳實踐
2018-06-04

【DataGuard】不能沒有你，我的Standby ——Oracle DataGuard最大保護模式 故障實驗

相關文章

【DataGuard】不能沒有你，我的Standby ——Oracle DataGuard最大保護模式故障實驗