Reboot-less node fencing in Oracle Clusterware 11g Release 2

花菜土豆粉發表於2016-07-31

在進行一次RAC的高可用性測試時，當private網路卡的網線被拔掉之後，沒有出現傳說中的有一個節點被CRS強制重啟，取而代之的是node2上面的ASM例項和RDBMS例項被關閉；當網線被重新插上時，node2上面的ASM例項和RDBMS例項自動重新啟動。

基於上面的現象，在google上搜尋，發現oracle在11.2.0.2版本之後引入了叫reboot-less node fencing的特性，即不使用重啟節點的方式進行fencing。

下面引用oracle官方文件對reboot-less node fencing的介紹。

As mentioned, Oracle Clusterware uses a STONITH (Shoot The Other Node In The Head) comparable fencing algorithm to ensure data integrity in cases, in which cluster integrity is endangered and split-brain scenarios need to be prevented. In case of Oracle Clusterware, this means that a local process enforces the removal of one or more nodes from the cluster (fencing). 

Until Oracle Clusterware 11g Release 2, Patch Set One (11.2.0.2) the fencing of a node was performed by a “fast reboot” of the respective server. A “fast reboot” in this context summarizes a shutdown and restart procedure that does not wait for any IO to finish or for file systems to synchronize on shutdown. With Oracle Clusterware 11g Release 2, Patch Set One (11.2.0.2) this mechanism has been changed in order to prevent such a reboot as much as possible. 

Already with Oracle Clusterware 11g Release 2 this algorithm was improved so that failures of certain, Oracle RAC-required subcomponents in the cluster do not necessarily cause an immediate fencing (reboot) of a node. Instead, an attempt is made to clean up the failure within the cluster and to restart the failed subcomponent. Only, if a cleanup of the failed component appears to be unsuccessful, a node reboot is performed in order to force a cleanup. 

With Oracle Clusterware 11g Release 2, Patch Set One (11.2.0.2) further improvements were made so that Oracle Clusterware will try to prevent a split-brain without rebooting the node. It thereby implements a standing requirement from those customers, who were requesting to preserve the node and to prevent a reboot, since the node runs applications not managed by Oracle Clusterware, which would otherwise be forcibly shut down by the reboot of a node. 

With the new algorithm and when a decision is made to evict a node from the cluster, Oracle Clusterware will first attempt to shutdown all resources on the machine that was chosen to be the subject of an eviction. Especially IO generating processes are killed and it is ensured that those processes are completely stopped before continuing. If, for some reason, not all resources can be stopped or IO generating processes cannot be stopped completely, Oracle Clusterware will still perform a reboot or use IPMI to forcibly evict the node from the cluster. 

If all resources can be stopped and all IO generating processes can be killed, Oracle Clusterware will shut itself down on the respective node, but will attempt to restart after the stack has been stopped. The restart is initiated by the Oracle High Availability Services Daemon, which has been introduced with Oracle Clusterware 11g Release 2.

Modifying the VIP or VIP Hostname of a 10g or 11g Oracle Clusterware Node
2009-10-16
Oracle
Oracle Database 11g Release 2 RAC On Linux
2010-11-19
OracleDatabaseLinux
APPEND_VALUES Hint in Oracle Database 11g Release 2
2013-10-22
APPOracleDatabase
Oracle® Database Error Messages 11g Release 2 (11.2)
2011-01-18
OracleDatabaseError
Using Oracle Database 11g Release 2 Result Cache in an Oracle RAC Environment
2013-07-18
OracleDatabase
Add Node/Instance Remove Node/Instance in 10gR2 11g Clusterware RAC_1332451.1
2014-06-17
REM
Oracle Database 11g Release 2 RAC On Linux Using VMware Server 2
2010-04-07
OracleDatabaseLinuxServer
1 Oracle Database 11g Release 2 (11.2.0.4) New Features
2014-10-03
OracleDatabase
ORACLE 11G RAC--CLUSTERWARE工具集1
2012-08-15
Oracle
[轉]Important ASM changes in 11g Release 2
2009-09-12
ImportASM
補接_oracle rac_node addition and deletion for clusterware or software
2009-12-11
Oracle
oracle 11g release 2 12g 不支援裸裝置的宣告
2009-02-23
Oracle
在Centos 6.5上安裝Oracle Database 11g Release 2 (11.2.0.4.0) RAC
2016-05-20
CentOSOracleDatabase
Partitioning Enhancements in Oracle Database 11g Release 1
2014-06-12
OracleDatabase
Configuring raw devices (multipath) for Oracle Clusterware 10g Release 2 (10.2.0) on RHEL5-OEL5 [ID ...
2010-11-30
devOracle
oracle clusterware
2011-07-13
Oracle
Oracle 11g Release 2 RAC叢集系統——安裝後置任務
2013-09-26
Oracle
oracle 12c release 2 安裝
2018-07-24
Oracle
ORACLE RAC clusterware
2018-12-25
Oracle
Read-Only Tables in Oracle Database 11g Release 1
2014-10-17
OracleDatabase
Oracle 11g Release 1 (11.1) Data Pump 技術
2012-04-09
Oracle
Oracle 11g rac add node
2015-08-20
Oracle
Using Diagwait as a diagnostic to get more information for diagnosing Oracle Clusterware Node evicti
2012-03-21
AIORMOracle
Configuring non-raw Multipath Devices for Oracle Clusterware 11g
2013-12-12
devOracle
Oracle Clusterware的心跳
2013-10-27
Oracle
Oracle Clusterware工具集
2016-01-27
Oracle
Oracle Clusterware and Oracle Grid Infrastructure
2011-12-04
OracleASTStruct
SAP has certified Oracle Database 11g Release 2,will be Certified Sun Exadata V2 during Q2 CY 2011
2011-01-27
OracleDatabase
Oracle 10gR2 RAC Clusterware ONS服務的管理
2012-06-12
Oracle 10g
Oracle clusterware組成概述
2014-07-08
Oracle
HACMP & Oracle Clusterware 對比
2011-10-08
ACMOracle
11g Release 2 (11.2) for IBM AIX on POWER Systems (64-Bit)
2014-04-15
IBMAI
Fedora core 4安裝Oracle9i release 2
2006-12-07
Oracle
Transparent Data Encryption (TDE) in Oracle 10g Database Release 2
2011-12-22
Oracle 10gDatabase
Oracle 11g RAC One node 安裝與配置
2017-07-25
Oracle
The Oracle Clusterware Voting Disk and Oracle Cluster Registry
2012-06-08
Oracle
Automatic Diagnostic Repository (ADR) in Oracle Database 11g Release 1 (ADRCI)
2019-01-09
OracleDatabase
Troubleshooting 11.2 Clusterware Node Evictions (Reboots)_1050693.1
2015-04-02
boot

Reboot-less node fencing in Oracle Clusterware 11g Release 2

相關文章