Oracle 叢集心跳及其引數misscount/disktimeout/reboottime


在Oracle RAC中,能夠從多個層次,多個不同的機制來檢測RAC的健康狀況,即能夠透過心跳機制以及一定的投票演算法來隔離故障。假設檢測到某節點失敗,則存在故障的節點將會被逐出叢集以避免故障節點破壞資料。本文主要描寫敘述了Oracle RAC下的幾種心跳機制以及心跳引數的調整。


    OCSSD是一個管理及提供Cluster Synchronization Services (CSS)服務的Linux或者Unix程式。使用Oracle使用者來執行該程式並提供節點成員管理功能,一旦該程式失敗。將導致節點重新啟動。CSS服務提供2種心跳機制。一種為網路心跳。一種為磁碟心跳。兩種心跳都有最大延時,網路心跳的延時叫MC(Misscount), 磁碟心跳延時叫作IOT (I/O Timeout)。

這2個引數都以秒為單位。預設時情況下Misscount < Disktimeout。



    故名思義即是透過私有網路來檢測節點的狀態。假設私有網路硬體、軟體導致叢集節點間私有網路在一定時間內無法進行正常通訊。由此而導致腦裂。由於叢集環境中的儲存為共享儲存,因此此時必須要將故障節點從 叢集隔離出來,以避免資料災難。關於這個網路心跳的詳細動作描寫敘述例如以下:
    Every one second, a sending thread in the cssd sends a network tcp heartbeat to itself and all nodes. The receiving thread of the ocssd.bin receives the heartbeat. 
    If the package network is dropped or has error, the error correction mechanism on tcp would retransmit the package.  
    Oracle does not retransmit.  From the ocssd.log, you will see a WARNING message about missing of heartbeat if a node does not receive a heartbeat from another node for 15 seconds (50% of miscount).  Another warning is reported in ocssd.log if the same node is missing for 22 seconds (75% of miscount)..another warning continues from the same node for 27 seconds (90% miscount).  When the heartbeat is missing 100% ..30 seconds miscount, the node is evicted
   這個網路心跳的延遲稱之為misscount,能夠透過crsctl 工具查詢及改動。
   [grid@Linux-01 ~]$ crsctl get css misscount
   CRS-4678: Successful get misscount 30 for Cluster Synchronization Services.

   怎樣尋找故障節點。Oracle則透過投票演算法來決定,以下是一個演算法描寫敘述演示樣例,描寫敘述參考大話Oracle RAC。


這是必須剔除一個partition才幹保障叢集的健康執行。 對於這3個節點的叢集, A 心跳出現故障後, B 和 C 是一個partion,有2票, A僅僅有1票。

依照投票演算法。 B 和C 組成的叢集獲得控制權。 A 被剔除。假設僅僅有2個節點,投票演算法就失效了。

由於每一個節點上都僅僅有1票。 這時就須要引入第三個裝置:Quorum Device. Quorum Device 通常採用的是共享磁碟,這個磁碟也叫作Quorum disk。 這個Quorum Disk 也代表一票。 當2個結點的心跳出現故障時, 2個節點同一時候去爭取Quorum Disk 這一票, 最早到達的請求被最先滿足。

故最先獲得Quorum Disk的節點就獲得2票。還有一個節點就會被剔除。




   A thread in ocssd.bin updates the voting disk every second.
   If  a node does not update the voting disks for 200 seconds, it's evicted. 
   However, the ocssd.bin on the local node has the logic that it will bring down the node if it has an I/O error more than majority of the voting disks. Also there is a CRS reconfiguration is happening when misscount is 27 second and the local node is rebooted. As a result, you rarely see an eviction due to failure of the voting disk on (this is more common in because the ocssd.bin will abort the node before it get evicted by another node if writing to the voting disk is the problem.




    Default 3 seconds -the amount of time allowed for a node to complete a reboot 
    after the CSS daemon has been evicted.
    crsctl get css reboottime


  1) to版本號的改動方法
    a) Shut down CRS on all but one node. For exact steps use note 309542.1
    b) Execute crsctl as root to modify the misscount:
       $CRS_HOME/bin/crsctl set css misscount <n>    #### where <n> is the maximum private network latency in seconds
       $CRS_HOME/bin/crsctl set css reboottime <r> [-force]  #### (<r> is seconds)
       $CRS_HOME/bin/crsctl set css disktimeout <d> [-force] #### (<d> is seconds)
    c) Reboot the node where adjustment was made
    d) Start all other nodes which was shutdown in step 1
    e) Execute crsctl as root to confirm the change:
       $CRS_HOME/bin/crsctl get css misscount
       $CRS_HOME/bin/crsctl get css reboottime 
       $CRS_HOME/bin/crsctl get css disktimeout


  2) 11gR2的改動方法
     With 11gR2, these settings can be changed online without taking any node down:

    a) Execute crsctl as root to modify the misscount:
       $CRS_HOME/bin/crsctl set css misscount <n>    #### where <n> is the maximum private network latency in seconds
       $CRS_HOME/bin/crsctl set css reboottime <r> [-force]  #### (<r> is seconds)
       $CRS_HOME/bin/crsctl set css disktimeout <d> [-force] #### (<d> is seconds)
    b) Execute crsctl as root to confirm the change:
       $CRS_HOME/bin/crsctl get css misscount
       $CRS_HOME/bin/crsctl get css reboottime 
       $CRS_HOME/bin/crsctl get css disktimeout

