之前文章介紹了redis replication主從高可用架構，
現在延伸下，本文討論redis replication架構下實現自動故障切換(automatic failover)的技術--Sentinel

主要參考官方文件：

注：配置sentinel之前需先建立master-slave replication
可依照文章建立replication： http://blog.itpub.net/25583515/viewspace-2644438/

一.安裝配置
在1個master 1個slave 的環境中加一個sentinel，
進入redis原始碼安裝目錄copy sentinel.conf檔案
# cd /u01/packages/redis-3.0.6
# cp sentinel.conf /usr/local/redis/etc/
# vi /usr/local/redis/etc/sentinel.conf
daemonize yes
sentinel monitor mymaster 127.0.0.1 6379 1
sentinel down-after-milliseconds mymaster 30000
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 180000

引數解釋：
sentinel monitor mymaster 127.0.0.1 6379 1
指示 Sentinel 去監視一個被命名為 mymaster 的master，可指定為任何名字
Master IP為127.0.0.1 ，埠號為 6379 ，
這個master判斷為失效至少需要 1 個 Sentinel 同意（只要同意 Sentinel 的數量不達標，自動故障遷移就不會執行）
注意，無論你設定多少個 Sentinel 同意才能判斷一個伺服器失效，一個 Sentinel 都需要獲得系統中多數（majority） Sentinel 的支援，才能發起一次自動故障遷移

sentinel down-after-milliseconds mymaster 30000
down-after-milliseconds 指定了 Sentinel 認為master已經斷線所需的毫秒數

sentinel parallel-syncs mymaster 1
parallel-syncs 指定了在執行故障轉移時，最多可以有多少個slave同時對新的master進行同步，這個數字越小，完成故障轉移所需的時間就越長

sentinel failover-timeout mymaster 180000
failover-timeout 指定故障切換允許的毫秒數，超過這個時間，就認為故障切換失敗，預設為3分鐘

啟動sentinel
# redis-sentinel /usr/local/redis/etc/sentinel.conf

檢視狀態
# redis-cli -p 26379
127.0.0.1:26379> info sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
master0:name=mymaster,status=ok,address=127.0.0.1:6379,slaves=1,sentinels=1

至此，配置sentinel工作完成

Sentinel 相關命令
• INFO sentinel的基本狀態資訊
• PING ：返回 PONG
• SENTINEL masters ：列出所有被監視的master，以及這些master的當前狀態
• SENTINEL slaves <master name> ：列出給定master的所有slave伺服器，以及這些slave的當前狀態
• SENTINEL get-master-addr-by-name <master name> ：返回給定名字的master的 IP 地址和埠號。如果這個master正在執行故障轉移操作，或者針對這個master的故障轉移操作已經完成，那麼這個命令返回新的master的 IP 地址和埠號。
• SENTINEL reset <pattern> ：重置所有名字和給定模式 pattern 相匹配的master。 pattern 引數是一個 Glob 風格的模式。重置操作清除master目前的所有狀態，包括正在執行中的故障轉移，並移除目前已經發現和關聯的，master的所有slave和 Sentinel
• SENTINEL failover <master name> ：當master失效時，在不詢問其他 Sentinel 意見的情況下，強制開始一次自動故障遷移（不過發起故障轉移的 Sentinel 會向其他 Sentinel 傳送一個新的配置，其他 Sentinel 會根據這個配置進行相應的更新）。

二. 原理：
故障轉移時主要是解決兩個問題，一是選Leader Sentinel，二是選新的master

1.選Leader Sentinel規則

Sentinel 自動故障遷移使用 Raft 演算法來選舉領頭（Leader）Sentinel ，從而確保在一個給定的紀元時期（epoch）裡，只有一個Leader產生。
表示在同一個時期，不會有兩個 Sentinel 同時被選中為Leader，並且各個 Sentinel 在同一個時期中只會對一個Leader進行投票。

注：Raft演算法主要思想是同一期Term(Epoch)投票中少數服從多數原則達成一致，選出Leader
具體演算法這裡不過多解釋，詳細可參考文章：

2.選新master規則
1> 在失效主伺服器屬下的從伺服器當中，那些被標記為主觀下線、已斷線、或者最後一次回覆 PING 命令的時間大於五秒鐘的從伺服器都會被淘汰。
2> 在失效主伺服器屬下的從伺服器當中，那些與失效主伺服器連線斷開的時長超過 down-after 選項指定的時長十倍的從伺服器都會被淘汰。
3> 在經歷了以上兩輪淘汰之後剩下來的從伺服器中，我們選出複製偏移量（replication offset）最大的那個slave作為新的master伺服器；如果複製偏移量不可用，或者slave伺服器的複製偏移量相同，那麼帶有最小執行 ID 的那個從伺服器成為新的master。

一次故障轉移步驟：
1>發現master已進入客觀下線狀態。
2>對當前紀元時期(epoch)進行自增，並嘗試在這個紀元時期中當選。
3>如果當選失敗，那麼在設定的故障遷移超時時間的兩倍之後，重新嘗試當選。如果當選成功，那麼執行以下步驟
4>選出一個slave，並將它升級為master。
5>向被選中的slave傳送 SLAVEOF NO ONE 命令，讓它轉變為master。
透過釋出與訂閱功能，將更新後的配置傳播給所有其他 Sentinel ，其他 Sentinel 對它們自己的配置進行更新。
6>向已下線master的其它slave傳送 SLAVEOF host port 命令，讓它們去複製新的master。
7>當所有slave都已經開始複製新的master時，領頭 Sentinel 終止這次故障遷移操作。

以上，介紹了Redis Sentinel哨兵模式相關配置及原理，是否很好理解呢。

Redis Sentinel哨兵模式原理及配置

相關文章