Redis Sentinel 高可用實現說明

jyzhou發表於2016-06-20

背景:

     前面介紹了Redis 複製、Sentinel的搭建和原理說明,通過這篇文章大致能瞭解Sentinel的原理和實現方法以及相關的搭建。這篇文章就針對Redis Sentinel的搭建做下詳細的說明。

安裝:

     這裡對原始碼編譯進行一下說明,本文例項的作業系統是Ubuntu16.04,使用Redis的版本是3.2.0。安裝步驟如下:

  • 下載原始碼包:wget http://download.redis.io/releases/redis-3.2.0.tar.gz
  • 安裝依賴包:sudo apt-get install gcc tcl
  • 解壓編譯   :
    #tar zxvf redis-3.2.0.tar.gz
    ...
    ...
    #make
    ...
    Hint: It's a good idea to run 'make test' ;)
    #make test
    ...
    \o/ All tests passed without errors!
    ...
    #make install

    注意:這裡很可能會在make test 這步出現一個錯誤:

    [err]: Test replication partial resync: ok psync (diskless: yes, reconnect: 1) in tests/integration/replication-psync.tcl

    Expected condition '[s -1 sync_partial_ok] > 0' to be true ([s -1 sync_partial_ok] > 0)

    出現這個問題的原因可能是"測試點在配置比較低的機器上會因為超時而過不了",本文的環境是一個lxc的虛擬機器。不過有2個方法可以避免:

    1:在解壓目錄中修改
    # vi tests/integration/replication-psync.tcl
    把 after 100 改成 after 500
    
    2:用taskset來make test
    # taskset -c 1 make test

    到此redis編譯安裝完成。

  • 編譯檔案的目錄裡有2個配置:
    redis.confsentinel.conf,配置檔案說明請見這篇文章
  • 本文測試的環境架構:
    3個redis例項1主、2從、3sentinel。M:10.0.3.110、S:10.0.3.92、10.0.3.66,每個redis例項上配置一個sentinel例項。修改配置檔案:
    redis.conf
  • # Redis configuration file example.
    # ./redis-server /path/to/redis.conf
    
    ################################## INCLUDES ###################################
    
    # include /path/to/local.conf
    # include /path/to/other.conf
    
    ################################## NETWORK #####################################
    
    bind 10.0.3.110
    
    protected-mode yes
    
    port 6379
    
    tcp-backlog 511
    
    unixsocket "/tmp/redis.sock"
    unixsocketperm 700
    
    timeout 0
    
    tcp-keepalive 0
    
    ################################# GENERAL #####################################
    
    daemonize yes
    
    pidfile "/var/run/redis6379.pid"
    
    loglevel notice
    
    logfile "/var/log/redis/redis_6379.log"
    
    # syslog-enabled no
    # syslog-ident redis
    # syslog-facility local0
    
    databases 16
    supervised no
    
    ################################ SNAPSHOTTING  ################################
    
    save 900 1
    save 300 10
    save 60 10000
    
    stop-writes-on-bgsave-error yes
    
    rdbcompression yes
    
    rdbchecksum yes
    
    dbfilename "dump_6379.rdb"
    
    dir "/var/lib/redis_6379"
    
    ################################# REPLICATION #################################
    
    # slaveof <masterip> <masterport>
    masterauth "dxydxy"
    
    slave-serve-stale-data yes
    slave-read-only yes
    
    repl-diskless-sync no
    repl-diskless-sync-delay 5
    
    # repl-ping-slave-period 10
    # repl-timeout 60
    
    repl-disable-tcp-nodelay no
    repl-backlog-size 5mb
    repl-backlog-ttl 3600
    
    slave-priority 100
    
    #min-slaves-to-write 3
    #min-slaves-max-lag 10
    
    ################################## SECURITY ###################################
    
    requirepass "dxydxy"
    # rename-command CONFIG b840fc02d524045429941cc15f59e41cb7be6c52
    # rename-command CONFIG ""
    
    ################################### LIMITS ####################################
    
    maxclients 1000
    #maxmemory <bytes>
    maxmemory-policy noeviction
    # maxmemory-samples 5
    
    ############################## APPEND ONLY MODE ###############################
    
    appendonly yes
    
    appendfilename "appendonly_6379.aof"
    
    # appendfsync always
    appendfsync everysec
    # appendfsync no
    
    no-appendfsync-on-rewrite no
    auto-aof-rewrite-percentage 100
    auto-aof-rewrite-min-size 64mb
    aof-load-truncated yes
    
    ################################ LUA SCRIPTING  ###############################
    
    lua-time-limit 5000
    
    ################################ REDIS CLUSTER  ###############################
    
    # cluster-enabled yes
    # cluster-config-file nodes-6379.conf
    # cluster-node-timeout 15000
    # cluster-slave-validity-factor 10
    # cluster-migration-barrier 1
    # cluster-require-full-coverage yes
    
    ################################## SLOW LOG ###################################
    
    slowlog-log-slower-than 10000
    slowlog-max-len 128
    
    ################################ LATENCY MONITOR ##############################
    
    latency-monitor-threshold 0
    
    ############################# EVENT NOTIFICATION ##############################
    
    notify-keyspace-events ""
    
    ############################### ADVANCED CONFIG ###############################
    
    hash-max-ziplist-entries 512
    hash-max-ziplist-value 64
    
    list-max-ziplist-entries 512
    list-max-ziplist-value 64
    
    list-compress-depth 0
    set-max-intset-entries 512
    
    zset-max-ziplist-entries 128
    zset-max-ziplist-value 64
    
    hll-sparse-max-bytes 3000
    
    activerehashing yes
    
    client-output-buffer-limit normal 0 0 0
    client-output-buffer-limit slave 256mb 64mb 60
    client-output-buffer-limit pubsub 32mb 8mb 60
    
    hz 10
    aof-rewrite-incremental-fsync yes
    
    list-max-ziplist-size -2
    View Code

    sentinel.conf

    port 16379
    
    dir "/var/lib/sentinel_16379"
    
    logfile "/var/log/redis/sentinel_16379.log"
    
    daemonize yes
    
    protected-mode no
    
    sentinel monitor dxy 10.0.3.110 6379 2
    
    sentinel auth-pass dxy dxydxy
    
    sentinel down-after-milliseconds dxy 15000
    
    sentinel failover-timeout dxy 120000
    
    #發生切換之後執行的一個自定義指令碼:如發郵件、vip切換等
    #sentinel notification-script <master-name> <script-path>
    #sentinel client-reconfig-script <master-name> <script-path>

    配置檔案儲存在 /etc/redis/目錄下,按照配置檔案建立相應的目錄。和Redis 複製、Sentinel的搭建和原理說明這裡不同的是各個redis例項都配置了密碼訪問的限制(requirepass)。
    注意:當一個master配置需要密碼才能連線時,客戶端和slave在連線時都需要提供密碼。master通過requirepass設定自身的密碼,不提供密碼無法連線到這個master。slave通過masterauth來設定訪問master時的密碼。客戶端需要auth提供密碼,但是當使用了sentinel時,由於一個master可能會變成一個slave,一個slave也可能會變成master,所以需要同時設定上述兩個配置項,並且sentinel需要連線master和slave,需要設定引數:sentinel auth-pass <master_name> xxxxx。

  • 建立redis使用者和組,把配置檔案裡指定的目錄均授權。
    # useradd redis
    # groupadd redis
    # chown -R redis.redis redis/
    # chown -R redis.redis /etc/redis/
  • 開啟各個redis例項
    redis-server /etc/redis/redis.conf

        注意:開啟的時redis的日誌會報幾個WARNING

  • 29407:M 14 Jun 14:36:42.186 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
    處理:修改/etc/sysctl.conf檔案,增加一行 net.core.somaxconn= 1024;然後執行命令:sysctl -p
    
    29407:M 14 Jun 14:36:42.186 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
    處理echo 1 > /proc/sys/vm/
    
    29407:M 14 Jun 14:36:42.187 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
    處理echo never > /sys/kernel/mm/transparent_hugepage/enabled

    WARNING說明:

    net.core.somaxconn是linux中的一個kernel引數,表示socket監聽(listen)的backlog上限。
    backlog是socket的監聽佇列,當一個請求(request)尚未被處理或建立時,他會進入backlog。
    而socket server可以一次性處理backlog中的所有請求,處理後的請求不再位於監聽佇列中。
    當server處理請求較慢,以至於監聽佇列被填滿後,新來的請求會被拒絕。
    所以說net.core.somaxconn限制了接收新 TCP 連線偵聽佇列的大小。
    對於一個經常處理新連線的高負載 web服務環境來說,預設的 128 太小了。大多數環境這個值建議增加到 1024 或者更多。
    
    
    overcommit_memory引數說明:
    設定記憶體分配策略(可選,根據伺服器的實際情況進行設定)
    /proc/sys/vm/overcommit_memory
    可選值:0120, 表示核心將檢查是否有足夠的可用記憶體供應用程式使用;如果有足夠的可用記憶體,記憶體申請允許;否則,記憶體申請失敗,並把錯誤返回給應用程式。
    1, 表示核心允許分配所有的實體記憶體,而不管當前的記憶體狀態如何。
    2, 表示核心允許分配超過所有實體記憶體和交換空間總和的記憶體
    注意:redis在dump資料的時候,會fork出一個子程式,理論上child程式所佔用的記憶體和parent是一樣的,比如parent佔用的記憶體為8G,這個時候也要同樣分配8G的記憶體給child,如果記憶體無法負擔,往往會造成redis伺服器的down機或者IO負載過高,效率下降。所以這裡比較優化的記憶體分配策略應該設定為 1(表示核心允許分配所有的實體記憶體,而不管當前的記憶體狀態如何)。
    View Code
  • 建立好複製後(slaveof)開啟各個sentinel例項
  • redis-sentinel /etc/redis/sentinel.conf

    注意:這裡出現一個問題,這個問題罪魁禍首是引數:protected-mode。看下日誌:

    2208:X 14 Jun 23:13:09.185 * +sentinel sentinel ebf9b1b4a5cc98bffead5d0996b8f43deb806641 10.0.3.92 16379 @ dxy 10.0.3.110 6379
    2208:X 14 Jun 23:13:24.234 # +sdown sentinel ebf9b1b4a5cc98bffead5d0996b8f43deb806641 10.0.3.92 16379 @ dxy 10.0.3.110 6379
    2208:X 14 Jun 23:14:18.888 * +sentinel sentinel 07e189ae6c30d4951d3eb48e9effd948de026c3b 10.0.3.66 16379 @ dxy 10.0.3.110 6379
    2208:X 14 Jun 23:14:33.962 # +sdown sentinel 07e189ae6c30d4951d3eb48e9effd948de026c3b 10.0.3.66 16379 @ dxy 10.0.3.110 6379

    從日誌裡可以看到,除了本地的sentinel正常,其他2個sentinel都主觀不可用了(SDOWN),時間剛好15秒(down-after-milliseconds 15000),sentinel會向master傳送心跳PING來確認master是否存活,如果master在“一定時間範圍”內不迴應PONG 或者是回覆了一個錯誤訊息,那麼這個sentinel會主觀地(單方面地)認為這個master已經不可用了(subjectively down, 也簡稱為SDOWN)。而這個down-after-milliseconds就是用來指定這個“一定時間範圍”的,單位是毫秒
    通過時間點的判斷可以看到,sentinel之間發現不了對方,導致SDOWN(從Redis 複製、Sentinel的搭建和原理說明裡介紹的發現機制)。因為沒有錯誤資訊,這裡找了半天原因都沒發現什麼問題。最後登陸sentinel上檢視一下:

    # redis -h 10.0.3.110 -p 16379
    10.0.3.110:16379> info
    DENIED Redis is running in protected mode because protected mode is enabled, no bind address was specified, no authentication password is requested to clients. In this mode connections are only accepted from the loopback interface. If you want to connect from external computers to Redis you may adopt one of the following solutions: 1) Just disable protected mode sending the command 'CONFIG SET protected-mode no' from the loopback interface by connecting to Redis from the same host the server is running, however MAKE SURE Redis is not publicly accessible from internet if you do so. Use CONFIG REWRITE to make this change permanent. 2) Alternatively you can just disable the protected mode by editing the Redis configuration file, and setting the protected mode option to 'no', and then restarting the server. 3) If you started the server manually just for testing, restart it with the '--protected-mode no' option. 4) Setup a bind address or an authentication password. NOTE: You only need to do one of the above things in order for the server to start accepting connections from the outside.

    這裡看到一大串的資訊,總的就是在說redis在沒有開啟bind和密碼的情況下,保護模式被開啟。然後Redis的只接受來自環回IPv4和IPv6地址的連線。拒絕外部連線,使使用者知道發生了什麼錯誤。其實應該為使用者提供了線索,而不是拒絕連線。具體的說明可以看作者的討論,最後作者給出的建議是關閉保護模式:--portected-mode no。所以最後我們這裡的錯誤資訊可以得到解釋:由於sentinel沒有指定bind和密碼訪問,所以被開啟了protected-mode保護模式,拒絕其他sentinel的連線。導致進入了ODWON。在sentinel.conf里加入:

    protected-mode no

    問題得到解決。portected-mode是3.2被引入,預設開啟。具體的資訊如下:

    # Protected mode is a layer of security protection, in order to avoid that
    # Redis instances left open on the internet are accessed and exploited.
    #
    # When protected mode is on and if:
    #
    # 1) The server is not binding explicitly to a set of addresses using the
    #    "bind" directive.
    # 2) No password is configured.
    #
    # The server only accepts connections from clients connecting from the
    # IPv4 and IPv6 loopback addresses 127.0.0.1 and ::1, and from Unix domain
    # sockets.
    #
    # By default protected mode is enabled. You should disable it only if
    # you are sure you want clients from other hosts to connect to Redis
    # even if no authentication is configured, nor a specific set of interfaces
    # are explicitly listed using the "bind" directive.
    protected-mode yes
    View Code
  • 開啟sentinel,檢視日誌:(成功開啟)
    2253:X 14 Jun 23:48:05.477 # Sentinel ID is 68fdb1e07c0998b119e4678f7aead7742a7b1f64
    2253:X 14 Jun 23:48:05.477 # +monitor master dxy 10.0.3.110 6379 quorum 2
    2253:X 14 Jun 23:48:05.478 * +slave slave 10.0.3.92:6379 10.0.3.92 6379 @ dxy 10.0.3.110 6379
    2253:X 14 Jun 23:48:05.512 * +slave slave 10.0.3.66:6379 10.0.3.66 6379 @ dxy 10.0.3.110 6379
    2253:X 14 Jun 23:48:14.894 * +sentinel sentinel b2fb07a1cce853ddec86a993428fb09edf15b6c1 10.0.3.92 16379 @ dxy 10.0.3.110 6379
    2253:X 14 Jun 23:48:23.346 * +sentinel sentinel d9b198d75ede190fc63d95af8a7ca58e1a395c9b 10.0.3.66 16379 @ dxy 10.0.3.110 6379
  • 檢視狀態,驗證sentinel是否建立成功。(任意登陸一個sentinel檢視)
    10.0.3.92:16379> info sentinel
    # Sentinel
    sentinel_masters:1
    sentinel_tilt:0
    sentinel_running_scripts:0
    sentinel_scripts_queue_length:0
    sentinel_simulate_failure_flags:0
    master0:name=dxy,status=ok,address=10.0.3.110:6379,slaves=2,sentinels=3

    上面粗體的字說明sentinel開啟成功。

測試:

注意:因為上面的虛擬機器連不了郵件伺服器,所以更換了環境。新環境:版本2.8.4,3個redis例項1主、2從、3sentinel。M:192.168.200.208<6379>、S:192.168.200.199、192.168.200.73,每個redis例項上配置一個sentinel<7379>例項。

① 檢視:info 

192.168.200.208:6379> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=192.168.200.199,port=6379,state=online,offset=354835,lag=0
slave1:ip=192.168.200.73,port=6379,state=online,offset=354835,lag=0
master_repl_offset:354974 
repl_backlog_active:1
repl_backlog_size:5242880 
repl_backlog_first_byte_offset:2
repl_backlog_histlen:354973
192.168.200.208:6379>

192.168.200.208:7379> info sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
192.168.200.208:7379> sentinel master dxy
 1) "name"
 2) "dxy"
 3) "ip"
 4) "192.168.200.208"
 5) "port"
 6) "6379"
 7) "runid"
 8) "50ad7cfe6676fc1a1e671ead4a780958942879fc"
 9) "flags"
10) "master"
11) "pending-commands"
12) "0"
13) "last-ok-ping-reply"
14) "682"
15) "last-ping-reply"
16) "682"
17) "info-refresh"
18) "3301"
19) "role-reported"
20) "master"
21) "role-reported-time"
22) "1930980"
23) "config-epoch"
24) "4"
25) "num-slaves"
26) "2"
27) "num-other-sentinels"
28) "2"
29) "quorum"
30) "2"
31) "down-after-milliseconds"
32) "30000"
33) "failover-timeout"
34) "180000"
35) "parallel-syncs"
36) "1"
37) "client-reconfig-script"
38) "/opt/bin/notify.py"

192.168.200.208:7379> sentinel slaves dxy
1)  1) "name"
    2) "192.168.200.199:6379"
    3) "ip"
    4) "192.168.200.199"  
    5) "port"
    6) "6379"
    7) "runid"
    8) "c4e7bf53f7cee3c28bc369e1db656f879bf41947"
    9) "flags"
   10) "slave"
   11) "pending-commands" 
   12) "0"
   13) "last-ok-ping-reply"
   14) "591"
   15) "last-ping-reply"  
   16) "591"
   17) "info-refresh"
   18) "3606"
   19) "role-reported"
   20) "slave"
   21) "role-reported-time"
   22) "1971346"
   23) "master-link-down-time"
   24) "0"
   25) "master-link-status"
   26) "ok"
   27) "master-host"
   28) "192.168.200.208"
   29) "master-port"
   30) "6379"
   31) "slave-priority"
   32) "100"
   33) "slave-repl-offset"
   34) "400362"
2)  1) "name"
    2) "192.168.200.73:6379"
    3) "ip"
    4) "192.168.200.73"
    5) "port"
    6) "6379"
    7) "runid"
    8) "64ad290c43bba2b062220029c4c91274bb4465b9"
    9) "flags"
   10) "slave"
   11) "pending-commands"
   12) "0"
   13) "last-ok-ping-reply"
   14) "591"
   15) "last-ping-reply"
   16) "591"
   17) "info-refresh"
   18) "4817"
   19) "role-reported"
   20) "slave"
   21) "role-reported-time"
   22) "326006"
   23) "master-link-down-time"
   24) "0"
   25) "master-link-status"
   26) "ok"
   27) "master-host"
   28) "192.168.200.208"
   29) "master-port"
   30) "6379"
   31) "slave-priority"
   32) "100"
   33) "slave-repl-offset"
   34) "400085"
View Code

② 驗證failover

kill 掉 master,通過日誌檢視是切換過程的資訊:

[7637] 17 Jun 12:11:08.728 # +sdown master dxy 192.168.200.208 6379   #進入客觀不可用
[7637] 17 Jun 12:11:08.819 # +odown master dxy 192.168.200.208 6379   #quorum 2/2 #投票好之後進入主觀不可用
[7637] 17 Jun 12:11:08.819 # +new-epoch 5                             #版本號
[7637] 17 Jun 12:11:08.819 # +try-failover master dxy 192.168.200.208 6379  #達到failover條件,正等待其他sentinel的選舉
[7637] 17 Jun 12:11:08.819 # +vote-for-leader 38da843c4ad8baf95dcfdcd968ae6c2f05ab995c 5  #選舉出leader
[7637] 17 Jun 12:11:08.820 # 192.168.200.199:7379 voted for 38da843c4ad8baf95dcfdcd968ae6c2f05ab995c 5
[7637] 17 Jun 12:11:08.820 # 192.168.200.73:7379 voted for 38da843c4ad8baf95dcfdcd968ae6c2f05ab995c 5
[7637] 17 Jun 12:11:08.909 # +elected-leader master dxy 192.168.200.208 6379 #選擇leader
[7637] 17 Jun 12:11:08.909 # +failover-state-select-slave master dxy 192.168.200.208 6379 #選擇一個slave當選新master
[7637] 17 Jun 12:11:08.965 # +selected-slave slave 192.168.200.73:6379 192.168.200.73 6379 @ dxy 192.168.200.208 6379 #選擇了從73作為master
[7637] 17 Jun 12:11:08.965 * +failover-state-send-slaveof-noone slave 192.168.200.73:6379 192.168.200.73 6379 @ dxy 192.168.200.208 6379 #當把選擇為新master的slave的身份進行切換
[7637] 17 Jun 12:11:09.017 * +failover-state-wait-promotion slave 192.168.200.73:6379 192.168.200.73 6379 @ dxy 192.168.200.208 6379 #等待其他sentinel的確認
[7637] 17 Jun 12:11:09.867 # +promoted-slave slave 192.168.200.73:6379 192.168.200.73 6379 @ dxy 192.168.200.208 6379 #確認成功
[7637] 17 Jun 12:11:09.867 # +failover-state-reconf-slaves master dxy 192.168.200.208 6379 #Failover狀態變為reconf-slaves 
[7637] 17 Jun 12:11:09.957 * +slave-reconf-sent slave 192.168.200.199:6379 192.168.200.199 6379 @ dxy 192.168.200.208 6379 #sentinel傳送SLAVEOF命令把它重新配置,重新配置到新主
[7637] 17 Jun 12:11:10.887 * +slave-reconf-inprog slave 192.168.200.199:6379 192.168.200.199 6379 @ dxy 192.168.200.208 6379 #slave被重新配置為另外一個master的slave,但資料複製還未發生
[7637] 17 Jun 12:11:10.887 * +slave-reconf-done slave 192.168.200.199:6379 192.168.200.199 6379 @ dxy 192.168.200.208 6379 #slave被重新配置為另外一個master的slave並且資料複製已經與master同步
[7637] 17 Jun 12:11:10.946 # -odown master dxy 192.168.200.208 6379 #老主離開主觀不可用
[7637] 17 Jun 12:11:10.946 # +failover-end master dxy 192.168.200.208 6379 ##failover成功完成
[7637] 17 Jun 12:11:10.946 # +switch-master dxy 192.168.200.208 6379 192.168.200.73 6379 #監聽新的master
[7637] 17 Jun 12:11:10.946 * +slave slave 192.168.200.199:6379 192.168.200.199 6379 @ dxy 192.168.200.73 6379 #發現slave
[7637] 17 Jun 12:11:10.947 * +slave slave 192.168.200.208:6379 192.168.200.208 6379 @ dxy 192.168.200.73 6379
[7637] 17 Jun 12:11:40.960 # +sdown slave 192.168.200.208:6379 192.168.200.208 6379 @ dxy 192.168.200.73 6379
View Code

start 老的master,通過日誌檢視:

[98910] 17 Jun 12:29:01.856 # -sdown slave 192.168.200.208:6379 192.168.200.208 6379 @ dxy 192.168.200.73 6379
[98910] 17 Jun 12:29:11.793 * +convert-to-slave slave 192.168.200.208:6379 192.168.200.208 6379 @ dxy 192.168.200.73 6379  #failover 成功!
View Code

更多的日誌資訊見上一篇文章。在sentinel裡有個選項client-reconfig-script,接下來說明下。 

failover指令碼高可用,通過引數 client-reconfig-script 指定指令碼:failover發生時候執行的指令碼。

該引數的解釋:

# When the master changed because of a failover a script can be called in
# order to perform application-specific tasks to notify the clients that the
# configuration has changed and the master is at a different address.
# 
# The following arguments are passed to the script:
#
# <master-name> <role> <state> <from-ip> <from-port> <to-ip> <to-port>
#
# <state> is currently always "failover"
# <role> is either "leader" or "observer"
# 
# The arguments from-ip, from-port, to-ip, to-port are used to communicate
# the old address of the master and the new address of the elected slave
# (now a master).
#
# This script should be resistant to multiple invocations.
View Code

返回的引數:

<master-name> <role> <state> <from-ip> <from-port> <to-ip> <to-port>

指令碼的目的是在發生failover之後,傳送郵件報警,並且把vip切換到新的master上,有點類似MySQL的MHA,指令碼比較簡單,沒有做其他多餘的判斷,也可以根據複雜的情況加強這個指令碼。實現方法:

①:首先在三臺redis例項上建立信任用密碼登陸。

用ssh-keygen建立公鑰,一直預設回車,最後會在.ssh/下面生成id_rsa.pub
ssh-keygen -t rsa  

把id_rsa.pub 檔案複製到另外2臺機子並匯入公鑰: 
cat id_rsa.pub >> /root/.ssh/authorized_keys 

這裡需要注意:因為測試中的sentinel例項和redis例項是放一起的,要是本地的sentinel要操作(down,up VIP)redis例項,也需要本地也可以訪問本地,即自己ssh-keygen建立的公鑰也要放到自己的authorized_keys中,最後每個伺服器的authorized_keys都相互包含(三行)。

②:第一次執行的時候需要在master上先設定vip,即搭好redis sentinel之後,就需要在master上設定好vip。

③:通過收集日誌,取得所需要的ip。

④:傳送、記錄日誌,並且遠端執行up、down VIP。

在此之前首先要安裝paramiko模組:easy_install paramiko,需要依賴包:apt-get install python-setuptools python-dev build-essential libffi-dev libssl-dev;或則直接執行:apt-get install python-paramiko。

具體指令碼如下:logging說明

#!/usr/bin/env python
#-*-encoding:utf8-*-
#------------------------------------------------
# Name:        notify.py
# Purpose:     failover切換後的操作
# Author:      zhoujy
# Created:     2016-06-17
#------------------------------------------------
import os
import sys
import time
import datetime
import smtplib
import subprocess
import fileinput
import logging
import paramiko
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
from email.Utils import COMMASPACE, formatdate

reload(sys)
sys.setdefaultencoding('utf8')

def send_mail(to, subject, text, from_mail, server="localhost"):
    message = MIMEMultipart()
    message['From'] = from_mail
    message['To'] = COMMASPACE.join(to)
    message['Date'] = formatdate(localtime=True)
    message['Subject'] = subject
    message.attach(MIMEText(text,_charset='utf-8'))
    smtp = smtplib.SMTP(server)
    smtp.sendmail(from_mail, to, message.as_string())
    smtp.close()

#關vip
def down_vip(hostname,port):
    ssh = paramiko.SSHClient()
    ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
    ssh.connect(hostname=hostname,port=port)
    stdin, stdout, stderr = ssh.exec_command("ifconfig eth0:0 down")
#    print stdout.readlines()
    if  not stderr.readlines() :
        print "down vip ok..."
    else :
        print stderr.readlines()
    ssh.close()

#開vip
def up_vip(hostname,port,vip):
    ssh = paramiko.SSHClient()
    ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
    ssh.connect(hostname=hostname,port=port)
    stdin, stdout, stderr = ssh.exec_command("ifconfig eth0:0 %s;arping -c 3 -A %s;hash -r" %(vip,vip))
#    print stdout.readlines()
    if  not stderr.readlines() :
        print "up vip ok..."
    else :
        print stderr.readlines()
    ssh.close()

if __name__ == "__main__":
#伺服器埠
    ssh_port = 22
#指定VIP
    vip      = '192.168.200.2'
#通過logging.basicConfig函式對日誌的輸出格式及方式做相關配置
    logging.basicConfig(level=logging.INFO,
                format=':::%(levelname)s::: \n%(message)s',
                datefmt='%a, %d %b %Y %H:%M:%S',
                filename='/var/log/redis/failover.txt',
                filemode='a')
#定義一個StreamHandler,將INFO級別的日誌資訊列印到標準錯誤,並將其新增到當前的日誌處理物件
    console = logging.StreamHandler()
    console.setLevel(logging.INFO)
    formatter = logging.Formatter('%(name)-12s: %(levelname)-8s %(message)s')
    console.setFormatter(formatter)
    logging.getLogger('').addHandler(console)

    time =  (datetime.datetime.now()).strftime("%Y-%m-%d %H:%M:%S")
    message = sys.argv[1:]
    master_name = sys.argv[1]
    role = sys.argv[2]
    stats = sys.argv[3]
    from_ip = sys.argv[4]
    from_port = sys.argv[5]
    to_ip = sys.argv[6]
    to_port = sys.argv[7]
    messages = "++++++++++++++++++++++++++"+time+" failover++++++++++++++++++++++++++"+'\n'+' '.join(message)
    subject = ''' Redis 【%s】 Failover ''' %master_name
    info = ''' %s : Redis Master %s failover %s(%s:%s) to %s(%s:%s) succeeded ! '''  %(time,master_name,from_ip,from_ip,from_port,to_ip,to_ip,to_port)
    mail_list =['zjy@dxyer.com']
    if role == 'leader':
        logging.info(messages)
        down_vip(from_ip,ssh_port)
        up_vip(to_ip,ssh_port,vip)
        send_mail(mail_list, subject.encode("utf8"), info +' and VIP do sucessed !!', "Redis_failover_report@ls.xxx.net", server="192.168.xxx.xxx")

當發生切換時,最終郵件報警的內容如下:

2016-06-17 19:06:42 : Redis Master dxy failover 192.168.200.73(192.168.200.73:6379) to 192.168.200.208(192.168.200.208:6379) succeeded !  and VIP do sucessed !!

日誌裡記錄的資訊如下:

::INFO:::
++++++++++++++++++++++++++2016-06-17 19:06:42 failover++++++++++++++++++++++++++
dxy leader start 192.168.200.73 6379 192.168.200.208 6379
:::INFO:::
Connected (version 2.0, client OpenSSH_6.6.1p1)
:::INFO:::
Authentication (publickey) successful!
:::INFO:::
Connected (version 2.0, client OpenSSH_6.6.1p1)
:::INFO:::
Authentication (publickey) successful!

BTW:程式可以直接連vip訪問Redis,實現一定的高可用:當vip切換的時候,服務會斷開,多久不可用主要看設定的檢測時間(down-after-milliseconds:預設30秒,可以設定更低,如5000即5秒)和程式重連的時間。當然也可以直接用java的jedis客戶端訪問,直接實現高可用(通過sentinel中的資訊得到master,再連master)。

總結:

通過Redis 複製、Sentinel的搭建和原理說明和本文大致的瞭解redis sentinel 高可用的實現,sentinel比較簡單在壓力不大,單機可以滿足需求的情況下,redis sentinel是一個不錯的選擇。

 

參考文件:

Redis 複製、Sentinel的搭建和原理說明

叢集Failover解決方案

python 的日誌logging模組

python paramiko

Redis Sentinel高可用架構 

 

相關文章