redis的sentinel模式故障演練

codecraft發表於2018-09-12

本文主要研究一下redis的sentinel模式的failover

啟動

docker-compose up

這裡使用redis-cluster的docker-compose檔案進行演示

  • master日誌
1:M 12 Sep 06:42:02.159 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 12 Sep 06:42:02.159 # Server started, Redis version 3.2.8
1:M 12 Sep 06:42:02.159 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add `vm.overcommit_memory = 1` to /etc/sysctl.conf and then reboot or run the command `sysctl vm.overcommit_memory=1` for this to take effect.
1:M 12 Sep 06:42:02.159 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command `echo never > /sys/kernel/mm/transparent_hugepage/enabled` as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:M 12 Sep 06:42:02.159 * The server is now ready to accept connections on port 6379
1:M 12 Sep 06:42:02.849 * Slave 172.17.0.3:6379 asks for synchronization
1:M 12 Sep 06:42:02.849 * Full resync requested by slave 172.17.0.3:6379
1:M 12 Sep 06:42:02.849 * Starting BGSAVE for SYNC with target: disk
1:M 12 Sep 06:42:02.851 * Background saving started by pid 16
16:C 12 Sep 06:42:02.861 * DB saved on disk
16:C 12 Sep 06:42:02.862 * RDB: 6 MB of memory used by copy-on-write
1:M 12 Sep 06:42:02.865 * Background saving terminated with success
1:M 12 Sep 06:42:02.866 * Synchronization with slave 172.17.0.3:6379 succeeded
1:M 12 Sep 06:42:13.649 # Connection with slave 172.17.0.3:6379 lost.
1:M 12 Sep 06:42:14.072 * Slave 172.17.0.3:6379 asks for synchronization
1:M 12 Sep 06:42:14.073 * Full resync requested by slave 172.17.0.3:6379
1:M 12 Sep 06:42:14.073 * Starting BGSAVE for SYNC with target: disk
1:M 12 Sep 06:42:14.075 * Background saving started by pid 17
17:C 12 Sep 06:42:14.085 * DB saved on disk
17:C 12 Sep 06:42:14.085 * RDB: 8 MB of memory used by copy-on-write
1:M 12 Sep 06:42:14.185 * Background saving terminated with success
1:M 12 Sep 06:42:14.186 * Synchronization with slave 172.17.0.3:6379 succeeded
  • slave日誌
1:S 12 Sep 06:42:02.847 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:S 12 Sep 06:42:02.847 # Server started, Redis version 3.2.8
1:S 12 Sep 06:42:02.847 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add `vm.overcommit_memory = 1` to /etc/sysctl.conf and then reboot or run the command `sysctl vm.overcommit_memory=1` for this to take effect.
1:S 12 Sep 06:42:02.847 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command `echo never > /sys/kernel/mm/transparent_hugepage/enabled` as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:S 12 Sep 06:42:02.847 * The server is now ready to accept connections on port 6379
1:S 12 Sep 06:42:02.847 * Connecting to MASTER redis-master:6379
1:S 12 Sep 06:42:02.848 * MASTER <-> SLAVE sync started
1:S 12 Sep 06:42:02.848 * Non blocking connect for SYNC fired the event.
1:S 12 Sep 06:42:02.849 * Master replied to PING, replication can continue...
1:S 12 Sep 06:42:02.849 * Partial resynchronization not possible (no cached master)
1:S 12 Sep 06:42:02.851 * Full resync from master: 32f526697a22fef7945974d2b4dfc599401e2525:1
1:S 12 Sep 06:42:02.866 * MASTER <-> SLAVE sync: receiving 76 bytes from master
1:S 12 Sep 06:42:02.866 * MASTER <-> SLAVE sync: Flushing old data
1:S 12 Sep 06:42:02.866 * MASTER <-> SLAVE sync: Loading DB in memory
1:S 12 Sep 06:42:02.867 * MASTER <-> SLAVE sync: Finished with success
1:S 12 Sep 06:42:02.869 * Background append only file rewriting started by pid 15
1:S 12 Sep 06:42:02.903 * AOF rewrite child asks to stop sending diffs.
15:C 12 Sep 06:42:02.904 * Parent agreed to stop sending diffs. Finalizing AOF...
15:C 12 Sep 06:42:02.904 * Concatenating 0.00 MB of AOF diff received from parent.
15:C 12 Sep 06:42:02.906 * SYNC append only file rewrite performed
15:C 12 Sep 06:42:02.907 * AOF rewrite: 6 MB of memory used by copy-on-write
1:S 12 Sep 06:42:02.948 * Background AOF rewrite terminated with success
1:S 12 Sep 06:42:02.948 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
1:S 12 Sep 06:42:02.948 * Background AOF rewrite finished successfully
1:S 12 Sep 06:42:13.649 # Connection with master lost.
1:S 12 Sep 06:42:13.649 * Caching the disconnected master state.
1:S 12 Sep 06:42:13.650 * Discarding previously cached master state.
1:S 12 Sep 06:42:13.650 * SLAVE OF 172.17.0.2:6379 enabled (user request from `id=3 addr=172.17.0.4:57270 fd=6 name=sentinel-927320a2-cmd age=10 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=0 qbuf-free=32768 obl=36 oll=0 omem=0 events=r cmd=exec`)
1:S 12 Sep 06:42:13.650 # CONFIG REWRITE executed with success.
1:S 12 Sep 06:42:14.071 * Connecting to MASTER 172.17.0.2:6379
1:S 12 Sep 06:42:14.072 * MASTER <-> SLAVE sync started
1:S 12 Sep 06:42:14.072 * Non blocking connect for SYNC fired the event.
1:S 12 Sep 06:42:14.072 * Master replied to PING, replication can continue...
1:S 12 Sep 06:42:14.072 * Partial resynchronization not possible (no cached master)
1:S 12 Sep 06:42:14.076 * Full resync from master: 32f526697a22fef7945974d2b4dfc599401e2525:733
1:S 12 Sep 06:42:14.185 * MASTER <-> SLAVE sync: receiving 76 bytes from master
1:S 12 Sep 06:42:14.186 * MASTER <-> SLAVE sync: Flushing old data
1:S 12 Sep 06:42:14.186 * MASTER <-> SLAVE sync: Loading DB in memory
1:S 12 Sep 06:42:14.186 * MASTER <-> SLAVE sync: Finished with success
1:S 12 Sep 06:42:14.189 * Background append only file rewriting started by pid 16
1:S 12 Sep 06:42:14.221 * AOF rewrite child asks to stop sending diffs.
16:C 12 Sep 06:42:14.221 * Parent agreed to stop sending diffs. Finalizing AOF...
16:C 12 Sep 06:42:14.221 * Concatenating 0.00 MB of AOF diff received from parent.
16:C 12 Sep 06:42:14.223 * SYNC append only file rewrite performed
16:C 12 Sep 06:42:14.224 * AOF rewrite: 6 MB of memory used by copy-on-write
1:S 12 Sep 06:42:14.274 * Background AOF rewrite terminated with success
1:S 12 Sep 06:42:14.274 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
1:S 12 Sep 06:42:14.274 * Background AOF rewrite finished successfully

主從切換

  • docker-compose ps
       Name                      Command               State           Ports
-------------------------------------------------------------------------------------
sentinel_master_1     docker-entrypoint.sh redis ...   Up      0.0.0.0:6379->6379/tcp
sentinel_sentinel_1   sh /data/sentinel-entrypoi ...   Up      26379/tcp, 6379/tcp
sentinel_sentinel_2   sh /data/sentinel-entrypoi ...   Up      26379/tcp, 6379/tcp
sentinel_sentinel_3   sh /data/sentinel-entrypoi ...   Up      26379/tcp, 6379/tcp
sentinel_slave_1      docker-entrypoint.sh redis ...   Up      6379/tcp
sentinel_slave_2      docker-entrypoint.sh redis ...   Up      6379/tcp
  • 停止master節點:
docker pause sentinel_master_1
  • 檢視sentinel日誌:
1:X 12 Sep 06:46:42.611 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:X 12 Sep 06:46:42.615 # Sentinel ID is 9e1da269ca7f134ed7bae15ad8efa3f5dd22f72d
1:X 12 Sep 06:46:42.615 # +monitor master redis-master 172.17.0.2 6379 quorum 2
1:X 12 Sep 06:46:42.617 * +slave slave 172.17.0.3:6379 172.17.0.3 6379 @ redis-master 172.17.0.2 6379
1:X 12 Sep 06:46:43.467 * +sentinel sentinel 927320a2afbfd70eae1716e8a024c726e71f2b51 172.17.0.4 26379 @ redis-master 172.17.0.2 6379
1:X 12 Sep 06:46:44.554 * +sentinel sentinel 8fc2f95bc671dc8a3df30046a29fdc41743a774d 172.17.0.5 26379 @ redis-master 172.17.0.2 6379
1:X 12 Sep 06:47:02.679 * +slave slave 172.17.0.7:6379 172.17.0.7 6379 @ redis-master 172.17.0.2 6379
1:X 12 Sep 06:48:32.777 # +new-epoch 1
1:X 12 Sep 06:48:32.784 # +vote-for-leader 927320a2afbfd70eae1716e8a024c726e71f2b51 1
1:X 12 Sep 06:48:32.843 # +sdown master redis-master 172.17.0.2 6379
1:X 12 Sep 06:48:32.944 # +odown master redis-master 172.17.0.2 6379 #quorum 3/2
1:X 12 Sep 06:48:32.944 # Next failover delay: I will not start a failover before Wed Sep 12 06:48:43 2018
1:X 12 Sep 06:48:33.857 # +config-update-from sentinel 927320a2afbfd70eae1716e8a024c726e71f2b51 172.17.0.4 26379 @ redis-master 172.17.0.2 6379
1:X 12 Sep 06:48:33.861 # +switch-master redis-master 172.17.0.2 6379 172.17.0.3 6379
1:X 12 Sep 06:48:33.863 * +slave slave 172.17.0.7:6379 172.17.0.7 6379 @ redis-master 172.17.0.3 6379
1:X 12 Sep 06:48:33.864 * +slave slave 172.17.0.2:6379 172.17.0.2 6379 @ redis-master 172.17.0.3 6379
1:X 12 Sep 06:48:38.902 # +sdown slave 172.17.0.2:6379 172.17.0.2 6379 @ redis-master 172.17.0.3 6379
  • 檢視新的master
1:M 12 Sep 06:48:32.996 # Connection with master lost.
1:M 12 Sep 06:48:32.997 * Caching the disconnected master state.
1:M 12 Sep 06:48:32.997 * Discarding previously cached master state.
1:M 12 Sep 06:48:32.997 * MASTER MODE enabled (user request from `id=3 addr=172.17.0.4:57270 fd=6 name=sentinel-927320a2-cmd age=389 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=0 qbuf-free=32768 obl=36 oll=0 omem=0 events=r cmd=exec`)
1:M 12 Sep 06:48:32.998 # CONFIG REWRITE executed with success.
1:M 12 Sep 06:48:33.983 * Slave 172.17.0.7:6379 asks for synchronization
1:M 12 Sep 06:48:33.983 * Full resync requested by slave 172.17.0.7:6379
1:M 12 Sep 06:48:33.983 * Starting BGSAVE for SYNC with target: disk
1:M 12 Sep 06:48:33.984 * Background saving started by pid 28
28:C 12 Sep 06:48:33.992 * DB saved on disk
28:C 12 Sep 06:48:33.992 * RDB: 6 MB of memory used by copy-on-write
1:M 12 Sep 06:48:34.076 * Background saving terminated with success
1:M 12 Sep 06:48:34.076 * Synchronization with slave 172.17.0.7:6379 succeeded
  • 可以看到MASTER MODE enabled

恢復節點

docker unpause sentinel_master_1

檢視該節點日誌

1:M 12 Sep 06:56:05.592 # Connection with slave client id #12 lost.
1:M 12 Sep 06:56:05.592 # Connection with slave client id #5 lost.
1:S 12 Sep 06:56:17.140 * SLAVE OF 172.17.0.3:6379 enabled (user request from `id=144 addr=172.17.0.5:41876 fd=7 name=sentinel-8fc2f95b-cmd age=10 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=0 qbuf-free=32768 obl=36 oll=0 omem=0 events=r cmd=exec`)
1:S 12 Sep 06:56:17.141 # CONFIG REWRITE executed with success.
1:S 12 Sep 06:56:17.206 * Connecting to MASTER 172.17.0.3:6379
1:S 12 Sep 06:56:17.206 * MASTER <-> SLAVE sync started
1:S 12 Sep 06:56:17.206 * Non blocking connect for SYNC fired the event.
1:S 12 Sep 06:56:17.207 * Master replied to PING, replication can continue...
1:S 12 Sep 06:56:17.208 * Partial resynchronization not possible (no cached master)
1:S 12 Sep 06:56:17.211 * Full resync from master: b2e78c2c21c3a4caa7a37fe86da9b3a2cda0dce4:134615
1:S 12 Sep 06:56:17.288 * MASTER <-> SLAVE sync: receiving 94 bytes from master
1:S 12 Sep 06:56:17.289 * MASTER <-> SLAVE sync: Flushing old data
1:S 12 Sep 06:56:17.289 * MASTER <-> SLAVE sync: Loading DB in memory
1:S 12 Sep 06:56:17.289 * MASTER <-> SLAVE sync: Finished with success
1:S 12 Sep 06:56:17.292 * Background append only file rewriting started by pid 32
1:S 12 Sep 06:56:17.339 * AOF rewrite child asks to stop sending diffs.
32:C 12 Sep 06:56:17.339 * Parent agreed to stop sending diffs. Finalizing AOF...
32:C 12 Sep 06:56:17.339 * Concatenating 0.00 MB of AOF diff received from parent.
32:C 12 Sep 06:56:17.342 * SYNC append only file rewrite performed
32:C 12 Sep 06:56:17.342 * AOF rewrite: 4 MB of memory used by copy-on-write
1:S 12 Sep 06:56:17.407 * Background AOF rewrite terminated with success
1:S 12 Sep 06:56:17.407 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
1:S 12 Sep 06:56:17.407 * Background AOF rewrite finished successfully
  • 可以看到自己切換為slave跟新的master同步

小結

redis的sentinel模式相對cluster來說比較簡單,缺點是需要浪費一些資源來做sentinel節點,對於中小資料量的業務來說,相對比較划算。

doc

相關文章