Redis 叢集原理與使用

Redis 在2014年11月3日release了3.0.0-RC1版本其中含有萬眾期待的叢集功能

(補充 2015年1月13日釋出了RC2版本)

在github上很多人都開始測試叢集版本目前尚不穩定變更還是比較大

Redis Cluster採用無中心結構，每個節點儲存資料和整個叢集狀態,每個節點都和其他所有節點連線。

節點之間使用gossip協議傳播資訊以及發現新節點。

這種結構和Cassandra很相似，Cassandra節點可以轉發請求。

Redis叢集中節點不作為client請求的代理，client根據node返回的錯誤資訊重定向請求

Redis叢集預分好16384個桶，根據 CRC16(key) mod 16384的值，決定將一個key放到哪個桶中。

每個Redis物理結點負責一部分桶的管理，當發生Redis節點的增減時，調整桶的分佈即可。

例如，假設Redis Cluster三個節點A/B/C，則 Node A 包含桶的編號可以為: 0 到 5500. Node B 包含桶的編號可以為: 5500 到 11000. Node C包含桶的編號可以為: 11001 到 16384. 當發生Redis節點的增減時，調整桶的分佈即可。

預分桶的方案介於“硬Hash”和“一致性Hash”之間，犧牲了一定的靈活性，但相比“一致性Hash“，資料的管理成本大大降低。

為了保證服務的可用性，Redis Cluster採取的方案是的Master-Slave 每個Redis Node可以有一個或者多個Slave。

當Master掛掉時，選舉一個Slave形成新的Master 一個Redis Node包含一定量的桶，當這些桶對應的Master和Slave都掛掉時，這部分桶對應的資料不可用

Redis Cluster使用非同步複製，一個完整的寫操作步驟： 1.client寫資料到master 2.master告訴client "ok" 3.master傳播更新到slave 存在資料丟失的風險： 1. 上述寫步驟1）和2）成功後，master crash，而此時資料還沒有傳播到slave 2. 由於分割槽導致同時存在兩個master，client向舊的master寫入了資料。

Redis Cluster支援線上增/減節點。

基於桶的資料分佈方式大大降低了遷移成本，只需將資料桶從一個Redis Node遷移到另一個Redis Node即可完成遷移。

當桶從一個Node A向另一個Node B遷移時，Node A和Node B都會有這個桶，Node A上桶的狀態設定為MIGRATING，Node B上桶的狀態被設定為IMPORTING 當客戶端請求時：所有在Node A上的請求都將由A來處理，所有不在A上的key都由Node B來處理。同時，Node A上將不會建立新的key

我在本機上搭建了一個Redis叢集作為測試使用其中學習過程紀錄在此

配置檔案差異：

Redis叢集版本中比單機版本在配置檔案上多出了一部分

577 ################################ REDIS CLUSTER ###############################
578 #
579 # 單機版本Redis例項不能作為叢集的一部分；
580 # 只有以cluster node啟動的節點可以加入叢集
581 # 為了以叢集啟動Redis例項需要開啟下列引數
582 #
583 cluster-enabled yes
584
585 # 叢集中每個節點都有一個配置檔案
586 # 這個檔案並不需要手動配置
587 # 這個配置檔案有Redis生成並更新
588 # 每個Redis叢集節點需要一個單獨的配置檔案
589 # 請確保與例項執行的系統中配置檔名稱不衝突
590 #
591 cluster-config-file nodes-6379.conf
592
593 # 叢集節點超時毫秒數Cluster node timeout is the amount of milliseconds a node must be unreachable
594 # for it to be considered in failure state.
595 # Most other internal time limits are multiple of the node timeout.
596 #
597 cluster-node-timeout 15000
598
599 # 如果資料太老的話失敗的master A slave of a failing master will avoid to start a failover if its data
600 # looks too old.
601 #
602 # slave有一個簡單的方法用來度量其資料生命週期
603 # 以下的兩個檢測將被執行:
604 #
605 # 1) 如果失連節點關聯了多個slave, slave之間會交換資訊
606 # 用來確定slave之間的最佳複製位置(處理了master
607 # 發來的資料最多).
608 # Slaves 之間會嘗試根據偏移量進行排名, 用於啟動and apply to the start
609 # of the failover a delay proportional to their rank.
610 #
611 # 2) Every single slave computes the time of the last interaction with
612 # its master. This can be the last ping or command received (if the master
613 # is still in the "connected" state), or the time that elapsed since the
614 # disconnection with the master (if the replication link is currently down).
615 # If the last interaction is too old, the slave will not try to failover
616 # at all.
617 #
618 # The point "2" can be tuned by user. Specifically a slave will not perform
619 # the failover if, since the last interaction with the master, the time
620 # elapsed is greater than:
621 #
622 # (node-timeout * slave-validity-factor) + repl-ping-slave-period
623 #
624 # 舉個例子如果節點超時時間為三十秒, 並且slave-validity-factor為10,
625 # 假設預設的 repl-ping-slave-period 是十秒,
626 # 如果310秒slave 將不會嘗試進行故障轉移 if it was not able to talk with the master
627 # for longer than 310 seconds.
628 #
629 # A large slave-validity-factor may allow slaves with too old data to failover
630 # a master, while a too small value may prevent the cluster from being able to
631 # elect a slave at all.
632 #
633 # For maximum availability, it is possible to set the slave-validity-factor
634 # to a value of 0, which means, that slaves will always try to failover the
635 # master regardless of the last time they interacted with the master.
636 # (However they'll always try to apply a delay proportional to their
637 # offset rank).
638 #
639 # 當叢集中所有分割槽恢復時，叢集可以繼續正常執行，
640 # 需要把這個值設為零.
641 #
642 cluster-slave-validity-factor 10
643
644 # 叢集中slave可以遷移成為孤立的masterCluster slaves are able to migrate to orphaned masters, that are masters
645 # that are left without working slaves. This improves the cluster ability
646 # to resist to failures as otherwise an orphaned master can't be failed over
647 # in case of failure if it has no working slaves.
648 #
649 # Slaves migrate to orphaned masters only if there are still at least a
650 # given number of other working slaves for their old master. This number
651 # is the "migration barrier". A migration barrier of 1 means that a slave
652 # will migrate only if there is at least 1 other working slave for its master
653 # and so forth. It usually reflects the number of slaves you want for every
654 # master in your cluster.
655 #
656 # 預設情況下是1(slaves 只有在master關聯到至少一個slave時才會觸發遷移過程).
657 # 設定一個很大的值可以禁止遷移觸發.
658 # 在除錯環境下可以設為0
659 # 在生產環境下這麼做風險較大
660 #
661 cluster-migration-barrier 1
662
663 # 預設情況下Redis叢集各節點在檢測到至少一個hash槽位遺漏的情況下會停止處理查詢請求
664 # （不可達節點會處理這個遺漏的槽位）
665 # 在這種情況下如果叢集部分節點當機(例如部分hash槽位沒有被分配)
666 # 會造成整個叢集不可用.
667 # 叢集直到所有槽位均被分配時才自動回覆為可用狀態.
668 #
669 # 但是有時我們希望叢集的一個子集正常工作,
670 # 對active的部分keyspace繼續接收並執行請求.
671 # 為達到這種效果, 請將cluster-require-full-coverage
672 # 設定為no.
673 #
674 cluster-require-full-coverage yes

在Redis src目錄下有一個redis-tribe.rb ruby指令碼用於新建Redis叢集

首先需要使用gem安裝ruby redis client:

gem install redis

redis-tribe.rb指令碼提供了叢集的基本管理功能：

Usage: redis-trib
create host1:port1 ... hostN:portN
--replicas
check host:port
fix host:port
reshard host:port
--from
--to
--slots
--yes
add-node new_host:new_port existing_host:existing_port
--slave
--master-id
del-node host:port node_id
set-timeout host:port milliseconds
call host:port command arg arg .. arg
import host:port
--from
help (show this help)

>>> Creating cluster
Connecting to node 127.0.0.1:4379: OK
Connecting to node 127.0.0.1:5379: OK
Connecting to node 127.0.0.1:6379: OK
Connecting to node 127.0.0.1:7379: OK
Connecting to node 127.0.0.1:8379: OK
Connecting to node 127.0.0.1:9379: OK
>>> Performing hash slots allocation on 6 nodes...
Using 3 masters:
127.0.0.1:4379
127.0.0.1:5379
127.0.0.1:6379
Adding replica 127.0.0.1:7379 to 127.0.0.1:4379
Adding replica 127.0.0.1:8379 to 127.0.0.1:5379
Adding replica 127.0.0.1:9379 to 127.0.0.1:6379
M: 317fd61eea7cecbc1d919a028657af955e654c0d 127.0.0.1:4379
slots:0-5460 (5461 slots) master
M: 3f3e200b6fa6c73e72ad4caa73d53a611b126981 127.0.0.1:5379
slots:5461-10922 (5462 slots) master
M: 06bcc9edcb10b2fcc6d19f1e4a19ad9c3cd6082e 127.0.0.1:6379
slots:10923-16383 (5461 slots) master
S: 699c827a885bcfe4833693f4ce37b22102bd9ee9 127.0.0.1:7379
replicates 317fd61eea7cecbc1d919a028657af955e654c0d
S: d5dc58114b0840a81e5f2ff9402a92b0e12f7544 127.0.0.1:8379
replicates 3f3e200b6fa6c73e72ad4caa73d53a611b126981
S: 22770332720330ec3b27b806edf8c8aaf1eccadc 127.0.0.1:9379
replicates 06bcc9edcb10b2fcc6d19f1e4a19ad9c3cd6082e
Can I set the above configuration? (type 'yes' to accept): yes
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join..
>>> Performing Cluster Check (using node 127.0.0.1:4379)
M: 317fd61eea7cecbc1d919a028657af955e654c0d 127.0.0.1:4379
slots:0-5460 (5461 slots) master
M: 3f3e200b6fa6c73e72ad4caa73d53a611b126981 127.0.0.1:5379
slots:5461-10922 (5462 slots) master
M: 06bcc9edcb10b2fcc6d19f1e4a19ad9c3cd6082e 127.0.0.1:6379
slots:10923-16383 (5461 slots) master
M: 699c827a885bcfe4833693f4ce37b22102bd9ee9 127.0.0.1:7379
slots: (0 slots) master
replicates 317fd61eea7cecbc1d919a028657af955e654c0d
M: d5dc58114b0840a81e5f2ff9402a92b0e12f7544 127.0.0.1:8379
slots: (0 slots) master
replicates 3f3e200b6fa6c73e72ad4caa73d53a611b126981
M: 22770332720330ec3b27b806edf8c8aaf1eccadc 127.0.0.1:9379
slots: (0 slots) master
replicates 06bcc9edcb10b2fcc6d19f1e4a19ad9c3cd6082e
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

Setting up a cluster involves a few different moving parts. Here are a few things to try so we can figure out where you're stuck:

make sure each redis-server is running in its own directory
make sure you can reach the Redis port and Redis port + 10000 (example: port 6379 and port 16379, or port 7000 and port 17000)
provide the command line and cluster-related configuration used to start Redis
provide the nodes.conf (or whatever you named cluster-config-file) for each server after the failure or timeout of redis-trib.
provide any output from your redis-server instances

if the output doesn't look useful, move to a different log level and try again

CLUSTER SLOTS用於檢視槽位與Redis例項之間的對映關係其中返回的結構如下：

起始槽位

終止槽位

master Host/Port

各個slave Host/Port

		1) 1) (integer) 5461
	

		   2) (integer) 10922
	

		   3) 1) "127.0.0.1"
	

		      2) (integer) 8379
	

		   4) 1) "127.0.0.1"
	

		      2) (integer) 5379
	

		2) 1) (integer) 0
	

		   2) (integer) 5460
	

		   3) 1) "127.0.0.1"
	

		      2) (integer) 7379
	

		   4) 1) "127.0.0.1"
	

		      2) (integer) 4379
	

		3) 1) (integer) 10923
	

		   2) (integer) 16383
	

		   3) 1) "127.0.0.1"
	

		      2) (integer) 6379
	

		   4) 1) "127.0.0.1"
	

		      2) (integer) 9379
	

參考文獻：

[1] Redis Cluster a pragmatic approach to distribution

http://redis.io/presentation/Redis_Cluster.pdf

[2] How Twitter Uses Redis To Scale - 105TB RAM, 39MM QPS, 10,000+ Instances

http://highscalability.com/blog/2014/9/8/how-twitter-uses-redis-to-scale-105tb-ram-39mm-qps-10000-ins.html

[3] CLUSTER SLOTS

http://redis.io/commands/cluster-slots

[4] redis-benchmark cannot work on Redis Cluster

https://github.com/antirez/redis/issues/2191

[5] Redis Cluster（Redis 3.X）設計要點

http://blog.csdn.net/yfkiss/article/details/39996129

來自 “ ITPUB部落格 ” ，連結：http://blog.itpub.net/26838672/viewspace-1815173/，如需轉載，請註明出處，否則將追究法律責任。

Redis 叢集原理與使用

相關文章