Redis 叢集原理與使用

liupzmin發表於2015-10-20

Redis 在2014年11月3日release了3.0.0-RC1版本 其中含有萬眾期待的叢集功能  

(補充 2015年1月13日釋出了RC2版本)

在github上很多人都開始測試叢集版本  目前尚不穩定 變更還是比較大  


Redis Cluster採用無中心結構,每個節點儲存資料和整個叢集狀態,每個節點都和其他所有節點連線。

節點之間使用gossip協議傳播資訊以及發現新節點。

這種結構和Cassandra很相似,Cassandra節點可以轉發請求。

Redis叢集中節點不作為client請求的代理,client根據node返回的錯誤資訊重定向請求


Redis叢集預分好16384個桶,根據 CRC16(key) mod 16384的值,決定將一個key放到哪個桶中。

每個Redis物理結點負責一部分桶的管理,當發生Redis節點的增減時,調整桶的分佈即可。

例如,假設Redis Cluster三個節點A/B/C,則 Node A 包含桶的編號可以為: 0 到 5500. Node B 包含桶的編號可以為: 5500 到 11000. Node C包含桶的編號可以為: 11001 到 16384. 當發生Redis節點的增減時,調整桶的分佈即可。 

預分桶的方案介於“硬Hash”和“一致性Hash”之間,犧牲了一定的靈活性,但相比“一致性Hash“,資料的管理成本大大降低。


為了保證服務的可用性,Redis Cluster採取的方案是的Master-Slave 每個Redis Node可以有一個或者多個Slave。

當Master掛掉時,選舉一個Slave形成新的Master 一個Redis Node包含一定量的桶,當這些桶對應的Master和Slave都掛掉時,這部分桶對應的資料不可用


Redis Cluster使用非同步複製,一個完整的寫操作步驟: 1.client寫資料到master 2.master告訴client "ok" 3.master傳播更新到slave 存在資料丟失的風險: 1. 上述寫步驟1)和2)成功後,master crash,而此時資料還沒有傳播到slave 2. 由於分割槽導致同時存在兩個master,client向舊的master寫入了資料。 


Redis Cluster支援線上增/減節點。 

基於桶的資料分佈方式大大降低了遷移成本,只需將資料桶從一個Redis Node遷移到另一個Redis Node即可完成遷移。

當桶從一個Node A向另一個Node B遷移時,Node A和Node B都會有這個桶,Node A上桶的狀態設定為MIGRATING,Node B上桶的狀態被設定為IMPORTING 當客戶端請求時: 所有在Node A上的請求都將由A來處理,所有不在A上的key都由Node B來處理。同時,Node A上將不會建立新的key


我在本機上搭建了一個Redis叢集  作為測試使用  其中學習過程紀錄在此

配置檔案差異:

Redis叢集版本中比單機版本在配置檔案上多出了一部分

?

  1. 577 ################################ REDIS CLUSTER ###############################
  2. 578 #
  3. 579 # 單機版本Redis例項不能作為叢集的一部分;
  4. 580 # 只有以cluster node啟動的節點可以加入叢集
  5. 581 # 為了以叢集啟動Redis例項需要開啟下列引數
  6. 582 #
  7. 583 cluster-enabled yes
  8. 584
  9. 585 # 叢集中每個節點都有一個配置檔案
  10. 586 # 這個檔案並不需要手動配置
  11. 587 # 這個配置檔案有Redis生成並更新
  12. 588 # 每個Redis叢集節點需要一個單獨的配置檔案
  13. 589 # 請確保與例項執行的系統中配置檔名稱不衝突
  14. 590 #
  15. 591 cluster-config-file nodes-6379.conf
  16. 592
  17. 593 # 叢集節點超時毫秒數Cluster node timeout is the amount of milliseconds a node must be unreachable
  18. 594 # for it to be considered in failure state.
  19. 595 # Most other internal time limits are multiple of the node timeout.
  20. 596 #
  21. 597 cluster-node-timeout 15000
  22. 598
  23. 599 # 如果資料太老的話失敗的master A slave of a failing master will avoid to start a failover if its data
  24. 600 # looks too old.
  25. 601 #
  26. 602 # slave有一個簡單的方法用來度量其資料生命週期
  27. 603 # 以下的兩個檢測將被執行:
  28. 604 #
  29. 605 # 1) 如果失連節點關聯了多個slave, slave之間會交換資訊
  30. 606 # 用來確定slave之間的最佳複製位置(處理了master
  31. 607 # 發來的資料最多).
  32. 608 # Slaves 之間會嘗試根據偏移量進行排名, 用於啟動and apply to the start
  33. 609 # of the failover a delay proportional to their rank.
  34. 610 #
  35. 611 # 2) Every single slave computes the time of the last interaction with
  36. 612 # its master. This can be the last ping or command received (if the master
  37. 613 # is still in the "connected" state), or the time that elapsed since the
  38. 614 # disconnection with the master (if the replication link is currently down).
  39. 615 # If the last interaction is too old, the slave will not try to failover
  40. 616 # at all.
  41. 617 #
  42. 618 # The point "2" can be tuned by user. Specifically a slave will not perform
  43. 619 # the failover if, since the last interaction with the master, the time
  44. 620 # elapsed is greater than:
  45. 621 #
  46. 622 # (node-timeout * slave-validity-factor) + repl-ping-slave-period
  47. 623 #
  48. 624 # 舉個例子 如果節點超時時間為三十秒, 並且slave-validity-factor為10,
  49. 625 # 假設預設的 repl-ping-slave-period 是十秒,
  50. 626 # 如果310秒slave 將不會嘗試進行故障轉移 if it was not able to talk with the master
  51. 627 # for longer than 310 seconds.
  52. 628 #
  53. 629 # A large slave-validity-factor may allow slaves with too old data to failover
  54. 630 # a master, while a too small value may prevent the cluster from being able to
  55. 631 # elect a slave at all.
  56. 632 #
  57. 633 # For maximum availability, it is possible to set the slave-validity-factor
  58. 634 # to a value of 0, which means, that slaves will always try to failover the
  59. 635 # master regardless of the last time they interacted with the master.
  60. 636 # (However they'll always try to apply a delay proportional to their
  61. 637 # offset rank).
  62. 638 #
  63. 639 # 當叢集中所有分割槽恢復時,叢集可以繼續正常執行,
  64. 640 # 需要把這個值設為零.
  65. 641 #
  66. 642 cluster-slave-validity-factor 10
  67. 643
  68. 644 # 叢集中slave可以遷移成為孤立的masterCluster slaves are able to migrate to orphaned masters, that are masters
  69. 645 # that are left without working slaves. This improves the cluster ability
  70. 646 # to resist to failures as otherwise an orphaned master can't be failed over
  71. 647 # in case of failure if it has no working slaves.
  72. 648 #
  73. 649 # Slaves migrate to orphaned masters only if there are still at least a
  74. 650 # given number of other working slaves for their old master. This number
  75. 651 # is the "migration barrier". A migration barrier of 1 means that a slave
  76. 652 # will migrate only if there is at least 1 other working slave for its master
  77. 653 # and so forth. It usually reflects the number of slaves you want for every
  78. 654 # master in your cluster.
  79. 655 #
  80. 656 # 預設情況下是1(slaves 只有在master關聯到至少一個slave時才會觸發遷移過程).
  81. 657 # 設定一個很大的值可以禁止遷移觸發.
  82. 658 # 在除錯環境下可以設為0
  83. 659 # 在生產環境下這麼做風險較大
  84. 660 #
  85. 661 cluster-migration-barrier 1
  86. 662
  87. 663 # 預設情況下Redis叢集各節點在檢測到至少一個hash槽位遺漏的情況下會停止處理查詢請求
  88. 664 # (不可達節點會處理這個遺漏的槽位)
  89. 665 # 在這種情況下如果叢集部分節點當機(例如部分hash槽位沒有被分配)
  90. 666 # 會造成整個叢集不可用.
  91. 667 # 叢集直到所有槽位均被分配時才自動回覆為可用狀態.
  92. 668 #
  93. 669 # 但是有時我們希望叢集的一個子集正常工作,
  94. 670 # 對active的部分keyspace繼續接收並執行請求.
  95. 671 # 為達到這種效果, 請將cluster-require-full-coverage
  96. 672 # 設定為no.
  97. 673 #
  98. 674 cluster-require-full-coverage yes


在Redis src目錄下 有一個redis-tribe.rb ruby指令碼用於新建Redis叢集

首先需要使用gem安裝ruby redis client:

?

  1. gem install redis


redis-tribe.rb指令碼提供了叢集的基本管理功能:

?

  1. Usage: redis-trib
  2. create host1:port1 ... hostN:portN
  3. --replicas
  4. check host:port
  5. fix host:port
  6. reshard host:port
  7. --from
  8. --to
  9. --slots
  10. --yes
  11. add-node new_host:new_port existing_host:existing_port
  12. --slave
  13. --master-id
  14. del-node host:port node_id
  15. set-timeout host:port milliseconds
  16. call host:port command arg arg .. arg
  17. import host:port
  18. --from
  19. help (show this help)


   

  1. >>> Creating cluster
  2. Connecting to node 127.0.0.1:4379: OK
  3. Connecting to node 127.0.0.1:5379: OK
  4. Connecting to node 127.0.0.1:6379: OK
  5. Connecting to node 127.0.0.1:7379: OK
  6. Connecting to node 127.0.0.1:8379: OK
  7. Connecting to node 127.0.0.1:9379: OK
  8. >>> Performing hash slots allocation on 6 nodes...
  9. Using 3 masters:
  10. 127.0.0.1:4379
  11. 127.0.0.1:5379
  12. 127.0.0.1:6379
  13. Adding replica 127.0.0.1:7379 to 127.0.0.1:4379
  14. Adding replica 127.0.0.1:8379 to 127.0.0.1:5379
  15. Adding replica 127.0.0.1:9379 to 127.0.0.1:6379
  16. M: 317fd61eea7cecbc1d919a028657af955e654c0d 127.0.0.1:4379
  17. slots:0-5460 (5461 slots) master
  18. M: 3f3e200b6fa6c73e72ad4caa73d53a611b126981 127.0.0.1:5379
  19. slots:5461-10922 (5462 slots) master
  20. M: 06bcc9edcb10b2fcc6d19f1e4a19ad9c3cd6082e 127.0.0.1:6379
  21. slots:10923-16383 (5461 slots) master
  22. S: 699c827a885bcfe4833693f4ce37b22102bd9ee9 127.0.0.1:7379
  23. replicates 317fd61eea7cecbc1d919a028657af955e654c0d
  24. S: d5dc58114b0840a81e5f2ff9402a92b0e12f7544 127.0.0.1:8379
  25. replicates 3f3e200b6fa6c73e72ad4caa73d53a611b126981
  26. S: 22770332720330ec3b27b806edf8c8aaf1eccadc 127.0.0.1:9379
  27. replicates 06bcc9edcb10b2fcc6d19f1e4a19ad9c3cd6082e
  28. Can I set the above configuration? (type 'yes' to accept): yes
  29. >>> Nodes configuration updated
  30. >>> Assign a different config epoch to each node
  31. >>> Sending CLUSTER MEET messages to join the cluster
  32. Waiting for the cluster to join..
  33. >>> Performing Cluster Check (using node 127.0.0.1:4379)
  34. M: 317fd61eea7cecbc1d919a028657af955e654c0d 127.0.0.1:4379
  35. slots:0-5460 (5461 slots) master
  36. M: 3f3e200b6fa6c73e72ad4caa73d53a611b126981 127.0.0.1:5379
  37. slots:5461-10922 (5462 slots) master
  38. M: 06bcc9edcb10b2fcc6d19f1e4a19ad9c3cd6082e 127.0.0.1:6379
  39. slots:10923-16383 (5461 slots) master
  40. M: 699c827a885bcfe4833693f4ce37b22102bd9ee9 127.0.0.1:7379
  41. slots: (0 slots) master
  42. replicates 317fd61eea7cecbc1d919a028657af955e654c0d
  43. M: d5dc58114b0840a81e5f2ff9402a92b0e12f7544 127.0.0.1:8379
  44. slots: (0 slots) master
  45. replicates 3f3e200b6fa6c73e72ad4caa73d53a611b126981
  46. M: 22770332720330ec3b27b806edf8c8aaf1eccadc 127.0.0.1:9379
  47. slots: (0 slots) master
  48. replicates 06bcc9edcb10b2fcc6d19f1e4a19ad9c3cd6082e
  49. [OK] All nodes agree about slots configuration.
  50. >>> Check for open slots...
  51. >>> Check slots coverage...
  52. [OK] All 16384 slots covered.



Setting up a cluster involves a few different moving parts. Here are a few things to try so we can figure out where you're stuck:

  • make sure each redis-server is running in its own directory

  • make sure you can reach the Redis port and Redis port + 10000 (example: port 6379 and port 16379, or port 7000 and port 17000)

  • provide the command line and cluster-related configuration used to start Redis

  • provide the nodes.conf (or whatever you named cluster-config-file) for each server after the failure or timeout of redis-trib.

  • provide any output from your redis-server instances

    • if the output doesn't look useful, move to a different log level and try again


CLUSTER SLOTS用於檢視槽位與Redis例項之間的對映關係  其中返回的結構如下:

    起始槽位

    終止槽位

    master Host/Port

    各個slave Host/Port

1) 1) (integer) 5461
   2) (integer) 10922
   3) 1) "127.0.0.1"
      2) (integer) 8379
   4) 1) "127.0.0.1"
      2) (integer) 5379
       
2) 1) (integer) 0
   2) (integer) 5460
   3) 1) "127.0.0.1"
      2) (integer) 7379
   4) 1) "127.0.0.1"
      2) (integer) 4379
       
3) 1) (integer) 10923
   2) (integer) 16383
   3) 1) "127.0.0.1"
      2) (integer) 6379
   4) 1) "127.0.0.1"
      2) (integer) 9379
?

參考文獻:

[1] Redis Cluster a pragmatic approach to distribution  

http://redis.io/presentation/Redis_Cluster.pdf

[2] How Twitter Uses Redis To Scale - 105TB RAM, 39MM QPS, 10,000+ Instances

http://highscalability.com/blog/2014/9/8/how-twitter-uses-redis-to-scale-105tb-ram-39mm-qps-10000-ins.html 

[3] CLUSTER SLOTS

http://redis.io/commands/cluster-slots

[4] redis-benchmark cannot work on Redis Cluster

https://github.com/antirez/redis/issues/2191

[5] Redis Cluster(Redis 3.X)設計要點

http://blog.csdn.net/yfkiss/article/details/39996129

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/26838672/viewspace-1815173/,如需轉載,請註明出處,否則將追究法律責任。

相關文章