Redis Cluster高可用叢集線上遷移操作記錄

散盡浮華發表於2018-10-24

 

之前介紹了redis cluster的結構及高可用叢集部署過程,今天這裡簡單說下redis叢集的遷移。由於之前的redis cluster叢集環境部署的伺服器效能有限,需要遷移到高配置的伺服器上。考慮到是線上生產環境,決定線上遷移,遷移過程,不中斷服務。操作過程如下:

一、機器環境

遷移前機器環境
-------------------------------------------------------------------------------
主機名              ip地址             節點埠
redis-node01       172.16.60.207     7000,7001
redis-node02       172.16.60.208     7002,7003
redis-node03       172.16.60.209     7004,7005

遷移後機器環境
-------------------------------------------------------------------------------
主機名             ip地址             節點埠
redis-new01       172.16.60.202     7000,7001
redis-new02       172.16.60.204     7002,7003
redis-new03       172.16.60.205     7004,7005

二、遷移前redis cluster高可用叢集環境部署(這裡採用"三主三從"模式)

三臺節點機器安裝操作如下一致
[root@redis-node01 ~]# yum install -y gcc g++ make gcc-c++ kernel-devel automake autoconf libtool make wget tcl vim ruby rubygems unzip git
[root@redis-node01 ~]# /etc/init.d/iptables stop
[root@redis-node01 ~]# setenforce 0
[root@redis-node01 ~]# vim /etc/sysconfig/selinux
SELINUX=disabled

提前做好下面的準備操作,否則redis日誌裡會有相應報錯
[root@redis-node01 ~]# echo "512" > /proc/sys/net/core/somaxconn      
[root@redis-node01 ~]# vim /etc/rc.local
echo "512" > /proc/sys/net/core/somaxconn   
[root@redis-node01 ~]# echo 1 > /proc/sys/vm/overcommit_memory
[root@redis-node01 ~]# sysctl vm.overcommit_memory=1
vm.overcommit_memory = 1
[root@redis-node01 ~]# vim /etc/sysctl.conf
vm.overcommit_memory=1
[root@redis-node01 ~]# echo never > /sys/kernel/mm/transparent_hugepage/enabled
[root@redis-node01 ~]# vim /etc/rc.local
echo never > /sys/kernel/mm/transparent_hugepage/enabled

下載並編譯安裝redis
[root@redis-node01 ~]# mkdir -p /data/software/
[root@redis-node01 software]# wget http://download.redis.io/releases/redis-4.0.6.tar.gz
[root@redis-node01 software]# tar -zvxf redis-4.0.6.tar.gz
[root@redis-node01 software]# mv redis-4.0.6 /data/
[root@redis-node01 software]# cd /data/redis-4.0.6/
[root@redis-node01 redis-4.0.6]# make
    
-------------------------------------------------------------------------------
分別建立和配置節點
節點1配置
[root@redis-node01 ~]# mkdir /data/redis-4.0.6/redis-cluster
[root@redis-node01 ~]# cd /data/redis-4.0.6/redis-cluster
[root@redis-node01 redis-cluster]# mkdir 7000 7001
[root@redis-node01 redis-cluster]# mkdir /var/log/redis
[root@redis-node01 redis-cluster]# vim 7000/redis.conf
port 7000
bind 172.16.60.207
daemonize yes
pidfile /var/run/redis_7000.pid
logfile /var/log/redis/redis_7000.log
cluster-enabled yes
cluster-config-file nodes_7000.conf
cluster-node-timeout 10100
appendonly yes
    
[root@redis-node01 redis-cluster]# vim 7001/redis.conf
port 7001
bind 172.16.60.207
daemonize yes
pidfile /var/run/redis_7001.pid
logfile /var/log/redis/redis_7001.log
cluster-enabled yes
cluster-config-file nodes_7001.conf
cluster-node-timeout 10100
appendonly yes
    
節點2配置
[root@redis-node02 ~]# mkdir /data/redis-4.0.6/redis-cluster
[root@redis-node02 ~]# cd /data/redis-4.0.6/redis-cluster
[root@redis-node02 redis-cluster]# mkdir 7002 7003
[root@redis-node02 redis-cluster]# mkdir /var/log/redis
[root@redis-node02 redis-cluster]# vim 7000/redis.conf
port 7002
bind 172.16.60.208
daemonize yes
pidfile /var/run/redis_7002.pid
logfile /var/log/redis/redis_7002.log
cluster-enabled yes
cluster-config-file nodes_7002.conf
cluster-node-timeout 10100
appendonly yes
    
[root@redis-node02 redis-cluster]# vim 7003/redis.conf
port 7003
bind 172.16.60.208
daemonize yes
pidfile /var/run/redis_7003.pid
logfile /var/log/redis/redis_7003.log
cluster-enabled yes
cluster-config-file nodes_7003.conf
cluster-node-timeout 10100
appendonly yes
    
節點3配置
[root@redis-node03 ~]# mkdir /data/redis-4.0.6/redis-cluster
[root@redis-node03 ~]# cd /data/redis-4.0.6/redis-cluster
[root@redis-node03 redis-cluster]# mkdir 7004 7005
[root@redis-node03 redis-cluster]# mkdir /var/log/redis
[root@redis-node03 redis-cluster]# vim 7004/redis.conf
port 7004
bind 172.16.60.209
daemonize yes
pidfile /var/run/redis_7004.pid
logfile /var/log/redis/redis_7004.log
cluster-enabled yes
cluster-config-file nodes_7004.conf
cluster-node-timeout 10100
appendonly yes
    
[root@redis-node03 redis-cluster]# vim 7005/redis.conf
port 7005
bind 172.16.60.209
daemonize yes
pidfile /var/run/redis_7005.pid
logfile /var/log/redis/redis_7005.log
cluster-enabled yes
cluster-config-file nodes_7005.conf
cluster-node-timeout 10100
appendonly yes
    
-------------------------------------------------------------------------------
分別啟動redis服務(這裡統一在/data/redis-4.0.6/redis-cluster路徑下啟動redis服務,即nodes_*.conf等檔案也在這個路徑下產生)
節點1
[root@redis-node01 redis-cluster]# for((i=0;i<=1;i++)); do /data/redis-4.0.6/src/redis-server /data/redis-4.0.6/redis-cluster/700$i/redis.conf; done
[root@redis-node01 redis-cluster]# ps -ef|grep redis
root      1103     1  0 15:19 ?        00:00:03 /data/redis-4.0.6/src/redis-server 172.16.60.207:7000 [cluster]               
root      1105     1  0 15:19 ?        00:00:03 /data/redis-4.0.6/src/redis-server 172.16.60.207:7001 [cluster]               
root      1315 32360  0 16:16 pts/1    00:00:00 grep redis
    
節點2
[root@redis-node02 redis-cluster]# for((i=2;i<=3;i++)); do /data/redis-4.0.6/src/redis-server /data/redis-4.0.6/redis-cluster/700$i/redis.conf; done
[root@redis-node02 redis-cluster]# ps -ef|grep redis
root      9446     1  0 15:19 ?        00:00:03 /data/redis-4.0.6/src/redis-server 172.16.60.208:7002 [cluster]               
root      9448     1  0 15:19 ?        00:00:03 /data/redis-4.0.6/src/redis-server 172.16.60.208:7003 [cluster]               
root      9644  8540  0 16:17 pts/0    00:00:00 grep redis
    
節點3
[root@redis-node01 redis-cluster]# for((i=4;i<=5;i++)); do /data/redis-4.0.6/src/redis-server /data/redis-4.0.6/redis-cluster/700$i/redis.conf; done
[root@redis-node03 ~]# ps -ef|grep redis
root      9486     1  0 15:19 ?        00:00:03 /data/redis-4.0.6/src/redis-server 172.16.60.209:7004 [cluster]               
root      9488     1  0 15:19 ?        00:00:03 /data/redis-4.0.6/src/redis-server 172.16.60.209:7005 [cluster]               
root      9686  9555  0 16:17 pts/0    00:00:00 grep redis
    
-------------------------------------------------------------------------------
接著在節點1上安裝 Ruby(只需要在其中一個節點上安裝即可)
[root@redis-node01 ~]# yum -y install ruby ruby-devel rubygems rpm-build
[root@redis-node01 ~]# gem install redis
    
溫馨提示:
在centos6.x下執行上面的"gem install redis"操作可能會報錯,坑很多!
預設yum安裝的ruby版本是1.8.7,版本太低,需要升級到ruby2.2以上,否則執行上面安裝會報錯!
    
首先安裝rvm(或者直接下載證書:https://pan.baidu.com/s/1slTyJ7n  金鑰:7uan   下載並解壓後直接執行"curl -L get.rvm.io | bash -s stable"即可)
[root@redis-node01 ~]# curl -L get.rvm.io | bash -s stable          //可能會報錯,需要安裝提示進行下面一步操作
[root@redis-node01 ~]# curl -sSL https://rvm.io/mpapis.asc | gpg2 --import -      //然後再接著執行:curl -L get.rvm.io | bash -s stable
[root@redis-node01 ~]# find / -name rvm.sh
/etc/profile.d/rvm.sh
[root@redis-node01 ~]# source /etc/profile.d/rvm.sh
[root@redis-node01 ~]# rvm requirements
      
然後升級ruby到2.3
[root@redis-node01 ~]# rvm install ruby 2.3.1
[root@redis-node01 ~]# ruby -v
ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-linux]
      
列出所有ruby版本
[root@redis-node01 ~]# rvm list
      
設定預設的版本
[root@redis-node01 ~]# rvm --default use 2.3.1
      
更新下載源
[root@redis-node01 ~]# gem sources --add https://gems.ruby-china.org/ --remove https://rubygems.org
https://gems.ruby-china.org/ added to sources
source https://rubygems.org not present in cache
      
[root@redis-node01 ~]# gem sources
*** CURRENT SOURCES ***
      
https://rubygems.org/
https://gems.ruby-china.org/
      
最後就能順利安裝了
[root@redis-node01 ~]# gem install redis
Successfully installed redis-4.0.6
Parsing documentation for redis-4.0.6
Done installing documentation for redis after 1 seconds
1 gem installed
    
-------------------------------------------------------------------------------
接著建立redis cluster叢集(在節點1機器上操作即可)
    
首先手動指定三個master節點。master節點最好分佈在三臺機器上
[root@redis-node01 ~]# /data/redis-4.0.6/src/redis-trib.rb create  172.16.60.207:7000 172.16.60.208:7002  172.16.60.209:7004
    
然後手動指定上面三個master節點各自的slave節點。slave節點也最好分佈在三臺機器上
[root@redis-node01 ~]# /data/redis-4.0.6/src/redis-trib.rb add-node --slave 172.16.60.208:7003  172.16.60.207:7000
[root@redis-node01 ~]# /data/redis-4.0.6/src/redis-trib.rb add-node --slave 172.16.60.209:7005  172.16.60.208:7002
[root@redis-node01 ~]# /data/redis-4.0.6/src/redis-trib.rb add-node --slave 172.16.60.207:7001  172.16.60.209:7004
    
然後檢查下redis cluster叢集狀態
[root@redis-node01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb check 172.16.60.207:7000
>>> Performing Cluster Check (using node 172.16.60.207:7000)
M: 971d05cd7b9bb3634ad024e6aac3dff158c52eee 172.16.60.207:7000
   slots:0-5460 (5461 slots) master
   1 additional replica(s)
S: e7592314869c29375599d781721ad76675645c4c 172.16.60.209:7005
   slots: (0 slots) slave
   replicates 0060012d749167d3f72833d916e53b3445b66c62
S: 52b8d27838244657d9b01a233578f24d287979fe 172.16.60.208:7003
   slots: (0 slots) slave
   replicates 971d05cd7b9bb3634ad024e6aac3dff158c52eee
S: 213bde6296c36b5f31b958c7730ff1629125a204 172.16.60.207:7001
   slots: (0 slots) slave
   replicates e936d5b4c95b6cae57f994e95805aef87ea4a7a5
M: e936d5b4c95b6cae57f994e95805aef87ea4a7a5 172.16.60.209:7004
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
M: 0060012d749167d3f72833d916e53b3445b66c62 172.16.60.208:7002
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots。 covered.
  
通過上面可以看出,只有master主節點才佔用slots,從節點都是0 slots,也就是說keys數值是在master節點上。
三個master主節點分割了16384 slots。分別是0-5460、5461-10922、10923-16383。
如果有一組master-slave都掛掉,16484 slots不完整,則整個叢集服務也就掛了,必須等到這組master-slave節點恢復,則整個叢集才能恢復。
如果新加入master主節點,預設是0 slots,需要reshard為新master節點分佈資料槽(會詢問向移動多少雜湊槽到此節點),後面會提到。
 
寫入幾條測試資料
登入三個master節點上寫入資料(登入slave節點上寫入資料,發現也會自動跳到master節點上進行寫入)
[root@redis-node01 redis-cluster]# /data/redis-4.0.6/src/redis-cli -h 172.16.60.207 -c -p 7000
172.16.60.207:7000> set test1 test-207
OK
172.16.60.207:7000> set test11 test-207-207
-> Redirected to slot [13313] located at 172.16.60.209:7004
OK
 
[root@redis-node01 redis-cluster]# /data/redis-4.0.6/src/redis-cli -h 172.16.60.208 -c -p 7002
172.16.60.208:7002> set test2 test-208
OK
172.16.60.208:7002> set test22 test-208-208
-> Redirected to slot [4401] located at 172.16.60.207:7000
OK
 
[root@redis-node01 redis-cluster]# /data/redis-4.0.6/src/redis-cli -h 172.16.60.209 -c -p 7004
172.16.60.209:7004> set test3 test-209
OK
172.16.60.209:7004> set test33 test-209-209
OK
 
讀資料
[root@redis-node01 redis-cluster]# /data/redis-4.0.6/src/redis-cli -h 172.16.60.207 -c -p 7000
172.16.60.207:7000> get test1
"test-207"
172.16.60.207:7000> get test11
-> Redirected to slot [13313] located at 172.16.60.209:7004
"test-207-207"
172.16.60.209:7004> get test2
-> Redirected to slot [8899] located at 172.16.60.208:7002
"test-208"
172.16.60.208:7002> get test22
-> Redirected to slot [4401] located at 172.16.60.207:7000
"test-208-208"
172.16.60.207:7000> get test3
-> Redirected to slot [13026] located at 172.16.60.209:7004
"test-209"
172.16.60.209:7004> get test33
"test-209-209"
172.16.60.209:7004> 

三、線上遷移

三臺新機器安裝redis步驟省略,和上面一致。
三臺新機器的各節點配置和遷移前三臺機器一直,只需要修改ip地址即可。路徑和埠一致
啟動三臺新機器的redis節點服務
在新節點redis-new01上安裝Ruby,安裝過程省略,和上面一直。

將三個新節點都新增到之前的叢集中。
=====================
先新增主節點
命令格式"redis-trib.rb add-node <新增節點名> < 原叢集節點名>"
第一個為新節點IP的master埠,第二個引數為現有的任意節點IP的master埠
[root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb add-node 172.16.60.202:7000 172.16.60.207:7000
[root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb add-node 172.16.60.204:7002 172.16.60.207:7000
[root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb add-node 172.16.60.205:7004 172.16.60.207:7000 

=====================
再新增新機器的從節點
[root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb add-node --slave 172.16.60.204:7003  172.16.60.202:7000
[root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb add-node --slave 172.16.60.205:7005  172.16.60.204:7002
[root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb add-node --slave 172.16.60.202:7001  172.16.60.205:7004

檢視此時叢集狀態
[root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb check 172.16.60.202:7000

檢視叢集的雜湊槽slot情況
[root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb info 172.16.60.202:7000 
172.16.60.202:7000 (a0169bec...) -> 0 keys | 0 slots | 1 slaves.
172.16.60.209:7004 (47cde5c7...) -> 3 keys | 5461 slots | 1 slaves.
172.16.60.208:7002 (656fc84a...) -> 1 keys | 5462 slots | 1 slaves.
172.16.60.205:7004 (48cbab90...) -> 0 keys | 0 slots | 1 slaves.
172.16.60.207:7000 (a8fe2d6e...) -> 2 keys | 5461 slots | 1 slaves.
172.16.60.204:7002 (c6a78cfb...) -> 0 keys | 0 slots | 1 slaves.
[OK] 6 keys in 6 masters.
0.00 keys per slot on average.

新新增的master節點的slot預設都是為0,master主節點如果沒有slots的話,存取資料就都不會被選中! 
資料只會儲存在master主節點中!
所以需要給新新增的master主節點分配slot,即reshard slot操作。

如上根據最後一個新master節點新增成功後顯示的slot可知,已有的master節點的slot分配情況為:
172.16.60.207:7000   -->  slots:0-5460 (5461 slots) master
172.16.60.208:7002   -->  slots:5461-10922 (5462 slots) master
172.16.60.209:7004   -->  slots:10923-16383 (5461 slots) master

現在開始往新新增的三個master節點分配slot
a)將172.16.60.207:7000的slot全部分配(5461)給172.16.60.202:7000
[root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb reshard 172.16.60.202:7000
........
How many slots do you want to move (from 1 to 16384)? 5461          #分配多少數量的slot。(這裡要把172.16.60.207:7000節點的slot都分配完)
What is the receiving node ID? a0169becd97ccca732d905fd762b4d615674f7bd      #上面那些數量的slot被哪個節點接收。這裡填寫172.16.60.202:7000節點ID
Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
Source node #1:971d05cd7b9bb3634ad024e6aac3dff158c52eee          #指從哪個節點分配上面指定數量的slot。這裡填寫172.16.60.207:7000的ID。如果填寫all,則表示從之前所有master節點中抽取上面指定數量的slot。
Source node #2:done                       #填寫done
.......
Do you want to proceed with the proposed reshard plan (yes/no)? yes     #填寫yes,確認分配

==================================================================
可能會遇到點問題,resharding執行中斷。然後出現兩邊都有slot的情況。
Moving slot 4396 from 172.16.60.207:7000 to 172.16.60.202:7000: 
Moving slot 4397 from 172.16.60.207:7000 to 172.16.60.202:7000: 
Moving slot 4398 from 172.16.60.207:7000 to 172.16.60.202:7000: 
Moving slot 4399 from 172.16.60.207:7000 to 172.16.60.202:7000:
Moving slot 4400 from 172.16.60.207:7000 to 172.16.60.202:7000: 
Moving slot 4401 from 172.16.60.207:7000 to 172.16.60.202:7000: 
[ERR] Calling MIGRATE: ERR Syntax error, try CLIENT (LIST | KILL | GETNAME | SETNAME | PAUSE | REPLY)

[root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb check 172.16.60.202:7000   
>>> Performing Cluster Check (using node 172.16.60.202:7000)
M: a0169becd97ccca732d905fd762b4d615674f7bd 172.16.60.202:7000
   slots:0-4400 (4401 slots) master
   1 additional replica(s)
.......
M: 971d05cd7b9bb3634ad024e6aac3dff158c52eee 172.16.60.207:7000
   slots:4401-5460 (1060 slots) master
   1 additional replica(s)

分析原因:
reshard重新分配slot時報錯內容為:Syntax error ,try CLIENT (LIST|KILL|GETNAME|SETNAME|PAUSE|REPLY)
但是遷移沒有key-value的槽的時候就會執行成功。 這就說明問題出在了存不存在key-value上!

找到reshard的執行過程:發現具體遷移步驟是通過 move_slot函式呼叫(redis-trib.rb檔案中)。
開啟move_slot函式,找到具體的遷移程式碼。
[root@redis-new01 redis-cluster]# cp /data/redis-4.0.6/src/redis-trib.rb /tmp/
[root@redis-new01 redis-cluster]# cat /data/redis-4.0.6/src/redis-trib.rb|grep source.r.client.call
                source.r.client.call(["migrate",target.info[:host],target.info[:port],"",0,@timeout,:keys,*keys])
                    source.r.client.call(["migrate",target.info[:host],target.info[:port],"",0,@timeout,:replace,:keys,*keys])

上面grep出來的source.r.client.call部分則就是redis-trib.rb指令碼告知客戶端執行遷移帶key-value槽的指令。

我們會發現該指令的具體呼叫時,等同於
"client migrate target.info[:host],target.info[:port],"",0,@timeout,:replace,:keys,*keys]"

問題來了,這條指令在伺服器中怎麼執行的呢?
它先執行networking.c  檔案中的 clientCommand(client *c)

根據引數一一比對(if條件語句)。這時候就會發現bug來了!!!clientCommand函式中沒有 migrate分支。
所以會返回一個    Syntax error ,try CLIENT (LIST|KILL|GETNAME|SETNAME|PAUSE|REPLY);
這個錯誤資訊告訴你, Client中只有LIST|KILL|GETNAME|SETNAME|PAUSE|REPLY分支。

那麼怎麼去修改實現真正的帶key遷移的slot呢?

研究原始碼,cluster.c檔案中裡面有migrateCommand(client *c)。恍然大悟,故只要將redis-trib.rb檔案中遷移語句修改為:
  source.r.call(["migrate",target.info[:host],target.info[:port],"",0,@timeout,"replace",:keys,*keys])
  source.r.call(["migrate",target.info[:host],target.info[:port],"",0,@timeout,:replace,:keys,*keys])

即不執行clientCommand,直接執行migrateCommand。

也就是說,只需要將redis-trib.rb檔案中原來的
                source.r.client.call(["migrate",target.info[:host],target.info[:port],"",0,@timeout,:keys,*keys])
                    source.r.client.call(["migrate",target.info[:host],target.info[:port],"",0,@timeout,:replace,:keys,*keys])
改為
                source.r.call(["migrate",target.info[:host],target.info[:port],"",0,@timeout,"replace",:keys,*keys])
                    source.r.call(["migrate",target.info[:host],target.info[:port],"",0,@timeout,:replace,:keys,*keys])

問題就解決了!

[root@redis-new01 redis-cluster]# cat /data/redis-4.0.6/src/redis-trib.rb |grep  source.r.call
                source.r.call(["migrate",target.info[:host],target.info[:port],"",0,@timeout,"replace",:keys,*keys])
                    source.r.call(["migrate",target.info[:host],target.info[:port],"",0,@timeout,:replace,:keys,*keys])

這個bug是因為ruby的gem不同造成的,以後5.0版本會拋棄redis-trib.rb。直接使用redis-cli客戶端實現叢集管理!!
==================================================================

redis-trib.rb指令碼檔案修改後,繼續將172.16.60.207:7000剩下的slot全部分配給172.16.60.202:7000
[root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb reshard 172.16.60.202:7000
........
>>> Check for open slots...
[WARNING] Node 172.16.60.202:7000 has slots in importing state (4401).
[WARNING] Node 172.16.60.207:7000 has slots in migrating state (4401).
[WARNING] The following slots are open: 4401
>>> Check slots coverage...
[OK] All 16384 slots covered.
*** Please fix your cluster problems before resharding

解決辦法:
[root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-cli -h 172.16.60.202 -c -p 7000
172.16.60.202:7000> cluster setslot 4401 stable
OK
172.16.60.202:7000> 
[root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-cli -h 172.16.60.207 -c -p 7000
172.16.60.207:7000> cluster setslot 4401 stable
OK
172.16.60.207:7000> 

[root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb fix 172.16.60.202:7000  
.......
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.


[root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb reshard 172.16.60.202:7000
......
How many slots do you want to move (from 1 to 16384)? 1060
What is the receiving node ID? a0169becd97ccca732d905fd762b4d615674f7bd      
Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
Source node #1:971d05cd7b9bb3634ad024e6aac3dff158c52eee          
Source node #2:done                       
.......
Do you want to proceed with the proposed reshard plan (yes/no)? yes     

然後再check檢查叢集狀態.
發現172.16.60.207:7000節點的5461個slot已經移動到172.16.60.202:7000節點上了。
 [root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb check 172.16.60.202:7000      
>>> Performing Cluster Check (using node 172.16.60.202:7000)
M: a0169becd97ccca732d905fd762b4d615674f7bd 172.16.60.202:7000
   slots:0-5460 (5461 slots) master
   2 additional replica(s)
........
M: 971d05cd7b9bb3634ad024e6aac3dff158c52eee 172.16.60.207:7000
   slots: (0 slots) master
   0 additional replica(s)

b)將172.16.60.208:7002的slot(5462)全部分配給172.16.60.204:7002
[root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb reshard 172.16.60.204:7002
.......
How many slots do you want to move (from 1 to 16384)? 5462
What is the receiving node ID? c6a78cfbb77804c4837963b5f589064b6111457a
Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
Source node #1:0060012d749167d3f72833d916e53b3445b66c62
Source node #2:done
.......
Do you want to proceed with the proposed reshard plan (yes/no)? yes

c)將172.16.60.209:7004的slot(5461)全部分配給172.16.60.205:7004
[root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb reshard 172.16.60.205:7004
.........
How many slots do you want to move (from 1 to 16384)? 5461
What is the receiving node ID? 48cbab906141dd26241ccdbc38bee406586a8d03
Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
Source node #1:e936d5b4c95b6cae57f994e95805aef87ea4a7a5
Source node #2:done
.........
Do you want to proceed with the proposed reshard plan (yes/no)? yes

待到三個新節點的master都分配完雜湊槽slot之後,再次檢視下叢集狀態
發現遷移之前的那三個master的slot都為0了,slot都對應遷移到新的節點的三個master上了
[root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb check 172.16.60.202:7000
>>> Performing Cluster Check (using node 172.16.60.202:7000)
M: a0169becd97ccca732d905fd762b4d615674f7bd 172.16.60.202:7000
   slots:0-5460 (5461 slots) master
   2 additional replica(s)
S: d9671ca6b4235931a2a215cc327a400ad4f9a399 172.16.60.205:7005
   slots: (0 slots) slave
   replicates c6a78cfbb77804c4837963b5f589064b6111457a
M: e936d5b4c95b6cae57f994e95805aef87ea4a7a5 172.16.60.209:7004
   slots: (0 slots) master
   0 additional replica(s)
S: 213bde6296c36b5f31b958c7730ff1629125a204 172.16.60.207:7001
   slots: (0 slots) slave
   replicates 48cbab906141dd26241ccdbc38bee406586a8d03
M: 0060012d749167d3f72833d916e53b3445b66c62 172.16.60.208:7002
   slots: (0 slots) master
   0 additional replica(s)
S: 52b8d27838244657d9b01a233578f24d287979fe 172.16.60.208:7003
   slots: (0 slots) slave
   replicates a0169becd97ccca732d905fd762b4d615674f7bd
M: 48cbab906141dd26241ccdbc38bee406586a8d03 172.16.60.205:7004
   slots:10923-16383 (5461 slots) master
   2 additional replica(s)
S: e7592314869c29375599d781721ad76675645c4c 172.16.60.209:7005
   slots: (0 slots) slave
   replicates c6a78cfbb77804c4837963b5f589064b6111457a
S: 2950f2cb6d960cd48e792f7c82d62d2cd07d20f9 172.16.60.204:7003
   slots: (0 slots) slave
   replicates a0169becd97ccca732d905fd762b4d615674f7bd
M: 971d05cd7b9bb3634ad024e6aac3dff158c52eee 172.16.60.207:7000
   slots: (0 slots) master
   0 additional replica(s)
M: c6a78cfbb77804c4837963b5f589064b6111457a 172.16.60.204:7002
   slots:5461-10922 (5462 slots) master
   2 additional replica(s)
S: 6e663a1bcc3d241ed4d1a9667a0cc92fbe554740 172.16.60.202:7001
   slots: (0 slots) slave
   replicates 48cbab906141dd26241ccdbc38bee406586a8d03
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

檢視叢集slot情況
[root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb info 172.16.60.202:7000 
172.16.60.202:7000 (a0169bec...) -> 2 keys | 5461 slots | 2 slaves.
172.16.60.209:7004 (47cde5c7...) -> 0 keys | 0 slots | 0 slaves.
172.16.60.208:7002 (656fc84a...) -> 0 keys | 0 slots | 0 slaves.
172.16.60.205:7004 (48cbab90...) -> 3 keys | 5461 slots | 2 slaves.
172.16.60.207:7000 (a8fe2d6e...) -> 0 keys | 0 slots | 0 slaves.
172.16.60.204:7002 (c6a78cfb...) -> 1 keys | 5462 slots | 2 slaves.
[OK] 6 keys in 6 masters.
0.00 keys per slot on average.

檢查下資料,發現測試資料也已經遷移到新的master節點上了
[root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-cli -h 172.16.60.202 -c -p 7000
172.16.60.202:7000> get test1
"test-207"
172.16.60.202:7000> get test2
-> Redirected to slot [8899] located at 172.16.60.204:7002
"test-208"
172.16.60.204:7002> get test3
-> Redirected to slot [13026] located at 172.16.60.205:7004
"test-209"
172.16.60.205:7004> get test11
"test-207-207"
172.16.60.205:7004> get test22
-> Redirected to slot [4401] located at 172.16.60.202:7000
"test-208-208"
172.16.60.202:7000> get test33
-> Redirected to slot [12833] located at 172.16.60.205:7004
"test-209-209"
172.16.60.205:7004> 

關於reshard重新分配雜湊槽slot,除了上面互動式的操作,也可以直接使用如下命令進行操作:

# redis-trib.rb reshard --from <node-id> --to <node-id> --slots <number of slots> --yes <host>:<port>

四、遷移完成後,從叢集中刪除原來的節點

a)從叢集中刪除遷移之前的slave從節點
[root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb del-node 172.16.60.207:7001 213bde6296c36b5f31b958c7730ff1629125a204
[root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb del-node 172.16.60.208:7003 52b8d27838244657d9b01a233578f24d287979fe
[root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb del-node 172.16.60.209:7005 e7592314869c29375599d781721ad76675645c4c
  
b)從叢集中刪除遷移之前的master主節點。
刪除master主節點時需注意下面節點:
-  如果主節點有從節點,需要將從節點轉移到其他主節點或提前刪除從節點
-  如果主節點有slot,去掉分配的slot,然後再刪除主節點。
 
刪除master主節點時,必須確保它上面的slot為0,即必須為空!否則可能會導致整個redis cluster叢集無法工作!
如果要移除的master節點不是空的,需要先用重新分片命令來把資料移到其他的節點。
另外一個移除master節點的方法是先進行一次手動的失效備援,等它的slave被選舉為新的master,並且它被作為一個新的slave被重新加到叢集中來之後再移除它。
很明顯,如果你是想要減少叢集中的master數量,這種做法沒什麼用。在這種情況下你還是需要用重新分片來移除資料後再移除它。
 
由於已經將原來的三個master主節點的slot全部抽完了,即slot現在都為0,且他們各自的slave節點也已在上面刪除
所以這時原來的三個master主節點可以直接從叢集中刪除
[root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb del-node 172.16.60.207:7000 971d05cd7b9bb3634ad024e6aac3dff158c52eee
[root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb del-node 172.16.60.208:7002 0060012d749167d3f72833d916e53b3445b66c62
[root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb del-node 172.16.60.209:7004 e936d5b4c95b6cae57f994e95805aef87ea4a7a5
  
最後再次檢視下新的redis cluster叢集狀態
[root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb check 172.16.60.202:7000                                          
>>> Performing Cluster Check (using node 172.16.60.202:7000)
M: a0169becd97ccca732d905fd762b4d615674f7bd 172.16.60.202:7000
   slots:0-5460 (5461 slots) master
   1 additional replica(s)
S: d9671ca6b4235931a2a215cc327a400ad4f9a399 172.16.60.205:7005
   slots: (0 slots) slave
   replicates c6a78cfbb77804c4837963b5f589064b6111457a
M: 48cbab906141dd26241ccdbc38bee406586a8d03 172.16.60.205:7004
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
S: 2950f2cb6d960cd48e792f7c82d62d2cd07d20f9 172.16.60.204:7003
   slots: (0 slots) slave
   replicates a0169becd97ccca732d905fd762b4d615674f7bd
M: c6a78cfbb77804c4837963b5f589064b6111457a 172.16.60.204:7002
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
S: 6e663a1bcc3d241ed4d1a9667a0cc92fbe554740 172.16.60.202:7001
   slots: (0 slots) slave
   replicates 48cbab906141dd26241ccdbc38bee406586a8d03
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
  
  
[root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb info 172.16.60.202:7000                                         
172.16.60.202:7000 (a0169bec...) -> 2 keys | 5461 slots | 1 slaves.
172.16.60.205:7004 (48cbab90...) -> 3 keys | 5461 slots | 1 slaves.
172.16.60.204:7002 (c6a78cfb...) -> 1 keys | 5462 slots | 1 slaves.
[OK] 6 keys in 3 masters.
0.00 keys per slot on average.
  
[root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-cli -h 172.16.60.202 -c -p 7000
172.16.60.202:7000> get test1
"test-207"
172.16.60.202:7000> get test11
-> Redirected to slot [13313] located at 172.16.60.205:7004
"test-207-207"
172.16.60.205:7004> get test2
-> Redirected to slot [8899] located at 172.16.60.204:7002
"test-208"
172.16.60.204:7002> get test22
-> Redirected to slot [4401] located at 172.16.60.202:7000
"test-208-208"
172.16.60.202:7000> get test3
-> Redirected to slot [13026] located at 172.16.60.205:7004
"test-209"
172.16.60.205:7004> get test33
"test-209-209"
172.16.60.205:7004>
  
=====================================================
溫馨提示:
如果被刪除的master主節點的slot不為0,則需要先將被刪除master節點的slot抽取完,即取消它的slot分配!
  
假設master主節點172.16.60.207:7000的slot還有2550個,則需要將這2550個slot從172.16.60.207:7000上抽到172.16.60.202:7000上
  
[root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb reshard 172.16.60.207:7000
.......
How many slots do you want to move (from 1 to 16384)? 2550               //被刪除master的所有slot數量
What is the receiving node ID? a0169becd97ccca732d905fd762b4d615674f7bd       //接收2550的slot的master節點ID,即172.16.60.202:7000的ID
Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
Source node #1:971d05cd7b9bb3634ad024e6aac3dff158c52eee        //被刪除master節點的ID,即172.16.60.207:7000的ID
Source node #2:done                                                                              //輸入done
.......
Do you want to proceed with the proposed reshard plan (yes/no)? yes           //確認操作
  
如上成功取消了master節點的slot分配(即slot為0)之後,它就可以被刪除了!
  
溫馨提示:
1)新增master節點後,也需要進行reshard操作,不過針對的是新增節點。即"redis-trib.rb reshard 新增節點"。這是slot分配操作!
2)刪除master節點前,如果有slot,也需要進行reshard操作,不過針對的是刪除節點。即"redis-trib.rb reshard 被刪除節點"。這是slot取消操作!

經過測試,應用在redis cluster如上遷移過程中沒有受到任何影響!但是要注意,遷移後需要在應用程式裡將redis連線地址更新為新的redis地址和埠。

相關文章