Redis叢集環境各節點無法互相發現與Hash槽分配異常 CLUSTERDOWN Hash slot not served的解決方式

朱季謙發表於2021-12-14

總結/朱季謙

在搭建Redis5.x版本的叢集環境曾出現各節點無法互相發現與Hash槽分配異常 CLUSTERDOWN Hash slot not served的情況,故而把解決方式記錄下來。

在以下三臺虛擬機器機器=搭建Redis叢集——

192.168.200.160

192.168.200.161

192.168.200.162

啟動三臺Redis叢集,然後連線其中一臺客戶端,隨便set一個指令,測試叢集是否可行,結果報出異常(error) CLUSTERDOWN Hash slot not served提示——

[app@hadoop-nn bin]$ ./redis-cli -c -h 192.168.200.162
192.168.200.162:6379> set zhu "test"
(error) CLUSTERDOWN Hash slot not served

首先,先看一下叢集各個節點是否能互相發現,執行以下指令檢視各個節點連線情況——

192.168.200.162:6379> cluster nodes
8c5809df064ad7234c6475555411afda026c230f :6379@16379 myself,master - 0 0 0 connected

接著再檢查一下當前叢集狀態,發現目前狀態為fail,說明叢集沒有互連成功——

192.168.200.162:6379> cluster info
cluster_state:fail
cluster_slots_assigned:0
cluster_slots_ok:0
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:1
cluster_size:0
cluster_current_epoch:0
cluster_my_epoch:0
cluster_stats_messages_sent:0
cluster_stats_messages_received:0

發現,三臺Redis搭建的叢集沒有互相發現,故而,只需要在其中一臺客戶端上執行以下指令,手動幫助該節點去發現其他兩個節點,因叢集是互連的,所以只需要在其中一臺上手動發現另外兩臺即可——

192.168.200.162:6379> cluster meet 192.168.200.160 6379
OK
192.168.200.162:6379> cluster meet 192.168.200.161 6379
OK

完成以上指令,檢視各個節點狀態,發現當前節點已經能發現其他兩臺機器節點了——

192.168.200.162:6379> cluster nodes
a0cf910effc52eda7c5561746c42f8bcd710f735 192.168.200.161:6379@16379 master - 0 1639410795898 0 connected
5e5f08f9ec39910cc250239b4f44e701d4b831f5 192.168.200.160:6379@16379 master - 0 1639410794885 1 connected
8c5809df064ad7234c6475555411afda026c230f 192.168.200.162:6379@16379 myself,master - 0 1639410795000 2 connected

再測試叢集狀態,發現狀態依然還是失敗,且還報CLUSTERDOWN Hash slot not served異常——

192.168.200.162:6379> cluster info
cluster_state:fail
cluster_slots_assigned:0
cluster_slots_ok:0
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:3
cluster_size:0
cluster_current_epoch:2
cluster_my_epoch:2
cluster_stats_messages_ping_sent:26
cluster_stats_messages_pong_sent:30
cluster_stats_messages_meet_sent:3
cluster_stats_messages_sent:59
cluster_stats_messages_ping_received:30
cluster_stats_messages_pong_received:29
cluster_stats_messages_received:59
192.168.200.162:6379> set zhu "test"
(error) CLUSTERDOWN Hash slot not served

到這一步,說明當前叢集存在hash槽異常情況,那麼,可以執行以下指令修復下——

[app@hadoop-nn bin]$ ./redis-cli --cluster fix 192.168.200.162:6379

回車執行,頓時就會執行列印很多以下資訊,說明正在對16384個hash槽重新分配——

>>> Covering slot 10620 with 192.168.200.162:6379
>>> Covering slot 3059 with 192.168.200.162:6379
>>> Covering slot 9764 with 192.168.200.162:6379
>>> Covering slot 11335 with 192.168.200.162:6379
>>> Covering slot 6368 with 192.168.200.162:6379
>>> Covering slot 4884 with 192.168.200.162:6379
>>> Covering slot 15271 with 192.168.200.162:6379
>>> Covering slot 5109 with 192.168.200.162:6379
......

等執行完成後,我們再檢查一下叢集狀態,發現狀態已經由剛剛的fail變出ok了,說明hash槽已經正確分配——

192.168.200.162:6379> cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:3
cluster_size:3
cluster_current_epoch:19
cluster_my_epoch:18
cluster_stats_messages_ping_sent:1514
cluster_stats_messages_pong_sent:1486
cluster_stats_messages_meet_sent:3
cluster_stats_messages_sent:3003
cluster_stats_messages_ping_received:1486
cluster_stats_messages_pong_received:1517
cluster_stats_messages_received:3003

最後,在其中一臺叢集上輸入以下指令測試下,沒有報異常了——

192.168.200.162:6379> set test zhu
OK

另外,在其他兩臺機器上,輸入以下指令,都可以獲取到192.168.200.162機器redis輸入的測試k-v值了

192.168.200.160:6379> get test
-> Redirected to slot [6918] located at 192.168.200.162:6379
"zhu"

相關文章