訊息型中介軟體之RabbitMQ叢集

1874發表於2020-08-27

  在上一篇部落格中我們簡單的介紹了下rabbitmq安裝配置相關指令的說明以及rabbitmqctl的相關子命令的說明;回顧請參考https://www.cnblogs.com/qiuhom-1874/p/13561245.html;今天我們來聊一聊rabbitmq叢集;之所以要用叢集是因為在一個分散式應用環境中,rabbitmq的作用是連線各元件,一旦rabbitmq服務掛掉,可能影響整個線上業務,為了避免這樣的問題發生,我們就必須想辦法對rabbitmq做高可用,能夠讓叢集中的每個rabbitmq節點把自身接收到的訊息通過網路同步到其他節點,這樣一來使得每個節點都有整個rabbitmq叢集的所有訊息,即便其中一臺rabbitmq當機不影響訊息丟失的情況;rabbitmq叢集它的主要作用就是各節點互相同步訊息,從而實現了資料的冗餘;除了rabbitmq的資料冗餘,我們還需要考慮,一旦後端有多臺rabbitmq我們就需要通過對後端多臺rabbitmq-server做負載均衡,使得每個節點能夠分擔一部分流量,同時對客戶端訪問提供一個統一的訪問介面;客戶端就可以基於負載均衡的地址來請求rabbitmq,通過負載均衡排程,把客戶端的請求分攤到後端多個rabbitmq上;如果某一臺rabbitmq當機了,根據負載均衡的健康狀態監測,自動將請求不排程到當機的rabbitmq-server上,從而也實現了對rabbitmq高可用;

  在實現rabbitmq叢集前我們需要做以下準備

  1、更改各節點的主機名同hosts檔案解析的主機名相同,必須保證各節點主機名稱不一樣,並且可以通過hosts檔案解析出來;

  2、時間同步,時間同步對於一個叢集來講是最基本的要求;

  3、各節點的cookie資訊必須保持一致;

  實驗環境說明

節點名 主機名 ip地址
node01 node01 192.168.0.41
node2 node2 192.168.0.42
負載均衡 node3 192.168.0.43

 

 

 

 

 

   

  1、配置各節點的主機名稱

[root@node01 ~]# hostnamectl set-hostname node01
[root@node01 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.0.41 node01
192.168.0.42 node2
192.168.0.43 node3
[root@node01 ~]# scp /etc/hosts node2:/etc/
hosts                                                                                100%  218   116.4KB/s   00:00    
[root@node01 ~]# scp /etc/hosts node3:/etc/
hosts                                                                                100%  218   119.2KB/s   00:00    
[root@node01 ~]# 

  提示:對於rabbitmq叢集來講就只有node01和node2,這兩個節點互相同步訊息;而負載均衡是為了做流量負載而設定的,本質上不屬於rabbitmq叢集;所以對於負載均衡的主機名是什麼都可以;

  驗證:連結個節點驗證主機名是否正確,以及hosts檔案

[root@node2 ~]# hostname
node2
[root@node2 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.0.41 node01
192.168.0.42 node2
192.168.0.43 node3
[root@node2 ~]# 

  在各節點安裝rabbitmq-server

yum install rabbitmq-server -y

  啟動各節點rabbitmq-server

  提示:node01上啟動了rabbitmq-management外掛,所以15672處於監聽;而node2沒有啟動rabbitmq-management外掛,15672埠並沒有處於監聽狀體;對於一個rabbitmq叢集,25672這個埠就是專用於叢集個節點通訊;

  現在基本環境已經準備好,現在我們就可以來配置叢集了,rabbitmq叢集的配置非常簡單,預設情況啟動一個rabbitmq,它就是一個叢集,所以25672處於監聽狀態嘛,只不過叢集中就只有一個自身節點;

  驗證:各節點叢集狀態資訊,節點名是否同主機hostname名稱相同

 

  提示:從上面的資訊可以看到兩個節點的叢集名稱都是同host主機名相同;

  停止node2上的應用,把node2加入node01叢集

  提示:這裡提示我們無法連線到rabbit@node01,出現以上錯誤的主要原因有兩個,第一個是主機名稱解析不正確;第二是cookie不一致;

  複製cookie資訊

[root@node2 ~]# scp /var/lib/rabbitmq/.erlang.cookie node01:/var/lib/rabbitmq/
The authenticity of host 'node01 (192.168.0.41)' can't be established.
ECDSA key fingerprint is SHA256:EG9nua4JJuUeofheXlgQeL9hX5H53JynOqf2vf53mII.
ECDSA key fingerprint is MD5:57:83:e6:46:2c:4b:bb:33:13:56:17:f7:fd:76:71:cc.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node01,192.168.0.41' (ECDSA) to the list of known hosts.
.erlang.cookie                                                                       100%   20    10.6KB/s   00:00    
[root@node2 ~]# 

  驗證:md5sum驗證各節點cookie是否一致

[root@node2 ~]# md5sum /var/lib/rabbitmq/.erlang.cookie 
1d4f9e4d6c92cf0c749cc4ace68317f6  /var/lib/rabbitmq/.erlang.cookie
[root@node2 ~]# ssh node01
Last login: Wed Aug 26 19:41:30 2020 from 192.168.0.232
[root@node01 ~]# md5sum /var/lib/rabbitmq/.erlang.cookie 
1d4f9e4d6c92cf0c749cc4ace68317f6  /var/lib/rabbitmq/.erlang.cookie
[root@node01 ~]# 

  提示:現在兩個節點的cookie資訊一致了,再次把node2加入到node01上看看是否能夠加入?

[root@node2 ~]# rabbitmqctl join_cluster rabbit@node01
Clustering node rabbit@node2 with rabbit@node01 ...
Error: unable to connect to nodes [rabbit@node01]: nodedown

DIAGNOSTICS
===========

attempted to contact: [rabbit@node01]

rabbit@node01:
  * connected to epmd (port 4369) on node01
  * epmd reports node 'rabbit' running on port 25672
  * TCP connection succeeded but Erlang distribution failed
  * suggestion: hostname mismatch?
  * suggestion: is the cookie set correctly?

current node details:
- node name: rabbitmqctl2523@node2
- home dir: /var/lib/rabbitmq
- cookie hash: HU+eTWySzwx0nMSs5oMX9g==

[root@node2 ~]#

  提示:還是提示我們加不進去,這裡的原因是我們更新了node01的cookie資訊,沒有重啟rabbitmq-server,所以它預設還是以前的cookie;

  重啟node01上的rabbitmq-server

[root@node01 ~]# systemctl restart rabbitmq-server.service 
[root@node01 ~]# ss -tnl
State       Recv-Q Send-Q              Local Address:Port                             Peer Address:Port              
LISTEN      0      128                     127.0.0.1:631                                         *:*                  
LISTEN      0      128                             *:15672                                       *:*                  
LISTEN      0      100                     127.0.0.1:25                                          *:*                  
LISTEN      0      100                     127.0.0.1:64667                                       *:*                  
LISTEN      0      128                             *:8000                                        *:*                  
LISTEN      0      128                             *:8001                                        *:*                  
LISTEN      0      128                             *:25672                                       *:*                  
LISTEN      0      5                       127.0.0.1:8010                                        *:*                  
LISTEN      0      128                             *:111                                         *:*                  
LISTEN      0      128                             *:80                                          *:*                  
LISTEN      0      128                             *:4369                                        *:*                  
LISTEN      0      5                   192.168.122.1:53                                          *:*                  
LISTEN      0      128                             *:22                                          *:*                  
LISTEN      0      128                           ::1:631                                        :::*                  
LISTEN      0      100                           ::1:25                                         :::*                  
LISTEN      0      128                            :::5672                                       :::*                  
LISTEN      0      128                            :::111                                        :::*                  
LISTEN      0      128                            :::80                                         :::*                  
LISTEN      0      128                            :::4369                                       :::*                  
LISTEN      0      128                            :::22                                         :::*                  
[root@node01 ~]# 

  提示:如果是把node01的cookie複製給node2,我們需要重啟node2,總之拿到新cookie節點都要重啟,保證在用cookie的資訊一致就可以了;

  再次把node2加入到node01

[root@node2 ~]# rabbitmqctl join_cluster rabbit@node01
Clustering node rabbit@node2 with rabbit@node01 ...
...done.
[root@node2 ~]# 

  提示:加入對應節點叢集沒有報錯就表示加入叢集成功;

  驗證:檢視各節點的叢集狀態資訊

  提示:在兩個節點上我們都可以看到兩個節點;到此node2就加入到node01這個叢集中了;但是兩個節點的叢集狀態資訊不一樣,原因是node2上沒有啟動應用,啟動應用以後,它倆的狀態資訊就會是一樣;

  啟動node2上的應用

  提示:此時兩個節點的狀態資訊就一樣了;到此rabbitmq叢集就搭建好了;

  驗證:在瀏覽器登入node1的15672,看看web管理介面是否有節點資訊?

  提示:node2之所以沒有統計資訊是因為node2上沒有啟動rabbitmq-management外掛;啟用外掛就可以統計到資料;

  rabbitmqctl叢集相關子命令

  join_cluster <clusternode> [--ram]:加入指定節點叢集;

  cluster_status:檢視叢集狀態

  change_cluster_node_type disc | ram:更改節點儲存型別,disc表示磁碟,ram表示記憶體;一個叢集中必須有一個節點為disc型別;

[root@node2 ~]# rabbitmqctl cluster_status
Cluster status of node rabbit@node2 ...
[{nodes,[{disc,[rabbit@node01,rabbit@node2]}]},
 {running_nodes,[rabbit@node01,rabbit@node2]},
 {cluster_name,<<"rabbit@node01">>},
 {partitions,[]}]
...done.
[root@node2 ~]# rabbitmqctl change_cluster_node_type ram
Turning rabbit@node2 into a ram node ...
Error: mnesia_unexpectedly_running
[root@node2 ~]#

  提示:這裡提示我們mnesia_unexpectedly_running,所以我們更改不了節點型別;解決辦法是停止node2上的應用,然後在更改型別,在啟動應用即可;

[root@node2 ~]# rabbitmqctl stop_app
Stopping node rabbit@node2 ...
...done.
[root@node2 ~]# rabbitmqctl cluster_status
Cluster status of node rabbit@node2 ...
[{nodes,[{disc,[rabbit@node01,rabbit@node2]}]}]
...done.
[root@node2 ~]# rabbitmqctl change_cluster_node_type ram
Turning rabbit@node2 into a ram node ...
...done.
[root@node2 ~]# rabbitmqctl start_app
Starting node rabbit@node2 ...
...done.
[root@node2 ~]# rabbitmqctl cluster_status              
Cluster status of node rabbit@node2 ...
[{nodes,[{disc,[rabbit@node01]},{ram,[rabbit@node2]}]},
 {running_nodes,[rabbit@node01,rabbit@node2]},
 {cluster_name,<<"rabbit@node01">>},
 {partitions,[]}]
...done.
[root@node2 ~]# 

  提示:可以看到node2就變成了ram型別了;

[root@node01 ~]#  rabbitmqctl change_cluster_node_type ram
Turning rabbit@node01 into a ram node ...
Error: mnesia_unexpectedly_running
[root@node01 ~]# rabbitmqctl stop_app
Stopping node rabbit@node01 ...
...done.
[root@node01 ~]# rabbitmqctl cluster_status
Cluster status of node rabbit@node01 ...
[{nodes,[{disc,[rabbit@node01]},{ram,[rabbit@node2]}]}]
...done.
[root@node01 ~]#  rabbitmqctl change_cluster_node_type ram
Turning rabbit@node01 into a ram node ...
Error: {resetting_only_disc_node,"You cannot reset a node when it is the only disc node in a cluster. Please convert another node of the cluster to a disc node first."}
[root@node01 ~]# 

  提示:這裡需要注意一個叢集中至少保持一個節點是disc型別;所以node2更改成ram型別,node01就必須是disc型別;

  forget_cluster_node [--offline]:離開叢集;

[root@node01 ~]# rabbitmqctl cluster_status
Cluster status of node rabbit@node01 ...
[{nodes,[{disc,[rabbit@node01]},{ram,[rabbit@node2]}]},
 {running_nodes,[rabbit@node2,rabbit@node01]},
 {cluster_name,<<"rabbit@node01">>},
 {partitions,[]}]
...done.
[root@node01 ~]# rabbitmqctl forget_cluster_node rabbit@node2
Removing node rabbit@node2 from cluster ...
Error: {failed_to_remove_node,rabbit@node2,
                              {active,"Mnesia is running",rabbit@node2}}
[root@node01 ~]# 

  提示:我們在node01上移除node2,提示我們node2節點處於活躍狀態不能移除;這也告訴我們這個子命令只能移除不線上的節點;

  下線node2上的應用

[root@node2 ~]# rabbitmqctl stop_app
Stopping node rabbit@node2 ...
...done.
[root@node2 ~]#

  再次移除node2

[root@node01 ~]# rabbitmqctl cluster_status
Cluster status of node rabbit@node01 ...
[{nodes,[{disc,[rabbit@node01]},{ram,[rabbit@node2]}]},
 {running_nodes,[rabbit@node01]},
 {cluster_name,<<"rabbit@node01">>},
 {partitions,[]}]
...done.
[root@node01 ~]# rabbitmqctl forget_cluster_node rabbit@node2          
Removing node rabbit@node2 from cluster ...
...done.
[root@node01 ~]# rabbitmqctl cluster_status                  
Cluster status of node rabbit@node01 ...
[{nodes,[{disc,[rabbit@node01]}]},
 {running_nodes,[rabbit@node01]},
 {cluster_name,<<"rabbit@node01">>},
 {partitions,[]}]
...done.
[root@node01 ~]# 

  update_cluster_nodes clusternode:更新叢集節點資訊;

  把node2加入node01這個叢集

[root@node2 ~]# rabbitmqctl stop_app
Stopping node rabbit@node2 ...
...done.
[root@node2 ~]# rabbitmqctl join_cluster rabbit@node01
Clustering node rabbit@node2 with rabbit@node01 ...
...done.
[root@node2 ~]# rabbitmqctl cluster_status
Cluster status of node rabbit@node2 ...
[{nodes,[{disc,[rabbit@node01,rabbit@node2]}]}]
...done.
[root@node2 ~]# rabbitmqctl start_app
Starting node rabbit@node2 ...
...done.
[root@node2 ~]# rabbitmqctl cluster_status
Cluster status of node rabbit@node2 ...
[{nodes,[{disc,[rabbit@node01,rabbit@node2]}]},
 {running_nodes,[rabbit@node01,rabbit@node2]},
 {cluster_name,<<"rabbit@node01">>},
 {partitions,[]}]
...done.
[root@node2 ~]# 

  停掉node2上的應用

[root@node2 ~]# rabbitmqctl stop_app
Stopping node rabbit@node2 ...
...done.
[root@node2 ~]# rabbitmqctl cluster_status
Cluster status of node rabbit@node2 ...
[{nodes,[{disc,[rabbit@node01,rabbit@node2]}]}]
...done.
[root@node2 ~]# 

  提示:如果此時有新節點加入叢集,如果在把node01上的應用停掉,node2再次啟動應用就會提示錯誤;如下

  把node3加入node01

[root@node3 ~]# rabbitmqctl cluster_status            
Cluster status of node rabbit@node3 ...
[{nodes,[{disc,[rabbit@node3]}]},
 {running_nodes,[rabbit@node3]},
 {cluster_name,<<"rabbit@node3">>},
 {partitions,[]}]
...done.
[root@node3 ~]# rabbitmqctl stop_app
Stopping node rabbit@node3 ...
...done.
[root@node3 ~]# rabbitmqctl join_cluster rabbit@node01
Clustering node rabbit@node3 with rabbit@node01 ...
...done.
[root@node3 ~]# rabbitmqctl start_app
Starting node rabbit@node3 ...
...done.
[root@node3 ~]# rabbitmqctl cluster_status
Cluster status of node rabbit@node3 ...
[{nodes,[{disc,[rabbit@node01,rabbit@node2,rabbit@node3]}]},
 {running_nodes,[rabbit@node01,rabbit@node3]},
 {cluster_name,<<"rabbit@node01">>},
 {partitions,[]}]
...done.
[root@node3 ~]# 

  停掉node01上的應用

[root@node01 ~]# rabbitmqctl cluster_status
Cluster status of node rabbit@node01 ...
[{nodes,[{disc,[rabbit@node01,rabbit@node2,rabbit@node3]}]},
 {running_nodes,[rabbit@node3,rabbit@node01]},
 {cluster_name,<<"rabbit@node01">>},
 {partitions,[]}]
...done.
[root@node01 ~]# rabbitmqctl stop_app
Stopping node rabbit@node01 ...
...done.
[root@node01 ~]# rabbitmqctl cluster_status
Cluster status of node rabbit@node01 ...
[{nodes,[{disc,[rabbit@node01,rabbit@node2,rabbit@node3]}]}]
...done.
[root@node01 ~]# 

  啟動node2上的應用

[root@node2 ~]# rabbitmqctl cluster_status
Cluster status of node rabbit@node2 ...
[{nodes,[{disc,[rabbit@node01,rabbit@node2]}]}]
...done.
[root@node2 ~]# rabbitmqctl start_app     
Starting node rabbit@node2 ...



BOOT FAILED
===========

Error description:
   {could_not_start,rabbit,
       {bad_return,
           {{rabbit,start,[normal,[]]},
            {'EXIT',
                {rabbit,failure_during_boot,
                    {error,
                        {timeout_waiting_for_tables,
                            [rabbit_user,rabbit_user_permission,rabbit_vhost,
                             rabbit_durable_route,rabbit_durable_exchange,
                             rabbit_runtime_parameters,
                             rabbit_durable_queue]}}}}}}}

Log files (may contain more information):
   /var/log/rabbitmq/rabbit@node2.log
   /var/log/rabbitmq/rabbit@node2-sasl.log

Error: {rabbit,failure_during_boot,
           {could_not_start,rabbit,
               {bad_return,
                   {{rabbit,start,[normal,[]]},
                    {'EXIT',
                        {rabbit,failure_during_boot,
                            {error,
                                {timeout_waiting_for_tables,
                                    [rabbit_user,rabbit_user_permission,
                                     rabbit_vhost,rabbit_durable_route,
                                     rabbit_durable_exchange,
                                     rabbit_runtime_parameters,
                                     rabbit_durable_queue]}}}}}}}}
[root@node2 ~]# 

  提示:此時node2就啟動不起來了,這時我們就需要用到update_cluster_nodes子命令向node3更新叢集資訊,然後再次在node2上啟動應用就不會報錯了;

  向node3詢問更新叢集節點資訊,並啟動node2上的應用

[root@node2 ~]# rabbitmqctl update_cluster_nodes rabbit@node3
Updating cluster nodes for rabbit@node2 from rabbit@node3 ...
...done.
[root@node2 ~]# rabbitmqctl cluster_status                   
Cluster status of node rabbit@node2 ...
[{nodes,[{disc,[rabbit@node01,rabbit@node2,rabbit@node3]}]}]
...done.
[root@node2 ~]# rabbitmqctl start_app
Starting node rabbit@node2 ...
...done.
[root@node2 ~]# rabbitmqctl cluster_status
Cluster status of node rabbit@node2 ...
[{nodes,[{disc,[rabbit@node01,rabbit@node2,rabbit@node3]}]},
 {running_nodes,[rabbit@node3,rabbit@node2]},
 {cluster_name,<<"rabbit@node01">>},
 {partitions,[]}]
...done.
[root@node2 ~]# 

  提示:可以看到更新了叢集節點資訊後,在node2上檢視叢集狀態資訊就可以看到node3了;此時在啟動node2上的應用就沒有任何問題;

  sync_queue queue:同步指定佇列;

  cancel_sync_queue queue:取消指定佇列同步

  set_cluster_name name:設定叢集名稱

[root@node2 ~]# rabbitmqctl cluster_status
Cluster status of node rabbit@node2 ...
[{nodes,[{disc,[rabbit@node01,rabbit@node2,rabbit@node3]}]},
 {running_nodes,[rabbit@node01,rabbit@node3,rabbit@node2]},
 {cluster_name,<<"rabbit@node01">>},
 {partitions,[]}]
...done.
[root@node2 ~]# rabbitmqctl set_cluster_name rabbit@rabbit_node02
Setting cluster name to rabbit@rabbit_node02 ...
...done.
[root@node2 ~]# rabbitmqctl cluster_status                       
Cluster status of node rabbit@node2 ...
[{nodes,[{disc,[rabbit@node01,rabbit@node2,rabbit@node3]}]},
 {running_nodes,[rabbit@node01,rabbit@node3,rabbit@node2]},
 {cluster_name,<<"rabbit@rabbit_node02">>},
 {partitions,[]}]
...done.
[root@node2 ~]# 

  提示:在叢集任意一個節點更改名稱都會同步到其他節點;也就是說叢集狀態資訊在每個節點都是保持一致的;

  基於haproxy負載均衡rabbitmq叢集

  1、安裝haproxy

[root@node3 ~]# yum install -y haproxy
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirrors.aliyun.com
 * extras: mirrors.aliyun.com
 * updates: mirrors.aliyun.com
Resolving Dependencies
--> Running transaction check
---> Package haproxy.x86_64 0:1.5.18-9.el7 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

====================================================================================================
 Package                Arch                  Version                     Repository           Size
====================================================================================================
Installing:
 haproxy                x86_64                1.5.18-9.el7                base                834 k

Transaction Summary
====================================================================================================
Install  1 Package

Total download size: 834 k
Installed size: 2.6 M
Downloading packages:
haproxy-1.5.18-9.el7.x86_64.rpm                                              | 834 kB  00:00:00     
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : haproxy-1.5.18-9.el7.x86_64                                                      1/1 
  Verifying  : haproxy-1.5.18-9.el7.x86_64                                                      1/1 

Installed:
  haproxy.x86_64 0:1.5.18-9.el7                                                                     

Complete!
[root@node3 ~]# 

  提示:haproxy可以重新找個主機部署,也可以在叢集中的某臺節點上部署;建議重新找個主機部署,這樣可避免埠衝突;

  配置haproxy

  提示:以上就是haproxy負載均衡rabbitmq叢集的示例,我們通過使用haproxy的tcp模式去代理rabbitmq,並且使用輪詢的演算法把請求排程到後端server上;

  驗證:啟動haproxy,看看對應的埠是否處於監聽狀態,狀態頁面是否能夠正常檢測到後端server是否線上?

  提示:此時負載均衡就搭建好了,後續使用這個叢集,我們就可以把這個負載均衡上監聽的地址給使用者訪問即可;這裡要考慮一點haproxy是新的單點;

  在瀏覽器開啟haproxy的狀態頁看看後端server是否線上?

  提示:可以看到後端3臺rabbitmq-server都是正常線上;

  停止node3上的rabbitmq,看看haproxy是否能夠及時發現node3不再線,並把它標記為down?

  提示:我們根據haproxy對後端server做健康狀態檢查來實現rabbitmq叢集的故障轉移,所以對於rabbitmq叢集來講,它只複製訊息的同步,實現資料冗餘,真正高可用還是要靠前端的排程器實現;對於nginx負載均衡rabbitmq可以參考ngixn對tcp協議的代理來寫配置;有關nginx負載均衡tcp應用相關話題,可以參考本人部落格https://www.cnblogs.com/qiuhom-1874/p/12468946.html我這裡就不過多闡述;

相關文章