Linux 高可用仲裁裝置配置

天涯客1224發表於2024-08-30

Red Hat Enterprise Linux 7.4 完全支援配置作為叢集的第三方裝置的獨立仲裁裝置。它的主要用途是允許叢集保持比標準仲裁規則允許更多的節點故障。建議在具有偶數節點的叢集中使用仲裁裝置。對於雙節點群集,使用仲裁裝置可以更好地決定在腦裂情況下保留哪些節點。
在配置仲裁裝置,您必須考慮以下內容。
建議您在與使用該仲裁裝置的叢集相同的站點中的不同的物理網路中執行仲裁裝置。理想情況下,仲裁裝置主機應該獨立於主叢集,或者至少位於一個獨立的 PSU,而不要與 corosync 環或者環位於同一個網路網段。
您不能同時在叢集中使用多個仲裁裝置。
雖然您不能同時在叢集中使用多個仲裁裝置,但多個叢集可能同時使用一個仲裁裝置。每個使用這個仲裁裝置的叢集都可以使用不同的演算法和仲裁選項,因為它們儲存在叢集節點本身。例如,單個仲裁裝置可由一個具有破壞 ( fifty/fifty split)演算法的叢集和具有 lms (last man standing)演算法的第二個群集使用。
不應在現有叢集節點中執行仲裁裝置。

系統環境:

[root@node201 ~]# cat /etc/centos-release
CentOS Linux release 7.9.2009 (Core)

系統架構:

[root@node203 ~]# cat /etc/hosts
192.168.1.201 node201
192.168.1.202 node202
192.168.1.203 node203  qdevice

一、系統環境部署
1、叢集節點部署

[root@node201 ~]# dnf install corosync-qdevice
[root@node201 ~]# rpm -qa |grep qdevice
corosync-qdevice-2.4.5-7.el7_9.2.x86_64

[root@node201 ~]# rpm -qa |grep pcs
pcs-0.9.169-3.el7.centos.3.x86_64
pcsc-lite-libs-1.8.8-8.el7.x86_64

[root@node201 ~]# rpm -qa |egrep 'pacemaker|corosync'
corosynclib-2.4.5-7.el7_9.2.x86_64
pacemaker-1.1.23-1.el7_9.1.x86_64
pacemaker-libs-1.1.23-1.el7_9.1.x86_64
pacemaker-doc-1.1.23-1.el7_9.1.x86_64
corosync-qdevice-2.4.5-7.el7_9.2.x86_64
pacemaker-cluster-libs-1.1.23-1.el7_9.1.x86_64
pacemaker-cli-1.1.23-1.el7_9.1.x86_64
corosync-2.4.5-7.el7_9.2.x86_64

2、仲裁節點部署

[root@node203 ~]# dnf install pcs corosync-qnet
[root@node203 corosync]# yum install -y corosync-qdevice
# 啟動pcs服務   
[root@node203 ~]# systemctl start pcsd.service
[root@node203 ~]# systemctl status pcsd.service
[root@node203 ~]# systemctl enable pcsd.service

二、建立叢集
1、建立使用者認證
如下所示,在叢集節點及qdevice節點建立哈cluster使用者並設定密碼:

[root@node201 ~]# id hacluster
uid=003(hacluster) gid=1004(haclient) groups=1004(haclient)
[root@node202 ~]# id hacluster
uid=5001(hacluster) gid=5010(haclient) groups=5010(haclient)
[root@node203 ~]# id hacluster
uid=5001(hacluster) gid=1004(haclient) groups=1004(haclient)

如下所示,在叢集節點建立到qdevice節點的認證:

[root@node201 ~]# pcs cluster auth node203
Username: hacluster
Password:
node203: Authorized
[root@node202 ~]# pcs cluster auth node203
Username: hacluster
Password:
node203: Authorized

2、建立叢集
如下所示,建立叢集test_cluster:

[root@node201 pcs]# pcs cluster setup --name test_cluster node201 node202 node203 --force
Destroying cluster on nodes: node201, node202, node203...
node202: Stopping Cluster (pacemaker)...
node203: Stopping Cluster (pacemaker)...
node201: Stopping Cluster (pacemaker)...
node203: Successfully destroyed cluster
node202: Successfully destroyed cluster
node201: Successfully destroyed cluster

Sending 'pacemaker_remote authkey' to 'node201', 'node202', 'node203'
node201: successful distribution of the file 'pacemaker_remote authkey'
node203: successful distribution of the file 'pacemaker_remote authkey'
node202: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
node201: Succeeded
node202: Succeeded
node203: Succeeded

Synchronizing pcsd certificates on nodes node201, node202, node203...
node201: Success
node203: Success
node202: Success
Restarting pcsd on the nodes in order to reload the certificates...
node201: Success
node203: Success
node202: Success

[root@node201 pcs]#  pcs cluster start --all
node201: Starting Cluster (corosync)...
node202: Starting Cluster (corosync)...
node203: Starting Cluster (corosync)...
node203: Starting Cluster (pacemaker)...
node202: Starting Cluster (pacemaker)...
node201: Starting Cluster (pacemaker)...

[root@node201 pcs]#  pcs cluster status
Cluster Status:
 Stack: unknown
 Current DC: NONE
 Last updated: Thu Aug 29 19:24:45 2024
 Last change: Thu Aug 29 19:24:40 2024 by hacluster via crmd on node203
 3 nodes configured
 0 resource instances configured
PCSD Status:
  node203: Online
  node201: Online
  node202: Online

[root@node201 pcs]# pcs cluster enable --all
node201: Cluster Enabled
node202: Cluster Enabled
node203: Cluster Enabled

三、配置仲裁裝置
仲裁裝置模型是 net,這是目前唯一支援的模型。net 模型支援以下演算法:

  • ffsplit :5-fifty split.這為擁有最多活躍節點的分割槽提供一個投票。
  • lMS:le -man-standing.如果節點是叢集中唯一可以看到 qnetd 伺服器的節點,則它將返回一個投票。

1、配置並啟動仲裁裝置模型net

[root@node203 ~]# pcs qdevice setup model net --enable --start
Quorum device 'net' initialized
quorum device enabled
Starting quorum device...
quorum device started

2、檢視仲裁裝置狀態

[root@node203 ~]# pcs qdevice status net --full
QNetd address:                  *:5403
TLS:                            Supported (client certificate required)
Connected clients:              0
Connected clusters:             0
Maximum send/receive size:      32768/32768 bytes

如下所示,將qdevice訪問加入到防火牆:

[root@node203 ~]# firewall-cmd --permanent --add-service=high-availability
FirewallD is not running
[root@node203 ~]# firewall-cmd --add-service=high-availability
FirewallD is not running

3、叢集節點新增仲裁
1)叢集節點配置corosync.conf(所有叢集節點)

[root@node202 corosync]#  cat /etc/corosync/corosync.conf |grep -v ^#|grep -v ^$|grep -v '#'
totem {
        version: 2
        crypto_cipher: none
        crypto_hash: none
        interface {
                ringnumber: 0
                bindnetaddr: 192.168.1.0
                mcastaddr: 239.255.1.1
                mcastport: 5405
                ttl: 1
        }
}
logging {
        fileline: off
        to_stderr: no
        to_logfile: yes
        logfile: /var/log/cluster/corosync.log
        to_syslog: yes
        debug: off
        timestamp: on
        logger_subsys {
                subsys: QUORUM
                debug: off
        }
}
quorum {
        provider: corosync_votequorum
        expected_votes: 7
}
nodelist {
        node { ring0_addr: node201
               nodeid: 1
        }
        node { ring0_addr: node202
               nodeid: 2
        }
}

quorum配置說明:

quorum {
        provider: corosync_votequorum      # 啟動了votequorum
        expected_votes: 7             # 7表示,7個節點,quorum為4。如果設定了nodelist引數,expected_votes無效
        wait_for_all: 1              # 值為1表示,當叢集啟動,叢集quorum被掛起,直到所有節點線上並加入叢集,這個引數是Corosync 2.0新增的。
        last_man_standing: 1            # 為1表示,啟用LMS特性。預設這個特性是關閉的,即值為0。
                             # 這個引數開啟後,當叢集的處於表決邊緣(如expected_votes=7,而當前online nodes=4),處於表決邊緣狀態超過last_man_standing_window引數指定的時間,
                             # 則重新計算quorum,直到online nodes=2。如果想讓online nodes能夠等於1,必須啟用auto_tie_breaker選項,生產環境不推薦。
        last_man_standing_window: 10000        # 單位為毫秒。在一個或多個主機從叢集中丟失後,重新計算quorum

2)重啟corosync服務

[root@node201 corosync]# systemctl restart corosync
[root@node201 corosync]# systemctl status corosync

3)檢視並新增叢集節點仲裁配置

[root@node201 ~]# pcs quorum config
Options:

檢視quorum狀態:
[root@node202 corosync]#  pcs quorum status
Quorum information
------------------
Date:             Thu Aug 29 17:56:32 2024
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          3232235978
Ring ID:          -1062731319/85
Quorate:          No

Votequorum information
----------------------
Expected votes:   7
Highest expected: 7
Total votes:      3
Quorum:           4 Activity blocked
Flags:

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
3232235977          1         NR node201
3232235978          1         NR node202 (local)
3232235979          1         NR node203

如下所示,叢集節點新增仲裁裝置,並指定演算法ffsplit:

[root@node201 pcs]# pcs quorum device add model net host=node203 algorithm=ffsplit --force
Setting up qdevice certificates on nodes...
node201: Succeeded
node202: Succeeded
node203: Succeeded
Enabling corosync-qdevice...
node203: corosync-qdevice enabled
node201: corosync-qdevice enabled
node202: corosync-qdevice enabled
Sending updated corosync.conf to nodes...
node201: Succeeded
node202: Succeeded
node203: Succeeded
Corosync configuration reloaded
Starting corosync-qdevice...
node203: corosync-qdevice started
node201: corosync-qdevice started
node202: corosync-qdevice started

[root@node202 corosync]# pcs quorum device add model net host=node203 algorithm=ffsplit --force
Error: quorum device is already defined

4)檢視新增quorum後的配置

[root@node201 pcs]# pcs quorum config
Options:
Device:
  votes: 1
  Model: net
    algorithm: ffsplit
    host: node203
	
[root@node201 pcs]# pcs quorum status
Quorum information
------------------
Date:             Thu Aug 29 19:31:45 2024
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          1
Ring ID:          1/98
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   4
Highest expected: 4
Total votes:      4
Quorum:           3
Flags:            Quorate Qdevice

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
         1          1    A,V,NMW node201 (local)
         2          1    A,V,NMW node202
         3          1    A,V,NMW node203
         0          1            Qdevice



[root@node201 pcs]# pcs quorum device status
Qdevice information
-------------------
Model:                  Net
Node ID:                1
Configured node list:
    0   Node ID = 1
    1   Node ID = 2
    2   Node ID = 3
Membership node list:   1, 2, 3

Qdevice-net information
----------------------
Cluster name:           test_cluster
QNetd host:             node203:5403
Algorithm:              Fifty-Fifty split
Tie-breaker:            Node with lowest node ID
State:                  Connected

5)在仲裁節點檢視qdevice連線情況

[root@node203 corosync]# pcs qdevice status net --full
QNetd address:                  *:5403
TLS:                            Supported (client certificate required)
Connected clients:              3
Connected clusters:             1
Maximum send/receive size:      32768/32768 bytes
Cluster "test_cluster":
    Algorithm:          Fifty-Fifty split
    Tie-breaker:        Node with lowest node ID
    Node ID 3:
        Client address:         ::ffff:192.168.1.203:45974
        HB interval:            8000ms
        Configured node list:   1, 2, 3
        Ring ID:                1.62
        Membership node list:   1, 2, 3
        Heuristics:             Undefined (membership: Undefined, regular: Undefined)
        TLS active:             Yes (client certificate verified)
        Vote:                   ACK (ACK)
    Node ID 1:
        Client address:         ::ffff:192.168.1.201:40657
        HB interval:            8000ms
        Configured node list:   1, 2, 3
        Ring ID:                1.62
        Membership node list:   1, 2, 3
        Heuristics:             Undefined (membership: Undefined, regular: Undefined)
        TLS active:             Yes (client certificate verified)
        Vote:                   ACK (ACK)
    Node ID 2:
        Client address:         ::ffff:192.168.1.202:35765
        HB interval:            8000ms
        Configured node list:   1, 2, 3
        Ring ID:                1.62
        Membership node list:   1, 2, 3
        Heuristics:             Undefined (membership: Undefined, regular: Undefined)
        TLS active:             Yes (client certificate verified)
        Vote:                   No change (ACK)

四、管理仲裁裝置
PCS 提供了在本地主機上管理仲裁裝置服務(corosync-qnetd)的功能,如下例所示。請注意,這些命令僅影響 corosync-qnetd 服務。

[root@qdevice:~]# pcs qdevice start net
[root@qdevice:~]# pcs qdevice stop net
[root@qdevice:~]# pcs qdevice enable net
[root@qdevice:~]# pcs qdevice disable net
[root@qdevice:~]# pcs qdevice kill net

附件:配置錯誤案例

案例1、檢視qdevice狀態異常
如下所示,叢集節點檢視quorum狀態時報錯:

[root@node201 ~]#  pcs quorum status
Error: Unable to get quorum status: Unable to start votequorum status tracking: CS_ERR_BAD_HANDLE

1)檢視corosync.conf配置

2)啟動corosync錯誤

[root@node202 ~]# systemctl restart corosync
Job for corosync.service failed because the control process exited with error code. See "systemctl status corosync.service" and "journalctl -xe" for details.

3)檢視corosync日誌

[root@node201 corosync]# tail -1000 /var/log/cluster/corosync.log

Aug 29 17:14:31 [324] node203 corosync notice  [SERV  ] Service engine loaded: corosync profile loading service [4]
Aug 29 17:14:31 [324] node203 corosync notice  [QUORUM] Using quorum provider corosync_votequorum
Aug 29 17:14:31 [324] node203 corosync crit    [QUORUM] Quorum provider: corosync_votequorum failed to initialize.
Aug 29 17:14:31 [324] node203 corosync error   [SERV  ] Service engine 'corosync_quorum' failed to load for reason 'configuration error: nodelist or quorum.expected_votes must be configured!'
Aug 29 17:14:31 [324] node203 corosync error   [MAIN  ] Corosync Cluster Engine exiting with status 20 at service.c:356.

如下所示,corosync.conf配置了 corosync_votequorum,必須配置expected_votes:

4)修改corosync.conf配置:

5)啟動corosync服務
[root@node201 corosync]# systemctl restart corosync

6)檢視qdevice狀態

[root@node202 ~]# pcs quorum status
Quorum information
------------------
Date:             Thu Aug 29 17:41:02 2024
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          3232235978
Ring ID:          -1062731319/67
Quorate:          No

Votequorum information
----------------------
Expected votes:   7
Highest expected: 7
Total votes:      2
Quorum:           1 Activity blocked
Flags:            2Node WaitForAll LastManStanding
Unable to get node 3232235979 info

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
3232235977          1         NR node201
3232235978          1         NR node202 (local)
3232235979          0         NR node203

案例2、新增qdevice故障

如下所示,叢集節點新增qdevice時,出現python錯誤:

[root@node201 corosync]# pcs quorum device add model net host=node203 algorithm=ffsplit
Setting up qdevice certificates on nodes...
Traceback (most recent call last):
  File "/usr/sbin/pcs", line 9, in <module>
    load_entry_point('pcs==0.9.169', 'console_scripts', 'pcs')()
......
  File "/usr/lib/python2.7/site-packages/pcs/common/node_communicator.py", line 160, in url
    host="[{0}]".format(self.host) if ":" in self.host else self.host,
TypeError: argument of type 'NoneType' is not iterable

建立叢集:

[root@node201 pcs]# pcs cluster setup --name test_cluster node201 node202 node203 --force
Destroying cluster on nodes: node201, node202, node203...
node202: Stopping Cluster (pacemaker)...
node203: Stopping Cluster (pacemaker)...
node201: Stopping Cluster (pacemaker)...
node203: Successfully destroyed cluster
node202: Successfully destroyed cluster
node201: Successfully destroyed cluster

Sending 'pacemaker_remote authkey' to 'node201', 'node202', 'node203'
node201: successful distribution of the file 'pacemaker_remote authkey'
node203: successful distribution of the file 'pacemaker_remote authkey'
node202: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
node201: Succeeded
node202: Succeeded
node203: Succeeded

Synchronizing pcsd certificates on nodes node201, node202, node203...
node201: Success
node203: Success
node202: Success
Restarting pcsd on the nodes in order to reload the certificates...
node201: Success
node203: Success
node202: Success
[root@node201 pcs]# pcs cluster setup --name test_cluster node201 node202 node203 --force
Destroying cluster on nodes: node201, node202, node203...
node202: Stopping Cluster (pacemaker)...
node203: Stopping Cluster (pacemaker)...
node201: Stopping Cluster (pacemaker)...
node203: Successfully destroyed cluster
node202: Successfully destroyed cluster
node201: Successfully destroyed cluster

Sending 'pacemaker_remote authkey' to 'node201', 'node202', 'node203'
node201: successful distribution of the file 'pacemaker_remote authkey'
node203: successful distribution of the file 'pacemaker_remote authkey'
node202: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
node201: Succeeded
node202: Succeeded
node203: Succeeded

Synchronizing pcsd certificates on nodes node201, node202, node203...
node201: Success
node203: Success
node202: Success
Restarting pcsd on the nodes in order to reload the certificates...
node201: Success
node203: Success
node202: Success

如下所示,在建立叢集后,節點新增qdevice成功:

[root@node201 pcs]# pcs quorum device add model net host=node203 algorithm=ffsplit --force
Setting up qdevice certificates on nodes...
node201: Succeeded
node202: Succeeded
node203: Succeeded
Enabling corosync-qdevice...
node203: corosync-qdevice enabled
node201: corosync-qdevice enabled
node202: corosync-qdevice enabled
Sending updated corosync.conf to nodes...
node201: Succeeded
node202: Succeeded
node203: Succeeded
Corosync configuration reloaded
Starting corosync-qdevice...
node203: corosync-qdevice started
node201: corosync-qdevice started
node202: corosync-qdevice started

相關文章