MySQL 組複製故障恢復的有效策略

abce發表於2024-08-19

沒有MGR環境,只是學學別人經驗。原文地址:https://www.percona.com/blog/effective-strategies-for-recovering-mysql-group-replication-from-failures/

組複製是一種容錯/高可用複製拓撲結構,可確保在主節點當機時,由其他候選成員或輔助成員之一接管,從而使寫入和讀取操作不間斷地繼續進行。

不過,在某些情況下,由於故障、網路分割槽或資料庫崩潰,組內成員資格可能會被破壞,或者我們最終會發現一些孤立的成員。在這種情況下,我們必須執行部分或全部恢復操作,使整個拓撲結構再次處於活動狀態。

假設有下面的情況: 當主節點 [Node1] 正在進行讀/寫操作時,所有組節點都當機了。

1)下面是使用 dbdeployer 工具設定的組複製拓撲。

node1 [localhost:23637] {root} ((none)) > select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | 00023637-1111-1111-1111-111111111111 | 127.0.0.1   |       23637 | ONLINE       | PRIMARY     | 8.0.36         | XCom                       |
| group_replication_applier | 00023638-2222-2222-2222-222222222222 | 127.0.0.1   |       23638 | ONLINE       | SECONDARY   | 8.0.36         | XCom                       |
| group_replication_applier | 00023639-3333-3333-3333-333333333333 | 127.0.0.1   |       23639 | ONLINE       | SECONDARY   | 8.0.36         | XCom                       |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
3 rows in set (0.01 sec)

2)在主節點(127.0.0.1:23637)上執行負載

sysbench 
--db-driver=mysql 
--mysql-user=sbtest_user 
--mysql-password=Sbtest@2022 
--mysql-db=sbtest 
--mysql-host=127.0.0.1 
--mysql-port=23637 
--tables=10 
--table-size=10000 
--create_secondary=off 
--threads=50 
--time=0 
--events=0 
--report-interval=1 /opt/homebrew/Cellar/sysbench/1.0.20_6/share/sysbench/oltp_read_write.lua run

執行結果:

[ 1s ] thds: 50 tps: 1726.75 qps: 35178.95 (r/w/o: 24710.96/6964.73/3503.26) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 2s ] thds: 50 tps: 2528.19 qps: 50605.04 (r/w/o: 35405.76/10142.90/5056.39) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 3s ] thds: 50 tps: 2316.05 qps: 46257.26 (r/w/o: 32348.96/9290.15/4618.15) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
...

3)在這裡,我們特意停止了所有的組節點。

./stop_all 
# executing 'stop' on /Users/aniljoshi/sandboxes/group_msb_8_0_36
executing 'stop' on node3
stop /Users/aniljoshi/sandboxes/group_msb_8_0_36/node3
executing 'stop' on node2
stop /Users/aniljoshi/sandboxes/group_msb_8_0_36/node2
executing 'stop' on node1
stop /Users/aniljoshi/sandboxes/group_msb_8_0_36/node1

至此,群集已完全癱瘓,所有群組成員都已停止工作。現在,我們將再次嘗試恢復群集。

引導/恢復群集節點

4) 首先,讓我們啟動每個節點。由於我使用的是 dbdeployer,我可以透過下面的一些內部命令/指令碼來管理(啟動/停止)或做其他事情。

shell> ./start_all 
# executing 'start' on /Users/aniljoshi/sandboxes/group_msb_8_0_36
executing "start" on node 1
... sandbox server started
executing "start" on node 2
.. sandbox server started
executing "start" on node 3
.. sandbox server started

5) 連線到每個節點,驗證當前狀態。

node1 [localhost:23637] {root} ((none)) > select * from performance_schema.replication_group_members;
+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME              | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier |           |             |        NULL | OFFLINE      |             |                |                            |
+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+----------------------------+
1 row in set (0.00 sec)

node2 [localhost:23638] {root} ((none)) > select * from performance_schema.replication_group_members;
+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME              | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier |           |             |        NULL | OFFLINE      |             |                |                            |
+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+----------------------------+
1 row in set (0.01 sec)

node3 [localhost:23639] {root} ((none)) >  select * from performance_schema.replication_group_members;
+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME              | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier |           |             |        NULL | OFFLINE      |             |                |                            |
+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+----------------------------+
1 row in set (0.01 sec)

此時,所有的節點都是offline狀態。

6) 我們可以透過引導群組中的一個節點來恢復群集,然後再啟動其他節點後,它們將透過分散式恢復過程加入。但在此之前,瞭解每個節點上已認證/已提交事務的狀態非常重要,這樣我們就不會啟動錯誤的節點。啟動錯誤的節點會導致極大的資料丟失!

注意:在我們的例子中,“received_transaction_set ”為空,因為所有節點都重新啟動了。但是,如果節點沒有重啟過,只有組複製因網路中斷或其他事件而損壞,那麼我們也可以觀察這部分的值。

ode1 [localhost:23637] {root} ((none)) > SELECT @@GLOBAL.GTID_EXECUTED
    -> ;
+-----------------------------------------------+
| @@GLOBAL.GTID_EXECUTED                        |
+-----------------------------------------------+
| 00023636-bbbb-cccc-dddd-eeeeeeeeeeee:1-148034 |
+-----------------------------------------------+
1 row in set (0.00 sec)

node1 [localhost:23637] {root} ((none)) > SELECT received_transaction_set FROM 
    ->       performance_schema.replication_connection_status WHERE 
    ->       channel_name="group_replication_applier";
+--------------------------+
| received_transaction_set |
+--------------------------+
|                           |
+--------------------------+
1 row in set (0.01 sec)

node2 [localhost:23638] {msandbox} ((none)) > SELECT @@GLOBAL.GTID_EXECUTED;
+----------------------------------------------+
| @@GLOBAL.GTID_EXECUTED                       |
+----------------------------------------------+
| 00023636-bbbb-cccc-dddd-eeeeeeeeeeee:1-96208 |
+----------------------------------------------+
1 row in set (0.00 sec)


node2 [localhost:23638] {msandbox} ((none)) > SELECT received_transaction_set FROM 
    ->       performance_schema.replication_connection_status WHERE 
    ->       channel_name="group_replication_applier";
+--------------------------+
| received_transaction_set |
+--------------------------+
|                          |
+--------------------------+
1 row in set (0.01 sec)

node3 [localhost:23639] {msandbox} ((none)) > SELECT @@GLOBAL.GTID_EXECUTED;
+----------------------------------------------+
| @@GLOBAL.GTID_EXECUTED                       |
+----------------------------------------------+
| 00023636-bbbb-cccc-dddd-eeeeeeeeeeee:1-76208 |
+----------------------------------------------+
1 row in set (0.00 sec)


node3 [localhost:23639] {msandbox} ((none)) > SELECT received_transaction_set FROM 
    ->       performance_schema.replication_connection_status WHERE 
    ->       channel_name="group_replication_applier";
+--------------------------+
| received_transaction_set |
+--------------------------+
|                          |
+--------------------------+
1 row in set (0.00 sec)

從這裡的 gtid_executed 資訊可以看出,Node1(“00023636-bbbb-cccc-ddd-eeeeeeeeeeee:1-148034”) 擁有最新的 GTID ,因此我們應該引導這個節點。

請注意:收集完所有組員的事務集後,請對它們進行比較,找出哪個組員的事務集最大,包括已執行事務(gtid_executed)和已認證事務(在 group_replication_applier 通道上)。可以透過檢視 GTID 手動進行比較,也可以使用儲存函式(GTID_SUBSET 和 GTID_SUBSTRACT)比較 GTID 集。

mysql> SELECT @@GTID_SUBSET();
mysql> SELECT @@GTID_SUBSTRACT();

7) 讓我們開始 Node1 的引導過程。

mysql> SET GLOBAL group_replication_bootstrap_group=ON;
mysql> START GROUP_REPLICATION;
mysql> SET GLOBAL group_replication_bootstrap_group=OFF;

現在,如果我們檢查 Node1 的狀態,就會發現該成員已線上。

node1 [localhost:23637] {msandbox} ((none)) > select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | 00023637-1111-1111-1111-111111111111 | 127.0.0.1   |       23637 | ONLINE       | PRIMARY     | 8.0.36         | XCom                       |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
1 row in set (0.00 sec)

8) 現在,我們只需啟動其他節點的組複製即可。如果二進位制日誌已不存在於任何可用的組節點上,則根據源上可用的二進位制日誌的狀態轉移,或透過克隆外掛對資料進行完全克隆,透過分散式恢復機制加入節點。

node2 [localhost:23638] {msandbox} ((none)) > start group_replication;

 node2 [localhost:23638] {root} ((none)) > select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | 00023637-1111-1111-1111-111111111111 | 127.0.0.1   |       23637 | ONLINE       | PRIMARY     | 8.0.36         | XCom                       |
| group_replication_applier | 00023638-2222-2222-2222-222222222222 | 127.0.0.1   |       23638 | ONLINE       | SECONDARY   | 8.0.36         | XCom                       |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
2 rows in set (0.00 sec)

node3 [localhost:23639] {root} ((none)) > start group_replication;
Query OK, 0 rows affected (1.36 sec)

node3 [localhost:23639] {root} ((none)) >  select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | 00023637-1111-1111-1111-111111111111 | 127.0.0.1   |       23637 | ONLINE       | PRIMARY     | 8.0.36         | XCom                       |
| group_replication_applier | 00023638-2222-2222-2222-222222222222 | 127.0.0.1   |       23638 | ONLINE       | SECONDARY   | 8.0.36         | XCom                       |
| group_replication_applier | 00023639-3333-3333-3333-333333333333 | 127.0.0.1   |       23639 | ONLINE       | SECONDARY   | 8.0.36         | XCom                       |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
3 rows in set (0.00 sec)

注意:如果二進位制日誌以某種方式被rotated或清除,那麼我們需要從備份中恢復,或者我們可以設定克隆外掛,它將自動執行完整的快照過程。

此外,我們還可以透過執行以下命令來驗證節點是否仍在進行分散式恢復。當恢復完成且節點顯示 “SECONDARY”狀態時,複製程序 [Slave_IO_Running/Slave_SQL_Running] 將停止。

node2 [localhost:23638] {root} ((none)) > SHOW SLAVE STATUS FOR CHANNEL 'group_replication_recovery'G;

到這個階段,我們已經有了一個完全執行的叢集。

從備份中恢復

在極少數情況下,當資料完全損壞、想要進行部分恢復或新增新節點時,我們可以直接執行備份/恢復活動。

讓我們試著用下面的場景來理解一下。在此,我們將嘗試使用物理備份 [Percona XtraBackup] 恢復群集。

1) 為演示目的,我們從群組中的一個節點進行了備份。

shell> xtrabackup --host=127.0.0.1 -usbtest_user -pSbtest@2022 -P23637 --backup --target-dir=/Users/aniljoshi/backup/

備份結果:

...
2024-08-03T19:54:50.102188+05:30 0 [Note] [MY-011825] [Xtrabackup] Transaction log of lsn (266657170) to (266657878) was copied.
2024-08-03T19:54:50.329784+05:30 0 [Note] [MY-011825] [Xtrabackup] completed OK!

2) 為了讓備份有用,我們也要做好 prepare。

shell> xtrabackup --prepare --use-memory=2G --target-dir=/Users/aniljoshi/backup/

輸出結果:

...
2024-08-03T19:58:00.432140+05:30 0 [Note] [MY-012980] [InnoDB] Shutdown completed; log sequence number 266658326
2024-08-03T19:58:00.434701+05:30 0 [Note] [MY-011825] [Xtrabackup] completed OK!

現在,考慮到我們要使用這些資料恢復群集。

3) 首先,我們將清理目標成員 [Node1] 的資料目錄。

shell> rm -rf  /Users/aniljoshi/sandboxes/group_msb_8_0_36/node1/data/      

4) 然後,我們將把prepare好的資料複製到 Node1 資料目錄下。

shell> cp -aR /Users/aniljoshi/backup/  /Users/aniljoshi/sandboxes/group_msb_8_0_36/node1/data/

5) 引導節點 [Node1]

mysql> SET GLOBAL group_replication_bootstrap_group=ON;
mysql> START GROUP_REPLICATION;
mysql> SET GLOBAL group_replication_bootstrap_group=OFF;
 
node1 [localhost:23637] {root} (performance_schema) > select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | 11400bec-51a6-11ef-b950-29b40b1b64a7 | 127.0.0.1   |       23637 | ONLINE       | PRIMARY     | 8.0.36         | XCom                       |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
1 row in set (0.00 sec)

其他節點可以透過分散式恢復/克隆流程加入(如果已經設定);不過,這裡我們將演示如何使用備份執行還原/恢復。

6) 讓我們從 Node1 獲取一個新的備份,並記住 GTID 執行資訊。

shell> xtrabackup --host=127.0.0.1 -usbtest_user -pSbtest@2022 -P23637 --backup --target-dir=/Users/aniljoshi/backup/Primary/

備份結果:

...
2024-08-03T20:16:36.182978+05:30 0 [Note] [MY-011825] [Xtrabackup] Transaction log of lsn (266723771) to (266724479) was copied.
2024-08-03T20:16:36.408616+05:30 0 [Note] [MY-011825] [Xtrabackup] completed OK!

GTID的詳細資訊:

shell> cat /Users/aniljoshi/backup/Primary/xtrabackup_binlog_info
mysql-bin.000006    193    00023636-bbbb-cccc-dddd-eeeeeeeeeeee:1-148040

7) prepare備份資料

shell> xtrabackup --prepare --use-memory=2G --target-dir=/Users/aniljoshi/backup/Primary/

輸出結果:

...
2024-08-03T20:17:47.817955+05:30 0 [Note] [MY-013072] [InnoDB] Starting shutdown...
2024-08-03T20:17:47.822229+05:30 0 [Note] [MY-013084] [InnoDB] Log background threads are being closed...
2024-08-03T20:17:47.865162+05:30 0 [Note] [MY-012980] [InnoDB] Shutdown completed; log sequence number 266724886
2024-08-03T20:17:47.867836+05:30 0 [Note] [MY-011825] [Xtrabackup] completed OK!

8) 清理 Node2 資料目錄,並將prepare好的備份複製到 Node2 上。

shell> cp -aR /Users/aniljoshi/backup/Primary/  /Users/aniljoshi/sandboxes/group_msb_8_0_36/node2/data/

9) 重啟服務後,我們就可以驗證是否應用了 gtid_purged。

node2 [localhost:23638] {root} ((none)) > show global variables like '%gtid_purged%';
+---------------+-----------------------------------------------+
| Variable_name | Value                                         |
+---------------+-----------------------------------------------+
| gtid_purged   | 00023636-bbbb-cccc-dddd-eeeeeeeeeeee:1-148040 |
+---------------+-----------------------------------------------+
1 row in set (0.01 sec)

否則,我們可以使用下面的命令設定 gtid_purged。

MySQL> set global gtid_purged="00023636-bbbb-cccc-dddd-eeeeeeeeeeee:1-148040 ";

10) 最後,我們可以建立複製並啟動組複製程序。

node2 [localhost:23638] {root} ((none)) > change master to master_user=rsandbox,master_password='rsandbox’ FOR CHANNEL 'group_replication_recovery';

node2 [localhost:23638] {root} ((none)) > start group_replication;
Query OK, 0 rows affected (1.97 sec)

node2 [localhost:23638] {root} ((none)) > select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | 11400bec-51a6-11ef-b950-29b40b1b64a7 | 127.0.0.1   |       23637 | ONLINE       | PRIMARY     | 8.0.36         | XCom                       |
| group_replication_applier | c2b74588-51a7-11ef-81da-5f8da38bd06f | 127.0.0.1   |       23638 | ONLINE       | SECONDARY     | 8.0.36         | XCom                       |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
2 rows in set (0.00 sec)

節點 [Node3] 後續也可執行類似步驟。此外,除了 Percona XtraBackup ,我們還可以使用一些邏輯備份/恢復方式,如 [mydumper/mysqldump/MySQL shell 等] 來執行恢復並建立組複製程序。我們在此不對其進行討論;不過,其過程通常與我們在任何非同步複製設定中所做的相同。一旦以預期的 gtid 座標完成恢復,我們就可以執行下面的命令。

MySQL> change master to master_user=rsandbox,master_password='rsandbox’ FOR CHANNEL 'group_replication_recovery';
MySQL> start group_replication;

此外,自 MySQL 8.0.17 起,克隆外掛可用來在組複製中進行分散式恢復的遠端克隆操作。這需要執行一些額外的配置和步驟。

透過克隆恢復

在 Donor 節點[Node1]上執行:

1) 動態安裝外掛

node1 [localhost:23637] {root} ((none)) > install plugin clone soname 'mysql_clone.so';
Query OK, 0 rows affected (0.16 sec)

2) 在資料庫中持久化變更

[mysqld]
plugin-load-add=mysql_clone.so
clone=FORCE_PLUS_PERMANENT

3) 授予複製使用者"rsandbox" BACKUP_ADMIN 許可權,由其負責分散式恢復流程

node1 [localhost:23637] {root} ((none)) > show grants for rsandbox@'%';
+---------------------------------------------+
| Grants for rsandbox@%                       |
+---------------------------------------------+
| GRANT USAGE ON *.* TO `rsandbox`@`%`        |
| GRANT BACKUP_ADMIN ON *.* TO `rsandbox`@`%` |
| GRANT `R_REPLICATION`@`%` TO `rsandbox`@`%` |
+---------------------------------------------+
3 rows in set (0.00 sec)

4) 驗證外掛狀態

node3 [localhost:23639] {root} ((none)) > select plugin_name,plugin_status from information_schema.plugins where plugin_name='clone';
+-------------+---------------+
| plugin_name | plugin_status |
+-------------+---------------+
| clone       | ACTIVE        |
+-------------+---------------+
1 row in set (0.00 sec)

在Recipient節點 [Node3] 上執行:

1) 動態安裝外掛

node1 [localhost:23637] {root} ((none)) > install plugin clone soname 'mysql_clone.so';
Query OK, 0 rows affected (0.16 sec)

2) 在資料庫中持久化變更

[mysqld]
plugin-load-add=mysql_clone.so
clone=FORCE_PLUS_PERMANENT

3) 驗證外掛狀態

node3 [localhost:23639] {root} ((none)) > select plugin_name,plugin_status from information_schema.plugins where plugin_name='clone';
+-------------+---------------+
| plugin_name | plugin_status |
+-------------+---------------+
| clone       | ACTIVE        |
+-------------+---------------+
1 row in set (0.00 sec)

4) 在所有相關/群組節點上定義克隆分散式恢復的閾值。節點將使用增量資料同步,除非事務間隙超過下面的閾值,在這種情況下將執行資料克隆過程。在本次測試中,我們使用了一個很小的值,儘管預設值很大。

node3 [localhost:23637] {root} ((none)) >  set global group_replication_clone_threshold = 10;
Query OK, 0 rows affected (0.00 sec)

5) 最後,啟動 Node3。在 MySQL 日誌中,我們可以看到透過克隆外掛進行的分散式恢復已經開始。該過程將替換或克隆 Donor/Source [Node1] 的整個資料目錄。

2024-08-03T18:21:04.039835Z 0 [System] [MY-013471] [Repl] Plugin group_replication reported: 'Distributed recovery will transfer data using: Cloning from a remote group donor.'
2024-08-03T18:21:03.033878Z 0 [Warning] [MY-013469] [Repl] Plugin group_replication reported: 'This member will start distributed recovery using clone. It is due to the number of missing transactions being higher than the configured threshold of 10.'
2024-08-03T18:21:04.150730Z 132 [Warning] [MY-013460] [InnoDB] Clone removing all user data for provisioning: Started
...
/Users/aniljoshi/opt/mysql/8.0.36/bin/mysqld: Shutdown complete (mysqld 8.0.36)  MySQL Community Server - GPL.
2024-08-03T18:21:09.6NZ mysqld_safe mysqld restarted

6) 接下來,我們可以使用下面的檔案和命令跟蹤 GTID 的執行和座標資訊。

shell> cat #view_status 
2 1
1722709264045254 1722709271116851
127.0.0.1:23637
0

./mysql-bin.000006
4836
00023636-bbbb-cccc-dddd-eeeeeeeeeeee:1-148056

或者

node3 [localhost:23639] {root} ((none)) > select * from performance_schema.clone_statusG;
*************************** 1. row ***************************
             ID: 1
            PID: 0
          STATE: Completed
     BEGIN_TIME: 2024-08-03 23:51:04.045
       END_TIME: 2024-08-03 23:51:11.117
         SOURCE: 127.0.0.1:23637
    DESTINATION: LOCAL INSTANCE
       ERROR_NO: 0
  ERROR_MESSAGE: 
    BINLOG_FILE: mysql-bin.000006
BINLOG_POSITION: 4836
  GTID_EXECUTED: 00023636-bbbb-cccc-dddd-eeeeeeeeeeee:1-148056
1 row in set (0.00 sec)



node3 [localhost:23639] {root} ((none)) > show variables like 'gtid_purged';
+---------------+-----------------------------------------------+
| Variable_name | Value                                         |
+---------------+-----------------------------------------------+
| gtid_purged   | 00023636-bbbb-cccc-dddd-eeeeeeeeeeee:1-148056 |
+---------------+-----------------------------------------------+
1 row in set (0.01 sec)

否則,我們可以使用下面的命令來設定 gtid_purged,方法是傳遞上面克隆狀態中的gtid。

node3 [localhost:23639] {root} ((none)) > set global gtid_purged="00023636-bbbb-cccc-dddd-eeeeeeeeeeee:1-128056 ";

7) 執行 CHANGE REPLICATION 命令並啟動組複製。

node3 [localhost:23639] {root} ((none)) > CHANGE REPLICATION SOURCE TO SOURCE_USER='rsandbox', SOURCE_PASSWORD='rsandbox' FOR CHANNEL 'group_replication_recovery';
Query OK, 0 rows affected, 2 warnings (0.01 sec)

node3 [localhost:23639] {root} ((none)) > start group_replication;
Query OK, 0 rows affected (10.69 sec)

8) 最後,我們可以看到 Node3 顯示在組複製拓撲中。

node3 [localhost:23639] {root} ((none)) >  select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | 00023639-3333-3333-3333-333333333333 | 127.0.0.1   |       23639 | ONLINE       | SECONDARY   | 8.0.36         | XCom                       |
| group_replication_applier | 11400bec-51a6-11ef-b950-29b40b1b64a7 | 127.0.0.1   |       23637 | ONLINE       | PRIMARY     | 8.0.36         | XCom                       |
| group_replication_applier | c2b74588-51a7-11ef-81da-5f8da38bd06f | 127.0.0.1   |       23638 | ONLINE       | SECONDARY   | 8.0.36         | XCom                       |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
3 rows in set (0.01 sec)

相關文章