MySQL命令rebootClusterFromCompleteOutage重啟叢集注意事項

潇湘隐者發表於2024-10-12

rebootClusterFromCompleteOutage()是MySQL Shell中的一個實用命令,用於在 InnoDB 叢集遇到完全中斷 (例如,當組複製在所有成員例項上停止時)後重新配置和恢復叢集。這個命令允許你連線到叢集中的一個 MySQL 例項,並使用該例項的後設資料來恢復整個叢集。

在MySQL Shell中使用rebootClusterFromCompleteOutage命令啟動叢集(MySQL InnoDB Cluster)

MySQL  mysqlu01:7306 ssl  JS > dba.rebootClusterFromCompleteOutage()
Restoring the Cluster 'Cluster_GSP' from complete outage...

Cluster instances: 'mysqlu01:7306' (OFFLINE), 'mysqlu02:7306' (OFFLINE), 'mysqlu03:7306' (OFFLINE)
Waiting for instances to apply pending received transactions...
Validating instance configuration at mysqlu01:7306...

This instance reports its own address as mysqlu01:7306

Instance configuration is suitable.
* Waiting for seed instance to become ONLINE...
mysqlu01:7306 was restored.
Validating instance configuration at mysqlu02:7306...

This instance reports its own address as mysqlu02:7306

Instance configuration is suitable.
Rejoining instance 'mysqlu02:7306' to cluster 'Cluster_GSP'...

Re-creating recovery account...
NOTE: User 'mysql_innodb_cluster_201'@'%' already existed at instance 'mysqlu01:7306'. It will be deleted and created again with a new password.

* Waiting for the Cluster to synchronize with the PRIMARY Cluster...
** Transactions replicated ############################################################ 100%

The instance 'mysqlu02:7306' was successfully rejoined to the cluster.

Validating instance configuration at mysqlu03:7306...

This instance reports its own address as mysqlu03:7306

Instance configuration is suitable.
Rejoining instance 'mysqlu03:7306' to cluster 'Cluster_GSP'...

Re-creating recovery account...
NOTE: User 'mysql_innodb_cluster_202'@'%' already existed at instance 'mysqlu01:7306'. It will be deleted and created again with a new password.

* Waiting for the Cluster to synchronize with the PRIMARY Cluster...
** Transactions replicated ############################################################ 100%

The instance 'mysqlu03:7306' was successfully rejoined to the cluster.

The Cluster was successfully rebooted.

<Cluster:Cluster_GSP>
MySQL mysqlu01:7306 ssl JS >

那麼,關於命令rebootClusterFromCompleteOutage的一些知識點或注意事項,必須弄清楚,避免踩坑或稀裡糊塗的。下面是一些簡單總結,當前測試環境為MySQL 8.0.35,隨著版本的變更,可能會有一些特性變化,請以實際情況為準。

1.確保叢集的所有成員/節點在執行命令之前都已啟動:在執行dba.rebootClusterFromCompleteOutage()命令之前,需要確保所有叢集成員的MySQL例項都已啟動並且可以訪問。 如果無法訪問任何叢集成員,該命令將失敗。如下所示:

情況1: 三個節點的MySQL服務都未啟動的情況

 MySQL  JS > \c icadmin@mysqlu03:7306
Creating a session to 'icadmin@mysqlu03:7306'
MySQL Error 2003 (HY000): Can't connect to MySQL server on 'mysqlu03:7306' (111)
MySQL JS >

三個節點都沒有啟動時,MySQL Shell甚至都無法連線上。

情況2: 三個節點中,有一個或兩個節點未啟動情況

 MySQL  mysqlu01:7306 ssl  JS > dba.rebootClusterFromCompleteOutage()
Restoring the Cluster 'Cluster_GSP' from complete outage...

Cluster instances: 'mysqlu01:7306' (OFFLINE), 'mysqlu02:7306' (UNREACHABLE), 'mysqlu03:7306' (UNREACHABLE)
WARNING: One or more instances of the Cluster could not be reached and cannot be rejoined nor ensured to be OFFLINE: 'mysqlu02:7306', 'mysqlu03:7306'. Cluster may diverge and become inconsistent unless all instances are either reachable or certain to be OFFLINE and not accepting new transactions. You may use the 'force' option to bypass this check and proceed anyway.
ERROR: Could not determine if Cluster is completely OFFLINE
Dba.rebootClusterFromCompleteOutage: Could not determine if Cluster is completely OFFLINE (RuntimeError)
MySQL mysqlu01:7306 ssl JS >

2.這種啟動方式,它會自動找出GTID值最大的成員/節點作為MGR的引導節點嗎?如果不能,是否需要mysql shell連線到叢集中GTID最大的成員/節點,然後執行這個命令呢?

如下所示,叢集中有三個節點mysqlu01,mysqlu02,mysqlu03

mysql> select * from performance_schema.replication_group_members order by member_host\G;
*************************** 1. row ***************************
CHANNEL_NAME: group_replication_applier
MEMBER_ID: 591280ce-bb5f-11ee-8862-00505697b437
MEMBER_HOST: mysqlu01
MEMBER_PORT: 7306
MEMBER_STATE: ONLINE
MEMBER_ROLE: PRIMARY
MEMBER_VERSION: 8.0.35
MEMBER_COMMUNICATION_STACK: MySQL
*************************** 2. row ***************************
CHANNEL_NAME: group_replication_applier
MEMBER_ID: 6ae7d68b-ba96-11ee-8092-005056971158
MEMBER_HOST: mysqlu02
MEMBER_PORT: 7306
MEMBER_STATE: ONLINE
MEMBER_ROLE: SECONDARY
MEMBER_VERSION: 8.0.35
MEMBER_COMMUNICATION_STACK: MySQL
*************************** 3. row ***************************
CHANNEL_NAME: group_replication_applier
MEMBER_ID: 4fc54bd5-bbf3-11ee-b588-0050569783ac
MEMBER_HOST: mysqlu03
MEMBER_PORT: 7306
MEMBER_STATE: ONLINE
MEMBER_ROLE: SECONDARY
MEMBER_VERSION: 8.0.35
MEMBER_COMMUNICATION_STACK: MySQL
3 rows in set (0.00 sec)

ERROR:
No query specified

mysql>

我們先用命令將節點mysqlu03的MySQL例項關閉

sudo systemctl stop mysqld.service

然後在mysqlu01中進行一些插入操作,人為模擬節點mysqlu01的GTID要大於mysqlu03

insert into test
select 1007, 'k1007' union all
select 1008, 'k1008';


mysql> insert into test
-> select 1007, 'k1007' union all
-> select 1008, 'k1008';
Query OK, 2 rows affected (0.01 sec)
Records: 2 Duplicates: 0 Warnings: 0

然後先後依次將mysqlu02,mysqlu01的MySQL服務關閉,然後啟動這三個節點的MySQL服務,然後使用mysqlsh連線到mysqlu03上來啟動

mysqlsh icadmin@mysqlu03:7306 

mysqlsh icadmin@mysqlu03:7306 --log-level=DEBUG3

如下所示:

 MySQL  mysqlu03:7306 ssl  JS > dba.rebootClusterFromCompleteOutage()
Restoring the Cluster 'Cluster_GSP' from complete outage...

Cluster instances: 'mysqlu01:7306' (OFFLINE), 'mysqlu02:7306' (OFFLINE), 'mysqlu03:7306' (OFFLINE)
Waiting for instances to apply pending received transactions...
Switching over to instance 'mysqlu01:7306' (which has the highest GTID set), to be used as seed.
NOTE: The instance 'mysqlu01:7306' is running auto-rejoin process, which will be cancelled.

Validating instance configuration at mysqlu01:7306...

This instance reports its own address as mysqlu01:7306

Instance configuration is suitable.
NOTE: Cancelling active GR auto-initialization at mysqlu01:7306
* Waiting for seed instance to become ONLINE...
mysqlu01:7306 was restored.
Validating instance configuration at mysqlu02:7306...

This instance reports its own address as mysqlu02:7306

Instance configuration is suitable.
Rejoining instance 'mysqlu02:7306' to cluster 'Cluster_GSP'...

Re-creating recovery account...
NOTE: User 'mysql_innodb_cluster_201'@'%' already existed at instance 'mysqlu01:7306'. It will be deleted and created again with a new password.

* Waiting for the Cluster to synchronize with the PRIMARY Cluster...
** Transactions replicated ############################################################ 100%

The instance 'mysqlu02:7306' was successfully rejoined to the cluster.

Validating instance configuration at mysqlu03:7306...

This instance reports its own address as mysqlu03:7306

Instance configuration is suitable.
Rejoining instance 'mysqlu03:7306' to cluster 'Cluster_GSP'...

Re-creating recovery account...
NOTE: User 'mysql_innodb_cluster_202'@'%' already existed at instance 'mysqlu01:7306'. It will be deleted and created again with a new password.

* Waiting for the Cluster to synchronize with the PRIMARY Cluster...
** Transactions replicated ############################################################ 100%

The instance 'mysqlu03:7306' was successfully rejoined to the cluster.

The Cluster was successfully rebooted.

<Cluster:Cluster_GSP>
MySQL mysqlu03:7306 ssl JS >

如上所示,從"Switching over to instance 'mysqlu01:7306' (which has the highest GTID set), to be used as seed."這個輸出資訊,我們可以清楚的 看到MySQL Shell中使用命令啟動叢集的時候,它會自動判斷哪一個節點的GTID值最大,這個測試案例中,mysqlu01節點的GTID值最大,它就將mysqlu01作為 MGR的引導節點。所以,使用命令(dba.rebootClusterFromCompleteOutage())重啟叢集時,可以連線到叢集的任一節點去啟動叢集。

注意:早期的版本,如果重新啟動叢集,需要連線到GTID最新的例項,也就是說必須連線到具有GTID超集的成員,即在中斷之前應用了最多事務的例項。如果當前連線的例項的GTID不是最新(最大的值)的,則會報錯,如下所示

Dba.rebootClusterFromCompleteOutage: The active session instance isn't the most updated 
in comparison with the ONLINE instances of the Cluster'
s metadata.
Please use the most up to date instance: '***.***.***.***:3306'. (RuntimeError)

這個僅僅在早期版本中有這個問題。

另外,需要注意的是,如果叢集中三個節點的GTID一致,如果叢集關閉前主節點為mysqlu01,如果此時MySQL shell連線到mysqlu03去啟動叢集,此時mysqlu03會切換為主節點(PRIMARY),而mysqlu01會切換為從節點(SECOND)

3.使用force選項會忽略GTID集的分歧,並將所選成員作為主節點,丟棄未包含在所選成員GTID集中的任何事務。如果此過程失敗,並且叢集後設資料已嚴重損壞, 你可能需要刪除後設資料並從頭開始重新建立叢集。這是一個危險的操作,因為它將刪除所有叢集後設資料,並且不能撤銷。 在實際操作中,應儘量避免使用force選項,因為它可能會引起資料不一致。只有在你完全瞭解可能產生的後果,並且沒有其他選擇時,才應考慮使用此選項。

如果你知道哪個節點具有最大的GTID值,但無法訪問其他節點,可以使用force選項來強制重啟叢集。這將使用剩餘可聯絡的成員來重啟叢集,即使某些成員當前無法訪問

4.這種方式是叢集中所有節點都關閉後使用,如果只是一個節點伺服器重啟或MySQL服務關閉後重啟,並不能使用這種方式。

相關文章