簡介:masterha_master_switch作為一個切換工具被整合在MHA程式包中,

安裝:編譯安裝MHA manager後會在/usr/local/bin/中生成二進位制可執行程式masterha_master_switch。

使用:

$masterha_master_switch --help
Usage:
    # For master failover

    masterha_master_switch --master_state=dead
    --global_conf=/etc/masterha_default.cnf
    --conf=/usr/local/masterha/conf/app1.cnf --dead_master_host=host1

    # For online master switch

    masterha_master_switch --master_state=alive
    --global_conf=/etc/masterha_default.cnf
    --conf=/usr/local/masterha/conf/app1.cnf

    See online reference
    (http://code.google.com/p/mysql-master-ha/wiki/masterha_master_switch)
    for details.

在這裡,我習慣將一套主從的配置都放在app1.cnf中,並且更改為業務相關的名稱,如mainBusiness.cnf

分析:

目標:獲取masterha_master_switch的線上切換邏輯

環境:MHA manager 192.168.1.8

       MHA node1+MySQL5.7+GTID 192.168.1.109+PORT3109MHA node1+MySQL5.7+GTID 192.168.1.110+PORT3110

配置檔案內容:

manager_workdir=/data/mha/mainBusiness                 #設定MHA的工作目錄
manager_log=/data/mha/mainBusiness/manager.log         #MHA manager的日誌輸出
remote_workdir=/data/mha/                              #預設MHA node端的工作目錄

master_binlog_dir= /data/mysql/3109/log/,/data/mysql/3110/log/  #預設MHA node端的binlog目錄                                                                                       
#secondary_check_script= masterha_secondary_check -s 192.168.1.109 -s 192.168.1.110                                                                    
secondary_check_script= masterha_secondary_check -s 192.168.1.109 -s 192.168.1.110 --user=root --master_host=192.168.1.109 --master_port=3109          

ping_interval=1                                       #設定MHA manager的檢測間隔(1秒)

[server1]
hostname=MySQL-Cent7-IP001109
ip=192.168.1.109
port=3109
ssh_user=root
ssh_port=22
candidate_master=1                                    #設定該節點是否可以提升為主,1為是,0否
check_repl_delay=0                                    #發生故障後是否檢查本例項主從落後程度,0否,1是

[server2]
hostname=MySQL-Cent7-IP001110
ip=192.168.1.110
port=3110
ssh_user=root
ssh_port=22
candidate_master=1                                    #設定該節點是否可以提升為主,1為是,0否
check_repl_delay=0                                    #發生故障後是否檢查本例項主從落後程度,0否,1是
    在MHA manager端上執行:

$masterha_master_switch --master_state=alive --conf=/etc/mha/mainBusiness.cnf  --orig_master_is_new_slave
#--master_state             指明線上切換
#--orig_master_is_new_slave 指定原先的主作為從庫掛到新的主上

MHA manager端輸出如下

#####################輸出段1###########################
[info] MHA::MasterRotate version 0.57.
[info] Starting online master switch..        #開始線上切換
[info] 
[info] * Phase 1: Configuration Check Phase.. #階段1,檢查配置
[info] 
[warning] Global configuration file /etc/masterha_default.cnf not found. Skipping..             #我這裡沒有使用全域性引數檔案,會有報錯跳過,沒關係
[info] Reading application default configuration from /etc/mha/mainBusiness.cnf..               #程式從mainBusiness檔案中讀取配置                                         
[info] Reading server configuration from /etc/mha/mainBusiness.cnf..
[info] GTID failover mode = 1                #啟用GTID故障轉移模式
[info] Current Alive Master: MySQL-Cent7-IP001109(192.168.1.109:3109)  #列出當前存活的主例項
[info] Alive Slaves:                         #列出當前存活的從例項
[info]   MySQL-Cent7-IP001110(192.168.1.110:3110)  Version=5.7.19-log (oldest major version between slaves) log-bin:enabled
[info]   GTID ON                             #從採用了GTID模式
[info]   Replicating from 192.168.1.109(192.168.1.109:3109)
[info]   Primary candidate for the new Master (candidate_master is set)

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on MySQL-Cent7-IP001109(192.168.1.109:3109)? (YES/no): 
#處於事務一致性考慮,程式詢問是否臨時關閉非事務型的表

原master,3109端顯示如下

#####################輸出段1###########################
SET wait_timeout=86400;                                #設定連線超時時間,防止切換時翻車
SELECT @@global.server_id As Value;                    
SELECT VERSION() AS Value;                             #獲取用於複製的server—id
SELECT @@global.gtid_mode As Value;                    #獲取自身是否使用了GTID複製
SHOW GLOBAL VARIABLES LIKE 'log_bin';                  #檢查自身是否啟用了binlog
SHOW MASTER STATUS;                                    #獲取自身的事務執行情況
SELECT @@global.datadir AS Value;                      #獲取自身資料檔案的儲存位置
SELECT @@global.slave_parallel_workers AS Value;       #檢查是否採用了多執行緒複製
SHOW SLAVE STATUS;                                     #獲取自身作為從庫時的事務執行情況
SELECT @@global.read_only As Value;                    #獲取自身是否開啟了只讀
SELECT @@global.relay_log_purge As Value;              #檢查自身是否開啟了relay log自動清除

原slave,3110端顯示如下

SELECT @@global.server_id As Value;
SELECT VERSION() AS Value;
SELECT @@global.gtid_mode As Value;
SHOW GLOBAL VARIABLES LIKE 'log_bin';
SHOW MASTER STATUS;
SELECT @@global.datadir AS Value;
SELECT @@global.slave_parallel_workers AS Value;
SHOW SLAVE STATUS;
SELECT @@global.read_only As Value;
SELECT @@global.relay_log_purge As Value;
SELECT @@global.relay_log_info_repository AS Value;     #差異處,獲取自身relay資訊儲存形式(table)
SELECT Relay_log_name FROM mysql.slave_relay_log_info;  #差異處,獲取正在使用的relay檔名稱
SELECT @@global.datadir AS Value;                       
SHOW SLAVE STATUS;
SELECT Repl_slave_priv AS Value FROM mysql.user WHERE user = 'repl';  #差異處,檢查複製使用者是否具有複製許可權

第一部分小結:

        讀取配置檔案,確認主從關係與複製方式;

        根據主從關係複製方式,連線主庫:設定必要引數,獲取的複製詳細資訊/

                            連線從庫:獲取複製的詳細資訊,獲取relay資訊,獲取repl賬號並確認許可權

接MHA manager端輸出確認輸入yes後

MHA manager端輸出如下:

#####################輸出段2###########################
[info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
[info]  ok.               #注意,這裡雖然顯示關閉了非事務表,但是上面的抓包語句裡面並沒有出現相關語句
[info] Checking MHA is not monitoring or doing failover..  #檢查MHA是否工作,切換時要求MHA manager停止執行
[info] Checking replication health on MySQL-Cent7-IP001110.. #檢查主從健康程度
[info]  ok.
[info] Searching new master from slaves..                    #開始在從庫中選取一個新主
[info]  Candidate masters from the configuration file:       #列出候選從庫
[info]   MySQL-Cent7-IP001109(192.168.1.109:3109)  Version=5.7.19-log log-bin:enabled
[info]     GTID ON                                          #檢查GTID開啟情況
[info]   MySQL-Cent7-IP001110(192.168.1.110:3110)  Version=5.7.19-log (oldest major version between slaves) log-bin:enabled
[info]     GTID ON                                          #檢查GTID開啟情況
[info]     Replicating from 192.168.1.109(192.168.1.109:3109)
[info]     Primary candidate for the new Master (candidate_master is set)
[info]  Non-candidate masters:
[info]  Searching from candidate_master slaves which have received the latest relay log events..
[info] #在所有從庫中選取relay log最新的一個作為新的主庫
From:
MySQL-Cent7-IP001109(192.168.1.109:3109) (current master)
 +--MySQL-Cent7-IP001110(192.168.1.110:3110)

To:
MySQL-Cent7-IP001110(192.168.1.110:3110) (new master)
 +--MySQL-Cent7-IP001109(192.168.1.109:3109)

Starting master switch from MySQL-Cent7-IP001109(192.168.1.109:3109) to MySQL-Cent7-IP001110(192.168.1.110:3110)? (yes/NO): 
#程式詢問是否可以進行切換了

master,3109端顯示如下

#####################輸出段2###########################
USE `unknown_database`;
FLUSH NO_WRITE_TO_BINLOG TABLES;                              -- 重新整理表快取到硬碟,同時不記錄這條到binlog
SELECT GET_LOCK('MHA_Master_High_Availability_Monitor', '0') AS Value;   -- 加一個模擬鎖,防止出現切換程式多開的問題,這樣多開的程式以為獲取不到同名鎖就會失敗退出,
#GET_LOCK(str,time) -- str為鎖的名稱,0表示持續鎖,5.7.5之後可以存在多個名稱不同的GET_LOCK
SHOW PROCESSLIST;

原slave,3110端顯示如下

#####################輸出段2###########################
USE `unknown_database`;
SELECT GET_LOCK('MHA_Master_High_Availability_Failover', '0') AS Value;
SHOW SLAVE STATUS;
SHOW SLAVE STATUS;

第二部分小結:

        1.確認可以儲存關閉表後,重新整理表快取到磁碟, 避免丟資料,加模擬鎖防止切換程式多開造成切換異常

        2.從候選從庫中選出擁有最新資料的從庫,並將其設為要切換到的主庫

接MHA manager端輸出確認可以切換後

MHA manager端輸出如下:

#####################輸出段3###########################
[info] Checking whether MySQL-Cent7-IP001110(192.168.1.110:3110) is ok for the new master..
[info]  ok.   #檢查上一步中被選定的新主庫的是否真正可以成為主
[info] MySQL-Cent7-IP001109(192.168.1.109:3109): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.          #沒有檢查到原來主(現在被作為從)作為從庫的殘留資訊,不管3721先將其掛到一個空主上
[info] MySQL-Cent7-IP001109(192.168.1.109:3109): Resetting slave pointing to the dummy host.
[info] ** Phase 1: Configuration Check Phase completed.
[info] # 配置檢查階段結束
[info] * Phase 2: Rejecting updates Phase..
[info] #
master_ip_online_change_script is not defined. If you do not disable writes on the current master manually, applications keep writing on the current master. Is it ok to proceed? (yes/NO):
#切換程式檢測到主機虛擬IP地址切換的地址沒有定義,如果只切換主從身份的話,應用還會寫到原來的主上,需要設定只讀。詢問是否真的要切換,這裡我們只是做切換實驗,先觀察下兩個例項的輸出,然後直接切換即可。

原master,3109端顯示如下

USE `unknown_database`;
FLUSH NO_WRITE_TO_BINLOG TABLES;                                -- 重新整理表快取到磁碟, 避免同步丟資料的問題

SELECT GET_LOCK('MHA_Master_High_Availability_Monitor', '0') AS Value;   
SHOW PROCESSLIST;
SHOW SLAVE STATUS;
CHANGE MASTER TO MASTER_HOST='dummy_host';                               #將原主作為從切換到一個莫須有的主機上
SHOW SLAVE STATUS;
RESET SLAVE /*!50516 ALL */;                                             #嘗試清除自己之前殘存的slave屬性的資訊,若本機不為主時,需要!
SELECT RELEASE_LOCK('MHA_Master_High_Availability_Monitor') As Value;    

原slave,3110端顯示如下

USE `unknown_database`;
SELECT GET_LOCK('MHA_Master_High_Availability_Failover', '0') AS Value;
SHOW SLAVE STATUS;
SHOW SLAVE STATUS;
SHOW PROCESSLIST;

第三部分總結:

    1.檢查選定的從庫的詳細資訊,確認真的是否可以作為新的主庫

    2.處理從庫之間的資料比對,若只有一個主和從,則將原主庫先掛到一個莫須有的空主上

                            若有兩個以上的從庫,則需清理從庫原來的slave記錄

接MHA manager端輸出確認,強制切換後:

MHA manager端輸出如下:

#####################輸出段4###########################
[info] Locking all tables on the orig master to reject updates from everybody (including root):
[info] Executing FLUSH TABLES WITH READ LOCK..
[info]  ok.   #所有的表都禁止寫操作
[info] Orig master binlog:pos is 3109binlog.000070:536.
[info]  Waiting to execute all relay logs on MySQL-Cent7-IP001110(192.168.1.110:3110)..
[info]  master_pos_wait(3109binlog.000070:536) completed on MySQL-Cent7-IP001110(192.168.1.110:3110). Executed 0 events.
[info]   done.#將從庫已經獲得,但還未來得及執行的事務應用到自身,和其他沒有跟上的從庫
[info] Getting new master`s binlog name and position..
[info]  3110binlog.000049:536
[info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='MySQL-Cent7-IP001110 or 192.168.1.110', MASTER_PORT=3110, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';
[info] Setting read_only=0 on MySQL-Cent7-IP001110(192.168.1.110:3110)..
[info]  ok.   #即將所有其他從庫都用change master的語句切換到新的主,即原來的從110上
[info] 
[info] * Switching slaves in parallel..
[info]        #即將並行切換所有從庫
[info] Unlocking all tables on the orig master: 
[info] Executing UNLOCK TABLES..
[info]  ok.   #釋放原主庫的表鎖
[info] Starting orig master as a new slave..
[info]  Resetting slave MySQL-Cent7-IP001109(192.168.1.109:3109) and starting replication from the new master MySQL-Cent7-IP001110(192.168.1.110:3110)..
[info]  Executed CHANGE MASTER.
[info]  Slave started.
[info] All new slave servers switched successfully.
[info]        
[info] * Phase 5: New master cleanup phase..
[info]        #即將清理新從庫的主從資訊,切換到主庫
[info]  MySQL-Cent7-IP001110: Resetting slave info succeeded.
[info] Switching master to MySQL-Cent7-IP001110(192.168.1.110:3110) completed successfully.
              #切換完成

新slave,3109端顯示如下:

#####################輸出段4###########################
USE ``;
SELECT CONNECTION_ID() AS Value;
SET wait_timeout=86400;
SET GLOBAL read_only=1;                           #將自己設為只讀
SHOW MASTER STATUS;
UNLOCK TABLES;                                    #釋放表鎖
CHANGE MASTER TO MASTER_HOST='192.168.1.110', MASTER_PORT=3110, MASTER_USER='repl', MASTER_PASSWORD='123456', MASTER_AUTO_POSITION=1; #把自己指向MHA選定的新主
START SLAVE;
SHOW SLAVE STATUS;                                #差異處,啟動slave身份
SELECT RELEASE_LOCK('MHA_Master_High_Availability_Failover') As Value;   #釋放鎖止同應用的併發鎖

新master,3110端顯示如下:

#####################輸出段4###########################
SHOW SLAVE STATUS;
SELECT MASTER_POS_WAIT('3109binlog.000070','536',0) AS Result;      #檢查自己是否執行完差異事務
STOP SLAVE SQL_THREAD;                                              #停止從庫SQL執行緒
SHOW SLAVE STATUS;                                                  
SHOW MASTER STATUS;                                                 
SELECT @@global.read_only As Value;                                 
SELECT @@global.read_only As Value;                                 
SET GLOBAL read_only=0;                                             #取消只讀狀態
USE ``;
SELECT UNIX_TIMESTAMP();                                            
SELECT @@GLOBAL.SERVER_ID;                                          
SET @master_heartbeat_period= 30000001024;                          
SET @master_binlog_checksum= @@global.binlog_checksum;              
SELECT @master_binlog_checksum;                                     
SELECT @@GLOBAL.GTID_MODE;                                          
SELECT @@GLOBAL.SERVER_UUID;                                        
SET @slave_uuid= '28ea40ab-9bbd-11e7-8cd1-000c29c31069';            #寫入從庫的UUID
STOP SLAVE;                                                         #停止自身作為從庫的身份
SHOW SLAVE STATUS;                                              
RESET SLAVE /*!50516 ALL */;                                        #清除自身作為從庫的所有記錄
SHOW SLAVE STATUS;
SELECT RELEASE_LOCK('MHA_Master_High_Availability_Failover') As Value; #釋放鎖止同應用的併發鎖

第四部分總結:

    1.檢查選定的新主例項relay日誌是否存在未執行完的原主事務,並應用到新主

    原主庫:鎖止,記錄主從資訊,釋放鎖,將自己指向新的主,啟動從庫身份

    新主庫:  自己執行完未來得及執行完的事務後停止自己從庫身份,取消只讀,清除自身作為從的記錄

    最後釋放切換程式的併發鎖

全文總結:

1.讀取配置檔案,確認主從關係與複製方式,根據主從關係複製方式,連線主庫:設定必要引數,獲取的複製詳細資訊;  連線從庫:獲取複製的詳細資訊,獲取relay資訊,獲取repl賬號並確認許可權。

2.確認可以儲存關閉非事務表後,關閉非事務表,加模擬鎖防止切換程式多開造成切換異常,從候選從庫中選出擁有最新資料的從庫,並將其設為要切換到的主庫。

3.檢查選定的從庫的詳細資訊,確認真的是否可以作為新的主庫,處理從庫之間的資料比對,若只有一個主和從,則將原主庫先掛到一個莫須有的空主上;若有兩個以上的從庫,則需清理從庫原來的slave記錄。

4.檢查選定的新主例項relay日誌是否存在未執行完的原主事務,並應用到新主,應用完後清理自己的從庫資訊,取消只讀,轉為主庫;鎖止原主庫,記錄主從資訊,切換到新主庫。