mysql高可用MHA的搭建與測試

mha官網

MHA分管理節點和資料庫節點，資料庫節點由mysql主從或者主主從構成，當主庫掛掉後，管理節點會自動將從節點提升為主節點；管理節點的角色類似於oracle資料庫中的fast start failover中的observer，但mha上層可以透過keepalive部署VIP，程式連線資料庫使用VIP，從而實現後臺資料庫的故障切換透明化

MHA節點包含三個指令碼，依賴perl模組；
save_binary_logs:儲存和複製當掉的主伺服器二進位制日誌；
apply_diff_relay_logs:識別差異的relay log事件，並應用於其他salve伺服器；
purge_relay_logs:清除relay log檔案；
需要在所有mysql伺服器上安裝MHA節點，MHA管理伺服器也需要安裝。MHA管理節點模組內部依賴MHA節點模組；
MHA管理節點透過ssh連線管理mysql伺服器和執行MHA節點指令碼。MHA節點依賴perl的DBD::mysql模組;

本文環境介紹，作業系統均為rhel5.4

在開始之前，請先配置好伺服器間的時間同步和名稱解析

規劃

192.168.186.141 MYSQL.COM	主： node
192.168.186.142 SLAVE1.COM	從1： node
192.168.186.146 SLAVE2.COM	從2： node
192.168.186.144 MANAGER.COM	管理機：manager node

一：搭建mysql 主從複製，配置主機間的ssh互信（免密碼登陸）

注意點：在主上建立複製使用者的時候在從上也建立一遍
搭建半同步複製：原因：使用半同步複製可以解決當主伺服器當掉後，無法ssh到主伺服器上儲存尚未傳送的二進位制日誌事件。

二：安裝node與manager

安裝perl環境（每臺機器都需要做的）

[root@SLAVE2 data]# yum -y install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Config-IniFiles ncftp perl-Params-Validate perl-CPAN perl-Test-Mock-LWP.noarch perl-LWP-Authen-Negotiate.noarch perl-devel
[root@SLAVE2 data]#yum install perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker 這個百度文件沒有但是顯然是要裝的

安裝node（每臺機器都要做的）

[root@MANAGER src]# tar -xf mha4mysql-node-0.56.tar.gz
[root@MANAGER src]# cd mha4mysql-node-0.53
[root@MANAGER mha4mysql-node-0.53]# perl Makefile.PL
[root@MANAGER mha4mysql-node-0.53]# make && make install

安裝manager（管理機器上要安裝的，就是管理機器上同時存在node與manager ）

點選(此處)摺疊或開啟

[root@MANAGER src]# tar -xf mha4mysql-manager-0.56.tar.gz
[root@MANAGER src]# cd mha4mysql-manager-0.53
[root@MANAGER mha4mysql-manager-0.53]# perl Makefile.PL
[root@MANAGER mha4mysql-manager-0.53]# make && make install
根據提示輸入如果中間有卡頓現象直接crtl+c 然後繼續會出現下載的進度條，說明是正常的
[root@MANAGER src]# mkdir /etc/masterha
[root@MANAGER mha]# mkdir -p /master/app1
[root@MANAGERmha]# mkdir -p /scripts
[root@MANAGER mha]# cp samples/conf/* /etc/masterha/ --複製配置檔案
[root@MANAGERmha]# cp samples/scripts/* /scripts
[root@MANAGER mha4mysql-manager-0.53]# cp samples/conf/* /etc/masterha/

三：配置

[root@MANAGER masterha]# vi app1.cnf
內容如下;
[server default]
manager_workdir=/masterha/app1
manager_log=/masterha/app1/manager.log
user=mha_mon 監控使用者，每臺機器上都要建的
password=123
ssh_user=root ssh無密碼的使用者
repl_user=slave 做主從的使用者每臺都要授權的
repl_password=yunwei123 做主從的密碼
ping_interval=1
shutdown_script=""
report_script="" ###切換時發郵件
####master_ip_failover_script=/usr/local/bin/master_ip_failover ###自動切換

####master_ip_online_change_script=/usr/local/bin/master_ip_online_change_script ###手工切換，瞭解這兩個指令碼可參考http://blog.csdn.net/largetalk/article/details/10006899

####secondary_check_script= /usr/local/bin/masterha_secondary_check -s node1 -s node2 --user=root --master_host=node2 --master_ip=192.168.6.115 --master_port=3306
####//一旦MHA到node2(主)的監控之間出現問題，MHA Manager將會嘗試從node1（從，主機名）登入到node2（主，主機名），檢查更詳細
[server1]
hostname=192.168.186.141
master_binlog_dir=/data/mysql/data ####binlog目錄
candidate_master=1
[server2]
hostname=192.168.186.142
master_binlog_dir=/data/mysql/data ####binlog目錄
candidate_master=1 ---備主，當master當掉之後會自動切換這臺為主機
[server3]
hostname=192.168.186.146
master_binlog_dir=/data/mysql/data ####binlog目錄
no_master=1 -----表示永不做主
儲存退出！
[root@MANAGER masterha]# >masterha_default.cnf

登入每臺資料庫
mysql> grant all privileges on *.* to mha_mon@'%' identified by '123'; Query OK, 0 rows affected (1.00 sec)
 
mysql> flush privileges;
Query OK, 0 rows affected (0.01 sec)

四：測試ssh，與複製

測試ssh

[root@MANAGER masterha]# masterha_check_ssh --global_conf=/etc/masterha/masterha_default.cnf --conf=/etc/masterha/app1.cnf
Wed Jul 9 02:26:57 2014 - [info] Reading default configuratoins from /etc/masterha/masterha_default.cnf..
Wed Jul 9 02:26:57 2014 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Wed Jul 9 02:26:57 2014 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Wed Jul 9 02:26:57 2014 - [info] Starting SSH connection tests..
Wed Jul 9 02:26:58 2014 - [debug]
Wed Jul 9 02:26:57 2014 - [debug] Connecting via SSH from root@192.168.186.141(192.168.186.141:22) to root@192.168.186.142(192.168.186.142:22)..
Wed Jul 9 02:26:57 2014 - [debug] ok.
Wed Jul 9 02:26:57 2014 - [debug] Connecting via SSH from root@192.168.186.141(192.168.186.141:22) to root@192.168.186.146(192.168.186.146:22)..
Wed Jul 9 02:26:57 2014 - [debug] ok.
Wed Jul 9 02:26:58 2014 - [debug]
Wed Jul 9 02:26:57 2014 - [debug] Connecting via SSH from root@192.168.186.142(192.168.186.142:22) to root@192.168.186.141(192.168.186.141:22)..
Wed Jul 9 02:26:57 2014 - [debug] ok.
Wed Jul 9 02:26:57 2014 - [debug] Connecting via SSH from root@192.168.186.142(192.168.186.142:22) to root@192.168.186.146(192.168.186.146:22)..
Wed Jul 9 02:26:58 2014 - [debug] ok.
Wed Jul 9 02:26:58 2014 - [debug]
Wed Jul 9 02:26:58 2014 - [debug] Connecting via SSH from root@192.168.186.146(192.168.186.146:22) to root@192.168.186.141(192.168.186.141:22)..
Wed Jul 9 02:26:58 2014 - [debug] ok.
Wed Jul 9 02:26:58 2014 - [debug] Connecting via SSH from root@192.168.186.146(192.168.186.146:22) to root@192.168.186.142(192.168.186.142:22)..
Wed Jul 9 02:26:58 2014 - [debug] ok.
Wed Jul 9 02:26:58 2014 - [info] All SSH connection tests passed successfully.

注意

[root@SLAVE1 ~]# ln -s /usr/local/mysql/bin/* /usr/bin    ---rpm安裝的mysql不需要，原始碼安裝的就這樣

在每臺 MYSQL 伺服器上做這件事情極度重要哦

mysql>set global read_only=1; set global relay_log_purge=0;    --read-only的意義是普通使用者沒有create，update，insert等修改的許可權，而具有super-privs超級許可權（all on *.*）的是不受這個引數約束的！

在備用節點和從節點的/etc/mysql/my.cnf中加入選項：

read_only=1 #這個設定待商榷，備選master設為read only之後，master轉移到備選master後不可寫(有super許可權的使用者還是可寫，切換到新的主庫上後，會在新的主庫上執行 SET GLOBAL read_only =0 ），，這個引數開放設定，根據環境來設定

relay_log_purge=0（這個值主從上都寫上，因為mha主從補齊資料會用到relay_log）

       slave-skip-errors=1396

為什麼要跳過這個錯誤呢因為啊在主裡面刪除使用者的時候從會報錯說沒有這個使用者所以跳過這個錯誤吧

如果資料庫存在空的使用者域名的使用者一定要刪除否則 MHA 連線 MYSQL 會報錯連不上一般只要在從上面刪除如果直接沒刪除也 OK 那就 OK 吧如果報錯登入不了就刪除掉吧或者跳過域名解析，或者你授權的時候記得也授權域名等等方法多種我的方式是跳過域名解析的 skip-name-resolve

mysql> select user,host from mysql.user; +---------+---------------+ | user    | host          |
+---------+---------------+ | root    | 127.0.0.1     |
| mha_mon | 192.168.186.% |
| repl    | 192.168.186.% |
| slave   | 192.168.186.% |
| root    | ::1           |
|         | SLAVE2.COM    |
| root    | SLAVE2.COM    |
| root    | localhost     |
+---------+---------------+ 8 rows in set (0.00 sec)
mysql> drop user 'root'@SLAVE2.COM; Query OK, 0 rows affected (0.00 sec)

測試複製程式

點選(此處)摺疊或開啟

[root@MANAGER masterha]# masterha_check_repl --conf=/etc/masterha/app1.cnf
Wed Jul 9 04:23:16 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Jul 9 04:23:16 2014 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Wed Jul 9 04:23:16 2014 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Wed Jul 9 04:23:16 2014 - [info] MHA::MasterMonitor version 0.53.
Wed Jul 9 04:23:17 2014 - [info] Dead Servers:
Wed Jul 9 04:23:17 2014 - [info] Alive Servers:
Wed Jul 9 04:23:17 2014 - [info] 192.168.186.141(192.168.186.141:3306)
Wed Jul 9 04:23:17 2014 - [info] 192.168.186.142(192.168.186.142:3306)
Wed Jul 9 04:23:17 2014 - [info] SLAVE2.COM(192.168.186.146:3306)
Wed Jul 9 04:23:17 2014 - [info] Alive Slaves:
Wed Jul 9 04:23:17 2014 - [info] 192.168.186.142(192.168.186.142:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
Wed Jul 9 04:23:17 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
Wed Jul 9 04:23:17 2014 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Jul 9 04:23:17 2014 - [info] SLAVE2.COM(192.168.186.146:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
Wed Jul 9 04:23:17 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
Wed Jul 9 04:23:17 2014 - [info] Not candidate for the new Master (no_master is set)
Wed Jul 9 04:23:17 2014 - [info] Current Alive Master: 192.168.186.141(192.168.186.141:3306)
Wed Jul 9 04:23:17 2014 - [info] Checking slave configurations..
Wed Jul 9 04:23:17 2014 - [info] Checking replication filtering settings..
Wed Jul 9 04:23:17 2014 - [info] binlog_do_db= , binlog_ignore_db=
Wed Jul 9 04:23:17 2014 - [info] Replication filtering check ok.
Wed Jul 9 04:23:17 2014 - [info] Starting SSH connection tests..
Wed Jul 9 04:23:18 2014 - [info] All SSH connection tests passed successfully.
Wed Jul 9 04:23:18 2014 - [info] Checking MHA Node version..
Wed Jul 9 04:23:19 2014 - [info] Version check ok.
Wed Jul 9 04:23:19 2014 - [info] Checking SSH publickey authentication settings on the current master..
Wed Jul 9 04:23:19 2014 - [info] HealthCheck: SSH to 192.168.186.141 is reachable.
Wed Jul 9 04:23:19 2014 - [info] Master MHA Node version is 0.53.
Wed Jul 9 04:23:19 2014 - [info] Checking recovery script configurations on the current master..
Wed Jul 9 04:23:19 2014 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql/data --output_file=/var/tmp/save_binary_logs_test --manager_version=0.53 --start_file=mysql-bin.000001
Wed Jul 9 04:23:19 2014 - [info] Connecting to root@192.168.186.141(192.168.186.141)..
Creating /var/tmp if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /data/mysql/data, up to mysql-bin.000001
Wed Jul 9 04:23:20 2014 - [info] Master setting check done.
Wed Jul 9 04:23:20 2014 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Wed Jul 9 04:23:20 2014 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=mha_mon --slave_host=192.168.186.142 --slave_ip=192.168.186.142 --slave_port=3306 --workdir=/var/tmp --target_version=5.6.10-log --manager_version=0.53 --relay_log_info=/data/mysql/data/relay-log.info --relay_dir=/data/mysql/data/ --slave_pass=xxx
Wed Jul 9 04:23:20 2014 - [info] Connecting to root@192.168.186.142(192.168.186.142:22)..
Checking slave recovery environment settings..
Opening /data/mysql/data/relay-log.info ... ok.
Relay log found at /data/mysql/data, up to SLAVE1-relay-bin.000002
Temporary relay log file is /data/mysql/data/SLAVE1-relay-bin.000002
Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Wed Jul 9 04:23:20 2014 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=mha_mon --slave_host=SLAVE2.COM --slave_ip=192.168.186.146 --slave_port=3306 --workdir=/var/tmp --target_version=5.6.10-log --manager_version=0.53 --relay_log_info=/data/mysql/data/relay-log.info --relay_dir=/data/mysql/data/ --slave_pass=xxx
Wed Jul 9 04:23:20 2014 - [info] Connecting to root@192.168.186.146(SLAVE2.COM:22)..
Checking slave recovery environment settings..
Opening /data/mysql/data/relay-log.info ... ok.
Relay log found at /data/mysql/data, up to slave2-relay-bin.000002
Temporary relay log file is /data/mysql/data/slave2-relay-bin.000002
Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Wed Jul 9 04:23:21 2014 - [info] Slaves settings check done.
Wed Jul 9 04:23:21 2014 - [info]
192.168.186.141 (current master)
+--192.168.186.142
+--SLAVE2.COM
Wed Jul 9 04:23:21 2014 - [info] Checking replication health on 192.168.186.142..
Wed Jul 9 04:23:21 2014 - [info] ok.
Wed Jul 9 04:23:21 2014 - [info] Checking replication health on SLAVE2.COM..
Wed Jul 9 04:23:21 2014 - [info] ok.
Wed Jul 9 04:23:21 2014 - [warning] master_ip_failover_script is not defined.
Wed Jul 9 04:23:21 2014 - [warning] shutdown_script is not defined.
Wed Jul 9 04:23:21 2014 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
至此說明你的MHA 已經配置好了

五：啟動/關閉mha

[root@dg55 ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf > /tmp/mha_manager.log < /dev/null 2>&1 &
[1] 25516
[root@dg55 ~]# masterha_check_status --conf=/etc/masterha/app1.cnf --檢查一下
app1 (pid:25516) is running(0:PING_OK), master:192.168.123.13
[root@node4 app1]#masterha_stop --conf=/etc/app1.cnf --關閉
# 如果不能停止，加 --abort選項

六測試故障轉移

將 MYSQL.COM 機器上的 MYSQL 服務關閉，注意觀察 manager.log 日誌會發現切換到了 SLAVE1.COM 並且 SLAVE1.COM 變成了主而 SLAVE2.COM 則變成了 SLAVE1.COM 的從

點選(此處)摺疊或開啟

root@MANAGER app1]# tail -f manager.log 這是啟動後還沒關閉主資料庫的日誌內容
192.168.186.141 (current master)
+--192.168.186.142
+--SLAVE2.COM
Wed Jul 9 18:52:32 2014 - [warning] master_ip_failover_script is not defined.
Wed Jul 9 18:52:32 2014 - [warning] shutdown_script is not defined.
Wed Jul 9 18:52:32 2014 - [info] Set master ping interval 1 seconds.
Wed Jul 9 18:52:32 2014 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Wed Jul 9 18:52:32 2014 - [info] Starting ping health check on 192.168.186.141(192.168.186.141:3306)..
Wed Jul 9 18:52:32 2014 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
[root@MYSQL ~]# service mysqld stop
Shutting down MySQL..... SUCCESS!
[root@MANAGER app1]# tail -f manager.log 最要看最後幾行就知道有沒有切換成功
192.168.186.141 (current master)
+--192.168.186.142
+--SLAVE2.COM
Wed Jul 9 18:56:47 2014 - [info] Dead Servers:
Wed Jul 9 18:56:47 2014 - [info] 192.168.186.141(192.168.186.141:3306)
Wed Jul 9 18:56:47 2014 - [info] Alive Servers:
Wed Jul 9 18:56:47 2014 - [info] 192.168.186.142(192.168.186.142:3306)
Wed Jul 9 18:56:47 2014 - [info] SLAVE2.COM(192.168.186.146:3306)
Wed Jul 9 18:56:47 2014 - [info] Alive Slaves:
Wed Jul 9 18:56:47 2014 - [info] 192.168.186.142(192.168.186.142:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
Wed Jul 9 18:56:47 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
Wed Jul 9 18:56:47 2014 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Jul 9 18:56:47 2014 - [info] SLAVE2.COM(192.168.186.146:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
Wed Jul 9 18:56:47 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
Wed Jul 9 18:56:47 2014 - [info] Not candidate for the new Master (no_master is set)
Wed Jul 9 18:56:47 2014 - [info] Checking slave configurations..
Wed Jul 9 18:56:47 2014 - [info] Checking replication filtering settings..
Wed Jul 9 18:56:47 2014 - [info] Replication filtering check ok.
Wed Jul 9 18:56:47 2014 - [info] Master is down!
Wed Jul 9 18:56:47 2014 - [info] Terminating monitoring script.
Wed Jul 9 18:56:47 2014 - [info] Got exit code 20 (Master dead).
Wed Jul 9 18:56:47 2014 - [info] MHA::MasterFailover version 0.53.
Wed Jul 9 18:56:47 2014 - [info] Starting master failover.
Wed Jul 9 18:56:47 2014 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Wed Jul 9 18:56:47 2014 - [info] 192.168.186.142(192.168.186.142:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
Wed Jul 9 18:56:47 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
Wed Jul 9 18:56:47 2014 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Jul 9 18:56:47 2014 - [info] SLAVE2.COM(192.168.186.146:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
Wed Jul 9 18:56:47 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
Wed Jul 9 18:56:47 2014 - [info] Not candidate for the new Master (no_master is set)
Wed Jul 9 18:56:47 2014 - [info] The oldest binary log file/position on all slaves is mysql-bin.000001:214
Wed Jul 9 18:56:47 2014 - [info] Oldest slaves:
Wed Jul 9 18:56:47 2014 - [info] 192.168.186.142(192.168.186.142:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
Wed Jul 9 18:56:47 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
Wed Jul 9 18:56:47 2014 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Jul 9 18:56:47 2014 - [info] SLAVE2.COM(192.168.186.146:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
Wed Jul 9 18:56:47 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
Wed Jul 9 18:56:47 2014 - [info] Not candidate for the new Master (no_master is set)
Wed Jul 9 18:56:47 2014 - [info]
Wed Jul 9 18:56:47 2014 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Wed Jul 9 18:56:47 2014 - [info]
Wed Jul 9 18:56:48 2014 - [info] Fetching dead master's binary logs..
Wed Jul 9 18:56:48 2014 - [info] Executing command on the dead master 192.168.186.141(192.168.186.141:3306): save_binary_logs --command=save --start_file=mysql-bin.000001 --start_pos=214 --binlog_dir=/data/mysql/data --output_file=/var/tmp/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53
Creating /var/tmp if not exists.. ok.
Concat binary/relay logs from mysql-bin.000001 pos 214 to mysql-bin.000001 EOF into /var/tmp/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog ..
Dumping binlog format description event, from position 0 to 120.. ok.
Dumping effective binlog data from /data/mysql/data/mysql-bin.000001 position 214 to tail(237).. ok.
Concat succeeded.
Wed Jul 9 18:56:48 2014 - [info] scp from root@192.168.186.141:/var/tmp/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog to local:/masterha/app1/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog succeeded.
Wed Jul 9 18:56:49 2014 - [info] HealthCheck: SSH to 192.168.186.142 is reachable.
Wed Jul 9 18:56:49 2014 - [info] HealthCheck: SSH to SLAVE2.COM is reachable.
Wed Jul 9 18:56:49 2014 - [info]
Wed Jul 9 18:56:49 2014 - [info] * Phase 3.3: Determining New Master Phase..
Wed Jul 9 18:56:49 2014 - [info]
Wed Jul 9 18:56:49 2014 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Wed Jul 9 18:56:49 2014 - [info] All slaves received relay logs to the same position. No need to resync each other.
Wed Jul 9 18:56:49 2014 - [info] Searching new master from slaves..
Wed Jul 9 18:56:49 2014 - [info] Candidate masters from the configuration file:
Wed Jul 9 18:56:49 2014 - [info] 192.168.186.142(192.168.186.142:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
Wed Jul 9 18:56:49 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
Wed Jul 9 18:56:49 2014 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Jul 9 18:56:49 2014 - [info] Non-candidate masters:
Wed Jul 9 18:56:49 2014 - [info] SLAVE2.COM(192.168.186.146:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
Wed Jul 9 18:56:49 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
Wed Jul 9 18:56:49 2014 - [info] Not candidate for the new Master (no_master is set)
Wed Jul 9 18:56:49 2014 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Wed Jul 9 18:56:49 2014 - [info] New master is 192.168.186.142(192.168.186.142:3306)
Wed Jul 9 18:56:49 2014 - [info] Starting master failover..
Wed Jul 9 18:56:49 2014 - [info]
From:
192.168.186.141 (current master)
+--192.168.186.142
+--SLAVE2.COM
To:
192.168.186.142 (new master)
+--SLAVE2.COM
Wed Jul 9 18:56:49 2014 - [info]
Wed Jul 9 18:56:49 2014 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Wed Jul 9 18:56:49 2014 - [info]
Wed Jul 9 18:56:49 2014 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Wed Jul 9 18:56:49 2014 - [info] Sending binlog..
Wed Jul 9 18:56:50 2014 - [info] scp from local:/masterha/app1/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog to root@192.168.186.142:/var/tmp/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog succeeded.
Wed Jul 9 18:56:50 2014 - [info]
Wed Jul 9 18:56:50 2014 - [info] * Phase 3.4: Master Log Apply Phase..
Wed Jul 9 18:56:50 2014 - [info]
Wed Jul 9 18:56:50 2014 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Wed Jul 9 18:56:50 2014 - [info] Starting recovery on 192.168.186.142(192.168.186.142:3306)..
Wed Jul 9 18:56:50 2014 - [info] Generating diffs succeeded.
Wed Jul 9 18:56:50 2014 - [info] Waiting until all relay logs are applied.
Wed Jul 9 18:56:50 2014 - [info] done.
Wed Jul 9 18:56:50 2014 - [info] Getting slave status..
Wed Jul 9 18:56:50 2014 - [info] This slave(192.168.186.142)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000001:214). No need to recover from Exec_Master_Log_Pos.
Wed Jul 9 18:56:50 2014 - [info] Connecting to the target slave host 192.168.186.142, running recover script..
Wed Jul 9 18:56:50 2014 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=mha_mon --slave_host=192.168.186.142 --slave_ip=192.168.186.142 --slave_port=3306 --apply_files=/var/tmp/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog --workdir=/var/tmp --target_version=5.6.10-log --timestamp=20140709185647 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53 --slave_pass=xxx
Wed Jul 9 18:56:50 2014 - [info]
MySQL client version is 5.6.10. Using --binary-mode.
Applying differential binary/relay log files /var/tmp/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog on 192.168.186.142:3306. This may take long time...
Applying log files succeeded.
Wed Jul 9 18:56:50 2014 - [info] All relay logs were successfully applied.
Wed Jul 9 18:56:50 2014 - [info] Getting new master's binlog name and position..
Wed Jul 9 18:56:50 2014 - [info] mysql-bin.000007:504
Wed Jul 9 18:56:50 2014 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.186.142', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000007', MASTER_LOG_POS=504, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Wed Jul 9 18:56:50 2014 - [warning] master_ip_failover_script is not set. Skipping taking over new master ip address.
Wed Jul 9 18:56:50 2014 - [info] Setting read_only=0 on 192.168.186.142(192.168.186.142:3306)..
Wed Jul 9 18:56:50 2014 - [info] ok.
Wed Jul 9 18:56:50 2014 - [info] ** Finished master recovery successfully.
Wed Jul 9 18:56:50 2014 - [info] * Phase 3: Master Recovery Phase completed.
Wed Jul 9 18:56:50 2014 - [info]
Wed Jul 9 18:56:50 2014 - [info] * Phase 4: Slaves Recovery Phase..
Wed Jul 9 18:56:50 2014 - [info]
Wed Jul 9 18:56:50 2014 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Wed Jul 9 18:56:50 2014 - [info]
Wed Jul 9 18:56:50 2014 - [info] -- Slave diff file generation on host SLAVE2.COM(192.168.186.146:3306) started, pid: 3135. Check tmp log /masterha/app1/SLAVE2.COM_3306_20140709185647.log if it takes time..
Wed Jul 9 18:56:50 2014 - [info]
Wed Jul 9 18:56:50 2014 - [info] Log messages from SLAVE2.COM ...
Wed Jul 9 18:56:50 2014 - [info]
Wed Jul 9 18:56:50 2014 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Wed Jul 9 18:56:50 2014 - [info] End of log messages from SLAVE2.COM.
Wed Jul 9 18:56:50 2014 - [info] -- SLAVE2.COM(192.168.186.146:3306) has the latest relay log events.
Wed Jul 9 18:56:50 2014 - [info] Generating relay diff files from the latest slave succeeded.
Wed Jul 9 18:56:50 2014 - [info]
Wed Jul 9 18:56:50 2014 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Wed Jul 9 18:56:50 2014 - [info]
Wed Jul 9 18:56:50 2014 - [info] -- Slave recovery on host SLAVE2.COM(192.168.186.146:3306) started, pid: 3137. Check tmp log /masterha/app1/SLAVE2.COM_3306_20140709185647.log if it takes time..
Wed Jul 9 18:56:51 2014 - [info]
Wed Jul 9 18:56:51 2014 - [info] Log messages from SLAVE2.COM ...
Wed Jul 9 18:56:51 2014 - [info]
Wed Jul 9 18:56:50 2014 - [info] Sending binlog..
Wed Jul 9 18:56:51 2014 - [info] scp from local:/masterha/app1/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog to root@SLAVE2.COM:/var/tmp/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog succeeded.
Wed Jul 9 18:56:51 2014 - [info] Starting recovery on SLAVE2.COM(192.168.186.146:3306)..
Wed Jul 9 18:56:51 2014 - [info] Generating diffs succeeded.
Wed Jul 9 18:56:51 2014 - [info] Waiting until all relay logs are applied.
Wed Jul 9 18:56:51 2014 - [info] done.
Wed Jul 9 18:56:51 2014 - [info] Getting slave status..
Wed Jul 9 18:56:51 2014 - [info] This slave(SLAVE2.COM)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000001:214). No need to recover from Exec_Master_Log_Pos.
Wed Jul 9 18:56:51 2014 - [info] Connecting to the target slave host SLAVE2.COM, running recover script..
Wed Jul 9 18:56:51 2014 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=mha_mon --slave_host=SLAVE2.COM --slave_ip=192.168.186.146 --slave_port=3306 --apply_files=/var/tmp/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog --workdir=/var/tmp --target_version=5.6.10-log --timestamp=20140709185647 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53 --slave_pass=xxx
Wed Jul 9 18:56:51 2014 - [info]
MySQL client version is 5.6.10. Using --binary-mode.
Applying differential binary/relay log files /var/tmp/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog on SLAVE2.COM:3306. This may take long time...
Applying log files succeeded.
Wed Jul 9 18:56:51 2014 - [info] All relay logs were successfully applied.
Wed Jul 9 18:56:51 2014 - [info] Resetting slave SLAVE2.COM(192.168.186.146:3306) and starting replication from the new master 192.168.186.142(192.168.186.142:3306)..
Wed Jul 9 18:56:51 2014 - [info] Executed CHANGE MASTER.
Wed Jul 9 18:56:51 2014 - [info] Slave started.
Wed Jul 9 18:56:51 2014 - [info] End of log messages from SLAVE2.COM.
Wed Jul 9 18:56:51 2014 - [info] -- Slave recovery on host SLAVE2.COM(192.168.186.146:3306) succeeded.
Wed Jul 9 18:56:51 2014 - [info] All new slave servers recovered successfully.
Wed Jul 9 18:56:51 2014 - [info]
Wed Jul 9 18:56:51 2014 - [info] * Phase 5: New master cleanup phease..
Wed Jul 9 18:56:51 2014 - [info]
Wed Jul 9 18:56:51 2014 - [info] Resetting slave info on the new master..
Wed Jul 9 18:56:51 2014 - [info] 192.168.186.142: Resetting slave info succeeded.
Wed Jul 9 18:56:51 2014 - [info] Master failover to 192.168.186.142(192.168.186.142:3306) completed successfully.
Wed Jul 9 18:56:51 2014 - [info]
----- Failover Report -----
app1: MySQL Master failover 192.168.186.141 to 192.168.186.142 succeeded
Master 192.168.186.141 is
Check MHA Manager logs at MANAGER.COM:/masterha/app1/manager.log for details.
Started automated(non-interactive) failover.
The latest slave 192.168.186.142(192.168.186.142:3306) has all relay logs for recovery.
Selected 192.168.186.142 as a new master.
192.168.186.142: OK: Applying all logs succeeded.
SLAVE2.COM: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
SLAVE2.COM: OK: Applying all logs succeeded. Slave started, replicating from 192.168.186.142.
192.168.186.142: Resetting slave info succeeded.
192.168.186.142: Resetting slave info succeeded.
看到192.168.186.142: Resetting slave info succeeded.
Master failover to 192.168.186.142(192.168.186.142:3306) completed successfully. 說明切換成功了
[root@slave2 ~]# mysql -e "show slave status\G"
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.186.142
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000007
Read_Master_Log_Pos: 504
Relay_Log_File: slave2-relay-bin.000002
Relay_Log_Pos: 283
Relay_Master_Log_File: mysql-bin.000007
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
看到已經切換到和142同步了本來是和141同步的此時的SALVE1.COM已經變成主了說明已然生效

切換後舊master的修復及重新上線
master已經由192.168.17.199 3306 切到了192.168.17.200 3306 實際環境中資料是在不斷的變化的，而在切換點mha沒有記錄當時新master的log-file和log-pos 所以要想直接啟動192.168.17.199 3306 然後change master to 192.168.17.200 3306的話是不行的，只能對新主或slave2做一個全備然後再恢復再change。
另外，當執行切換後管理節點上的masterha_manager程式會自動stop，所以等修復好後要再次執行啟動----不要糾結ip不對，這個地方是個思路

重構

重構我想就不要我做了吧，就是這時候等於你的主掛了切換在 SLAVE1.COM 上變成了主因此重構我提供一種方案（方案多種），拿出一臺新的伺服器重新加入以 142為主做成從再 app1.conf 配置檔案中加入新機器的資訊

[root@MANAGER app1]# rm -rf app1.failover.complete

刪除該檔案後再次啟動 manager 端即可

七．擴充套件 Keepalived

1master與slave端都安裝keepalive

[root@MYSQL src]# wget http://www.keepalived.org/software/keepalived-1.2.12.tar.gz
[root@MYSQL src]# tar -xf keepalived-1.2.12.tar.gz
[root@MYSQL src]# cd keepalived-1.2.12
[root@MYSQL src]# yum -y install gcc gcc-c++ gcc-g77 ncurses-devel bison libaio-devel cmake libnl* libpopt* popt-static openssl-devel
[root@MYSQL keepalived-1.2.12]# ./configure
[root@MYSQL keepalived-1.2.12]# make && make install
[root@MYSQL src]#mkdir /etc/keepalived/
[root@MYSQL src]# cp /usr/local/etc/keepalived/keepalived.conf /etc/keepalived/
[root@MYSQL src]# cp /usr/local/etc/rc.d/init.d/keepalived /etc/init.d/
[root@MYSQL src]# cp /usr/local/etc/sysconfig/keepalived /etc/sysconfig/
[root@MYSQL src]#cp /usr/local/sbin/keepalived /usr/sbin/

2. 配置 keepalived

[root@MYSQL keepalived]# vi keepalived.conf

! Configuration File for keepalived

global_defs {

   notification_email {

     acassen@firewall.loc

     failover@firewall.loc

     sysadmin@firewall.loc

   }

   notification_email_from Alexandre.Cassen@firewall.loc

   smtp_server 192.168.200.1

   smtp_connect_timeout 30

   router_id LVS_DEVEL ##配置是為了標識當前節點，兩個節點的此項設定可相同，也可不相同

}

vrrp_instance VI_1 {

    state MASTER #指定A節點為主節點備用節點上設定為BACKUP即可

    interface eth0

    virtual_router_id 51 #VRRP組名，兩個節點的設定必須一樣，以指明各個節點屬於同一VRRP

    priority 100    #主節點的優先順序（1-254之間），備用節點必須比主節點優先順序低

         advert_int 1

    authentication {     #設定驗證資訊，兩個節點必須一致

        auth_type PASS

        auth_pass 1111

    }

virtual_ipaddress {

192.168.200.16 =====vip，指定一個vip

    }

}

3. 檢測 mysql 服務指令碼

[root@MYSQL keepalived]# vi /root/check_mysql.sh

[root@node1 ~]# vi checkmysql.sh
#!/bin/bash
MYSQL=/usr/bin/mysql
MYSQL_HOST=127.0.0.1
MYSQL_USER=root
MYSQL_PASSWORD=ESBecs00
CHECK_TIME=3
MYSQL_OK=1
function check_mysql_helth (){
$MYSQL -h $MYSQL_HOST -u $MYSQL_USER -p$MYSQL_PASSWORD -e "show status;" >/dev/null 2>&1
if [ $? = 0 ] ;then
MYSQL_OK=1
else
MYSQL_OK=0
fi
return $MYSQL_OK
}
#echo "$MYSQL_OK $CHECK_TIME"

while [ $CHECK_TIME -ne 0 ]
do
let "CHECK_TIME -= 1"
check_mysql_helth
if [ $MYSQL_OK = 1 ] ; then
echo "#### Mysql is alive ####"
CHECK_TIME=0
exit 0
fi
if [ $MYSQL_OK -eq 0 ] && [ $CHECK_TIME -eq 0 ]
then
#pkill keepalived
echo "#### Mysql Down ####"
echo "####now kill keepalive#####"
/etc/init.d/keepalived stop
exit 1
fi
sleep 1
done

該指令碼做一個計劃任務每分鐘做一次檢查 MYSQL 服務如果掛掉了的話 VIP 就會在SALVE1.COM 上生效這樣的話 client 端連線的 VIP 就會從 MYSQL.COM 上變到了SLAVE1.COM 上此時 SLAVE1.COM 由於 MHA 的生效已經變為主對外提供服務了 VIP 也在 SALVE1.COM 上所以從頭到尾客戶端只需要連線 VIP 就可以了

 [root@MYSQL ~]# crontab  -l */1 * * * * bash /root/checkmysql.sh

以上步驟再在下一次切換的 SLAVE 上做一次也就是 SLAVE1.COM 上做一次 keepalived.conf 配置檔案不同的地方已做註釋

4. 啟動測試

兩邊啟動 keepalived

[root@MYSQL keepalived]# keepalived -f /etc/keepalived/keepalived.conf

 [root@MYSQL keepalived]# ps -ef |grep keep

root 3230 1 0 23:27 ? 00:00:00 keepalived -f /etc/keepalived/keepalived.conf
root 3231 3230 0 23:27 ? 00:00:00 keepalived -f /etc/keepalived/keepalived.conf
root 3232 3230 0 23:27 ? 00:00:00 keepalived -f /etc/keepalived/keepalived.conf
root 3234 2538 0 23:27 pts/0 00:00:00 grep keep
 
[root@MYSQL keepalived]# ip a 1: lo:  mtu 16436 qdisc noqueue state UNKNOWN  link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00  inet 127.0.0.1/8 scope host lo  inet6 ::1/128 scope host   valid_lft forever preferred_lft forever 2: eth0:  mtu 1500 qdisc pfifo_fast state UP qlen 1000  link/ether 00:0c:29:c9:85:ba brd ff:ff:ff:ff:ff:ff  inet 192.168.186.141/24 brd 192.168.186.255 scope global eth0  inet 192.168.200.16/32 scope global eth0  inet6 fe80::20c:29ff:fec9:85ba/64 scope link   valid_lft forever preferred_lft forever

測試關閉該主機的 MYSQL 服務看看 VIP 是否飄到配好 keepalive 的 SLAVE1.COM上 MHA 中的 MYSQL 主是不是也變成了 SLAVE1.COM

如是至此全部完成。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。謝謝！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！

一點注意事項：MHA時基於傳統的log + postion開發的，維護GTID肯定不靈光了，但是mha0.56已經開始支援gtid了，可以去作者的部落格看到

原文地址：
參考文章：http://ylw6006.blog.51cto.com/470441/890360/

附上node與manager的下載地址

【MHA】mysql高可用之MHA

mysql高可用MHA的搭建與測試

一：搭建mysql 主從複製，配置主機間的ssh互信（免密碼登陸）

二：安裝node與manager

三：配置

四：測試ssh，與複製

五：啟動/關閉mha

六測試故障轉移

重構

七．擴充套件 Keepalived

4. 啟動測試

相關文章