【MHA】mysql高可用之MHA

小亮520cl發表於2015-10-29

mysql高可用MHA的搭建與測試

mha官網

MHA分管理節點和資料庫節點,資料庫節點由mysql主從或者主主從構成,當主庫掛掉後,管理節點會自動將從節點提升為主節點;管理節點的角色類似於oracle資料庫中的fast start failover中的observer,但mha上層可以透過keepalive部署VIP,程式連線資料庫使用VIP,從而實現後臺資料庫的故障切換透明化

MHA節點包含三個指令碼,依賴perl模組;
save_binary_logs:儲存和複製當掉的主伺服器二進位制日誌;
apply_diff_relay_logs:識別差異的relay log事件,並應用於其他salve伺服器;
purge_relay_logs:清除relay log檔案;
需要在所有mysql伺服器上安裝MHA節點,MHA管理伺服器也需要安裝。MHA管理節點模組內部依賴MHA節點模組;
MHA管理節點透過ssh連線管理mysql伺服器和執行MHA節點指令碼。MHA節點依賴perl的DBD::mysql模組;

本文環境介紹,作業系統均為rhel5.4

在開始之前,請先配置好伺服器間的時間同步和名稱解析



規劃

192.168.186.141   MYSQL.COM

主:      node
192.168.186.142   SLAVE1.COM
從1:    node
192.168.186.146   SLAVE2.COM
從2:    node
192.168.186.144   MANAGER.COM 管理機:manager node

一:搭建mysql 主從複製,配置主機間的ssh互信(免密碼登陸)

注意點:在主上建立複製使用者的時候在從上也建立一遍
搭建半同步複製:原因:使用半同步複製可以解決當主伺服器當掉後,無法ssh到主伺服器上儲存尚未傳送的二進位制日誌事件。


二:安裝node與manager

安裝perl環境(每臺機器都需要做的)
  1. [root@SLAVE2 data]# yum -y install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Config-IniFiles ncftp perl-Params-Validate perl-CPAN perl-Test-Mock-LWP.noarch perl-LWP-Authen-Negotiate.noarch perl-devel
  2.  [root@SLAVE2 data]#yum install perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker 這個百度文件沒有但是顯然是要裝的

安裝node(每臺機器都要做的)
  1. [root@MANAGER src]# tar -xf mha4mysql-node-0.56.tar.gz
  2. [root@MANAGER src]# cd mha4mysql-node-0.53
  3. [root@MANAGER mha4mysql-node-0.53]# perl Makefile.PL
  4. [root@MANAGER mha4mysql-node-0.53]# make && make install

安裝manager(管理機器上要安裝的,就是管理機器上同時存在node與manager )

點選(此處)摺疊或開啟

  1. [root@MANAGER src]# tar -xf mha4mysql-manager-0.56.tar.gz
  2. [root@MANAGER src]# cd mha4mysql-manager-0.53
  3. [root@MANAGER mha4mysql-manager-0.53]# perl Makefile.PL
  4. [root@MANAGER mha4mysql-manager-0.53]# make && make install
  5. 根據提示輸入 如果中間有卡頓現象 直接crtl+c 然後繼續會出現下載的進度條,說明是正常的
  6. [root@MANAGER src]# mkdir /etc/masterha
  7. [root@MANAGER mha]# mkdir -p /master/app1
  8. [root@MANAGERmha]# mkdir -p /scripts
  9. [root@MANAGER mha]# cp samples/conf/* /etc/masterha/    --複製配置檔案
  10. [root@MANAGERmha]# cp samples/scripts/* /scripts
  11. [root@MANAGER mha4mysql-manager-0.53]# cp samples/conf/* /etc/masterha/

三:配置

  1. [root@MANAGER masterha]# vi app1.cnf
  2. 內容如下;
  3. [server default]
  4. manager_workdir=/masterha/app1
  5. manager_log=/masterha/app1/manager.log
  6. user=mha_mon            監控使用者,每臺機器上都要建的
  7. password=123
  8. ssh_user=root           ssh無密碼的使用者
  9. repl_user=slave         做主從的使用者 每臺都要授權的
  10. repl_password=yunwei123 做主從的密碼
  11. ping_interval=1
  12. shutdown_script=""
  13. report_script=""                                                                           ###切換時發郵件
  14.  
    ####master_ip_failover_script=/usr/local/bin/master_ip_failover                              ###自動切換

    ####master_ip_online_change_script=/usr/local/bin/master_ip_online_change_script           ###手工切換,瞭解這兩個指令碼可參考http://blog.csdn.net/largetalk/article/details/10006899

    ####secondary_check_script= /usr/local/bin/masterha_secondary_check -s node1 -s node2 --user=root  --master_host=node2 --master_ip=192.168.6.115 --master_port=3306 
    ####//一旦MHA到node2(主)的監控之間出現問題,MHA Manager將會嘗試從node1(從,主機名)登入到node2(主,主機名),檢查更詳細



  15. [server1]
  16. hostname=192.168.186.141
  17. master_binlog_dir=/data/mysql/data            ####binlog目錄
  18. candidate_master=1
  19.  
  20. [server2]
  21. hostname=192.168.186.142
  22. master_binlog_dir=/data/mysql/data   ####binlog目錄
  23. candidate_master=1    ---備主,當master當掉之後會自動切換這臺為主機
  24.  
  25. [server3]
  26. hostname=192.168.186.146
  27. master_binlog_dir=/data/mysql/data         ####binlog目錄
  28. no_master=1           -----表示永不做主
  29.  
  30. 儲存退出!
  31. [root@MANAGER masterha]# >masterha_default.cnf

登入每臺資料庫
mysql> grant all privileges on *.* to mha_mon@'%' identified by '123'; Query OK, 0 rows affected (1.00 sec)
 
mysql> flush privileges;
Query OK, 0 rows affected (0.01 sec)


四:測試ssh,與複製

測試ssh

  1. [root@MANAGER masterha]# masterha_check_ssh --global_conf=/etc/masterha/masterha_default.cnf --conf=/etc/masterha/app1.cnf
  2. Wed Jul 9 02:26:57 2014 - [info] Reading default configuratoins from /etc/masterha/masterha_default.cnf..
  3. Wed Jul 9 02:26:57 2014 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
  4. Wed Jul 9 02:26:57 2014 - [info] Reading server configurations from /etc/masterha/app1.cnf..
  5. Wed Jul 9 02:26:57 2014 - [info] Starting SSH connection tests..
  6. Wed Jul 9 02:26:58 2014 - [debug]
  7. Wed Jul 9 02:26:57 2014 - [debug] Connecting via SSH from root@192.168.186.141(192.168.186.141:22) to root@192.168.186.142(192.168.186.142:22)..
  8. Wed Jul 9 02:26:57 2014 - [debug] ok.
  9. Wed Jul 9 02:26:57 2014 - [debug] Connecting via SSH from root@192.168.186.141(192.168.186.141:22) to root@192.168.186.146(192.168.186.146:22)..
  10. Wed Jul 9 02:26:57 2014 - [debug] ok.
  11. Wed Jul 9 02:26:58 2014 - [debug]
  12. Wed Jul 9 02:26:57 2014 - [debug] Connecting via SSH from root@192.168.186.142(192.168.186.142:22) to root@192.168.186.141(192.168.186.141:22)..
  13. Wed Jul 9 02:26:57 2014 - [debug] ok.
  14. Wed Jul 9 02:26:57 2014 - [debug] Connecting via SSH from root@192.168.186.142(192.168.186.142:22) to root@192.168.186.146(192.168.186.146:22)..

  15. Wed Jul 9 02:26:58 2014 - [debug] ok.
  16. Wed Jul 9 02:26:58 2014 - [debug]
  17. Wed Jul 9 02:26:58 2014 - [debug] Connecting via SSH from root@192.168.186.146(192.168.186.146:22) to root@192.168.186.141(192.168.186.141:22)..
  18. Wed Jul 9 02:26:58 2014 - [debug] ok.
  19. Wed Jul 9 02:26:58 2014 - [debug] Connecting via SSH from root@192.168.186.146(192.168.186.146:22) to root@192.168.186.142(192.168.186.142:22)..
  20. Wed Jul 9 02:26:58 2014 - [debug] ok.
  21. Wed Jul 9 02:26:58 2014 - [info] All SSH connection tests passed successfully.

注意

[root@SLAVE1 ~]# ln -s /usr/local/mysql/bin/* /usr/bin    ---rpm安裝的mysql不需要,原始碼安裝的就這樣

在每臺 MYSQL  伺服器上做這件事情 極度重要哦

mysql>set global read_only=1; set global relay_log_purge=0;    --read-only的意義是普通使用者沒有create,update,insert等修改的許可權,而具有super-privs超級許可權(all on *.*)的是不受這個引數約束的!


在備用節點和從節點的/etc/mysql/my.cnf中加入選項:

read_only=1 #這個設定待商榷,備選master設為read only之後,master轉移到備選master後不可寫(有super許可權的使用者還是可寫,切換到新的主庫上後,會在新的主庫上執行 SET GLOBAL read_only =0 ),,這個引數開放設定,根據環境來設定

relay_log_purge=0(這個值主從上都寫上,因為mha主從補齊資料會用到relay_log)


       slave-skip-errors=1396

為什麼要跳過這個錯誤呢 因為啊在主裡面刪除使用者的時候 從會報錯說沒有這個使用者所以跳過這個錯誤吧

如果資料庫存在空的使用者 域名的使用者 一定要刪除否則 MHA   連線 MYSQL  會報錯連不上 一般只要在從上面刪除 如果直接沒刪除也 OK  那就 OK  吧如果報錯登入不了就刪除掉吧 或者跳過域名解析,或者你授權的時候記得也授權域名等等方法多種 我的方式是跳過域名解析的  skip-name-resolve

mysql> select user,host from mysql.user; +---------+---------------+ | user    | host          |
+---------+---------------+ | root    | 127.0.0.1     |
| mha_mon | 192.168.186.% |
| repl    | 192.168.186.% |
| slave   | 192.168.186.% |
| root    | ::1           |
|         | SLAVE2.COM    |
| root    | SLAVE2.COM    |
| root    | localhost     |
+---------+---------------+ 8 rows in set (0.00 sec)
mysql> drop user 'root'@SLAVE2.COM; Query OK, 0 rows affected (0.00 sec)

測試複製程式

點選(此處)摺疊或開啟

  1. [root@MANAGER masterha]# masterha_check_repl --conf=/etc/masterha/app1.cnf
  2. Wed Jul 9 04:23:16 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
  3. Wed Jul 9 04:23:16 2014 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
  4. Wed Jul 9 04:23:16 2014 - [info] Reading server configurations from /etc/masterha/app1.cnf..
  5. Wed Jul 9 04:23:16 2014 - [info] MHA::MasterMonitor version 0.53.
  6. Wed Jul 9 04:23:17 2014 - [info] Dead Servers:
  7. Wed Jul 9 04:23:17 2014 - [info] Alive Servers:
  8. Wed Jul 9 04:23:17 2014 - [info] 192.168.186.141(192.168.186.141:3306)
  9. Wed Jul 9 04:23:17 2014 - [info] 192.168.186.142(192.168.186.142:3306)
  10. Wed Jul 9 04:23:17 2014 - [info] SLAVE2.COM(192.168.186.146:3306)
  11. Wed Jul 9 04:23:17 2014 - [info] Alive Slaves:
  12. Wed Jul 9 04:23:17 2014 - [info] 192.168.186.142(192.168.186.142:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
  13. Wed Jul 9 04:23:17 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
  14. Wed Jul 9 04:23:17 2014 - [info] Primary candidate for the new Master (candidate_master is set)
  15. Wed Jul 9 04:23:17 2014 - [info] SLAVE2.COM(192.168.186.146:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
  16. Wed Jul 9 04:23:17 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
  17. Wed Jul 9 04:23:17 2014 - [info] Not candidate for the new Master (no_master is set)
  18. Wed Jul 9 04:23:17 2014 - [info] Current Alive Master: 192.168.186.141(192.168.186.141:3306)
  19. Wed Jul 9 04:23:17 2014 - [info] Checking slave configurations..
  20. Wed Jul 9 04:23:17 2014 - [info] Checking replication filtering settings..
  21. Wed Jul 9 04:23:17 2014 - [info] binlog_do_db= , binlog_ignore_db=
  22. Wed Jul 9 04:23:17 2014 - [info] Replication filtering check ok.
  23. Wed Jul 9 04:23:17 2014 - [info] Starting SSH connection tests..
  24. Wed Jul 9 04:23:18 2014 - [info] All SSH connection tests passed successfully.
  25. Wed Jul 9 04:23:18 2014 - [info] Checking MHA Node version..
  26. Wed Jul 9 04:23:19 2014 - [info] Version check ok.
  27. Wed Jul 9 04:23:19 2014 - [info] Checking SSH publickey authentication settings on the current master..
  28. Wed Jul 9 04:23:19 2014 - [info] HealthCheck: SSH to 192.168.186.141 is reachable.
  29. Wed Jul 9 04:23:19 2014 - [info] Master MHA Node version is 0.53.
  30. Wed Jul 9 04:23:19 2014 - [info] Checking recovery script configurations on the current master..
  31. Wed Jul 9 04:23:19 2014 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql/data --output_file=/var/tmp/save_binary_logs_test --manager_version=0.53 --start_file=mysql-bin.000001
  32. Wed Jul 9 04:23:19 2014 - [info] Connecting to root@192.168.186.141(192.168.186.141)..
  33.   Creating /var/tmp if not exists.. ok.
  34.   Checking output directory is accessible or not..
  35.    ok.
  36.   Binlog found at /data/mysql/data, up to mysql-bin.000001
  37. Wed Jul 9 04:23:20 2014 - [info] Master setting check done.
  38. Wed Jul 9 04:23:20 2014 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
  39. Wed Jul 9 04:23:20 2014 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=mha_mon --slave_host=192.168.186.142 --slave_ip=192.168.186.142 --slave_port=3306 --workdir=/var/tmp --target_version=5.6.10-log --manager_version=0.53 --relay_log_info=/data/mysql/data/relay-log.info --relay_dir=/data/mysql/data/ --slave_pass=xxx
  40. Wed Jul 9 04:23:20 2014 - [info] Connecting to root@192.168.186.142(192.168.186.142:22)..
  41.   Checking slave recovery environment settings..
  42.     Opening /data/mysql/data/relay-log.info ... ok.
  43.     Relay log found at /data/mysql/data, up to SLAVE1-relay-bin.000002
  44.     Temporary relay log file is /data/mysql/data/SLAVE1-relay-bin.000002
  45.     Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
  46.  done.
  47.     Testing mysqlbinlog output.. done.
  48.     Cleaning up test file(s).. done.
  49. Wed Jul 9 04:23:20 2014 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=mha_mon --slave_host=SLAVE2.COM --slave_ip=192.168.186.146 --slave_port=3306 --workdir=/var/tmp --target_version=5.6.10-log --manager_version=0.53 --relay_log_info=/data/mysql/data/relay-log.info --relay_dir=/data/mysql/data/ --slave_pass=xxx
  50. Wed Jul 9 04:23:20 2014 - [info] Connecting to root@192.168.186.146(SLAVE2.COM:22)..
  51.   Checking slave recovery environment settings..
  52.     Opening /data/mysql/data/relay-log.info ... ok.
  53.     Relay log found at /data/mysql/data, up to slave2-relay-bin.000002
  54.     Temporary relay log file is /data/mysql/data/slave2-relay-bin.000002
  55.     Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
  56.  done.
  57.     Testing mysqlbinlog output.. done.
  58.     Cleaning up test file(s).. done.
  59. Wed Jul 9 04:23:21 2014 - [info] Slaves settings check done.
  60. Wed Jul 9 04:23:21 2014 - [info]
  61. 192.168.186.141 (current master)
  62.  +--192.168.186.142
  63.  +--SLAVE2.COM
  64.  
  65. Wed Jul 9 04:23:21 2014 - [info] Checking replication health on 192.168.186.142..
  66. Wed Jul 9 04:23:21 2014 - [info] ok.
  67. Wed Jul 9 04:23:21 2014 - [info] Checking replication health on SLAVE2.COM..
  68. Wed Jul 9 04:23:21 2014 - [info] ok.
  69. Wed Jul 9 04:23:21 2014 - [warning] master_ip_failover_script is not defined.
  70. Wed Jul 9 04:23:21 2014 - [warning] shutdown_script is not defined.
  71. Wed Jul 9 04:23:21 2014 - [info] Got exit code 0 (Not master dead).
  72.  
  73. MySQL Replication Health is OK.
  74.  
  75. 至此說明你的MHA 已經配置好了

五:啟動/關閉mha

  1. [root@dg55 ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf > /tmp/mha_manager.log < /dev/null 2>&1 &
  2. [1] 25516
  3. [root@dg55 ~]# masterha_check_status --conf=/etc/masterha/app1.cnf     --檢查一下
  4. app1 (pid:25516) is running(0:PING_OK), master:192.168.123.13

  5. [root@node4 app1]#masterha_stop --conf=/etc/app1.cnf   --關閉
    # 如果不能停止, 加 --abort選項


六測試故障轉移

 將 MYSQL.COM 機器上的 MYSQL 服務關閉 ,注意觀察  manager.log   日誌會發現 切換到了 SLAVE1.COM  並且 SLAVE1.COM 變成了主  而 SLAVE2.COM  則變成了 SLAVE1.COM  的從

點選(此處)摺疊或開啟

  1. root@MANAGER app1]# tail -f manager.log 這是啟動後還沒關閉主資料庫的日誌內容
  2. 192.168.186.141 (current master)
  3.  +--192.168.186.142
  4.  +--SLAVE2.COM
  5.  
  6. Wed Jul 9 18:52:32 2014 - [warning] master_ip_failover_script is not defined.
  7. Wed Jul 9 18:52:32 2014 - [warning] shutdown_script is not defined.
  8. Wed Jul 9 18:52:32 2014 - [info] Set master ping interval 1 seconds.
  9. Wed Jul 9 18:52:32 2014 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
  10. Wed Jul 9 18:52:32 2014 - [info] Starting ping health check on 192.168.186.141(192.168.186.141:3306)..
  11. Wed Jul 9 18:52:32 2014 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
  12.  
  13. [root@MYSQL ~]# service mysqld stop
  14. Shutting down MySQL..... SUCCESS!
  15.  
  16.  
  17. [root@MANAGER app1]# tail -f manager.log 最要看最後幾行 就知道有沒有切換成功
  18. 192.168.186.141 (current master)
  19.  +--192.168.186.142
  20.  +--SLAVE2.COM
  21. Wed Jul 9 18:56:47 2014 - [info] Dead Servers:
  22. Wed Jul 9 18:56:47 2014 - [info] 192.168.186.141(192.168.186.141:3306)
  23. Wed Jul 9 18:56:47 2014 - [info] Alive Servers:
  24. Wed Jul 9 18:56:47 2014 - [info] 192.168.186.142(192.168.186.142:3306)
  25. Wed Jul 9 18:56:47 2014 - [info] SLAVE2.COM(192.168.186.146:3306)
  26. Wed Jul 9 18:56:47 2014 - [info] Alive Slaves:
  27. Wed Jul 9 18:56:47 2014 - [info] 192.168.186.142(192.168.186.142:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
  28. Wed Jul 9 18:56:47 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
  29. Wed Jul 9 18:56:47 2014 - [info] Primary candidate for the new Master (candidate_master is set)
  30. Wed Jul 9 18:56:47 2014 - [info] SLAVE2.COM(192.168.186.146:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
  31. Wed Jul 9 18:56:47 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
  32. Wed Jul 9 18:56:47 2014 - [info] Not candidate for the new Master (no_master is set)
  33. Wed Jul 9 18:56:47 2014 - [info] Checking slave configurations..
  34. Wed Jul 9 18:56:47 2014 - [info] Checking replication filtering settings..
  35. Wed Jul 9 18:56:47 2014 - [info] Replication filtering check ok.
  36. Wed Jul 9 18:56:47 2014 - [info] Master is down!
  37. Wed Jul 9 18:56:47 2014 - [info] Terminating monitoring script.
  38. Wed Jul 9 18:56:47 2014 - [info] Got exit code 20 (Master dead).
  39. Wed Jul 9 18:56:47 2014 - [info] MHA::MasterFailover version 0.53.
  40. Wed Jul 9 18:56:47 2014 - [info] Starting master failover.
  41. Wed Jul 9 18:56:47 2014 - [info] * Phase 2: Dead Master Shutdown Phase completed.
  42. Wed Jul 9 18:56:47 2014 - [info] 192.168.186.142(192.168.186.142:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
  43. Wed Jul 9 18:56:47 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
  44. Wed Jul 9 18:56:47 2014 - [info] Primary candidate for the new Master (candidate_master is set)
  45. Wed Jul 9 18:56:47 2014 - [info] SLAVE2.COM(192.168.186.146:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
  46. Wed Jul 9 18:56:47 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
  47. Wed Jul 9 18:56:47 2014 - [info] Not candidate for the new Master (no_master is set)
  48. Wed Jul 9 18:56:47 2014 - [info] The oldest binary log file/position on all slaves is mysql-bin.000001:214
  49. Wed Jul 9 18:56:47 2014 - [info] Oldest slaves:
  50. Wed Jul 9 18:56:47 2014 - [info] 192.168.186.142(192.168.186.142:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
  51. Wed Jul 9 18:56:47 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
  52. Wed Jul 9 18:56:47 2014 - [info] Primary candidate for the new Master (candidate_master is set)
  53. Wed Jul 9 18:56:47 2014 - [info] SLAVE2.COM(192.168.186.146:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
  54. Wed Jul 9 18:56:47 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
  55. Wed Jul 9 18:56:47 2014 - [info] Not candidate for the new Master (no_master is set)
  56. Wed Jul 9 18:56:47 2014 - [info]
  57. Wed Jul 9 18:56:47 2014 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
  58. Wed Jul 9 18:56:47 2014 - [info]
  59. Wed Jul 9 18:56:48 2014 - [info] Fetching dead master's binary logs..
  60. Wed Jul 9 18:56:48 2014 - [info] Executing command on the dead master 192.168.186.141(192.168.186.141:3306): save_binary_logs --command=save --start_file=mysql-bin.000001 --start_pos=214 --binlog_dir=/data/mysql/data --output_file=/var/tmp/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53
  61.   Creating /var/tmp if not exists.. ok.
  62.  Concat binary/relay logs from mysql-bin.000001 pos 214 to mysql-bin.000001 EOF into /var/tmp/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog ..
  63.   Dumping binlog format description event, from position 0 to 120.. ok.
  64.   Dumping effective binlog data from /data/mysql/data/mysql-bin.000001 position 214 to tail(237).. ok.
  65.  Concat succeeded.
  66. Wed Jul 9 18:56:48 2014 - [info] scp from root@192.168.186.141:/var/tmp/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog to local:/masterha/app1/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog succeeded.
  67. Wed Jul 9 18:56:49 2014 - [info] HealthCheck: SSH to 192.168.186.142 is reachable.
  68. Wed Jul 9 18:56:49 2014 - [info] HealthCheck: SSH to SLAVE2.COM is reachable.
  69. Wed Jul 9 18:56:49 2014 - [info]
  70. Wed Jul 9 18:56:49 2014 - [info] * Phase 3.3: Determining New Master Phase..
  71. Wed Jul 9 18:56:49 2014 - [info]
  72. Wed Jul 9 18:56:49 2014 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
  73. Wed Jul 9 18:56:49 2014 - [info] All slaves received relay logs to the same position. No need to resync each other.
  74. Wed Jul 9 18:56:49 2014 - [info] Searching new master from slaves..
  75. Wed Jul 9 18:56:49 2014 - [info] Candidate masters from the configuration file:
  76. Wed Jul 9 18:56:49 2014 - [info] 192.168.186.142(192.168.186.142:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
  77. Wed Jul 9 18:56:49 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
  78. Wed Jul 9 18:56:49 2014 - [info] Primary candidate for the new Master (candidate_master is set)
  79. Wed Jul 9 18:56:49 2014 - [info] Non-candidate masters:
  80. Wed Jul 9 18:56:49 2014 - [info] SLAVE2.COM(192.168.186.146:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
  81. Wed Jul 9 18:56:49 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
  82. Wed Jul 9 18:56:49 2014 - [info] Not candidate for the new Master (no_master is set)
  83. Wed Jul 9 18:56:49 2014 - [info] Searching from candidate_master slaves which have received the latest relay log events..
  84. Wed Jul 9 18:56:49 2014 - [info] New master is 192.168.186.142(192.168.186.142:3306)
  85. Wed Jul 9 18:56:49 2014 - [info] Starting master failover..
  86. Wed Jul 9 18:56:49 2014 - [info]
  87. From:
  88. 192.168.186.141 (current master)
  89.  +--192.168.186.142
  90.  +--SLAVE2.COM
  91.  
  92. To:
  93. 192.168.186.142 (new master)
  94.  +--SLAVE2.COM
  95. Wed Jul 9 18:56:49 2014 - [info]
  96. Wed Jul 9 18:56:49 2014 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
  97. Wed Jul 9 18:56:49 2014 - [info]
  98. Wed Jul 9 18:56:49 2014 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
  99. Wed Jul 9 18:56:49 2014 - [info] Sending binlog..
  100. Wed Jul 9 18:56:50 2014 - [info] scp from local:/masterha/app1/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog to root@192.168.186.142:/var/tmp/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog succeeded.
  101. Wed Jul 9 18:56:50 2014 - [info]
  102. Wed Jul 9 18:56:50 2014 - [info] * Phase 3.4: Master Log Apply Phase..
  103. Wed Jul 9 18:56:50 2014 - [info]
  104. Wed Jul 9 18:56:50 2014 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
  105. Wed Jul 9 18:56:50 2014 - [info] Starting recovery on 192.168.186.142(192.168.186.142:3306)..
  106. Wed Jul 9 18:56:50 2014 - [info] Generating diffs succeeded.
  107. Wed Jul 9 18:56:50 2014 - [info] Waiting until all relay logs are applied.
  108. Wed Jul 9 18:56:50 2014 - [info] done.
  109. Wed Jul 9 18:56:50 2014 - [info] Getting slave status..
  110. Wed Jul 9 18:56:50 2014 - [info] This slave(192.168.186.142)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000001:214). No need to recover from Exec_Master_Log_Pos.
  111. Wed Jul 9 18:56:50 2014 - [info] Connecting to the target slave host 192.168.186.142, running recover script..
  112. Wed Jul 9 18:56:50 2014 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=mha_mon --slave_host=192.168.186.142 --slave_ip=192.168.186.142 --slave_port=3306 --apply_files=/var/tmp/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog --workdir=/var/tmp --target_version=5.6.10-log --timestamp=20140709185647 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53 --slave_pass=xxx
  113. Wed Jul 9 18:56:50 2014 - [info]
  114. MySQL client version is 5.6.10. Using --binary-mode.
  115. Applying differential binary/relay log files /var/tmp/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog on 192.168.186.142:3306. This may take long time...
  116. Applying log files succeeded.
  117. Wed Jul 9 18:56:50 2014 - [info] All relay logs were successfully applied.
  118. Wed Jul 9 18:56:50 2014 - [info] Getting new master's binlog name and position..
  119. Wed Jul 9 18:56:50 2014 - [info] mysql-bin.000007:504
  120. Wed Jul 9 18:56:50 2014 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.186.142', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000007', MASTER_LOG_POS=504, MASTER_USER='repl', MASTER_PASSWORD='xxx';
  121. Wed Jul 9 18:56:50 2014 - [warning] master_ip_failover_script is not set. Skipping taking over new master ip address.
  122. Wed Jul 9 18:56:50 2014 - [info] Setting read_only=0 on 192.168.186.142(192.168.186.142:3306)..
  123. Wed Jul 9 18:56:50 2014 - [info] ok.
  124. Wed Jul 9 18:56:50 2014 - [info] ** Finished master recovery successfully.
  125. Wed Jul 9 18:56:50 2014 - [info] * Phase 3: Master Recovery Phase completed.
  126. Wed Jul 9 18:56:50 2014 - [info]
  127. Wed Jul 9 18:56:50 2014 - [info] * Phase 4: Slaves Recovery Phase..
  128. Wed Jul 9 18:56:50 2014 - [info]
  129. Wed Jul 9 18:56:50 2014 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
  130. Wed Jul 9 18:56:50 2014 - [info]
  131. Wed Jul 9 18:56:50 2014 - [info] -- Slave diff file generation on host SLAVE2.COM(192.168.186.146:3306) started, pid: 3135. Check tmp log /masterha/app1/SLAVE2.COM_3306_20140709185647.log if it takes time..
  132. Wed Jul 9 18:56:50 2014 - [info]
  133. Wed Jul 9 18:56:50 2014 - [info] Log messages from SLAVE2.COM ...
  134. Wed Jul 9 18:56:50 2014 - [info]
  135. Wed Jul 9 18:56:50 2014 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
  136. Wed Jul 9 18:56:50 2014 - [info] End of log messages from SLAVE2.COM.
  137. Wed Jul 9 18:56:50 2014 - [info] -- SLAVE2.COM(192.168.186.146:3306) has the latest relay log events.
  138. Wed Jul 9 18:56:50 2014 - [info] Generating relay diff files from the latest slave succeeded.
  139. Wed Jul 9 18:56:50 2014 - [info]
  140. Wed Jul 9 18:56:50 2014 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
  141. Wed Jul 9 18:56:50 2014 - [info]
  142. Wed Jul 9 18:56:50 2014 - [info] -- Slave recovery on host SLAVE2.COM(192.168.186.146:3306) started, pid: 3137. Check tmp log /masterha/app1/SLAVE2.COM_3306_20140709185647.log if it takes time..
  143. Wed Jul 9 18:56:51 2014 - [info]
  144. Wed Jul 9 18:56:51 2014 - [info] Log messages from SLAVE2.COM ...
  145. Wed Jul 9 18:56:51 2014 - [info]
  146. Wed Jul 9 18:56:50 2014 - [info] Sending binlog..
  147. Wed Jul 9 18:56:51 2014 - [info] scp from local:/masterha/app1/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog to root@SLAVE2.COM:/var/tmp/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog succeeded.
  148. Wed Jul 9 18:56:51 2014 - [info] Starting recovery on SLAVE2.COM(192.168.186.146:3306)..
  149. Wed Jul 9 18:56:51 2014 - [info] Generating diffs succeeded.
  150. Wed Jul 9 18:56:51 2014 - [info] Waiting until all relay logs are applied.
  151. Wed Jul 9 18:56:51 2014 - [info] done.
  152. Wed Jul 9 18:56:51 2014 - [info] Getting slave status..
  153. Wed Jul 9 18:56:51 2014 - [info] This slave(SLAVE2.COM)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000001:214). No need to recover from Exec_Master_Log_Pos.
  154. Wed Jul 9 18:56:51 2014 - [info] Connecting to the target slave host SLAVE2.COM, running recover script..
  155. Wed Jul 9 18:56:51 2014 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=mha_mon --slave_host=SLAVE2.COM --slave_ip=192.168.186.146 --slave_port=3306 --apply_files=/var/tmp/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog --workdir=/var/tmp --target_version=5.6.10-log --timestamp=20140709185647 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53 --slave_pass=xxx
  156. Wed Jul 9 18:56:51 2014 - [info]
  157. MySQL client version is 5.6.10. Using --binary-mode.
  158. Applying differential binary/relay log files /var/tmp/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog on SLAVE2.COM:3306. This may take long time...
  159. Applying log files succeeded.
  160. Wed Jul 9 18:56:51 2014 - [info] All relay logs were successfully applied.
  161. Wed Jul 9 18:56:51 2014 - [info] Resetting slave SLAVE2.COM(192.168.186.146:3306) and starting replication from the new master 192.168.186.142(192.168.186.142:3306)..
  162. Wed Jul 9 18:56:51 2014 - [info] Executed CHANGE MASTER.
  163. Wed Jul 9 18:56:51 2014 - [info] Slave started.
  164. Wed Jul 9 18:56:51 2014 - [info] End of log messages from SLAVE2.COM.
  165. Wed Jul 9 18:56:51 2014 - [info] -- Slave recovery on host SLAVE2.COM(192.168.186.146:3306) succeeded.
  166. Wed Jul 9 18:56:51 2014 - [info] All new slave servers recovered successfully.
  167. Wed Jul 9 18:56:51 2014 - [info]
  168. Wed Jul 9 18:56:51 2014 - [info] * Phase 5: New master cleanup phease..
  169. Wed Jul 9 18:56:51 2014 - [info]
  170. Wed Jul 9 18:56:51 2014 - [info] Resetting slave info on the new master..
  171. Wed Jul 9 18:56:51 2014 - [info] 192.168.186.142: Resetting slave info succeeded.
  172. Wed Jul 9 18:56:51 2014 - [info] Master failover to 192.168.186.142(192.168.186.142:3306) completed successfully.
  173. Wed Jul 9 18:56:51 2014 - [info]
  174.  
  175. ----- Failover Report -----
  176.  
  177. app1: MySQL Master failover 192.168.186.141 to 192.168.186.142 succeeded
  178.  
  179. Master 192.168.186.141 is
  180.  
  181. Check MHA Manager logs at MANAGER.COM:/masterha/app1/manager.log for details.
  182.  
  183. Started automated(non-interactive) failover.
  184. The latest slave 192.168.186.142(192.168.186.142:3306) has all relay logs for recovery.
  185. Selected 192.168.186.142 as a new master.
  186. 192.168.186.142: OK: Applying all logs succeeded.
  187. SLAVE2.COM: This host has the latest relay log events.
  188. Generating relay diff files from the latest slave succeeded.
  189. SLAVE2.COM: OK: Applying all logs succeeded. Slave started, replicating from 192.168.186.142.
  190. 192.168.186.142: Resetting slave info succeeded.
  191. 192.168.186.142: Resetting slave info succeeded.
  192. 看到192.168.186.142: Resetting slave info succeeded.

  193. Master failover to 192.168.186.142(192.168.186.142:3306) completed successfully. 說明切換成功了

  194. [root@slave2 ~]# mysql -e "show slave status\G"
  195. *************************** 1. row ***************************
  196.           Slave_IO_State: Waiting for master to send event
  197.             Master_Host: 192.168.186.142
  198.             Master_User: repl
  199.             Master_Port: 3306
  200.            Connect_Retry: 60
  201.           Master_Log_File: mysql-bin.000007
  202.        Read_Master_Log_Pos: 504
  203.           Relay_Log_File: slave2-relay-bin.000002
  204.            Relay_Log_Pos: 283
  205.       Relay_Master_Log_File: mysql-bin.000007
  206.          Slave_IO_Running: Yes
  207.         Slave_SQL_Running: Yes
  208. 看到已經切換到和142同步了 本來是和141同步的 此時的SALVE1.COM已經變成主了 說明已然生效


切換後舊master的修復及重新上線
master已經由192.168.17.199  3306 切到了192.168.17.200 3306  實際環境中資料是在不斷的變化的,而在切換點mha沒有記錄當時新master的log-file和log-pos 所以要想直接啟動192.168.17.199 3306 然後change master to 192.168.17.200 3306的話是不行的,只能對新主或slave2做一個全備然後再恢復再change。
另外,當執行切換後管理節點上的masterha_manager程式會自動stop,所以等修復好後要再次執行啟動----不要糾結ip不對,這個地方是個思路


重構

重構我想就不要我做了吧,就是這時候等於你的主掛了 切換在 SLAVE1.COM  上變成了主 因此重構我提供一種方案(方案多種),拿出一臺新的伺服器重新加入以 142為主做成從 再 app1.conf 配置檔案中加入 新機器的資訊

[root@MANAGER app1]# rm -rf app1.failover.complete
刪除該檔案後 再次啟動 manager 端即可




七.擴充套件 Keepalived

1master與slave端都安裝keepalive
  1. [root@MYSQL src]# wget http://www.keepalived.org/software/keepalived-1.2.12.tar.gz
  2. [root@MYSQL src]# tar -xf keepalived-1.2.12.tar.gz
  3. [root@MYSQL src]# cd keepalived-1.2.12
  4. [root@MYSQL src]# yum -y install gcc gcc-c++ gcc-g77 ncurses-devel bison libaio-devel cmake libnl* libpopt* popt-static openssl-devel
  5. [root@MYSQL keepalived-1.2.12]# ./configure
  6. [root@MYSQL keepalived-1.2.12]# make && make install
  7. [root@MYSQL src]#mkdir /etc/keepalived/
  8.  
  9. [root@MYSQL src]# cp /usr/local/etc/keepalived/keepalived.conf /etc/keepalived/
  10. [root@MYSQL src]# cp /usr/local/etc/rc.d/init.d/keepalived /etc/init.d/
  11. [root@MYSQL src]# cp /usr/local/etc/sysconfig/keepalived /etc/sysconfig/
  12. [root@MYSQL src]#cp /usr/local/sbin/keepalived /usr/sbin/

2. 配置 keepalived
  1. [root@MYSQL keepalived]# vi keepalived.conf
  2. ! Configuration File for keepalived
  3.  
  4. global_defs {
  5.    notification_email {
  6.      acassen@firewall.loc
  7.      failover@firewall.loc
  8.      sysadmin@firewall.loc
  9.    }
  10.    notification_email_from Alexandre.Cassen@firewall.loc
  11.    smtp_server 192.168.200.1
  12.    smtp_connect_timeout 30
  13.    router_id LVS_DEVEL ##配置是為了標識當前節點,兩個節點的此項設定可相同,也可不相同
  14. }
  15.  
  16. vrrp_instance VI_1 {
  17.     state MASTER                   #指定A節點為主節點 備用節點上設定為BACKUP即可
  18.     interface eth0
  19.     virtual_router_id 51              #VRRP組名,兩個節點的設定必須一樣,以指明各個節點屬於同一VRRP
  20.     priority 100              #主節點的優先順序(1-254之間),備用節點必須比主節點優先順序低
  21.          advert_int 1
  22.     authentication {             #設定驗證資訊,兩個節點必須一致
  23.         auth_type PASS
  24.         auth_pass 1111
  25.     }
  26. virtual_ipaddress {
  27. 192.168.200.16                   =====vip,指定一個vip
  28.     }
  29. }

3. 檢測 mysql 服務指令碼
  1. [root@MYSQL keepalived]# vi /root/check_mysql.sh 
  2. [root@node1 ~]# vi checkmysql.sh 
    #!/bin/bash
    MYSQL=/usr/bin/mysql
    MYSQL_HOST=127.0.0.1
    MYSQL_USER=root
    MYSQL_PASSWORD=ESBecs00
    CHECK_TIME=3
    MYSQL_OK=1
    function check_mysql_helth (){
    $MYSQL -h $MYSQL_HOST -u $MYSQL_USER -p$MYSQL_PASSWORD -e "show status;" >/dev/null 2>&1
    if [ $? = 0 ] ;then
      MYSQL_OK=1
    else
      MYSQL_OK=0
    fi
      return $MYSQL_OK
    }
    #echo "$MYSQL_OK $CHECK_TIME"


    while [ $CHECK_TIME -ne 0 ]
    do
      let "CHECK_TIME -= 1"
      check_mysql_helth
    if [ $MYSQL_OK = 1 ] ; then
      echo "#### Mysql is alive ####"
      CHECK_TIME=0
      exit 0
    fi
    if [ $MYSQL_OK -eq 0 ] && [ $CHECK_TIME -eq 0 ]
    then
      #pkill keepalived
      echo "####    Mysql Down    ####"
      echo "####now kill keepalive#####"
      /etc/init.d/keepalived stop
    exit 1
    fi
    sleep 1
    done

該指令碼做一個計劃任務每分鐘做一次檢查 MYSQL  服務如果掛掉了的話  VIP  就會在SALVE1.COM  上生效 這樣的話  client  端連線的 VIP  就會從 MYSQL.COM  上變到了SLAVE1.COM  上  此時 SLAVE1.COM  由於 MHA 的生效 已經變為主對外提供服務了 VIP  也在 SALVE1.COM 上 所以從頭到尾 客戶端只需要連線 VIP  就可以了

 [root@MYSQL ~]# crontab  -l */1 * * * * bash /root/checkmysql.sh

以上步驟再在下一次切換的 SLAVE 上做一次也就是 SLAVE1.COM 上做一次 keepalived.conf 配置檔案不同的地方已做註釋

4. 啟動測試

兩邊啟動 keepalived

[root@MYSQL keepalived]# keepalived -f /etc/keepalived/keepalived.conf
 [root@MYSQL keepalived]# ps -ef |grep keep 
root 3230 1 0 23:27 ? 00:00:00 keepalived -f /etc/keepalived/keepalived.conf
root 3231 3230 0 23:27 ? 00:00:00 keepalived -f /etc/keepalived/keepalived.conf
root 3232 3230 0 23:27 ? 00:00:00 keepalived -f /etc/keepalived/keepalived.conf
root 3234 2538 0 23:27 pts/0 00:00:00 grep keep
 
[root@MYSQL keepalived]# ip a 1: lo:  mtu 16436 qdisc noqueue state UNKNOWN  link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00  inet 127.0.0.1/8 scope host lo  inet6 ::1/128 scope host   valid_lft forever preferred_lft forever 2: eth0:  mtu 1500 qdisc pfifo_fast state UP qlen 1000  link/ether 00:0c:29:c9:85:ba brd ff:ff:ff:ff:ff:ff  inet 192.168.186.141/24 brd 192.168.186.255 scope global eth0  inet 192.168.200.16/32 scope global eth0  inet6 fe80::20c:29ff:fec9:85ba/64 scope link   valid_lft forever preferred_lft forever

測試關閉該主機的 MYSQL  服務看看 VIP  是否飄到配好 keepalive 的 SLAVE1.COM上  MHA 中的 MYSQL  主是不是也變成了 SLAVE1.COM

如是 至此全部完成。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。謝謝!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

一點注意事項:MHA時基於傳統的log + postion開發的,維護GTID肯定不靈光了,但是mha0.56已經開始支援gtid了,可以去作者的部落格看到







原文地址:
參考文章:http://ylw6006.blog.51cto.com/470441/890360/

附上node與manager的下載地址

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29096438/viewspace-1818461/,如需轉載,請註明出處,否則將追究法律責任。

相關文章