MySQL高可用之MHA切換測試(switchover & failover)

aaron8219發表於2018-08-02
 
Preface
 
    I`ve installed MasterHA yesterday,Now let`s test the master-slave switch and failover feature.
 
Framework
 
Hostname IP Port Identity OS Version MySQL Version
zlm2 192.168.1.101 3306 master CentOS 7.0 5.7.21
zlm3 192.168.1.102 3306 slave/mha-manager CentOS 7.0 5.7.21
null 192.168.1.200 null vip null null
Procedure
 
Test 1:Manual master switchover
 
Check state of MHA-manager on zlm3.
 1 [root@zlm3 07:35:00 ~]
 2 #masterha_check_ssh --conf=/etc/masterha/app1.conf
 3 Fri Aug  3 07:37:13 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
 4 Fri Aug  3 07:37:13 2018 - [info] Reading application default configuration from /etc/masterha/app1.conf..
 5 Fri Aug  3 07:37:13 2018 - [info] Reading server configuration from /etc/masterha/app1.conf..
 6 Fri Aug  3 07:37:13 2018 - [info] Starting SSH connection tests..
 7 Fri Aug  3 07:37:13 2018 - [debug] 
 8 Fri Aug  3 07:37:13 2018 - [debug]  Connecting via SSH from root@192.168.1.101(192.168.1.101:22) to root@192.168.1.102(192.168.1.102:22)..
 9 Fri Aug  3 07:37:13 2018 - [debug]   ok.
10 Fri Aug  3 07:37:14 2018 - [debug] 
11 Fri Aug  3 07:37:13 2018 - [debug]  Connecting via SSH from root@192.168.1.102(192.168.1.102:22) to root@192.168.1.101(192.168.1.101:22)..
12 Fri Aug  3 07:37:13 2018 - [debug]   ok.
13 Fri Aug  3 07:37:14 2018 - [info] All SSH connection tests passed successfully.
14 
15 [root@zlm3 07:37:14 ~]
16 #masterha_check_repl --conf=/etc/masterha/app1.conf --global_conf=/etc/masterha/masterha_default.conf 
17 Fri Aug  3 07:37:37 2018 - [info] Reading default configuration from /etc/masterha/masterha_default.conf..
18 Fri Aug  3 07:37:37 2018 - [info] Reading application default configuration from /etc/masterha/app1.conf..
19 Fri Aug  3 07:37:37 2018 - [info] Reading server configuration from /etc/masterha/app1.conf..
20 Fri Aug  3 07:37:37 2018 - [info] MHA::MasterMonitor version 0.56.
21 Fri Aug  3 07:37:38 2018 - [info] GTID failover mode = 1
22 Fri Aug  3 07:37:38 2018 - [info] Dead Servers:
23 Fri Aug  3 07:37:38 2018 - [info] Alive Servers:
24 Fri Aug  3 07:37:38 2018 - [info]   192.168.1.101(192.168.1.101:3306)
25 Fri Aug  3 07:37:38 2018 - [info]   192.168.1.102(192.168.1.102:3306)
26 Fri Aug  3 07:37:38 2018 - [info] Alive Slaves:
27 Fri Aug  3 07:37:38 2018 - [info]   192.168.1.102(192.168.1.102:3306)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
28 Fri Aug  3 07:37:38 2018 - [info]     GTID ON
29 Fri Aug  3 07:37:38 2018 - [info]     Replicating from 192.168.1.101(192.168.1.101:3306)
30 Fri Aug  3 07:37:38 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
31 Fri Aug  3 07:37:38 2018 - [info] Current Alive Master: 192.168.1.101(192.168.1.101:3306)
32 Fri Aug  3 07:37:38 2018 - [info] Checking slave configurations..
33 Fri Aug  3 07:37:38 2018 - [info]  read_only=1 is not set on slave 192.168.1.102(192.168.1.102:3306).
34 Fri Aug  3 07:37:38 2018 - [info] Checking replication filtering settings..
35 Fri Aug  3 07:37:38 2018 - [info]  binlog_do_db= , binlog_ignore_db= 
36 Fri Aug  3 07:37:38 2018 - [info]  Replication filtering check ok.
37 Fri Aug  3 07:37:38 2018 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
38 Fri Aug  3 07:37:38 2018 - [info] Checking SSH publickey authentication settings on the current master..
39 ssh_exchange_identification: Connection closed by remote host
40 Fri Aug  3 07:37:38 2018 - [warning] HealthCheck: SSH to 192.168.1.101 is NOT reachable.
41 Fri Aug  3 07:37:38 2018 - [info] 
42 192.168.1.101(192.168.1.101:3306) (current master)
43  +--192.168.1.102(192.168.1.102:3306)
44 
45 Fri Aug  3 07:37:38 2018 - [info] Checking replication health on 192.168.1.102..
46 Fri Aug  3 07:37:38 2018 - [info]  ok.
47 Fri Aug  3 07:37:38 2018 - [info] Checking master_ip_failover_script status:
48 Fri Aug  3 07:37:38 2018 - [info]   /etc/masterha/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.1.101 --orig_master_ip=192.168.1.101 --orig_master_port=3306  --orig_master_ssh_port=3306
49 Fri Aug  3 07:37:38 2018 - [info]  OK.
50 Fri Aug  3 07:37:38 2018 - [warning] shutdown_script is not defined.
51 Fri Aug  3 07:37:38 2018 - [info] Got exit code 0 (Not master dead).
52 
53 MySQL Replication Health is OK.
54 
55 [root@zlm3 07:40:03 ~]
56 #Fri Aug  3 07:40:03 2018 - [info] Reading default configuration from /etc/masterha/masterha_default.conf..
57 Fri Aug  3 07:40:03 2018 - [info] Reading application default configuration from /etc/masterha/app1.conf..
58 Fri Aug  3 07:40:03 2018 - [info] Reading server configuration from /etc/masterha/app1.conf..
59 ssh_exchange_identification: Connection closed by remote host
60 ^C
61 
62 [root@zlm3 07:40:11 ~]
63 #masterha_check_status --conf=/etc/masterha/app1.conf
64 app1 (pid:5628) is running(0:PING_OK), master:192.168.1.101

 

Switch master to slave and make it become a new slave of new master.

  1 [root@zlm3 08:21:27 ~]
  2 #masterha_master_switch --conf=/etc/masterha/app1.conf --global_conf=/etc/masterha/masterha_default.conf --master_state=alive --new_master_host=192.168.1.102 --orig_master_is_new_slave --running_updates_limit=60
  3 Fri Aug  3 08:21:29 2018 - [info] MHA::MasterRotate version 0.56.
  4 Fri Aug  3 08:21:29 2018 - [info] Starting online master switch..
  5 Fri Aug  3 08:21:29 2018 - [info] 
  6 Fri Aug  3 08:21:29 2018 - [info] * Phase 1: Configuration Check Phase..
  7 Fri Aug  3 08:21:29 2018 - [info] 
  8 Fri Aug  3 08:21:29 2018 - [info] Reading default configuration from /etc/masterha/masterha_default.conf..
  9 Fri Aug  3 08:21:29 2018 - [info] Reading application default configuration from /etc/masterha/app1.conf..
 10 Fri Aug  3 08:21:29 2018 - [info] Reading server configuration from /etc/masterha/app1.conf..
 11 Fri Aug  3 08:21:30 2018 - [info] GTID failover mode = 1
 12 Fri Aug  3 08:21:30 2018 - [info] Current Alive Master: 192.168.1.101(192.168.1.101:3306)
 13 Fri Aug  3 08:21:30 2018 - [info] Alive Slaves:
 14 Fri Aug  3 08:21:30 2018 - [info]   192.168.1.102(192.168.1.102:3306)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
 15 Fri Aug  3 08:21:30 2018 - [info]     GTID ON
 16 Fri Aug  3 08:21:30 2018 - [info]     Replicating from 192.168.1.101(192.168.1.101:3306)
 17 Fri Aug  3 08:21:30 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
 18 
 19 It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.1.101(192.168.1.101:3306)? (YES/no): yes
 20 Fri Aug  3 08:21:33 2018 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
 21 Fri Aug  3 08:21:33 2018 - [info]  ok.
 22 Fri Aug  3 08:21:33 2018 - [info] Checking MHA is not monitoring or doing failover..
 23 Fri Aug  3 08:21:33 2018 - [error][/usr/share/perl5/vendor_perl/MHA/MasterRotate.pm, ln142] Getting advisory lock failed on the current master. MHA Monitor runs on the current master. Stop MHA Manager/Monitor and try again.
 24 Fri Aug  3 08:21:33 2018 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR:  at /usr/bin/masterha_master_switch line 53.
 25 
 26 //It means that we should stop MHA-manager when donging switchover master.
 27 
 28 [root@zlm3 08:21:33 ~]
 29 #masterha_stop --conf=/etc/masterha/app1.conf --global_conf=/etc/masterha/masterha_default.conf
 30 Stopped app1 successfully.
 31 [1]+  Exit 1                  masterha_manager --conf=/etc/masterha/app1.conf --global_conf=/etc/masterha/masterha_default.conf
 32 
 33 [root@zlm3 08:28:07 ~]
 34 #masterha_master_switch --conf=/etc/masterha/app1.conf --global_conf=/etc/masterha/masterha_default.conf --master_state=alive --new_master_host=192.168.1.102 --orig_master_is_new_slave --running_updates_limit=60
 35 Fri Aug  3 08:28:21 2018 - [info] MHA::MasterRotate version 0.56.
 36 Fri Aug  3 08:28:21 2018 - [info] Starting online master switch..
 37 Fri Aug  3 08:28:21 2018 - [info] 
 38 Fri Aug  3 08:28:21 2018 - [info] * Phase 1: Configuration Check Phase..
 39 Fri Aug  3 08:28:21 2018 - [info] 
 40 Fri Aug  3 08:28:21 2018 - [info] Reading default configuration from /etc/masterha/masterha_default.conf..
 41 Fri Aug  3 08:28:21 2018 - [info] Reading application default configuration from /etc/masterha/app1.conf..
 42 Fri Aug  3 08:28:21 2018 - [info] Reading server configuration from /etc/masterha/app1.conf..
 43 Fri Aug  3 08:28:22 2018 - [info] GTID failover mode = 1
 44 Fri Aug  3 08:28:22 2018 - [info] Current Alive Master: 192.168.1.101(192.168.1.101:3306)
 45 Fri Aug  3 08:28:22 2018 - [info] Alive Slaves:
 46 Fri Aug  3 08:28:22 2018 - [info]   192.168.1.102(192.168.1.102:3306)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
 47 Fri Aug  3 08:28:22 2018 - [info]     GTID ON
 48 Fri Aug  3 08:28:22 2018 - [info]     Replicating from 192.168.1.101(192.168.1.101:3306)
 49 Fri Aug  3 08:28:22 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
 50 
 51 It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.1.101(192.168.1.101:3306)? (YES/no): yes
 52 Fri Aug  3 08:28:25 2018 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
 53 Fri Aug  3 08:28:25 2018 - [info]  ok.
 54 Fri Aug  3 08:28:25 2018 - [info] Checking MHA is not monitoring or doing failover..
 55 Fri Aug  3 08:28:25 2018 - [info] Checking replication health on 192.168.1.102..
 56 Fri Aug  3 08:28:25 2018 - [info]  ok.
 57 Fri Aug  3 08:28:25 2018 - [info] 192.168.1.102 can be new master.
 58 Fri Aug  3 08:28:25 2018 - [info] 
 59 From:
 60 192.168.1.101(192.168.1.101:3306) (current master)
 61  +--192.168.1.102(192.168.1.102:3306)
 62 
 63 To:
 64 192.168.1.102(192.168.1.102:3306) (new master)
 65  +--192.168.1.101(192.168.1.101:3306)
 66 
 67 Starting master switch from 192.168.1.101(192.168.1.101:3306) to 192.168.1.102(192.168.1.102:3306)? (yes/NO): yes
 68 Fri Aug  3 08:28:31 2018 - [info] Checking whether 192.168.1.102(192.168.1.102:3306) is ok for the new master..
 69 Fri Aug  3 08:28:31 2018 - [info]  ok.
 70 Fri Aug  3 08:28:31 2018 - [info] 192.168.1.101(192.168.1.101:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
 71 Fri Aug  3 08:28:31 2018 - [info] 192.168.1.101(192.168.1.101:3306): Resetting slave pointing to the dummy host.
 72 Fri Aug  3 08:28:31 2018 - [info] ** Phase 1: Configuration Check Phase completed.
 73 Fri Aug  3 08:28:31 2018 - [info] 
 74 Fri Aug  3 08:28:31 2018 - [info] * Phase 2: Rejecting updates Phase..
 75 Fri Aug  3 08:28:31 2018 - [info] 
 76 Fri Aug  3 08:28:31 2018 - [info] Executing master ip online change script to disable write on the current master:
 77 Fri Aug  3 08:28:31 2018 - [info]   /etc/masterha/master_ip_online_change --command=stop --orig_master_host=192.168.1.101 --orig_master_ip=192.168.1.101 --orig_master_port=3306 --orig_master_user=`zlm` --orig_master_password=`zlmzlm` --new_master_host=192.168.1.102 --new_master_ip=192.168.1.102 --new_master_port=3306 --new_master_user=`zlm` --new_master_password=`zlmzlm` --orig_master_ssh_user=root --new_master_ssh_user=root  --orig_master_ssh_port=3306  --new_master_ssh_port=3306 --orig_master_is_new_slave
 78 Unknown option: new_master_ssh_port
 79 Fri Aug  3 08:28:32 2018 116409 Set read_only on the new master.. ok.
 80 Fri Aug  3 08:28:32 2018 125643 drop vip 10.33.101.239..
 81 ssh_exchange_identification: Connection closed by remote host
 82 Fri Aug  3 08:28:32 2018 142948 Waiting all running 1 threads are disconnected.. (max 1500 milliseconds)
 83 {`Time` => `13435`,`db` => undef,`Id` => `21`,`User` => `repl`,`State` => `Master has sent all binlog to slave; waiting for more updates`,`Command` => `Binlog Dump GTID`,`Info` => undef,`Host` => `zlm3:40535`}
 84 Fri Aug  3 08:28:32 2018 646769 Waiting all running 1 threads are disconnected.. (max 1000 milliseconds)
 85 {`Time` => `13435`,`db` => undef,`Id` => `21`,`User` => `repl`,`State` => `Master has sent all binlog to slave; waiting for more updates`,`Command` => `Binlog Dump GTID`,`Info` => undef,`Host` => `zlm3:40535`}
 86 Fri Aug  3 08:28:33 2018 149221 Waiting all running 1 threads are disconnected.. (max 500 milliseconds)
 87 {`Time` => `13436`,`db` => undef,`Id` => `21`,`User` => `repl`,`State` => `Master has sent all binlog to slave; waiting for more updates`,`Command` => `Binlog Dump GTID`,`Info` => undef,`Host` => `zlm3:40535`}
 88 Fri Aug  3 08:28:33 2018 650816 Set read_only=1 on the orig master.. ok.
 89 Fri Aug  3 08:28:33 2018 653323 Waiting all running 1 queries are disconnected.. (max 500 milliseconds)
 90 {`Time` => `13436`,`db` => undef,`Id` => `21`,`User` => `repl`,`State` => `Master has sent all binlog to slave; waiting for more updates`,`Command` => `Binlog Dump GTID`,`Info` => undef,`Host` => `zlm3:40535`}
 91 Fri Aug  3 08:28:34 2018 154965 Killing all application threads..
 92 Fri Aug  3 08:28:34 2018 167919 done.
 93 Fri Aug  3 08:28:34 2018 - [info]  ok.
 94 Fri Aug  3 08:28:34 2018 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
 95 Fri Aug  3 08:28:34 2018 - [info] Executing FLUSH TABLES WITH READ LOCK..
 96 Fri Aug  3 08:28:34 2018 - [info]  ok.
 97 Fri Aug  3 08:28:34 2018 - [info] Orig master binlog:pos is mysql-bin.000050:2361.
 98 Fri Aug  3 08:28:34 2018 - [info]  Waiting to execute all relay logs on 192.168.1.102(192.168.1.102:3306)..
 99 Fri Aug  3 08:28:34 2018 - [info]  master_pos_wait(mysql-bin.000050:2361) completed on 192.168.1.102(192.168.1.102:3306). Executed 0 events.
100 Fri Aug  3 08:28:34 2018 - [info]   done.
101 Fri Aug  3 08:28:34 2018 - [info] Getting new master`s binlog name and position..
102 Fri Aug  3 08:28:34 2018 - [info]  mysql-bin.000003:2321
103 Fri Aug  3 08:28:34 2018 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST=`192.168.1.102`, MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER=`repl`, MASTER_PASSWORD=`xxx`;
104 Fri Aug  3 08:28:34 2018 - [info] Executing master ip online change script to allow write on the new master:
105 Fri Aug  3 08:28:34 2018 - [info]   /etc/masterha/master_ip_online_change --command=start --orig_master_host=192.168.1.101 --orig_master_ip=192.168.1.101 --orig_master_port=3306 --orig_master_user=`zlm` --orig_master_password=`zlmzlm` --new_master_host=192.168.1.102 --new_master_ip=192.168.1.102 --new_master_port=3306 --new_master_user=`zlm` --new_master_password=`zlmzlm` --orig_master_ssh_user=root --new_master_ssh_user=root  --orig_master_ssh_port=3306  --new_master_ssh_port=3306 --orig_master_is_new_slave
106 Unknown option: new_master_ssh_port
107 Fri Aug  3 08:28:34 2018 327146 Set read_only=0 on the new master.
108 Fri Aug  3 08:28:34 2018 328259Add vip 10.33.101.239 on p3p1..
109 ssh_exchange_identification: Connection closed by remote host
110 Fri Aug  3 08:28:34 2018 - [info]  ok.
111 Fri Aug  3 08:28:34 2018 - [info] 
112 Fri Aug  3 08:28:34 2018 - [info] * Switching slaves in parallel..
113 Fri Aug  3 08:28:34 2018 - [info] 
114 Fri Aug  3 08:28:34 2018 - [info] Unlocking all tables on the orig master:
115 Fri Aug  3 08:28:34 2018 - [info] Executing UNLOCK TABLES..
116 Fri Aug  3 08:28:34 2018 - [info]  ok.
117 Fri Aug  3 08:28:34 2018 - [info] Starting orig master as a new slave..
118 Fri Aug  3 08:28:34 2018 - [info]  Resetting slave 192.168.1.101(192.168.1.101:3306) and starting replication from the new master 192.168.1.102(192.168.1.102:3306)..
119 Fri Aug  3 08:28:34 2018 - [info]  Executed CHANGE MASTER.
120 Fri Aug  3 08:28:35 2018 - [info]  Slave started.
121 Fri Aug  3 08:28:35 2018 - [info] All new slave servers switched successfully.
122 Fri Aug  3 08:28:35 2018 - [info] 
123 Fri Aug  3 08:28:35 2018 - [info] * Phase 5: New master cleanup phase..
124 Fri Aug  3 08:28:35 2018 - [info] 
125 Fri Aug  3 08:28:35 2018 - [info]  192.168.1.102: Resetting slave info succeeded.
126 Fri Aug  3 08:28:35 2018 - [info] Switching master to 192.168.1.102(192.168.1.102:3306) completed successfully.
127 
128 [root@zlm3 08:28:35 ~]
129 #

 

Check the master-slave replication status.

 1 //New master(original slave)
 2 (zlm@192.168.1.102 3306)[(none)]>show master status;
 3 +------------------+----------+--------------+------------------+------------------------------------------------+
 4 | File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                              |
 5 +------------------+----------+--------------+------------------+------------------------------------------------+
 6 | mysql-bin.000003 |     2321 |              |                  | 1b7181ee-6eaf-11e8-998e-080027de0e0e:1-3730259 |
 7 +------------------+----------+--------------+------------------+------------------------------------------------+
 8 1 row in set (0.00 sec)
 9 
10 (zlm@192.168.1.102 3306)[(none)]>show slave statusG
11 Empty set (0.00 sec)
12 
13 //New slave(original master)
14 (zlm@192.168.1.101 3306)[(none)]>show master status;
15 +------------------+----------+--------------+------------------+------------------------------------------------+
16 | File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                              |
17 +------------------+----------+--------------+------------------+------------------------------------------------+
18 | mysql-bin.000050 |     2361 |              |                  | 1b7181ee-6eaf-11e8-998e-080027de0e0e:1-3730259 |
19 +------------------+----------+--------------+------------------+------------------------------------------------+
20 1 row in set (0.01 sec)
21 
22 (zlm@192.168.1.101 3306)[(none)]>show slave statusG
23 *************************** 1. row ***************************
24                Slave_IO_State: Waiting for master to send event
25                   Master_Host: 192.168.1.102
26                   Master_User: repl
27                   Master_Port: 3306
28                 Connect_Retry: 60
29               Master_Log_File: mysql-bin.000003
30           Read_Master_Log_Pos: 2321
31                Relay_Log_File: relay-bin.000002
32                 Relay_Log_Pos: 398
33         Relay_Master_Log_File: mysql-bin.000003
34              Slave_IO_Running: Yes
35             Slave_SQL_Running: Yes
36               Replicate_Do_DB: 
37           Replicate_Ignore_DB: 
38            Replicate_Do_Table: 
39        Replicate_Ignore_Table: 
40       Replicate_Wild_Do_Table: 
41   Replicate_Wild_Ignore_Table: 
42                    Last_Errno: 0
43                    Last_Error: 
44                  Skip_Counter: 0
45           Exec_Master_Log_Pos: 2321
46               Relay_Log_Space: 591
47               Until_Condition: None
48                Until_Log_File: 
49                 Until_Log_Pos: 0
50            Master_SSL_Allowed: No
51            Master_SSL_CA_File: 
52            Master_SSL_CA_Path: 
53               Master_SSL_Cert: 
54             Master_SSL_Cipher: 
55                Master_SSL_Key: 
56         Seconds_Behind_Master: 0
57 Master_SSL_Verify_Server_Cert: No
58                 Last_IO_Errno: 0
59                 Last_IO_Error: 
60                Last_SQL_Errno: 0
61                Last_SQL_Error: 
62   Replicate_Ignore_Server_Ids: 
63              Master_Server_Id: 1023306
64                   Master_UUID: 842ea497-9551-11e8-83ca-080027de0e0e
65              Master_Info_File: mysql.slave_master_info
66                     SQL_Delay: 0
67           SQL_Remaining_Delay: NULL
68       Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
69            Master_Retry_Count: 86400
70                   Master_Bind: 
71       Last_IO_Error_Timestamp: 
72      Last_SQL_Error_Timestamp: 
73                Master_SSL_Crl: 
74            Master_SSL_Crlpath: 
75            Retrieved_Gtid_Set: 
76             Executed_Gtid_Set: 1b7181ee-6eaf-11e8-998e-080027de0e0e:1-3730259
77                 Auto_Position: 1
78          Replicate_Rewrite_DB: 
79                  Channel_Name: 
80            Master_TLS_Version: 
81 1 row in set (0.00 sec)

 

Check the log of MasterHA on zlm3.

 1 [root@zlm3 08:28:35 ~]
 2 #cd /var/log/masterha/app1
 3 
 4 [root@zlm3 08:29:12 /var/log/masterha/app1]
 5 #cat app1.log 
 6 Fri Aug  3 07:39:13 2018 - [info] MHA::MasterMonitor version 0.56.
 7 Fri Aug  3 07:39:14 2018 - [info] GTID failover mode = 1
 8 Fri Aug  3 07:39:14 2018 - [info] Dead Servers:
 9 Fri Aug  3 07:39:14 2018 - [info] Alive Servers:
10 Fri Aug  3 07:39:14 2018 - [info]   192.168.1.101(192.168.1.101:3306)
11 Fri Aug  3 07:39:14 2018 - [info]   192.168.1.102(192.168.1.102:3306)
12 Fri Aug  3 07:39:14 2018 - [info] Alive Slaves:
13 Fri Aug  3 07:39:14 2018 - [info]   192.168.1.102(192.168.1.102:3306)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
14 Fri Aug  3 07:39:14 2018 - [info]     GTID ON
15 Fri Aug  3 07:39:14 2018 - [info]     Replicating from 192.168.1.101(192.168.1.101:3306)
16 Fri Aug  3 07:39:14 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
17 Fri Aug  3 07:39:14 2018 - [info] Current Alive Master: 192.168.1.101(192.168.1.101:3306)
18 Fri Aug  3 07:39:14 2018 - [info] Checking slave configurations..
19 Fri Aug  3 07:39:14 2018 - [info]  read_only=1 is not set on slave 192.168.1.102(192.168.1.102:3306).
20 Fri Aug  3 07:39:14 2018 - [info] Checking replication filtering settings..
21 Fri Aug  3 07:39:14 2018 - [info]  binlog_do_db= , binlog_ignore_db= 
22 Fri Aug  3 07:39:14 2018 - [info]  Replication filtering check ok.
23 Fri Aug  3 07:39:14 2018 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
24 Fri Aug  3 07:39:14 2018 - [info] Checking SSH publickey authentication settings on the current master..
25 Fri Aug  3 07:39:14 2018 - [warning] HealthCheck: SSH to 192.168.1.101 is NOT reachable.
26 Fri Aug  3 07:39:14 2018 - [info] 
27 192.168.1.101(192.168.1.101:3306) (current master)
28  +--192.168.1.102(192.168.1.102:3306)
29 
30 Fri Aug  3 07:39:14 2018 - [info] Checking master_ip_failover_script status:
31 Fri Aug  3 07:39:14 2018 - [info]   /etc/masterha/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.1.101 --orig_master_ip=192.168.1.101 --orig_master_port=3306  --orig_master_ssh_port=3306
32 Fri Aug  3 07:39:14 2018 - [info]  OK.
33 Fri Aug  3 07:39:14 2018 - [warning] shutdown_script is not defined.
34 Fri Aug  3 07:39:14 2018 - [info] Set master ping interval 1 seconds.
35 Fri Aug  3 07:39:14 2018 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
36 Fri Aug  3 07:39:14 2018 - [info] Starting ping health check on 192.168.1.101(192.168.1.101:3306)..
37 Fri Aug  3 07:39:14 2018 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn`t respond..
38 Fri Aug  3 07:39:27 2018 - [info] Got terminate signal. Exit.
39 Fri Aug  3 07:40:03 2018 - [info] MHA::MasterMonitor version 0.56.
40 Fri Aug  3 07:40:04 2018 - [info] GTID failover mode = 1
41 Fri Aug  3 07:40:04 2018 - [info] Dead Servers:
42 Fri Aug  3 07:40:04 2018 - [info] Alive Servers:
43 Fri Aug  3 07:40:04 2018 - [info]   192.168.1.101(192.168.1.101:3306)
44 Fri Aug  3 07:40:04 2018 - [info]   192.168.1.102(192.168.1.102:3306)
45 Fri Aug  3 07:40:04 2018 - [info] Alive Slaves:
46 Fri Aug  3 07:40:04 2018 - [info]   192.168.1.102(192.168.1.102:3306)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
47 Fri Aug  3 07:40:04 2018 - [info]     GTID ON
48 Fri Aug  3 07:40:04 2018 - [info]     Replicating from 192.168.1.101(192.168.1.101:3306)
49 Fri Aug  3 07:40:04 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
50 Fri Aug  3 07:40:04 2018 - [info] Current Alive Master: 192.168.1.101(192.168.1.101:3306)
51 Fri Aug  3 07:40:04 2018 - [info] Checking slave configurations..
52 Fri Aug  3 07:40:04 2018 - [info]  read_only=1 is not set on slave 192.168.1.102(192.168.1.102:3306).
53 Fri Aug  3 07:40:04 2018 - [info] Checking replication filtering settings..
54 Fri Aug  3 07:40:04 2018 - [info]  binlog_do_db= , binlog_ignore_db= 
55 Fri Aug  3 07:40:04 2018 - [info]  Replication filtering check ok.
56 Fri Aug  3 07:40:04 2018 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
57 Fri Aug  3 07:40:04 2018 - [info] Checking SSH publickey authentication settings on the current master..
58 Fri Aug  3 07:40:04 2018 - [warning] HealthCheck: SSH to 192.168.1.101 is NOT reachable.
59 Fri Aug  3 07:40:04 2018 - [info] 
60 192.168.1.101(192.168.1.101:3306) (current master)
61  +--192.168.1.102(192.168.1.102:3306)
62 
63 Fri Aug  3 07:40:04 2018 - [info] Checking master_ip_failover_script status:
64 Fri Aug  3 07:40:04 2018 - [info]   /etc/masterha/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.1.101 --orig_master_ip=192.168.1.101 --orig_master_port=3306  --orig_master_ssh_port=3306
65 Fri Aug  3 07:40:04 2018 - [info]  OK.
66 Fri Aug  3 07:40:04 2018 - [warning] shutdown_script is not defined.
67 Fri Aug  3 07:40:04 2018 - [info] Set master ping interval 1 seconds.
68 Fri Aug  3 07:40:04 2018 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
69 Fri Aug  3 07:40:04 2018 - [info] Starting ping health check on 192.168.1.101(192.168.1.101:3306)..
70 Fri Aug  3 07:40:04 2018 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn`t respond..
71 Fri Aug  3 08:28:07 2018 - [info] Got terminate signal. Exit.

 

Test 2:Manual master failover
 
Execute “masterha_master_switch” script again to generate a failover on zlm2.
  1 [root@zlm2 10:11:10 ~]
  2 #masterha_master_switch --conf=/etc/masterha/app1.conf --global_conf=/etc/masterha/masterha_default.conf --dead_master_host=192.168.1.102 --master_state=dead --new_master_host=192.168.1.101 --ignore_last_failover 
  3 --dead_master_ip=<dead_master_ip> is not set. Using 192.168.1.102.
  4 --dead_master_port=<dead_master_port> is not set. Using 3306.
  5 Fri Aug  3 10:11:50 2018 - [info] Reading default configuration from /etc/masterha/masterha_default.conf..
  6 Fri Aug  3 10:11:50 2018 - [info] Reading application default configuration from /etc/masterha/app1.conf..
  7 Fri Aug  3 10:11:50 2018 - [info] Reading server configuration from /etc/masterha/app1.conf..
  8 Fri Aug  3 10:11:50 2018 - [info] MHA::MasterFailover version 0.56.
  9 Fri Aug  3 10:11:50 2018 - [info] Starting master failover.
 10 Fri Aug  3 10:11:50 2018 - [info] 
 11 Fri Aug  3 10:11:50 2018 - [info] * Phase 1: Configuration Check Phase..
 12 Fri Aug  3 10:11:50 2018 - [info] 
 13 Fri Aug  3 10:11:51 2018 - [info] GTID failover mode = 1
 14 Fri Aug  3 10:11:51 2018 - [info] Dead Servers:
 15 Fri Aug  3 10:11:51 2018 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln187] None of server is dead. Stop failover.
 16 Fri Aug  3 10:11:51 2018 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR:  at /usr/bin/masterha_master_switch line 53.
 17 
 18 //Stop mysqld of master on zlm3.
 19 [root@zlm3 10:13:56 ~]
 20 #mysqladmin shutdown
 21 
 22 [root@zlm3 10:14:04 ~]
 23 #ps aux|grep mysqld
 24 mysql     5368  0.0 19.6 1110812 200292 pts/0  Sl   04:44   0:09 mysqld --defaults-file=/data/mysql/mysql3306/my.cnf
 25 root      8827  0.0  0.0 112640   960 pts/0    R+   10:14   0:00 grep --color=auto mysqld
 26 
 27 [root@zlm3 10:14:08 ~]
 28 #ps aux|grep mysqld
 29 mysql     5368  0.0 19.6 1110812 200292 pts/0  Sl   04:44   0:09 mysqld --defaults-file=/data/mysql/mysql3306/my.cnf
 30 root      8833  0.0  0.0 112640   960 pts/0    R+   10:14   0:00 grep --color=auto mysqld
 31 
 32 [root@zlm3 10:14:12 ~]
 33 #ps aux|grep mysqld
 34 mysql     5368  0.0 19.1 995088 194692 pts/0   Sl   04:44   0:09 mysqld --defaults-file=/data/mysql/mysql3306/my.cnf
 35 root      8839  0.0  0.0 112640   960 pts/0    R+   10:14   0:00 grep --color=auto mysqld
 36 
 37 [root@zlm3 10:14:23 ~]
 38 #ps aux|grep mysqld
 39 root      8854  0.0  0.0 112640   960 pts/0    R+   10:14   0:00 grep --color=auto mysqld
 40 
 41 //Execute the above command again on zlm2.
 42 [root@zlm2 10:15:43 ~]
 43 #masterha_master_switch --conf=/etc/masterha/app1.conf --global_conf=/etc/masterha/masterha_default.conf --dead_master_host=192.168.1.102 --master_state=dead --new_master_host=192.168.1.101 --ignore_last_failover 
 44 --dead_master_ip=<dead_master_ip> is not set. Using 192.168.1.102.
 45 --dead_master_port=<dead_master_port> is not set. Using 3306.
 46 Fri Aug  3 10:15:43 2018 - [info] Reading default configuration from /etc/masterha/masterha_default.conf..
 47 Fri Aug  3 10:15:43 2018 - [info] Reading application default configuration from /etc/masterha/app1.conf..
 48 Fri Aug  3 10:15:43 2018 - [info] Reading server configuration from /etc/masterha/app1.conf..
 49 Fri Aug  3 10:15:43 2018 - [info] MHA::MasterFailover version 0.56.
 50 Fri Aug  3 10:15:43 2018 - [info] Starting master failover.
 51 Fri Aug  3 10:15:43 2018 - [info] 
 52 Fri Aug  3 10:15:43 2018 - [info] * Phase 1: Configuration Check Phase..
 53 Fri Aug  3 10:15:43 2018 - [info] 
 54 Fri Aug  3 10:15:44 2018 - [info] GTID failover mode = 1
 55 Fri Aug  3 10:15:44 2018 - [info] Dead Servers:
 56 Fri Aug  3 10:15:44 2018 - [info]   192.168.1.102(192.168.1.102:3306)
 57 Fri Aug  3 10:15:44 2018 - [info] Checking master reachability via MySQL(double check)...
 58 Fri Aug  3 10:15:44 2018 - [info]  ok.
 59 Fri Aug  3 10:15:44 2018 - [info] Alive Servers:
 60 Fri Aug  3 10:15:44 2018 - [info]   192.168.1.101(192.168.1.101:3306)
 61 Fri Aug  3 10:15:44 2018 - [info] Alive Slaves:
 62 Fri Aug  3 10:15:44 2018 - [info]   192.168.1.101(192.168.1.101:3306)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
 63 Fri Aug  3 10:15:44 2018 - [info]     GTID ON
 64 Fri Aug  3 10:15:44 2018 - [info]     Replicating from 192.168.1.102(192.168.1.102:3306)
 65 Fri Aug  3 10:15:44 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
 66 Master 192.168.1.102(192.168.1.102:3306) is dead. Proceed? (yes/NO): yes
 67 Fri Aug  3 10:15:46 2018 - [info] Starting GTID based failover.
 68 Fri Aug  3 10:15:46 2018 - [info] 
 69 Fri Aug  3 10:15:46 2018 - [info] ** Phase 1: Configuration Check Phase completed.
 70 Fri Aug  3 10:15:46 2018 - [info] 
 71 Fri Aug  3 10:15:46 2018 - [info] * Phase 2: Dead Master Shutdown Phase..
 72 Fri Aug  3 10:15:46 2018 - [info] 
 73 ssh: connect to host 192.168.1.102 port 3306: Connection refused
 74 Fri Aug  3 10:15:46 2018 - [warning] HealthCheck: SSH to 192.168.1.102 is NOT reachable.
 75 Fri Aug  3 10:15:46 2018 - [info] Forcing shutdown so that applications never connect to the current master..
 76 Fri Aug  3 10:15:46 2018 - [info] Executing master IP deactivation script:
 77 Fri Aug  3 10:15:46 2018 - [info]   /etc/masterha/master_ip_failover --orig_master_host=192.168.1.102 --orig_master_ip=192.168.1.102 --orig_master_port=3306 --command=stop  --orig_master_ssh_port=3306
 78 ssh: connect to host 192.168.1.102 port 3306: Connection refused
 79 Fri Aug  3 10:15:48 2018 - [info]  done.
 80 Fri Aug  3 10:15:48 2018 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
 81 Fri Aug  3 10:15:48 2018 - [info] * Phase 2: Dead Master Shutdown Phase completed.
 82 Fri Aug  3 10:15:48 2018 - [info] 
 83 Fri Aug  3 10:15:48 2018 - [info] * Phase 3: Master Recovery Phase..
 84 Fri Aug  3 10:15:48 2018 - [info] 
 85 Fri Aug  3 10:15:48 2018 - [info] * Phase 3.1: Getting Latest Slaves Phase..
 86 Fri Aug  3 10:15:48 2018 - [info] 
 87 Fri Aug  3 10:15:48 2018 - [info] The latest binary log file/position on all slaves is mysql-bin.000003:2321
 88 Fri Aug  3 10:15:48 2018 - [info] Latest slaves (Slaves that received relay log files to the latest):
 89 Fri Aug  3 10:15:48 2018 - [info]   192.168.1.101(192.168.1.101:3306)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
 90 Fri Aug  3 10:15:48 2018 - [info]     GTID ON
 91 Fri Aug  3 10:15:48 2018 - [info]     Replicating from 192.168.1.102(192.168.1.102:3306)
 92 Fri Aug  3 10:15:48 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
 93 Fri Aug  3 10:15:48 2018 - [info] The oldest binary log file/position on all slaves is mysql-bin.000003:2321
 94 Fri Aug  3 10:15:48 2018 - [info] Oldest slaves:
 95 Fri Aug  3 10:15:48 2018 - [info]   192.168.1.101(192.168.1.101:3306)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
 96 Fri Aug  3 10:15:48 2018 - [info]     GTID ON
 97 Fri Aug  3 10:15:48 2018 - [info]     Replicating from 192.168.1.102(192.168.1.102:3306)
 98 Fri Aug  3 10:15:48 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
 99 Fri Aug  3 10:15:48 2018 - [info] 
100 Fri Aug  3 10:15:48 2018 - [info] * Phase 3.3: Determining New Master Phase..
101 Fri Aug  3 10:15:48 2018 - [info] 
102 Fri Aug  3 10:15:48 2018 - [info] 192.168.1.101 can be new master.
103 Fri Aug  3 10:15:48 2018 - [info] New master is 192.168.1.101(192.168.1.101:3306)
104 Fri Aug  3 10:15:48 2018 - [info] Starting master failover..
105 Fri Aug  3 10:15:48 2018 - [info] 
106 From:
107 192.168.1.102(192.168.1.102:3306) (current master)
108  +--192.168.1.101(192.168.1.101:3306)
109 
110 To:
111 192.168.1.101(192.168.1.101:3306) (new master)
112 
113 Starting master switch from 192.168.1.102(192.168.1.102:3306) to 192.168.1.101(192.168.1.101:3306)? (yes/NO): yes
114 Fri Aug  3 10:15:56 2018 - [info] New master decided manually is 192.168.1.101(192.168.1.101:3306)
115 Fri Aug  3 10:15:56 2018 - [info] 
116 Fri Aug  3 10:15:56 2018 - [info] * Phase 3.3: New Master Recovery Phase..
117 Fri Aug  3 10:15:56 2018 - [info] 
118 Fri Aug  3 10:15:56 2018 - [info]  Waiting all logs to be applied.. 
119 Fri Aug  3 10:15:56 2018 - [info]   done.
120 Fri Aug  3 10:15:56 2018 - [info] Getting new master`s binlog name and position..
121 Fri Aug  3 10:15:56 2018 - [info]  mysql-bin.000051:190
122 Fri Aug  3 10:15:56 2018 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST=`192.168.1.101`, MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER=`repl`, MASTER_PASSWORD=`xxx`;
123 Fri Aug  3 10:15:56 2018 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000051, 190, 1b7181ee-6eaf-11e8-998e-080027de0e0e:1-3730259
124 Fri Aug  3 10:15:56 2018 - [info] Executing master IP activate script:
125 Fri Aug  3 10:15:56 2018 - [info]   /etc/masterha/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.1.102 --orig_master_ip=192.168.1.102 --orig_master_port=3306 --new_master_host=192.168.1.101 --new_master_ip=192.168.1.101 --new_master_port=3306 --new_master_user=`zlm` --new_master_password=`zlmzlm`  --orig_master_ssh_port=3306  --new_master_ssh_port=3306
126 Unknown option: new_master_ssh_port
127 Set read_only=0 on the new master.
128 ssh_exchange_identification: Connection closed by remote host
129 Fri Aug  3 10:15:56 2018 - [info]  OK.
130 Fri Aug  3 10:15:56 2018 - [info] ** Finished master recovery successfully.
131 Fri Aug  3 10:15:56 2018 - [info] * Phase 3: Master Recovery Phase completed.
132 Fri Aug  3 10:15:56 2018 - [info] 
133 Fri Aug  3 10:15:56 2018 - [info] * Phase 4: Slaves Recovery Phase..
134 Fri Aug  3 10:15:56 2018 - [info] 
135 Fri Aug  3 10:15:56 2018 - [info] 
136 Fri Aug  3 10:15:56 2018 - [info] * Phase 4.1: Starting Slaves in parallel..
137 Fri Aug  3 10:15:56 2018 - [info] 
138 Fri Aug  3 10:15:56 2018 - [info] All new slave servers recovered successfully.
139 Fri Aug  3 10:15:56 2018 - [info] 
140 Fri Aug  3 10:15:56 2018 - [info] * Phase 5: New master cleanup phase..
141 Fri Aug  3 10:15:56 2018 - [info] 
142 Fri Aug  3 10:15:56 2018 - [info] Resetting slave info on the new master..
143 Fri Aug  3 10:15:56 2018 - [info]  192.168.1.101: Resetting slave info succeeded.
144 Fri Aug  3 10:15:56 2018 - [info] Master failover to 192.168.1.101(192.168.1.101:3306) completed successfully.
145 Fri Aug  3 10:15:56 2018 - [info] 
146 
147 ----- Failover Report -----
148 
149 app1: MySQL Master failover 192.168.1.102(192.168.1.102:3306) to 192.168.1.101(192.168.1.101:3306) succeeded
150 
151 Master 192.168.1.102(192.168.1.102:3306) is down!
152 
153 Check MHA Manager logs at zlm2 for details.
154 
155 Started manual(interactive) failover.
156 Invalidated master IP address on 192.168.1.102(192.168.1.102:3306)
157 Selected 192.168.1.101(192.168.1.101:3306) as a new master.
158 192.168.1.101(192.168.1.101:3306): OK: Applying all logs succeeded.
159 192.168.1.101(192.168.1.101:3306): OK: Activated master IP address.
160 192.168.1.101(192.168.1.101:3306): Resetting slave info succeeded.
161 Master failover to 192.168.1.101(192.168.1.101:3306) completed successfully.

 

Check the status of new master on zlm2.

 1 (zlm@192.168.1.101 3306)[(none)]>show master status;
 2 +------------------+----------+--------------+------------------+------------------------------------------------+
 3 | File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                              |
 4 +------------------+----------+--------------+------------------+------------------------------------------------+
 5 | mysql-bin.000051 |      190 |              |                  | 1b7181ee-6eaf-11e8-998e-080027de0e0e:1-3730259 |
 6 +------------------+----------+--------------+------------------+------------------------------------------------+
 7 1 row in set (0.00 sec)
 8 
 9 (zlm@192.168.1.101 3306)[(none)]>show slave statusG
10 Empty set (0.00 sec)

 

Check the file and log on MasterHA.

 1 [root@zlm2 10:15:56 ~]
 2 #cd /var/log/masterha/app1
 3 
 4 [root@zlm2 10:20:04 /var/log/masterha/app1]
 5 #ls -l
 6 total 4
 7 -rw-r--r-- 1 root root    0 Aug  3 10:15 app1.failover.complete 
 8 -rw-r--r-- 1 root root 3883 Aug  2 11:29 app1.log
 9 
10 //The option of "--ignore_last_failover" can neglect the influence of existence of "app1.failover.complete".Otherwise,the failover operation will be terminated by error.
11 //This file will be created after a failover operation and it will be created only on the original slave who wants to become a new master.
12 
13 [root@zlm2 10:20:05 /var/log/masterha/app1]
14 #cat app1.log
15 Thu Aug  2 11:12:03 2018 - [info] MHA::MasterMonitor version 0.56.
16 Thu Aug  2 11:12:04 2018 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln301] Got MySQL error when connecting 192.168.1.101(192.168.1.101:3306) :1045:Access denied for user `root`@`zlm2` (using password: NO), but this is not a MySQL crash. Check MySQL server settings.
17  at /usr/share/perl5/vendor_perl/MHA/ServerManager.pm line 297.
18 Thu Aug  2 11:12:04 2018 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln301] Got MySQL error when connecting 192.168.1.102(192.168.1.102:3306) :1045:Access denied for user `root`@`zlm2` (using password: NO), but this is not a MySQL crash. Check MySQL server settings.
19  at /usr/share/perl5/vendor_perl/MHA/ServerManager.pm line 297.
20 Thu Aug  2 11:12:05 2018 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln309] Got fatal error, stopping operations
21 Thu Aug  2 11:12:05 2018 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln424] Error happened on checking configurations.  at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 326.
22 Thu Aug  2 11:12:05 2018 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln523] Error happened on monitoring servers.
23 Thu Aug  2 11:12:05 2018 - [info] Got exit code 1 (Not master dead).
24 Thu Aug  2 11:13:56 2018 - [info] MHA::MasterMonitor version 0.56.
25 Thu Aug  2 11:13:57 2018 - [info] GTID failover mode = 1
26 Thu Aug  2 11:13:57 2018 - [info] Dead Servers:
27 Thu Aug  2 11:13:57 2018 - [info] Alive Servers:
28 Thu Aug  2 11:13:57 2018 - [info]   192.168.1.101(192.168.1.101:3306)
29 Thu Aug  2 11:13:57 2018 - [info]   192.168.1.102(192.168.1.102:3306)
30 Thu Aug  2 11:13:57 2018 - [info] Alive Slaves:
31 Thu Aug  2 11:13:57 2018 - [info]   192.168.1.102(192.168.1.102:3306)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
32 Thu Aug  2 11:13:57 2018 - [info]     GTID ON
33 Thu Aug  2 11:13:57 2018 - [info]     Replicating from 192.168.1.101(192.168.1.101:3306)
34 Thu Aug  2 11:13:57 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
35 Thu Aug  2 11:13:57 2018 - [info] Current Alive Master: 192.168.1.101(192.168.1.101:3306)
36 Thu Aug  2 11:13:57 2018 - [info] Checking slave configurations..
37 Thu Aug  2 11:13:57 2018 - [info]  read_only=1 is not set on slave 192.168.1.102(192.168.1.102:3306).
38 Thu Aug  2 11:13:57 2018 - [info] Checking replication filtering settings..
39 Thu Aug  2 11:13:57 2018 - [info]  binlog_do_db= , binlog_ignore_db= 
40 Thu Aug  2 11:13:57 2018 - [info]  Replication filtering check ok.
41 Thu Aug  2 11:13:57 2018 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
42 Thu Aug  2 11:13:57 2018 - [info] Checking SSH publickey authentication settings on the current master..
43 Thu Aug  2 11:13:57 2018 - [warning] HealthCheck: SSH to 192.168.1.101 is NOT reachable.
44 Thu Aug  2 11:13:57 2018 - [info] 
45 192.168.1.101(192.168.1.101:3306) (current master)
46  +--192.168.1.102(192.168.1.102:3306)
47 
48 Thu Aug  2 11:13:57 2018 - [info] Checking master_ip_failover_script status:
49 Thu Aug  2 11:13:57 2018 - [info]   /etc/masterha/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.1.101 --orig_master_ip=192.168.1.101 --orig_master_port=3306  --orig_master_ssh_port=3306
50 Thu Aug  2 11:13:57 2018 - [info]  OK.
51 Thu Aug  2 11:13:57 2018 - [warning] shutdown_script is not defined.
52 Thu Aug  2 11:13:57 2018 - [info] Set master ping interval 1 seconds.
53 Thu Aug  2 11:13:57 2018 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
54 Thu Aug  2 11:13:57 2018 - [info] Starting ping health check on 192.168.1.101(192.168.1.101:3306)..
55 Thu Aug  2 11:13:57 2018 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn`t respond..
56 Thu Aug  2 11:29:51 2018 - [info] Got terminate signal. Exit.

 

Test3: Automation of master switchover.
 
Repair the salve replication on zlm3.
 1 [root@zlm3 10:14:24 ~]
 2 #mysql
 3 Welcome to the MySQL monitor.  Commands end with ; or g.
 4 Your MySQL connection id is 3
 5 Server version: 5.7.21-log MySQL Community Server (GPL)
 6 
 7 Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.
 8 
 9 Oracle is a registered trademark of Oracle Corporation and/or its
10 affiliates. Other names may be trademarks of their respective
11 owners.
12 
13 Type `help;` or `h` for help. Type `c` to clear the current input statement.
14 
15 (zlm@192.168.1.102 3306)[(none)]>change master to 
16     -> master_host=`192.168.1.101`,
17     -> master_port=3306,
18     -> master_user=`repl`,
19     -> master_password=`repl4slave`,
20     -> master_auto_position=1;
21 Query OK, 0 rows affected, 2 warnings (0.02 sec)
22 
23 (zlm@192.168.1.102 3306)[(none)]>show slave statusG
24 *************************** 1. row ***************************
25                Slave_IO_State: Waiting for master to send event
26                   Master_Host: 192.168.1.101
27                   Master_User: repl
28                   Master_Port: 3306
29                 Connect_Retry: 60
30               Master_Log_File: mysql-bin.000051
31           Read_Master_Log_Pos: 190
32                Relay_Log_File: relay-bin.000002
33                 Relay_Log_Pos: 355
34         Relay_Master_Log_File: mysql-bin.000051
35              Slave_IO_Running: Yes
36             Slave_SQL_Running: Yes
37               Replicate_Do_DB: 
38           Replicate_Ignore_DB: 
39            Replicate_Do_Table: 
40        Replicate_Ignore_Table: 
41       Replicate_Wild_Do_Table: 
42   Replicate_Wild_Ignore_Table: 
43                    Last_Errno: 0
44                    Last_Error: 
45                  Skip_Counter: 0
46           Exec_Master_Log_Pos: 190
47               Relay_Log_Space: 548
48               Until_Condition: None
49                Until_Log_File: 
50                 Until_Log_Pos: 0
51            Master_SSL_Allowed: No
52            Master_SSL_CA_File: 
53            Master_SSL_CA_Path: 
54               Master_SSL_Cert: 
55             Master_SSL_Cipher: 
56                Master_SSL_Key: 
57         Seconds_Behind_Master: 0
58 Master_SSL_Verify_Server_Cert: No
59                 Last_IO_Errno: 0
60                 Last_IO_Error: 
61                Last_SQL_Errno: 0
62                Last_SQL_Error: 
63   Replicate_Ignore_Server_Ids: 
64              Master_Server_Id: 1013306
65                   Master_UUID: 1b7181ee-6eaf-11e8-998e-080027de0e0e
66              Master_Info_File: mysql.slave_master_info
67                     SQL_Delay: 0
68           SQL_Remaining_Delay: NULL
69       Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
70            Master_Retry_Count: 86400
71                   Master_Bind: 
72       Last_IO_Error_Timestamp: 
73      Last_SQL_Error_Timestamp: 
74                Master_SSL_Crl: 
75            Master_SSL_Crlpath: 
76            Retrieved_Gtid_Set: 
77             Executed_Gtid_Set: 1b7181ee-6eaf-11e8-998e-080027de0e0e:1-3730259
78                 Auto_Position: 1
79          Replicate_Rewrite_DB: 
80                  Channel_Name: 
81            Master_TLS_Version: 
82 1 row in set (0.00 sec)

 

Start MasterHA-manager.

 1 [root@zlm3 10:48:00 /var/log/masterha/app1]
 2 #nohup masterha_manager --conf=/etc/masterha/app1.conf --global_conf=/etc/masterha/masterha_default.conf &
 3 [1] 9265
 4 nohup: ignoring input and appending output to ‘nohup.out’
 5 
 6 [root@zlm3 10:48:12 /var/log/masterha/app1]
 7 #ls -l
 8 total 24
 9 -rw-r--r-- 1 root root 16370 Aug  3 10:48 app1.log
10 -rw-r--r-- 1 root root    35 Aug  3 10:48 app1.master_status.health //This file is created only when MasterHA-manager is running.It will continuously record the health status between slave and master.
11 -rw------- 1 root root   371 Aug  3 10:48 nohup.out
12 
13 [root@zlm3 10:48:14 /var/log/masterha/app1]
14 #cat nohup.out
15 Fri Aug  3 10:48:12 2018 - [info] Reading default configuration from /etc/masterha/masterha_default.conf..
16 Fri Aug  3 10:48:12 2018 - [info] Reading application default configuration from /etc/masterha/app1.conf..
17 Fri Aug  3 10:48:12 2018 - [info] Reading server configuration from /etc/masterha/app1.conf..
18 ssh_exchange_identification: Connection closed by remote host
19 
20 [root@zlm3 10:48:26 /var/log/masterha/app1]
21 #cat app1.master_status.health 
22 9265    0:PING_OK    master:192.168.1.101
23 [root@zlm3 10:48:31 /var/log/masterha/app1]
24 #ps aux|grep manager
25 root      9265  0.6  2.1 299172 21516 pts/1    S    10:48   0:00 perl /usr/bin/masterha_manager --conf=/etc/masterha/app1.conf --global_conf=/etc/masterha/masterha_default.conf
26 root      9332  0.0  0.0 112640   960 pts/1    R+   10:48   0:00 grep --color=auto manager

 

Kill mysqld on zlm2 to pretend the master is dead.

1 [root@zlm2 10:54:31 /var/log/masterha/app1]
2 #pkill mysqld
3 
4 [root@zlm2 10:55:28 /var/log/masterha/app1]
5 #ps aux|grep mysqld
6 root      6067  0.0  0.0 112640   960 pts/1    R+   10:55   0:00 grep --color=auto mysqld

 

Observe the app1.log on zlm3.

  1 [root@zlm3 10:54:59 /var/log/masterha/app1]
  2 #echo ``> app1.log
  3 
  4 [root@zlm3 10:55:17 /var/log/masterha/app1]
  5 #tail -f app1.log
  6 
  7 Fri Aug  3 10:55:29 2018 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
  8 Fri Aug  3 10:55:29 2018 - [info] Executing SSH check script: exit 0
  9 Fri Aug  3 10:55:29 2018 - [warning] HealthCheck: SSH to 192.168.1.101 is NOT reachable.
 10 Fri Aug  3 10:55:30 2018 - [warning] Got error on MySQL connect: 2003 (Can`t connect to MySQL server on `192.168.1.101` (111))
 11 Fri Aug  3 10:55:30 2018 - [warning] Connection failed 2 time(s)..
 12 Fri Aug  3 10:55:31 2018 - [warning] Got error on MySQL connect: 2003 (Can`t connect to MySQL server on `192.168.1.101` (111))
 13 Fri Aug  3 10:55:31 2018 - [warning] Connection failed 3 time(s)..
 14 Fri Aug  3 10:55:32 2018 - [warning] Got error on MySQL connect: 2003 (Can`t connect to MySQL server on `192.168.1.101` (111))
 15 Fri Aug  3 10:55:32 2018 - [warning] Connection failed 4 time(s)..
 16 Fri Aug  3 10:55:32 2018 - [warning] Master is not reachable from health checker!
 17 Fri Aug  3 10:55:32 2018 - [warning] Master 192.168.1.101(192.168.1.101:3306) is not reachable!
 18 Fri Aug  3 10:55:32 2018 - [warning] SSH is NOT reachable.
 19 Fri Aug  3 10:55:32 2018 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha/masterha_default.conf and /etc/masterha/app1.conf again, and trying to connect to all servers to check server status..
 20 Fri Aug  3 10:55:32 2018 - [info] Reading default configuration from /etc/masterha/masterha_default.conf..
 21 Fri Aug  3 10:55:32 2018 - [info] Reading application default configuration from /etc/masterha/app1.conf..
 22 Fri Aug  3 10:55:32 2018 - [info] Reading server configuration from /etc/masterha/app1.conf..
 23 Fri Aug  3 10:55:33 2018 - [info] GTID failover mode = 1
 24 Fri Aug  3 10:55:33 2018 - [info] Dead Servers:
 25 Fri Aug  3 10:55:33 2018 - [info]   192.168.1.101(192.168.1.101:3306)
 26 Fri Aug  3 10:55:33 2018 - [info] Alive Servers:
 27 Fri Aug  3 10:55:33 2018 - [info]   192.168.1.102(192.168.1.102:3306)
 28 Fri Aug  3 10:55:33 2018 - [info] Alive Slaves:
 29 Fri Aug  3 10:55:33 2018 - [info]   192.168.1.102(192.168.1.102:3306)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
 30 Fri Aug  3 10:55:33 2018 - [info]     GTID ON
 31 Fri Aug  3 10:55:33 2018 - [info]     Replicating from 192.168.1.101(192.168.1.101:3306)
 32 Fri Aug  3 10:55:33 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
 33 Fri Aug  3 10:55:33 2018 - [info] Checking slave configurations..
 34 Fri Aug  3 10:55:33 2018 - [info]  read_only=1 is not set on slave 192.168.1.102(192.168.1.102:3306).
 35 Fri Aug  3 10:55:33 2018 - [info] Checking replication filtering settings..
 36 Fri Aug  3 10:55:33 2018 - [info]  Replication filtering check ok.
 37 Fri Aug  3 10:55:33 2018 - [info] Master is down!
 38 Fri Aug  3 10:55:33 2018 - [info] Terminating monitoring script.
 39 Fri Aug  3 10:55:33 2018 - [info] Got exit code 20 (Master dead).
 40 Fri Aug  3 10:55:33 2018 - [info] MHA::MasterFailover version 0.56.
 41 Fri Aug  3 10:55:33 2018 - [info] Starting master failover.
 42 Fri Aug  3 10:55:33 2018 - [info] 
 43 Fri Aug  3 10:55:33 2018 - [info] * Phase 1: Configuration Check Phase..
 44 Fri Aug  3 10:55:33 2018 - [info] 
 45 Fri Aug  3 10:55:34 2018 - [info] GTID failover mode = 1
 46 Fri Aug  3 10:55:34 2018 - [info] Dead Servers:
 47 Fri Aug  3 10:55:34 2018 - [info]   192.168.1.101(192.168.1.101:3306)
 48 Fri Aug  3 10:55:34 2018 - [info] Checking master reachability via MySQL(double check)...
 49 Fri Aug  3 10:55:34 2018 - [info]  ok.
 50 Fri Aug  3 10:55:34 2018 - [info] Alive Servers:
 51 Fri Aug  3 10:55:34 2018 - [info]   192.168.1.102(192.168.1.102:3306)
 52 Fri Aug  3 10:55:34 2018 - [info] Alive Slaves:
 53 Fri Aug  3 10:55:34 2018 - [info]   192.168.1.102(192.168.1.102:3306)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
 54 Fri Aug  3 10:55:34 2018 - [info]     GTID ON
 55 Fri Aug  3 10:55:34 2018 - [info]     Replicating from 192.168.1.101(192.168.1.101:3306)
 56 Fri Aug  3 10:55:34 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
 57 Fri Aug  3 10:55:34 2018 - [info] Starting GTID based failover.
 58 Fri Aug  3 10:55:34 2018 - [info] 
 59 Fri Aug  3 10:55:34 2018 - [info] ** Phase 1: Configuration Check Phase completed.
 60 Fri Aug  3 10:55:34 2018 - [info] 
 61 Fri Aug  3 10:55:34 2018 - [info] * Phase 2: Dead Master Shutdown Phase..
 62 Fri Aug  3 10:55:34 2018 - [info] 
 63 Fri Aug  3 10:55:34 2018 - [info] Forcing shutdown so that applications never connect to the current master..
 64 Fri Aug  3 10:55:34 2018 - [info] Executing master IP deactivation script:
 65 Fri Aug  3 10:55:34 2018 - [info]   /etc/masterha/master_ip_failover --orig_master_host=192.168.1.101 --orig_master_ip=192.168.1.101 --orig_master_port=3306 --command=stop  --orig_master_ssh_port=3306
 66 ssh: connect to host 192.168.1.101 port 3306: Connection refused
 67 Fri Aug  3 10:55:37 2018 - [info]  done.
 68 Fri Aug  3 10:55:37 2018 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
 69 Fri Aug  3 10:55:37 2018 - [info] * Phase 2: Dead Master Shutdown Phase completed.
 70 Fri Aug  3 10:55:37 2018 - [info] 
 71 Fri Aug  3 10:55:37 2018 - [info] * Phase 3: Master Recovery Phase..
 72 Fri Aug  3 10:55:37 2018 - [info] 
 73 Fri Aug  3 10:55:37 2018 - [info] * Phase 3.1: Getting Latest Slaves Phase..
 74 Fri Aug  3 10:55:37 2018 - [info] 
 75 Fri Aug  3 10:55:37 2018 - [info] The latest binary log file/position on all slaves is mysql-bin.000051:190
 76 Fri Aug  3 10:55:37 2018 - [info] Latest slaves (Slaves that received relay log files to the latest):
 77 Fri Aug  3 10:55:37 2018 - [info]   192.168.1.102(192.168.1.102:3306)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
 78 Fri Aug  3 10:55:37 2018 - [info]     GTID ON
 79 Fri Aug  3 10:55:37 2018 - [info]     Replicating from 192.168.1.101(192.168.1.101:3306)
 80 Fri Aug  3 10:55:37 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
 81 Fri Aug  3 10:55:37 2018 - [info] The oldest binary log file/position on all slaves is mysql-bin.000051:190
 82 Fri Aug  3 10:55:37 2018 - [info] Oldest slaves:
 83 Fri Aug  3 10:55:37 2018 - [info]   192.168.1.102(192.168.1.102:3306)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
 84 Fri Aug  3 10:55:37 2018 - [info]     GTID ON
 85 Fri Aug  3 10:55:37 2018 - [info]     Replicating from 192.168.1.101(192.168.1.101:3306)
 86 Fri Aug  3 10:55:37 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
 87 Fri Aug  3 10:55:37 2018 - [info] 
 88 Fri Aug  3 10:55:37 2018 - [info] * Phase 3.3: Determining New Master Phase..
 89 Fri Aug  3 10:55:37 2018 - [info] 
 90 Fri Aug  3 10:55:37 2018 - [info] Searching new master from slaves..
 91 Fri Aug  3 10:55:37 2018 - [info]  Candidate masters from the configuration file:
 92 Fri Aug  3 10:55:37 2018 - [info]   192.168.1.102(192.168.1.102:3306)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
 93 Fri Aug  3 10:55:37 2018 - [info]     GTID ON
 94 Fri Aug  3 10:55:37 2018 - [info]     Replicating from 192.168.1.101(192.168.1.101:3306)
 95 Fri Aug  3 10:55:37 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
 96 Fri Aug  3 10:55:37 2018 - [info]  Non-candidate masters:
 97 Fri Aug  3 10:55:37 2018 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
 98 Fri Aug  3 10:55:37 2018 - [info] New master is 192.168.1.102(192.168.1.102:3306)
 99 Fri Aug  3 10:55:37 2018 - [info] Starting master failover..
100 Fri Aug  3 10:55:37 2018 - [info] 
101 From:
102 192.168.1.101(192.168.1.101:3306) (current master)
103  +--192.168.1.102(192.168.1.102:3306)
104 
105 To:
106 192.168.1.102(192.168.1.102:3306) (new master)
107 Fri Aug  3 10:55:37 2018 - [info] 
108 Fri Aug  3 10:55:37 2018 - [info] * Phase 3.3: New Master Recovery Phase..
109 Fri Aug  3 10:55:37 2018 - [info] 
110 Fri Aug  3 10:55:37 2018 - [info]  Waiting all logs to be applied.. 
111 Fri Aug  3 10:55:37 2018 - [info]   done.
112 Fri Aug  3 10:55:37 2018 - [info] Getting new master`s binlog name and position..
113 Fri Aug  3 10:55:37 2018 - [info]  mysql-bin.000004:190
114 Fri Aug  3 10:55:37 2018 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST=`192.168.1.102`, MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER=`repl`, MASTER_PASSWORD=`xxx`;
115 Fri Aug  3 10:55:37 2018 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000004, 190, 1b7181ee-6eaf-11e8-998e-080027de0e0e:1-3730259
116 Fri Aug  3 10:55:37 2018 - [info] Executing master IP activate script:
117 Fri Aug  3 10:55:37 2018 - [info]   /etc/masterha/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.1.101 --orig_master_ip=192.168.1.101 --orig_master_port=3306 --new_master_host=192.168.1.102 --new_master_ip=192.168.1.102 --new_master_port=3306 --new_master_user=`zlm` --new_master_password=`zlmzlm`  --orig_master_ssh_port=3306  --new_master_ssh_port=3306
118 Unknown option: new_master_ssh_port
119 Set read_only=0 on the new master.
120 ssh_exchange_identification: Connection closed by remote host
121 Fri Aug  3 10:55:37 2018 - [info]  OK.
122 Fri Aug  3 10:55:37 2018 - [info] ** Finished master recovery successfully.
123 Fri Aug  3 10:55:37 2018 - [info] * Phase 3: Master Recovery Phase completed.
124 Fri Aug  3 10:55:37 2018 - [info] 
125 Fri Aug  3 10:55:37 2018 - [info] * Phase 4: Slaves Recovery Phase..
126 Fri Aug  3 10:55:37 2018 - [info] 
127 Fri Aug  3 10:55:37 2018 - [info] 
128 Fri Aug  3 10:55:37 2018 - [info] * Phase 4.1: Starting Slaves in parallel..
129 Fri Aug  3 10:55:37 2018 - [info] 
130 Fri Aug  3 10:55:37 2018 - [info] All new slave servers recovered successfully.
131 Fri Aug  3 10:55:37 2018 - [info] 
132 Fri Aug  3 10:55:37 2018 - [info] * Phase 5: New master cleanup phase..
133 Fri Aug  3 10:55:37 2018 - [info] 
134 Fri Aug  3 10:55:37 2018 - [info] Resetting slave info on the new master..
135 Fri Aug  3 10:55:37 2018 - [info]  192.168.1.102: Resetting slave info succeeded.
136 Fri Aug  3 10:55:37 2018 - [info] Master failover to 192.168.1.102(192.168.1.102:3306) completed successfully.
137 Fri Aug  3 10:55:37 2018 - [info] 
138 
139 ----- Failover Report -----
140 
141 app1: MySQL Master failover 192.168.1.101(192.168.1.101:3306) to 192.168.1.102(192.168.1.102:3306) succeeded
142 
143 Master 192.168.1.101(192.168.1.101:3306) is down!
144 
145 Check MHA Manager logs at zlm3:/var/log/masterha/app1/app1.log for details.
146 
147 Started automated(non-interactive) failover.
148 Invalidated master IP address on 192.168.1.101(192.168.1.101:3306)
149 Selected 192.168.1.102(192.168.1.102:3306) as a new master.
150 192.168.1.102(192.168.1.102:3306): OK: Applying all logs succeeded.
151 192.168.1.102(192.168.1.102:3306): OK: Activated master IP address.
152 192.168.1.102(192.168.1.102:3306): Resetting slave info succeeded.
153 Master failover to 192.168.1.102(192.168.1.102:3306) completed successfully.
154 
155 //Above failover report shows all the evidence and results of automation master switchover.All of the steps are executed successfully.

 

相關文章