MHA架構搭建中遇到的問題

酸蘿蔔別吃發表於2021-03-02

1. 兩個包:mha4mysql-manager-0.56-0.el6.noarch.rpm 和 mha4mysql-node-0.56-0.el6.norch.rpm

地址:https://code.google.com/archive/p/mysql-master-ha/

 

2. 一些依賴包

yum install perl-DBD-MySQL
yum install perl-Config-Tiny
yum install perl-Log-Dispatch
yum install perl-Parallel-ForkManager

所有節點全裝,不然可能報錯;

 

3. manager節點的一些工具:

masterha_check_ssh:MHA依賴的ssh環境檢測

masterha_check_repl:MHA複製環境檢測

masterha_manager:服務主程式

masterha_check_status:MHA執行狀態檢測

masterha_stop:關閉MHA

 

4. masterha_check_ssh檢測中遇到的問題:

[root@manager ~]# masterha_check_ssh -conf=/etc/mha_master/mha.cnf
Can't locate MHA/SSHCheck.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /usr/bin/masterha_check_ssh line 25.
BEGIN failed--compilation aborted at /usr/bin/masterha_check_ssh line 25.

應該是環境變數的問題;

[root@manager ~]# find / -name SSHCheck.pm

/usr/lib/perl5/vendor_perl/MHA/SSHCheck.pm

將相關路徑加入PERL5LIB,(根本問題是MHA和OS版本不匹配)

export PERL5LIB=$PERL5LIB:/usr/lib/perl5/vendor_perl/

 

5. materha_check_repl檢測遇到的問題:

[root@manager ~]# masterha_check_repl -conf=/etc/mha_master/mha.cnf
Mon Mar 1 12:27:17 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Mar 1 12:27:17 2021 - [info] Reading application default configurations from /etc/mha_master/mha.cnf..
Mon Mar 1 12:27:17 2021 - [info] Reading server configurations from /etc/mha_master/mha.cnf..
Mon Mar 1 12:27:17 2021 - [info] MHA::MasterMonitor version 0.55.
Creating directory /etc/mha_master/app1.. done.
Mon Mar 1 12:27:17 2021 - [error][/usr/lib/perl5/vendor_perl//MHA/ServerManager.pm, ln255] Got MySQL error when connecting 192.168.10.30(192.168.10.30:3306) :1130:Host '192.168.10.10' is not allowed to connect to this MariaDB server, but this is not mysql crash. Check MySQL server settings.
at /usr/lib/perl5/vendor_perl//MHA/ServerManager.pm line 251.
Mon Mar 1 12:27:18 2021 - [error][/usr/lib/perl5/vendor_perl//MHA/ServerManager.pm, ln263] Got fatal error, stopping operations
Mon Mar 1 12:27:18 2021 - [error][/usr/lib/perl5/vendor_perl//MHA/MasterMonitor.pm, ln386] Error happend on checking configurations. at /usr/lib/perl5/vendor_perl//MHA/MasterMonitor.pm line 300.
Mon Mar 1 12:27:18 2021 - [error][/usr/lib/perl5/vendor_perl//MHA/MasterMonitor.pm, ln482] Error happened on monitoring servers.
Mon Mar 1 12:27:18 2021 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

檢查每個節點是不是都安裝了依賴包;

 

6. materha_check_repl檢測遇到的問題:

[root@manager ~]# masterha_check_repl -conf=/etc/mha_master/mha.cnf
Mon Mar 1 12:29:06 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Mar 1 12:29:06 2021 - [info] Reading application default configurations from /etc/mha_master/mha.cnf..
Mon Mar 1 12:29:06 2021 - [info] Reading server configurations from /etc/mha_master/mha.cnf..
Mon Mar 1 12:29:06 2021 - [info] MHA::MasterMonitor version 0.55.
Mon Mar 1 12:29:06 2021 - [error][/usr/lib/perl5/vendor_perl//MHA/ServerManager.pm, ln255] Got MySQL error when connecting 192.168.10.30(192.168.10.30:3306) :1045:Access denied for user 'mhaadmin'@'192.168.10.10' (using password: YES), but this is not mysql crash. Check MySQL server settings.
at /usr/lib/perl5/vendor_perl//MHA/ServerManager.pm line 251.
Mon Mar 1 12:29:07 2021 - [error][/usr/lib/perl5/vendor_perl//MHA/ServerManager.pm, ln263] Got fatal error, stopping operations
Mon Mar 1 12:29:07 2021 - [error][/usr/lib/perl5/vendor_perl//MHA/MasterMonitor.pm, ln386] Error happend on checking configurations. at /usr/lib/perl5/vendor_perl//MHA/MasterMonitor.pm line 300.
Mon Mar 1 12:29:07 2021 - [error][/usr/lib/perl5/vendor_perl//MHA/MasterMonitor.pm, ln482] Error happened on monitoring servers.
Mon Mar 1 12:29:07 2021 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

檢視每個MySQL中是不是都有對manager節點的授權了,如果有的話重新整理一下授權表;

 

7. materha_check_repl檢測遇到的問題:

[root@manager ~]# masterha_check_repl -conf=/etc/mha_master/mha.cnf
Mon Mar 1 12:34:06 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Mar 1 12:34:06 2021 - [info] Reading application default configurations from /etc/mha_master/mha.cnf..
Mon Mar 1 12:34:06 2021 - [info] Reading server configurations from /etc/mha_master/mha.cnf..
Mon Mar 1 12:34:06 2021 - [info] MHA::MasterMonitor version 0.55.
Mon Mar 1 12:34:07 2021 - [info] Dead Servers:
Mon Mar 1 12:34:07 2021 - [info] Alive Servers:
Mon Mar 1 12:34:07 2021 - [info] 192.168.10.20(192.168.10.20:3306)
Mon Mar 1 12:34:07 2021 - [info] 192.168.10.30(192.168.10.30:3306)
Mon Mar 1 12:34:07 2021 - [info] 192.168.10.40(192.168.10.40:3306)
Mon Mar 1 12:34:07 2021 - [info] Alive Slaves:
Mon Mar 1 12:34:07 2021 - [info] 192.168.10.30(192.168.10.30:3306) Version=5.5.68-MariaDB (oldest major version between slaves) log-bin:enabled
Mon Mar 1 12:34:07 2021 - [info] Replicating from 192.168.10.20(192.168.10.20:3306)
Mon Mar 1 12:34:07 2021 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Mar 1 12:34:07 2021 - [info] 192.168.10.40(192.168.10.40:3306) Version=5.5.68-MariaDB (oldest major version between slaves) log-bin:enabled
Mon Mar 1 12:34:07 2021 - [info] Replicating from 192.168.10.20(192.168.10.20:3306)
Mon Mar 1 12:34:07 2021 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Mar 1 12:34:07 2021 - [info] Current Alive Master: 192.168.10.20(192.168.10.20:3306)
Mon Mar 1 12:34:07 2021 - [info] Checking slave configurations..
Mon Mar 1 12:34:07 2021 - [info] Checking replication filtering settings..
Mon Mar 1 12:34:07 2021 - [info] binlog_do_db= , binlog_ignore_db=
Mon Mar 1 12:34:07 2021 - [info] Replication filtering check ok.
Mon Mar 1 12:34:07 2021 - [info] Starting SSH connection tests..
Mon Mar 1 12:34:10 2021 - [info] All SSH connection tests passed successfully.
Mon Mar 1 12:34:10 2021 - [info] Checking MHA Node version..
Mon Mar 1 12:34:10 2021 - [error][/usr/lib/perl5/vendor_perl//MHA/ManagerUtil.pm, ln122] Got error when getting node version. Error:
Mon Mar 1 12:34:10 2021 - [error][/usr/lib/perl5/vendor_perl//MHA/ManagerUtil.pm, ln123]
Can't locate MHA/BinlogManager.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /usr/bin/apply_diff_relay_logs line 24.
BEGIN failed--compilation aborted at /usr/bin/apply_diff_relay_logs line 24.
Mon Mar 1 12:34:10 2021 - [error][/usr/lib/perl5/vendor_perl//MHA/ManagerUtil.pm, ln151] node version on 192.168.10.30 not found! Maybe MHA Node package is not installed?
at /usr/lib/perl5/vendor_perl//MHA/MasterMonitor.pm line 346.
Mon Mar 1 12:34:10 2021 - [error][/usr/lib/perl5/vendor_perl//MHA/MasterMonitor.pm, ln386] Error happend on checking configurations. node version on 192.168.10.30 not found! Maybe MHA Node package is not installed?
at /usr/lib/perl5/vendor_perl//MHA/MasterMonitor.pm line 346.
...propagated at /usr/lib/perl5/vendor_perl//MHA/ManagerUtil.pm line 152.
Mon Mar 1 12:34:10 2021 - [error][/usr/lib/perl5/vendor_perl//MHA/MasterMonitor.pm, ln482] Error happened on monitoring servers.
Mon Mar 1 12:34:10 2021 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

解決辦法:在每個節點上設定軟連結:ln -s /usr/lib/perl5/vendor_perl/MHA /usr/lib64/perl5/vendor_perl/

 

8. materha_check_repl檢測遇到的問題:

[root@manager ~]# masterha_check_repl -conf=/etc/mha_master/mha.cnf
Mon Mar 1 15:16:22 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Mar 1 15:16:22 2021 - [info] Reading application default configurations from /etc/mha_master/mha.cnf..
Mon Mar 1 15:16:22 2021 - [info] Reading server configurations from /etc/mha_master/mha.cnf..
Mon Mar 1 15:16:22 2021 - [info] MHA::MasterMonitor version 0.55.
Mon Mar 1 15:16:23 2021 - [info] Dead Servers:
Mon Mar 1 15:16:23 2021 - [info] Alive Servers:
Mon Mar 1 15:16:23 2021 - [info] 192.168.10.20(192.168.10.20:3306)
Mon Mar 1 15:16:23 2021 - [info] 192.168.10.30(192.168.10.30:3306)
Mon Mar 1 15:16:23 2021 - [info] 192.168.10.40(192.168.10.40:3306)
Mon Mar 1 15:16:23 2021 - [info] Alive Slaves:
Mon Mar 1 15:16:23 2021 - [info] 192.168.10.30(192.168.10.30:3306) Version=5.5.68-MariaDB (oldest major version between slaves) log-bin:enabled
Mon Mar 1 15:16:23 2021 - [info] Replicating from 192.168.10.20(192.168.10.20:3306)
Mon Mar 1 15:16:23 2021 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Mar 1 15:16:23 2021 - [info] 192.168.10.40(192.168.10.40:3306) Version=5.5.68-MariaDB (oldest major version between slaves) log-bin:enabled
Mon Mar 1 15:16:23 2021 - [info] Replicating from 192.168.10.20(192.168.10.20:3306)
Mon Mar 1 15:16:23 2021 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Mar 1 15:16:23 2021 - [info] Current Alive Master: 192.168.10.20(192.168.10.20:3306)
Mon Mar 1 15:16:23 2021 - [info] Checking slave configurations..
Mon Mar 1 15:16:23 2021 - [info] Checking replication filtering settings..
Mon Mar 1 15:16:23 2021 - [info] binlog_do_db= , binlog_ignore_db=
Mon Mar 1 15:16:23 2021 - [info] Replication filtering check ok.
Mon Mar 1 15:16:23 2021 - [info] Starting SSH connection tests..
Mon Mar 1 15:16:25 2021 - [info] All SSH connection tests passed successfully.
Mon Mar 1 15:16:25 2021 - [info] Checking MHA Node version..
Mon Mar 1 15:16:26 2021 - [info] Version check ok.
Mon Mar 1 15:16:26 2021 - [info] Checking SSH publickey authentication settings on the current master..
Mon Mar 1 15:16:26 2021 - [info] HealthCheck: SSH to 192.168.10.20 is reachable.
Mon Mar 1 15:16:27 2021 - [info] Master MHA Node version is 0.54.
Mon Mar 1 15:16:27 2021 - [info] Checking recovery script configurations on the current master..
Mon Mar 1 15:16:27 2021 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/mydata/mha_masterapp1/save_binary_logs_test --manager_version=0.55 --start_file=mysql-bin.000002
Mon Mar 1 15:16:27 2021 - [info] Connecting to root@192.168.10.20(192.168.10.20)..
Creating /mydata/mha_masterapp1 if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /var/lib/mysql, up to mysql-bin.000002
Mon Mar 1 15:16:27 2021 - [info] Master setting check done.
Mon Mar 1 15:16:27 2021 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Mon Mar 1 15:16:27 2021 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mhaadmin' --slave_host=192.168.10.30 --slave_ip=192.168.10.30 --slave_port=3306 --workdir=/mydata/mha_masterapp1 --target_version=5.5.68-MariaDB --manager_version=0.55 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Mon Mar 1 15:16:27 2021 - [info] Connecting to root@192.168.10.30(192.168.10.30:22)..
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to mysql-relay-bin.000005
Temporary relay log file is /var/lib/mysql/mysql-relay-bin.000005
Testing mysql connection and privileges..ERROR 1045 (28000): Access denied for user 'mhaadmin'@'slave1' (using password: YES)
mysql command failed with rc 1:0!
at /usr/bin/apply_diff_relay_logs line 367.
main::check() called at /usr/bin/apply_diff_relay_logs line 486
eval {...} called at /usr/bin/apply_diff_relay_logs line 466
main::main() called at /usr/bin/apply_diff_relay_logs line 112
Mon Mar 1 15:16:28 2021 - [error][/usr/lib/perl5/vendor_perl//MHA/MasterMonitor.pm, ln195] Slaves settings check failed!
Mon Mar 1 15:16:28 2021 - [error][/usr/lib/perl5/vendor_perl//MHA/MasterMonitor.pm, ln375] Slave configuration failed.
Mon Mar 1 15:16:28 2021 - [error][/usr/lib/perl5/vendor_perl//MHA/MasterMonitor.pm, ln386] Error happend on checking configurations. at /usr/bin/masterha_check_repl line 48.
Mon Mar 1 15:16:28 2021 - [error][/usr/lib/perl5/vendor_perl//MHA/MasterMonitor.pm, ln482] Error happened on monitoring servers.
Mon Mar 1 15:16:28 2021 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

這裡可能是MHA自動識別主機名對主機名進行解析,在/etc/hosts下新增解析就行了;

 

9. 啟動MHA時候的報錯:

Mon Mar 1 16:03:14 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.10.20' (4))
Mon Mar 1 16:03:14 2021 - [warning] Connection failed 1 time(s)..
Mon Mar 1 16:03:15 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.10.20' (4))
Mon Mar 1 16:03:15 2021 - [warning] Connection failed 2 time(s)..
Mon Mar 1 16:03:16 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.10.20' (4))
Mon Mar 1 16:03:16 2021 - [warning] Connection failed 3 time(s)..
Mon Mar 1 16:03:18 2021 - [warning] HealthCheck: Got timeout on checking SSH connection to 192.168.10.20! at /usr/lib/perl5/vendor_perl//MHA/HealthCheck.pm line 298.
Mon Mar 1 16:03:18 2021 - [warning] Master is not reachable from health checker!
Mon Mar 1 16:03:18 2021 - [warning] Master 192.168.10.20(192.168.10.20:3306) is not reachable!
Mon Mar 1 16:03:18 2021 - [warning] SSH is NOT reachable.
Mon Mar 1 16:03:18 2021 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mha_master/mha.cnf again, and trying to connect to all servers to check server status..

這裡連線不上master節點,還是主機名解析的問題,新增/etc/hosts解析;

10. 切換master節點後,MHA會down掉,新增新節點後,需要重啟,新的從節點可能沒有對manager的授權,需要重新整理一下授權表;

 

相關文章