Preface
In my last test of pt-heartbeat,both of master and slave were out of disk.And the mysql client was hang.In order to resolve the issue,I`ve tryed to fix the replicaiton environment without using mysqldump to reconfigure the slave.Let`s see the details.
Procedure
I dropped test tables in database “sysbench” to release the disk space on master.
1 [root@zlm2 08:56:51 /data/mysql/mysql3306/logs] 2 #sysbench oltp_read_write.lua --mysql-host=192.168.1.101 --mysql-port=3306 --mysql-user=zlm --mysql-password=zlmzlm --mysql-db=sysbench --tables=1 --table-size=10000000 --mysql-storage-engine=innodb cleanup 3 sysbench 1.0.15 (using bundled LuaJIT 2.1.0-beta2) 4 5 Dropping table `sbtest1`... 6 7 (zlm@192.168.1.101 3306)[(none)]>use sysbench; 8 Reading table information for completion of table and column names 9 You can turn off this feature to get a quicker startup with -A 10 11 Database changed 12 (zlm@192.168.1.101 3306)[sysbench]>show tables; 13 +--------------------+ 14 | Tables_in_sysbench | 15 +--------------------+ 16 | hb | 17 | sbtest2 | 18 | sbtest3 | 19 | sbtest4 | 20 | sbtest5 | 21 +--------------------+ 22 5 rows in set (0.00 sec) 23 24 //Only sbtest1 was deleted.It`s not enough. 25 26 [root@zlm2 08:59:04 ~/sysbench-1.0/src/lua] 27 #sysbench oltp_read_write.lua --mysql-host=192.168.1.101 --mysql-port=3306 --mysql-user=zlm --mysql-password=zlmzlm --mysql-db=sysbench --tables=5 --table-size=10000000 --mysql-storage-engine=innodb cleanup 28 sysbench 1.0.15 (using bundled LuaJIT 2.1.0-beta2) 29 30 Dropping table `sbtest1`... 31 Dropping table `sbtest2`... 32 Dropping table `sbtest3`... 33 Dropping table `sbtest4`... 34 Dropping table `sbtest5`... 35 36 [root@zlm2 08:59:09 ~/sysbench-1.0/src/lua] 37 #df -h 38 Filesystem Size Used Avail Use% Mounted on 39 /dev/mapper/centos-root 8.4G 6.9G 1.5G 83% / //I`d got 27% free space. 40 devtmpfs 488M 0 488M 0% /dev 41 tmpfs 497M 0 497M 0% /dev/shm 42 tmpfs 497M 6.6M 491M 2% /run 43 tmpfs 497M 0 497M 0% /sys/fs/cgroup 44 /dev/sda1 497M 118M 379M 24% /boot 45 none 87G 80G 6.6G 93% /vagrant 46 47 (zlm@192.168.1.101 3306)[(none)]>drop database sysbench; 48 Query OK, 1 row affected (0.04 sec) 49 50 //Further more,I dropped the "sysbench".
The slave hung still and disk space was full.
1 [root@zlm3 08:55:17 ~] 2 #df -h 3 Filesystem Size Used Avail Use% Mounted on 4 /dev/mapper/centos-root 8.4G 8.4G 20K 100% / 5 devtmpfs 488M 0 488M 0% /dev 6 tmpfs 497M 0 497M 0% /dev/shm 7 tmpfs 497M 6.5M 491M 2% /run 8 tmpfs 497M 0 497M 0% /sys/fs/cgroup 9 /dev/sda1 497M 118M 379M 24% /boot 10 none 87G 80G 6.6G 93% /vagrant 11 12 (zlm@192.168.1.102 3306)[(none)]>show slave statusG 13 ^C^C -- query aborted 14 15 16 ^Z 17 [6]+ Stopped mysql 18 19 [root@zlm3 08:55:58 ~] 20 #pkill mysqld 21 22 [root@zlm3 08:56:04 ~] 23 #./mysqld.sh 24 25 [root@zlm3 08:56:07 ~] 26 #mysql 27 ERROR 2003 (HY000): Can`t connect to MySQL server on `192.168.1.102` (111) 28 29 [root@zlm3 09:00:35 ~] 30 #cd /data/mysql/mysql3306/data 31 32 [root@zlm3 09:00:46 /data/mysql/mysql3306/data] 33 #cat error.log |tail -n 30 34 2018-07-19T08:57:02.581937+01:00 0 [Note] InnoDB: Log scan progressed past the checkpoint lsn 17039436409 35 2018-07-19T08:57:02.581958+01:00 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 17039436418 36 2018-07-19T08:57:02.581963+01:00 0 [Note] InnoDB: Database was not shutdown normally! 37 2018-07-19T08:57:02.581965+01:00 0 [Note] InnoDB: Starting crash recovery. 38 2018-07-19T08:57:02.696292+01:00 0 [Note] InnoDB: Transaction 31700139 was in the XA prepared state. 39 2018-07-19T08:57:02.700688+01:00 0 [Note] InnoDB: Transaction 31700139 was in the XA prepared state. 40 2018-07-19T08:57:02.700814+01:00 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo 41 2018-07-19T08:57:02.700821+01:00 0 [Note] InnoDB: Trx id counter is 31700480 42 2018-07-19T08:57:02.701719+01:00 0 [Note] InnoDB: Last MySQL binlog file position 0 99139927, file name mysql-bin.000016 43 2018-07-19T08:57:02.805965+01:00 0 [Note] InnoDB: Ignoring tablespace `zlm`.`sbtest2` because the DISCARD flag is set . 44 2018-07-19T08:57:02.806462+01:00 0 [Note] InnoDB: Creating shared tablespace for temporary tables 45 2018-07-19T08:57:02.807316+01:00 0 [Note] InnoDB: Setting file `./ibtmp1` size to 12 MB. Physically writing the file full; Please wait ... 46 2018-07-19T08:57:02.807568+01:00 0 [Note] InnoDB: Starting in background the rollback of uncommitted transactions 47 2018-07-19T08:57:02.807594+01:00 0 [Note] InnoDB: Rollback of non-prepared transactions completed 48 2018-07-19T08:57:02.871396+01:00 0 [Warning] InnoDB: Retry attempts for writing partial data failed. 49 2018-07-19T08:57:02.871423+01:00 0 [ERROR] InnoDB: Write to file ./ibtmp1failed at offset 11534336, 1048576 bytes should have been written, only 0 were written. Operating system error number 28. Check that your OS and file system support files of this size. Check also that the disk is not full or a disk quota exceeded. 50 2018-07-19T08:57:02.871441+01:00 0 [ERROR] InnoDB: Error number 28 means `No space left on device` 51 2018-07-19T08:57:02.871446+01:00 0 [Note] InnoDB: Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/operating-system-error-codes.html 52 2018-07-19T08:57:02.871451+01:00 0 [ERROR] InnoDB: Could not set the file size of `./ibtmp1`. Probably out of disk space 53 2018-07-19T08:57:02.871456+01:00 0 [ERROR] InnoDB: Unable to create the shared innodb_temporary 54 2018-07-19T08:57:02.871459+01:00 0 [ERROR] InnoDB: Plugin initialization aborted with error Generic error 55 2018-07-19T08:57:03.273011+01:00 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1" 56 2018-07-19T08:57:03.273029+01:00 0 [ERROR] Plugin `InnoDB` init function returned error. 57 2018-07-19T08:57:03.273033+01:00 0 [ERROR] Plugin `InnoDB` registration as a STORAGE ENGINE failed. 58 2018-07-19T08:57:03.273037+01:00 0 [ERROR] Failed to initialize builtin plugins. 59 2018-07-19T08:57:03.273040+01:00 0 [ERROR] Aborting 60 61 2018-07-19T08:57:03.273046+01:00 0 [Note] Binlog end 62 2018-07-19T08:57:03.273389+01:00 0 [Note] mysqld: Shutdown complete 63 64 //The mysqld process could not run again because of no free disk space.
I decided to drop all the binlogs on slave to release the disk space.
1 [root@zlm3 09:01:34 /data/mysql/mysql3306] 2 #cd logs 3 4 [root@zlm3 09:01:35 /data/mysql/mysql3306/logs] 5 #ls -l 6 total 2691436 7 -rw-r----- 1 mysql mysql 4333745 Jul 13 11:07 mysql-bin.000001 8 -rw-r----- 1 mysql mysql 653 Jul 16 09:28 mysql-bin.000002 9 -rw-r----- 1 mysql mysql 268435663 Jul 16 10:03 mysql-bin.000003 10 -rw-r----- 1 mysql mysql 268435798 Jul 16 10:22 mysql-bin.000004 11 -rw-r----- 1 mysql mysql 268435798 Jul 16 10:32 mysql-bin.000005 12 -rw-r----- 1 mysql mysql 268435798 Jul 16 10:48 mysql-bin.000006 13 -rw-r----- 1 mysql mysql 268435798 Jul 16 11:29 mysql-bin.000007 14 -rw-r----- 1 mysql mysql 35214498 Jul 16 11:59 mysql-bin.000008 15 -rw-r----- 1 mysql mysql 249 Jul 16 11:59 mysql-bin.000009 16 -rw-r----- 1 mysql mysql 199908536 Jul 18 11:27 mysql-bin.000010 17 -rw-r----- 1 mysql mysql 249 Jul 18 11:27 mysql-bin.000011 18 -rw-r----- 1 mysql mysql 268474318 Jul 19 04:20 mysql-bin.000012 19 -rw-r----- 1 mysql mysql 268817026 Jul 19 04:24 mysql-bin.000013 20 -rw-r----- 1 mysql mysql 268802986 Jul 19 04:28 mysql-bin.000014 21 -rw-r----- 1 mysql mysql 268826152 Jul 19 04:32 mysql-bin.000015 22 -rw-r----- 1 mysql mysql 99422208 Jul 19 08:56 mysql-bin.000016 23 -rw-r----- 1 mysql mysql 704 Jul 19 04:32 mysql-bin.index 24 25 [root@zlm3 09:01:36 /data/mysql/mysql3306/logs] 26 #rm -f * 27 28 [root@zlm3 09:01:38 /data/mysql/mysql3306/logs] 29 #ls -l 30 total 0 31 32 [root@zlm3 09:01:48 ~] 33 #df -h 34 Filesystem Size Used Avail Use% Mounted on 35 /dev/mapper/centos-root 8.4G 4.5G 4.0G 53% / //The free disk space had been reduced to 47%. 36 devtmpfs 488M 0 488M 0% /dev 37 tmpfs 497M 0 497M 0% /dev/shm 38 tmpfs 497M 6.5M 491M 2% /run 39 tmpfs 497M 0 497M 0% /sys/fs/cgroup 40 /dev/sda1 497M 118M 379M 24% /boot 41 none 87G 80G 6.6G 93% /vagrant
Ran the mysqld again and dropped the database “sysbench” on slave.
1 [root@zlm3 09:01:42 /data/mysql/mysql3306/logs] 2 #sh /root/mysqld.sh 3 4 [root@zlm3 09:01:51 /data/mysql/mysql3306/logs] 5 #ps aux|grep mysqld 6 mysql 4346 7.0 17.8 1109984 181576 pts/0 Sl 09:01 0:00 mysqld --defaults-file=/data/mysql/mysql3306/my.cnf 7 root 4380 0.0 0.0 112640 960 pts/0 R+ 09:01 0:00 grep --color=auto mysqld 8 9 [root@zlm3 09:01:55 /data/mysql/mysql3306/logs] 10 #mysql 11 Welcome to the MySQL monitor. Commands end with ; or g. 12 Your MySQL connection id is 2 13 Server version: 5.7.21-log MySQL Community Server (GPL) 14 15 Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved. 16 17 Oracle is a registered trademark of Oracle Corporation and/or its 18 affiliates. Other names may be trademarks of their respective 19 owners. 20 21 Type `help;` or `h` for help. Type `c` to clear the current input statement. 22 23 (zlm@192.168.1.102 3306)[(none)]>drop database sysbench; 24 Query OK, 6 rows affected (0.11 sec)
Started the replication threads of slave.
1 (zlm@192.168.1.102 3306)[(none)]>start slave; 2 Query OK, 0 rows affected (0.00 sec) 3 4 (zlm@192.168.1.102 3306)[(none)]>show slave statusG 5 *************************** 1. row *************************** 6 Slave_IO_State: Waiting for master to send event 7 Master_Host: 192.168.1.101 8 Master_User: repl 9 Master_Port: 3306 10 Connect_Retry: 60 11 Master_Log_File: mysql-bin.000019 12 Read_Master_Log_Pos: 155765801 13 Relay_Log_File: relay-bin.000068 14 Relay_Log_Pos: 398 15 Relay_Master_Log_File: mysql-bin.000019 16 Slave_IO_Running: Yes 17 Slave_SQL_Running: No 18 Replicate_Do_DB: 19 Replicate_Ignore_DB: 20 Replicate_Do_Table: 21 Replicate_Ignore_Table: 22 Replicate_Wild_Do_Table: 23 Replicate_Wild_Ignore_Table: 24 Last_Errno: 1146 25 Last_Error: Error executing row event: `Table `sysbench.sbtest1` doesn`t exist` 26 Skip_Counter: 0 27 Exec_Master_Log_Pos: 99148246 28 Relay_Log_Space: 56618146 29 Until_Condition: None 30 Until_Log_File: 31 Until_Log_Pos: 0 32 Master_SSL_Allowed: No 33 Master_SSL_CA_File: 34 Master_SSL_CA_Path: 35 Master_SSL_Cert: 36 Master_SSL_Cipher: 37 Master_SSL_Key: 38 Seconds_Behind_Master: NULL 39 Master_SSL_Verify_Server_Cert: No 40 Last_IO_Errno: 0 41 Last_IO_Error: 42 Last_SQL_Errno: 1146 43 Last_SQL_Error: Error executing row event: `Table `sysbench.sbtest1` doesn`t exist` //Since the database had been droppted.This error was notable. 44 Replicate_Ignore_Server_Ids: 45 Master_Server_Id: 1013306 46 Master_UUID: 1b7181ee-6eaf-11e8-998e-080027de0e0e 47 Master_Info_File: mysql.slave_master_info 48 SQL_Delay: 0 49 SQL_Remaining_Delay: NULL 50 Slave_SQL_Running_State: 51 Master_Retry_Count: 86400 52 Master_Bind: 53 Last_IO_Error_Timestamp: 54 Last_SQL_Error_Timestamp: 180719 09:02:07 55 Master_SSL_Crl: 56 Master_SSL_Crlpath: 57 Retrieved_Gtid_Set: 1b7181ee-6eaf-11e8-998e-080027de0e0e:3714549-3730021 //It was stuck on transaction 3714549(which contained error). 58 Executed_Gtid_Set: 1b7181ee-6eaf-11e8-998e-080027de0e0e:1-3714548, 59 5c77c31b-4add-11e8-81e2-080027de0e0e:1-168 60 Auto_Position: 1 61 Replicate_Rewrite_DB: 62 Channel_Name: 63 Master_TLS_Version: 64 1 row in set (0.00 sec) 65 66 [root@zlm3 09:10:42 ~] 67 #perror 1146 68 MySQL error code 1146 (ER_NO_SUCH_TABLE): Table `%-.192s.%-.192s` doesn`t exist 69 70 //Error 1146 indicated the absence of table "sbtest1" in "sysbench" database. 71 //Obviously,the slave was replaying the operations relevant to this table on master.The table even the database had been dropped. 72 //How could I do next step?Do I have to generate a new mysqldump file and reconfigure the slave again? 73 //There`s One thing I`m rather sure that there were no other transactions generated in the whole course except the operations on "sysbench" database. 74 //Since I`d drop "sysbentch" database on both master and slave.Maybe I can fix the issue easily.
Checked the Executed_Gtid_Set on master.
1 (zlm@192.168.1.101 3306)[(none)]>show master status; 2 +------------------+-----------+--------------+------------------+------------------------------------------------+ 3 | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | 4 +------------------+-----------+--------------+------------------+------------------------------------------------+ 5 | mysql-bin.000019 | 155765801 | | | 1b7181ee-6eaf-11e8-998e-080027de0e0e:1-3730021 | 6 +------------------+-----------+--------------+------------------+------------------------------------------------+ 7 1 row in set (0.00 sec) 8 9 //The executed gtid was upto "3730021".
Tryed to fix the replica of master.
1 (zlm@192.168.1.102 3306)[(none)]>show slave statusG 2 *************************** 1. row *************************** 3 Slave_IO_State: Waiting for master to send event 4 Master_Host: 192.168.1.101 5 Master_User: repl 6 Master_Port: 3306 7 Connect_Retry: 60 8 Master_Log_File: mysql-bin.000019 9 Read_Master_Log_Pos: 155765801 10 Relay_Log_File: relay-bin.000068 11 Relay_Log_Pos: 398 12 Relay_Master_Log_File: mysql-bin.000019 13 Slave_IO_Running: Yes 14 Slave_SQL_Running: No 15 Replicate_Do_DB: 16 Replicate_Ignore_DB: 17 Replicate_Do_Table: 18 Replicate_Ignore_Table: 19 Replicate_Wild_Do_Table: 20 Replicate_Wild_Ignore_Table: 21 Last_Errno: 1146 22 Last_Error: Error executing row event: `Table `sysbench.sbtest1` doesn`t exist` 23 Skip_Counter: 0 24 Exec_Master_Log_Pos: 99148246 25 Relay_Log_Space: 56618627 26 Until_Condition: None 27 Until_Log_File: 28 Until_Log_Pos: 0 29 Master_SSL_Allowed: No 30 Master_SSL_CA_File: 31 Master_SSL_CA_Path: 32 Master_SSL_Cert: 33 Master_SSL_Cipher: 34 Master_SSL_Key: 35 Seconds_Behind_Master: NULL 36 Master_SSL_Verify_Server_Cert: No 37 Last_IO_Errno: 0 38 Last_IO_Error: 39 Last_SQL_Errno: 1146 40 Last_SQL_Error: Error executing row event: `Table `sysbench.sbtest1` doesn`t exist` 41 Replicate_Ignore_Server_Ids: 42 Master_Server_Id: 1013306 43 Master_UUID: 1b7181ee-6eaf-11e8-998e-080027de0e0e 44 Master_Info_File: mysql.slave_master_info 45 SQL_Delay: 0 46 SQL_Remaining_Delay: NULL 47 Slave_SQL_Running_State: 48 Master_Retry_Count: 86400 49 Master_Bind: 50 Last_IO_Error_Timestamp: 51 Last_SQL_Error_Timestamp: 180719 09:02:57 52 Master_SSL_Crl: 53 Master_SSL_Crlpath: 54 Retrieved_Gtid_Set: 1b7181ee-6eaf-11e8-998e-080027de0e0e:3714549-3730021 55 Executed_Gtid_Set: 56 Auto_Position: 1 57 Replicate_Rewrite_DB: 58 Channel_Name: 59 Master_TLS_Version: 60 1 row in set (0.00 sec) 61 62 (zlm@192.168.1.102 3306)[(none)]>reset master; 63 Query OK, 0 rows affected (0.02 sec) 64 65 (zlm@192.168.1.102 3306)[(none)]>set @@global.gtid_purged=`1b7181ee-6eaf-11e8-998e-080027de0e0e:1-3730021`; 66 Query OK, 0 rows affected (0.00 sec) 67 68 //On account of surely knowing there were no other transactions at all.I set the "gtid_purged" variable to the value of "gtid_executed" on master. 69 //It means I guised that all the transactions generated on master had been replayed on slave already.The slave could retrieve new GTID at the moment. 70 71 (zlm@192.168.1.102 3306)[(none)]>start slave sql_thread; 72 Query OK, 0 rows affected (0.02 sec) 73 74 (zlm@192.168.1.102 3306)[(none)]>show slave statusG 75 *************************** 1. row *************************** 76 Slave_IO_State: Waiting for master to send event 77 Master_Host: 192.168.1.101 78 Master_User: repl 79 Master_Port: 3306 80 Connect_Retry: 60 81 Master_Log_File: mysql-bin.000019 82 Read_Master_Log_Pos: 155765801 83 Relay_Log_File: relay-bin.000069 84 Relay_Log_Pos: 438 85 Relay_Master_Log_File: mysql-bin.000019 86 Slave_IO_Running: Yes 87 Slave_SQL_Running: Yes //The sql_thread became "Yes". 88 Replicate_Do_DB: 89 Replicate_Ignore_DB: 90 Replicate_Do_Table: 91 Replicate_Ignore_Table: 92 Replicate_Wild_Do_Table: 93 Replicate_Wild_Ignore_Table: 94 Last_Errno: 0 95 Last_Error: 96 Skip_Counter: 0 97 Exec_Master_Log_Pos: 155765801 98 Relay_Log_Space: 56618434 99 Until_Condition: None 100 Until_Log_File: 101 Until_Log_Pos: 0 102 Master_SSL_Allowed: No 103 Master_SSL_CA_File: 104 Master_SSL_CA_Path: 105 Master_SSL_Cert: 106 Master_SSL_Cipher: 107 Master_SSL_Key: 108 Seconds_Behind_Master: 0 109 Master_SSL_Verify_Server_Cert: No 110 Last_IO_Errno: 0 111 Last_IO_Error: 112 Last_SQL_Errno: 0 113 Last_SQL_Error: 114 Replicate_Ignore_Server_Ids: 115 Master_Server_Id: 1013306 116 Master_UUID: 1b7181ee-6eaf-11e8-998e-080027de0e0e 117 Master_Info_File: mysql.slave_master_info 118 SQL_Delay: 0 119 SQL_Remaining_Delay: NULL 120 Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates 121 Master_Retry_Count: 86400 122 Master_Bind: 123 Last_IO_Error_Timestamp: 124 Last_SQL_Error_Timestamp: 125 Master_SSL_Crl: 126 Master_SSL_Crlpath: 127 Retrieved_Gtid_Set: 1b7181ee-6eaf-11e8-998e-080027de0e0e:3714549-3730021 128 Executed_Gtid_Set: 1b7181ee-6eaf-11e8-998e-080027de0e0e:1-3730021 //The slave had skipped those GTID(which contained error 1146) of master and waited for newer GTID.The replica had been fixed up. 129 Auto_Position: 1 130 Replicate_Rewrite_DB: 131 Channel_Name: 132 Master_TLS_Version: 133 1 row in set (0.00 sec)
Summary
- The variable “gtid_purged” cannot be set if “gtid_executed” is not empty.
- Caution,”reset master” can only be used on slave.Keep in mind that don`t do it on master anytime.
- This case can be followed only in test environment `cause you cannot guarantee whether all the transactions are really replayed on slave.