週五同事監控報警,有個MySQL從庫複製狀態異常,讓我幫忙排查下,經過排查發現是MySQL5.6並行複製的一個Bug所致,具體處理過程如下:
一、錯誤資訊
登入mysql從庫伺服器,檢查複製狀態
mysql> show slave status \G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: xx.xx.xx.xx Master_User: repuser Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.001689 Read_Master_Log_Pos: 145325388 Relay_Log_File: mysql2-relay-bin.004390 Relay_Log_Pos: 147850306 Relay_Master_Log_File: mysql-bin.001673 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 1755 Last_Error: Cannot execute the current event group in the parallel mode. Encountered event Format_desc, relay-log name ./mysql2-relay-bin.004392, position 238 which prevents execution of this event group in parallel mode. Reason: possible malformed group of events from an old master. Skip_Counter: 0 Exec_Master_Log_Pos: 147850096 Relay_Log_Space: 4440639819 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 1755 Last_SQL_Error: Cannot execute the current event group in the parallel mode. Encountered event Format_desc, relay-log name ./mysql2-relay-bin.004392, position 238 which prevents execution of this event group in parallel mode. Reason: possible malformed group of events from an old master. Replicate_Ignore_Server_Ids: Master_Server_Id: 46 Master_UUID: f6d575b8-6a5b-11e5-8313-0050568d23c9 Master_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: 180314 01:01:17 Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: f6d575b8-6a5b-11e5-8313-0050568d23c9:9393159-70288025 Executed_Gtid_Set: 33a835bb-6a5a-11e5-8308-0050568d685b:1-146, f6d575b8-6a5b-11e5-8313-0050568d23c9:1-69584557 Auto_Position: 1 1 row in set (0.01 sec)
mysql錯誤日誌內容如下
018-03-14 01:01:16 26349 [Note] Event Scheduler: Loaded 0 events 2018-03-14 01:01:16 26349 [Note] /data/mysql/bin/mysqld: ready for connections. Version: '5.6.16-log' socket: '/tmp/mysql.sock' port: 3306 Source distribution 2018-03-14 01:01:17 26349 [ERROR] Slave SQL: Cannot execute the current event group in the parallel mode. Encountered event Format_desc, relay-log name ./mysql2-relay-bin.004392, position 238 which prevents execution of this event group in parallel mode. Reason: possible malformed group of events from an old master. Error_code: 1755 2018-03-14 01:01:17 26349 [Warning] Slave: Cannot execute the current event group in the parallel mode. Encountered event Format_desc, relay-log name ./mysql2-relay-bin.004392, position 238 which prevents execution of this event group in parallel mode. Reason: possible malformed group of events from an old master. Error_code: 1755 2018-03-14 01:01:17 26349 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.001673' position 147850096
二、錯誤原因
從錯誤日誌字面意思理解,出現這個錯誤是因為資料庫開啟了並行複製,一些event group在並行複製下,不支援從而導致複製失敗,主從狀態異常。
檢查資料庫版本:
mysql> select version(); +------------+ | version() | +------------+ | 5.6.16-log | +------------+ 1 row in set (0.07 sec)
檢查資料庫並行複製資訊:
mysql> show variables like '%parall%'; +------------------------+-------+ | Variable_name | Value | +------------------------+-------+ | slave_parallel_workers | 1 | +------------------------+-------+ 1 row in set (0.05 sec)
官方搜尋錯誤資訊,發現此錯誤是一bug,具體bug編號為Bug #71495、Bug #72537
https://bugs.mysql.com/bug.php?id=71495
https://bugs.mysql.com/bug.php?id=72537
三、解決方案
1、既然是bug,通過分析binlog內容,然後跳過的方式跳過事物
(非gtid模式,通過set sql_slave_skip_counter=1;跳過事務)
主庫分析binlog:
mysqlbinlog -vv --base64-output='decode-rows' --start-position=147850096 --stop-position=147850096 mysql-bin.001674 >a.sql
SHOW BINLOG EVENTS in 'mysql-bin.001673' from 147850096 limit 10;
從庫跳過事物(gtid模式)
stop slave;
SET @@SESSION.GTID_NEXT= 'f6d575b8-6a5b-11e5-8313-0050568d23c9:9393159-70288025';
begin;
commit;
SET SESSION GTID_NEXT = AUTOMATIC;
start slave;
2、通過關閉並行複製來恢復(最佳方法)
stop slave;
set global slave_parallel_workers = 0 ;
start slave;
( 更改my.cnf中並行複製引數slave_parallel_workers = 0)
3、如果上述方法都不行,可以使用重新部署從庫的方式來恢復,恢復後關閉並行複製(5.6並行複製存在bug,不推薦使用)
如果使用mysqldump備份,最好指定引數--set-gtid-purged=off,否則需要充值gtid
主庫備份資料庫:
innobackupexp方式:innobackupex --socket=/tmp/mysql.sock /20180316/
從庫恢復資料庫:
innobackupex --redo-only --apply-log /soft/2018-03-16_17-45-30
innobackupex --copy-back /soft/2018-03-16_17-45-30
檢視事物應用binlog資訊:cat xtrabackup_info
binlog_pos = filename 'mysql-bin.001690', position 46997344, GTID of the last change 'f6d575b8-6a5b-11e5-8313-0050568d23c9:1-70316241'
innodb_from_lsn = 0
innodb_to_lsn = 84446557453
設定從庫gtid資訊
set global gtid_purged='f6d575b8-6a5b-11e5-8313-0050568d23c9:1-70316241';
重新部署從庫:
change master to master_host='xx.xx.xx.xx',master_user='usernamer',master_password='password',MASTER_AUTO_POSITION=84446557453;
(上述使用了reset master,否則可以直接指定從MASTER_AUTO_POSITION=1開始)