手動註冊binlog檔案造成主從異常

gaopengtttt發表於2019-05-21

原文網址 : http://blog.itpub.net/7728585/viewspace-2645114/

一、問題來源

有一個朋友@水米田問我，基於POSITION的主從。他做了如下的操作

將備份的一些binlog檔案加入到了目錄中
修改index檔案，加入了這些binlog檔案
flush binary logs

然後整個主從環境大量延遲。

二、朋友的測試

下面是另外一個朋友@徐晨亮的測試：

master上：
(root:db1@xucl:10:30:22)[(none)]> show binary logs;
+---------------------+-----------+| Log_name            | File_size |
+---------------------+-----------+
| mysql-binlog.000031 |      1019 || mysql-binlog.000032 |       424 |
| mysql-binlog.000033 |       244 || mysql-binlog.000034 |      2332 |
| mysql-binlog.000035 |      2134 || mysql-binlog.000036 |    845915 |
| mysql-binlog.000037 |     11735 || mysql-binlog.000038 |       284 |
| mysql-binlog.000039 |       284 || mysql-binlog.000040 |       284 |
| mysql-binlog.000041 |       284 || mysql-binlog.000042 |       234 |
+---------------------+-----------+
12 rows in set (0.00 sec)
(root:db1@xucl:10:30:34)[(none)]> purge binary logs to 'mysql-binlog.000039';
Query OK, 0 rows affected (0.00 sec)
(root:db1@xucl:10:30:49)[(none)]> show binary logs;
+---------------------+-----------+
| Log_name            | File_size |+---------------------+-----------+| mysql-binlog.000039 |       284 |
| mysql-binlog.000040 |       284 || mysql-binlog.000041 |       284 |
| mysql-binlog.000042 |       234 |+---------------------+-----------+4 rows in set (0.00 sec)
執行插入資料
(root:db1@xucl:10:31:23)[xucl]> insert into t values(9,9);
將備份的binlog拷貝回原先的目錄並修改index檔案進行註冊
[root@izbp12nspj47ypto9t6vyez logs]# ll總用量 884-rw-r----- 1 mysql mysql   1019 5月  20 22:03 mysql-binlog.000031-rw-r----- 1 mysql mysql    424 5月  20 22:03 mysql-binlog.000032-rw-r----- 1 mysql mysql    244 5月  20 22:03 mysql-binlog.000033-rw-r----- 1 mysql mysql   2332 5月  20 22:03 mysql-binlog.000034-rw-r----- 1 mysql mysql   2134 5月  20 22:03 mysql-binlog.000035-rw-r----- 1 mysql mysql 845915 5月  20 22:03 mysql-binlog.000036-rw-r----- 1 mysql mysql  11735 5月  20 22:05 mysql-binlog.000037-rw-r----- 1 mysql mysql    284 5月  20 22:06 mysql-binlog.000038
-rw-r----- 1 mysql mysql    284 5月  21 10:28 mysql-binlog.000039
-rw-r----- 1 mysql mysql    284 5月  21 10:28 mysql-binlog.000040-rw-r----- 1 mysql mysql    284 5月  21 10:28 mysql-binlog.000041-rw-r----- 1 mysql mysql    491 5月  21 10:31 mysql-binlog.000042-rw-r----- 1 mysql mysql    204 5月  21 10:30 mysql-binlog.index
主庫flush binary logs
(root:db1@xucl:10:32:51)[(none)]> flush binary logs;
Query OK, 0 rows affected (0.01 sec)
(root:db1@xucl:10:32:57)[(none)]> show binary logs;
+---------------------+-----------+| Log_name            | File_size |
+---------------------+-----------+
| mysql-binlog.000031 |      1019 || mysql-binlog.000032 |       424 |
| mysql-binlog.000033 |       244 || mysql-binlog.000034 |      2332 |
| mysql-binlog.000035 |      2134 || mysql-binlog.000036 |    845915 |
| mysql-binlog.000037 |     11735 || mysql-binlog.000038 |       284 |
| mysql-binlog.000039 |       284 || mysql-binlog.000040 |       284 |
| mysql-binlog.000041 |       284 || mysql-binlog.000042 |       541 |
| mysql-binlog.000043 |       234 |+---------------------+-----------+13 rows in set (0.00 sec)
此時，slave報錯如下：
(root:db1@xucl:10:31:05)[(none)]> show slave status\G
*************************** 1. row ***************************               Slave_IO_State:
                  Master_Host: 127.0.0.1
                  Master_User: repl                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-binlog.000035
          Read_Master_Log_Pos: 194
               Relay_Log_File: izbp12nspj47ypto9t6vyez-relay-bin.000011
                Relay_Log_Pos: 373
        Relay_Master_Log_File: mysql-binlog.000035
             Slave_IO_Running: No            Slave_SQL_Running: Yes              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 194
              Relay_Log_Space: 648
              Until_Condition: None               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULLMaster_SSL_Verify_Server_Cert: No                Last_IO_Errno: 1236
                Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Cannot replicate GTID-transaction when @@GLOBAL.GTID_MODE = OFF, at file /storage/single/mysql3306/logs/mysql-binlog.000035, position 194.; the first event 'mysql-binlog.000039' at 234, the last event read from '/storage/single/mysql3306/logs/mysql-binlog.000035' at 259, the last byte read from '/storage/single/mysql33'
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 3306
                  Master_UUID: e8bdf01a-c79b-11e8-82b3-00163e088352
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp: 190521 10:32:57
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set:
            Executed_Gtid_Set: 4c423515-6661-11e9-b767-00163e088352:1-7,
e8bdf01a-c79b-11e8-82b3-00163e088352:1-57192
                Auto_Position: 0
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:
1 row in set (0.00 sec)
從slave上的報錯來看，在主庫flush binary logs後，從庫又讀取註冊的binlog並且又apply了

三、現象說明

從整個測試來看，MySQL視乎將手動註冊的檔案進行了傳輸和應用。報錯是因為這個庫以前是GITD_MODE=ON的，但是測試的時候已經關閉了GTID_MODE，改為了POSITION的模式，這個報錯是因為DUMP執行緒會進行檢測：

image.png

這個圖來自我新寫的一個系列（暫時還沒釋出，大概年底才能寫好）。不管怎麼說，DUMP執行緒已經在開始傳輸老的binlog檔案了。那麼原因是什麼呢？下面我們將進行解釋。

四、flush binary logs對binlog的操作

flush binary logs 將包含如下操作：

獲取新的binlog檔名字
關閉舊的binlog
關閉index file
開啟index file
開啟新的binlog
將新的binlog加入到indexfile

具體可以參考函式MYSQL_BIN_LOG::new_file_impl。其中要說明一下獲取新的binlog檔名字是通過函式find_uniq_filename實現的，其中包含如下程式碼：

  file_info= dir_info->dir_entry;  for (i= dir_info->number_off_files ; i-- ; file_info++)
  {    if (strncmp(file_info->name, start, length) == 0 &&
    is_number(file_info->name+length, &number,0))
    {
      set_if_bigger(max_found, number);
    }
  }
...
 *next= (need_next || max_found == 0) ? max_found + 1 : max_found;

大概意思就是掃描index file檔案裡面的binlog檔案，獲取其序號最高的一個，然後加1。棧幀如下：

#0  find_uniq_filename (name=0x7fffec5ec6d0 "/mysqldata/mysql3340/log/binlog", next=0x7fffec5ec678, need_next=true)
    at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:3679#1  0x000000000187d208 in generate_new_log_name (new_name=0x7fffec5ec6d0 "/mysqldata/mysql3340/log/binlog", new_ext=0x0, 
    log_name=0x7ffedc011520 "/mysqldata/mysql3340/log/binlog", is_binlog=true) at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:3767#2  0x0000000001884fdb in MYSQL_BIN_LOG::new_file_impl (this=0x2e83640, need_lock_log=false, extra_description_event=0x0)
    at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:7175#3  0x0000000001884cbb in MYSQL_BIN_LOG::new_file_without_locking (this=0x2e83640, extra_description_event=0x0)
    at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:7099#4  0x0000000001886b6b in MYSQL_BIN_LOG::rotate (this=0x2e83640, force_rotate=true, check_purge=0x7fffec5ecbfb)
    at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:7775#5  0x0000000001886d53 in MYSQL_BIN_LOG::rotate_and_purge (this=0x2e83640, thd=0x7ffedc000b90, force_rotate=true)
    at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:7846

因此即便我們手動修改了index file，flush binary logs卻不會有問題，因為它實際掃描了index file檔案。
同時我們也看到flush binary logs重新載入了index file，這個時候手動修改的index file就生效了，使用show binary logs就能檢視到你加入的檔案了。

五、主從問題的產生

binlog切換後，DUMP執行緒也需要讀取下一個binlog檔案。我們來看看它是怎麼確認讀取哪一個檔案的。

在函式sender.run()中可以找到迴圈每個binlog檔案的程式碼。下面一句是尋找下一個binlog檔案：

int error= mysql_bin_log.find_next_log(&m_linfo, 0);

mysql_bin_log.find_next_log 包含的程式碼：

  my_b_seek(&index_file, linfo->index_file_offset);//進行偏移量偏移
  linfo->index_file_start_offset= linfo->index_file_offset;
  length=my_b_gets(&index_file, fname, FN_REFLEN));//讀取檔名字...    if(normalize_binlog_name(full_fname, fname, is_relay_log))
...
  linfo->index_file_offset= my_b_tell(&index_file);//偏移量從新記錄以備下一次使用

我們能夠看到DUMP執行緒並沒有實際掃描整個index檔案，而是通過一個index檔案的偏移量進行讀取。如果手動修改index檔案那麼偏移量就出現了錯亂。因此DUMP傳送的下一個檔案將是不確定的。因此出現了傳送手動註冊的binlog檔案給從庫的現象，這種情況下可能出現下面情況：

如果是GTID_MODE 關閉，使用POSITION那麼這種情況一定報錯，比如重複的行。
如果是GTID_MODE 和AUTO_POSITION=1，那麼DUMP執行緒會進行GTID的過濾不傳送，因為Event不傳送所以延遲會持續一段時間為0。
如果是GTID_MODE 和AUTO_POSITION=0，那麼SQL執行緒應用GITD_EVENT的時候會進行過濾，延遲可能出現很大的情況。

儘管GTID可能可以遮蔽這種問題，但是DUMP執行緒已經在考慮傳送老的binlog檔案了，這是不合適的。

六、purge binary logs能夠維護這個偏移量

為什麼purge binary logs不會出現問題呢，因為在purge binary logs的語句下，會維護這個偏移量如下：

  virtual void operator()(THD *thd)
  {
    LOG_INFO* linfo;
    mysql_mutex_lock(&thd->LOCK_thd_data);    if ((linfo= thd->current_linfo))//b binlog.cc:2829
    {      /*
        Index file offset can be less that purge offset only if
        we just started reading the index file. In that case
        we have nothing to adjust.
      */
      if (linfo->index_file_offset < m_purge_offset)
        linfo->fatal = (linfo->index_file_offset != 0);      else
        linfo->index_file_offset -= m_purge_offset;
    }
    mysql_mutex_unlock(&thd->LOCK_thd_data);

我們可以看到linfo->index_file_offset -= m_purge_offset;這樣一個語句。下面是棧幀：

#0  Adjust_offset::operator() (this=0x7fffec5ec720, thd=0x7ffedc000be0) at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:2831#1  0x0000000000eef320 in Do_THD::operator() (this=0x7fffec5ec6a0, thd=0x7ffedc000be0) at /mysqldata/percona-server-locks-detail-5.7.22/sql/mysqld_thd_manager.cc:46#2  0x0000000000eefa0f in std::for_each<THD**, Do_THD> (__first=0x3003358, __last=0x3003368, __f=...)
    at /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/stl_algo.h:4200#3  0x0000000000eeefc0 in Global_THD_manager::do_for_all_thd (this=0x3003340, func=0x7fffec5ec720)
    at /mysqldata/percona-server-locks-detail-5.7.22/sql/mysqld_thd_manager.cc:273#4  0x000000000187ae0a in adjust_linfo_offsets (purge_offset=0) at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:2873#5  0x0000000001883239 in MYSQL_BIN_LOG::remove_logs_from_index (this=0x2e83640, log_info=0x7fffec5ec7d0, need_update_threads=true)
    at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:6352#6  0x0000000001883676 in MYSQL_BIN_LOG::purge_logs (this=0x2e83640, to_log=0x7fffec5eca80 "/mysqldata/mysql3340/log/binlog.000001", included=false, 
    need_lock_index=true, need_update_threads=true, decrease_log_space=0x0, auto_purge=false) at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:6469#7  0x000000000187b4b5 in purge_master_logs (thd=0x7ffee0000c00, to_log=0x7ffee0006600 "binlog.000001")
    at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:3127

七、測試POSITION模式下的報錯

1、環境

mysql> show binary logs;
+---------------+-----------+
| Log_name      | File_size |
+---------------+-----------+
| binlog.000001 |       198 |
| binlog.000002 |       154 |
+---------------+-----------+2 rows in set (0.00 sec)
mysql> show create table testcp \G;
*************************** 1. row ***************************
       Table: testcp
Create Table: CREATE TABLE `testcp` (
  `id` int(11) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf81 row in set (0.00 sec)
ERROR: 
No query specified

2、執行一個語句

主庫：

mysql> insert into testcp values(20);
Query OK, 1 row affected (0.43 sec)
mysql> select * from testcp;
+----+| id |+----+| 10 || 20 |+----+2 rows in set (0.01 sec)

從庫：

mysql> show slave status \G;
*************************** 1. row ***************************               Slave_IO_State: Waiting for master to send event                  Master_Host: 192.168.99.41
                  Master_User: repl                  Master_Port: 3340
                Connect_Retry: 60
              Master_Log_File: binlog.000002
          Read_Master_Log_Pos: 417
               Relay_Log_File: relay.000004
                Relay_Log_Pos: 624
        Relay_Master_Log_File: binlog.000002
             Slave_IO_Running: Yes            Slave_SQL_Running: Yes
...
mysql> select * from testcp;
+----+| id |+----+| 10 || 20 |+----+

這個時候DUMP執行緒index file偏移指標如下：

image.png

3、主庫執行purge binary logs

做之前拷貝原有binlog.000001為binlog.000001bak因為等會要拷貝回去

mysql> purge binary logs to  'binlog.000002';
Query OK, 0 rows affected (3.14 sec)
mysql> show binary logs;
+---------------+-----------+| Log_name      | File_size |
+---------------+-----------+
| binlog.000002 |       417 |+---------------+-----------+1 row in set (0.00 sec)

因為purge binary logs命令會維護偏移指標，這個時候DUMP執行緒的index file偏移指標如下：

image.png

4、手動更改

將binlog.000001bak拷貝為binlog.000001，然後修改index file將binlog.000001加入回去，然後flush binary logs如下：

mysql> flush binary logs;
Query OK, 0 rows affected (0.15 sec)
mysql> show binary logs;
+---------------+-----------+| Log_name      | File_size |
+---------------+-----------+
| binlog.000001 |         198 || binlog.000002 |       461 |
+---------------+-----------+
2 rows in set (0.00 sec)

手動完成這個工作並不會維護DUMP執行緒的index file偏移指標，因此如下：

image.png

就這樣DUMP執行緒重新傳送了一次binlog.000002的內容，POSITION的從庫只能報錯了如下：

mysql> select * from replication_applier_status_by_worker  \G
*************************** 1. row ***************************
         CHANNEL_NAME: 
            WORKER_ID: 1
            THREAD_ID: NULL
        SERVICE_STATE: OFF
LAST_SEEN_TRANSACTION: ANONYMOUS
    LAST_ERROR_NUMBER: 1062
   LAST_ERROR_MESSAGE: Worker 1 failed executing transaction 'ANONYMOUS' at master log binlog.000002, end_log_pos 386; Could not execute Write_rows event on table testmts.testcp; Duplicate entry '20' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log binlog.000002, end_log_pos 386
 LAST_ERROR_TIMESTAMP: 2019-05-21 14:46:26

八、總結

這種操作非常不規範，如果實在要這麼做考慮如下步驟：

關閉全部的從庫
手動註冊binlog檔案，修改index 檔案
flush binary logs
啟動從庫

這種情況下將會重新初始化DUMP的偏移量指標，應該是沒有問題的。還是儘量不要考慮這麼做。

作者微信：gp_22389860

來自 “ ITPUB部落格 ” ，連結：http://blog.itpub.net/7728585/viewspace-2645114/，如需轉載，請註明出處，否則將追究法律責任。

啟動Dubbo專案註冊Zookeeper時提示zookeeper not connected異常原理解析
2021-10-27
故障分析 | 手動 rm 掉 binlog 導致主從報錯
2022-05-16
binlog 異常暴漲分析
2019-08-05
oracle主動丟擲異常
2021-05-24
Oracle
python的檔案和異常
2024-06-25
Python
.Net8 AddKeyedScoped鍵值key註冊服務異常
2024-06-20
MySQL 主從複製，常見的binlog錯誤及解決方法
2024-07-30
MySql
JAVA類檔案操作和異常
2020-11-25
Java
Win10系統如何註冊ocx檔案？Win10系統註冊ocx檔案的方法
2020-10-22
Win10
Binlog多大導致主從中斷
2021-12-11
解決陌陌註冊新號就提示裝置異常
2020-10-29
python進階(7)--檔案與異常
2021-04-18
Python
註冊 sql語句+後端PHP檔案
2019-10-16
SQL後端PHP
Kafka SimpleStringSchema 可能會造成空指標異常
2023-01-09
Kafka指標
Springboot專案啟動異常排查
2024-04-05
Spring Boot
動態註冊和靜態註冊
2018-05-21
ostgreSQL主從切換-手動
2024-05-20
SQL
造成記憶體洩漏的異常處理
2022-04-10
記憶體
Dubbo配置註冊中心設定application的name使用駝峰命名法存在的隱藏專案啟動異常問題
2021-12-21
APP
Python之錯誤異常和檔案處理
2019-03-04
Python
Python學習之路9-檔案和異常
2019-03-04
Python
docker執行javaWeb服務，操作檔案異常
2024-04-05
DockerJavaWeb
MySQL二進位制檔案（binlog）
2020-12-04
MySql
oracle的靜態註冊和動態註冊
2024-11-11
Oracle
win10如何註冊mscomctl.ocx_win10註冊mscomctl.ocx檔案操作步驟
2020-03-19
Win10
羽毛球群檔案--客戶手冊
2018-04-08
主動學習可減少對標註資料的依賴卻會造成標註冗餘？
2019-12-22
一次意外斷電導致mysql檔案損壞，啟動異常
2022-02-18
MySql
Linux下修改fstab檔案造成系統無法啟動
2021-03-16
Linux
SpringBoot專案使用yml檔案連結資料庫異常
2024-10-03
Spring Boot資料庫
自動註冊appleid
2019-01-16
APP
zabbix自動註冊
2024-06-10
手動造輪子——為Ocelot整合Nacos註冊中心
2020-07-21
Python入門教程：Day11-檔案和異常
2019-04-08
Python
PHP透過pem檔案校驗簽名異常
2024-11-03
PHP
記 QEMU 虛擬磁碟裝置移動檔案拋異常但實際移動成功
2024-11-15
批量註冊歸檔日誌
2018-04-13
MySQL：MGR修改max_binlog_cache_size引數導致異常
2020-10-28
MySql

手動註冊binlog檔案造成主從異常

一、問題來源

二、朋友的測試

三、現象說明

四、flush binary logs對binlog的操作

五、主從問題的產生

六、purge binary logs能夠維護這個偏移量

七、測試POSITION模式下的報錯

1、 環境

2、執行一個語句

3、主庫執行purge binary logs

4、手動更改

八、總結

相關文章

1、環境