MySQL5.7GTID運維實戰

蘭春發表於2017-02-15

GTID 和 START SLAVE

  • START SLAVE 語法
START SLAVE [thread_types] [until_option] [connection_options]

thread_types:
    [thread_type [, thread_type] ... ]

thread_type:
    IO_THREAD | SQL_THREAD

until_option:
    UNTIL {   {SQL_BEFORE_GTIDS | SQL_AFTER_GTIDS} = gtid_set
          |   MASTER_LOG_FILE = `log_name`, MASTER_LOG_POS = log_pos
          |   RELAY_LOG_FILE = `log_name`, RELAY_LOG_POS = log_pos
          |   SQL_AFTER_MTS_GAPS  }


* SQL_BEFORE_GTIDS = $gitd_set : $gtid_set之前的gtid都會被執行

eg. START SLAVE SQL_THREAD UNTIL SQL_BEFORE_GTIDS = 3E11FA47-71CA-11E1-9E33-C80AA9429562:11-56

表示,當SQL_thread 執行到3E11FA47-71CA-11E1-9E33-C80AA9429562:10 的時候停止,下一個事務是11

* SQL_AFTER_GTIDS = $gitd_set : $gtid_set之前,以及$gtid_set包含的gtid都會被執行

eg. START SLAVE SQL_THREAD UNTIL SQL_AFTER_GTIDS = 3E11FA47-71CA-11E1-9E33-C80AA9429562:11-56

表示,當SQL_thread 執行到3E11FA47-71CA-11E1-9E33-C80AA9429562:56 的時候停止,56是最後一個提交的事務。


  • 如何從multi-threaded slave 轉化成 single-threaded mode
START SLAVE UNTIL SQL_AFTER_MTS_GAPS;

SET @@GLOBAL.slave_parallel_workers = 0;

START SLAVE SQL_THREAD;

GTID 和 upgrade

如果 --gtid-mode=ON ,那麼在使用upgrade時候,不推薦使用--write-binlog 選項。
因為,mysql_upgrade 會更新Myisam引擎的系統表. 而同時更新transction table 和 non-trasaction table 是gtid所不允許的

GTID 和 mysql.gtid_executed

  • gtid_mode = (ON|ON_PERMISSIVE), bin_log = off
gtid 會實時的寫入到mysql.gtid_executed表中,且根據executed_gtids_compression_period=N來壓縮
  • gtid_mode = (ON|ON_PERMISSIVE), bin_log = on
gtid 不會實時的寫入到mysql.gtid_executed,executed_gtids_compression_period會失效。
只有當binlog rotate或者mysql shutdown的時候才會寫入mysql.gtid_executed

如果master 異常shutdown,gtid還沒有寫入到mysql.gtid_executed怎麼辦呢?
這種場景,一般通過mysql recover機制寫入到mysql.gtid_executed中

GTID 和 gtid_next

http://dev.mysql.com/doc/refman/5.7/en/replication-options-gtids.html#sysvar_gtid_next

  • 三種取值
* AUTOMATIC: Use the next automatically-generated global transaction ID.

* ANONYMOUS: Transactions do not have global identifiers, and are identified by file and position only.

* A global transaction ID in UUID:NUMBER format.
  • QA: GTID 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-50 對應的事務順序,從小到大,一定是順序執行的嗎?

答案:錯,一般情況下事務是從小到大,順序執行的。 但是如果再MTS場景,或者是人工設定gtid_next的情況下,就可能不是順序執行了

dba:(none)> show master status;
+--------------------+----------+--------------+------------------+-------------------------------------------+
| File               | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                         |
+--------------------+----------+--------------+------------------+-------------------------------------------+
| xx.000009 |     1719 |              |                  | 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-46 |
+--------------------+----------+--------------+------------------+-------------------------------------------+
1 row in set (0.00 sec)

dba:(none)> set gtid_next=`0923e916-3c36-11e6-82a5-ecf4bbf1f518:50`;
Query OK, 0 rows affected (0.00 sec)

dba:lc> insert into gtid_1 values(5);
Query OK, 1 row affected (0.00 sec)

dba:lc> set gtid_next=AUTOMATIC;
Query OK, 0 rows affected (0.00 sec)


dba:lc> flush logs;
Query OK, 0 rows affected (0.01 sec)

dba:lc> show master status;
+--------------------+----------+--------------+------------------+----------------------------------------------+
| File               | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                            |
+--------------------+----------+--------------+------------------+----------------------------------------------+
| xx.000010 |      210 |              |                  | 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-46:50 |
+--------------------+----------+--------------+------------------+----------------------------------------------+
1 row in set (0.00 sec)

dba:lc> insert into gtid_1 values(6);
Query OK, 1 row affected (0.00 sec)

dba:lc> insert into gtid_1 values(6);
Query OK, 1 row affected (0.00 sec)

dba:lc> insert into gtid_1 values(6);
Query OK, 1 row affected (0.00 sec)

dba:lc> show master status;
+--------------------+----------+--------------+------------------+-------------------------------------------+
| File               | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                         |
+--------------------+----------+--------------+------------------+-------------------------------------------+
| xx.000010 |     1125 |              |                  | 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-50 |
+--------------------+----------+--------------+------------------+-------------------------------------------+
1 row in set (0.00 sec)


在這裡面,很明顯0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-50 事務執行順序為: 1-46(最先執行) , 50(其次執行) , 47-49(最後執行)

GTID 和 MHA

請參考MHA原始碼解析

  • GTID模式下,需要relay-log嗎?purge_relay_log設定為on可以嗎?
* replication 架構

host_1(host_1:3306) (current master)
 +--host_2(host_2:3306 candidate master)
 +--host_3(host_3:3306 no candidate)

* 模擬:
1. 大量併發的寫入,一直持續的往host_1寫資料,造成併發寫入很大的樣子
2. host_2: stop slave , 造成host_2 延遲master很多的樣子
3. host_1:  purge binary logs, 造成master刪掉了日誌,導致host_2 修復的時候拿不到master的最新binlog
4. host_3:  一直正常同步master,擁有最新的binlog
5. host_3: flush logs; purge_relay_log=on; flush logs;一直迴圈flush logs,造成host_3已經將最新的relay log刪掉了,host_2 是肯定拿不到host_3的relay 來修復自己了
6. 好了,一切條件均已經準備完畢,這個時候讓master 當機,這樣就能模擬出在relay log沒有的情況下,是否可以正常完成mha 切換了

...............


7. 結果完成了正常切換,那mha是怎麼再gtid模式下,在沒有relay log的情況下,正常切換的恩?
8. 原理:host_2發現自己不是最新的slave,所以就去change master到host_3,通過host_3的binlog來恢復
9. 最後,當host_2和host_3都一致的情況下,再讓host_3 重新指向host_2,完畢...

*結論: gtid模式下,mha恢復切換的原理是不需要relay log的,只需要binlog

GTID 和 備份(物理備份+邏輯備份)

物理備份:xtrabackup,其他等
邏輯備份:mysqldump,mydumper,mysqlpump等

  • 物理備份
備份的時候,只要在備份的時候記錄下Executed_Gtid_Set($gtid_dump)即可,這個可以用於重新change master;

reset master;
SET @@GLOBAL.GTID_PURGED=`$gtid_dump`;
change master to master_auto_position=1;
  • 邏輯備份
* mysqldump 中 sql_log_bin 預設是關閉的。 SET @@SESSION.SQL_LOG_BIN= 0;   所以這裡用途非常重要

* 如果dump檔案,你要在master上執行,那麼必須這樣備份: mysqldump xx  --set-gtid-purged=OFF , 這樣dump檔案不會有SET @@SESSION.SQL_LOG_BIN= 0存在

* 如果dump檔案,你要在slave上執行,想重新搭建一套slave環境。那麼必須這樣備份: mysqldump xx  --set-gtid-purged=ON

GTID 和 crash safe slave

slave relay log 不完整怎麼辦?(relay-log-recover=0)
relay-log-recover=1 不考慮,因為它會捨棄掉relay log

  • 為何要討論這個
* 官方解釋:

1) 非GTID模式下,如何保證slave crash safe 呢?
    relay_log_recovery=1,relay_log_info_repository=TABLE,master_info_repository=TABLE,innodb_flush_log_at_trx_commit=1,sync_binlog=1

2) GTID模式下,如何保證slave crash safe呢?
    relay_log_recovery=(1|0),relay_log_info_repository=TABLE,master_info_repository=TABLE,innodb_flush_log_at_trx_commit=1,sync_binlog=1

以上兩種情況配置,可以保證crash safe

這裡看到區別就是relay_log_recovery了,gtid可以是any,這就需要討論下了。

當relay_log_recovery=1時,當mysql crash的時候,會丟棄掉之前獲取的relay,所以這個不會產生一致性問題。
當relay_log_recovery=0時
    如果是非GTID模式,因為沒辦法保證寫master_info.log和relay log file之間的原子性,會導致slave有可能多拉取一個事務,這樣就有一致性問題。
    如果是GTID模式,因為binlog-dump協議變了,master_info.log已經不用,slave會將已經exected_GTID與retrieve_gtid的並集傳送給master,以此來獲取沒有執行過的gtid,所以沒問題。

這裡面的retrieve_gtid就是IO_thread從master獲取的gtid,會寫入到relay log。
  • 模擬relay log不完整的情況

從上面可以知道,relay log的記錄非常重要,那麼relay log 不完整,會怎麼樣呢?

1) master 建立一張10G的表,然後執行全表更新操作。
2)這時候,slave就在狂寫relay log了
3)此時,去slave kill掉mysql程式
4)這時候,relay log就不完整了

WARNING: The range of printed events ends with a row event or a table map event that does not have the STMT_END_F flag set.
This might be because the last statement was not fully written to the log, or because you are using a --stop-position or --stop-datetime that refers to an event in the middle of a statement.
The event(s) from the partial statement have not been written to output.

總結: relay log不完整,mysql起來後,會重新獲取不完整的這個events,sql_thread在回放的時候,如果發現events不完整,會跳過,不會影響到同步。

GTID 和 MTS

MTS_GAPS

  • 如果MTS遇到Gap transction怎麼辦?
1. 先解決問題

START SLAVE UNTIL SQL_AFTER_MTS_GAPS

2. 考慮設定slave_preserve_commit_order=1

GTID 生產環境中必須考慮的問題

Migration to GTID replication
Non transactionally safe statement will raise errors now
MySQL Performance in GTID
mysql_upgrade script
Errant transactions
Filtration on the slave
Injecting empty transactions
以上問題請參考 GTID原理與實戰

GTID 和 online升級

online升級丟資料?
online升級會報錯嗎?
online升級步驟請參考 GTID原理與實戰

  • 故障案例一

Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: `Cannot replicate anonymous transaction when @@GLOBAL.GTID_MODE = ON …`

兩種情況:
1)slave的gtid_mode=on時,卻還接受著來自master的non-gtid transaction的時候,會報以上錯誤。
2)事實上,不管slave的gtid_mode是on,還是off,只要master的gtid_mode=on,那麼整個replication slave,都必須是gtid的事務

解決方案:在master上從gtid_mode=ON_PERMISSIVE 設定到 gtid_mode=ON之前,如何保證現在所有non-gtid事務都已經在slave執行完畢了?

很簡單,兩種方法:

第一種方案:

1) 在master上,當設定gtid_mode=ON_PERMISSIVE的時候,其實就已經產生gtid事務了,這個時候show master status;記下這個位置 $pos

2)然後再每個slave上,執行 SELECT MASTER_POS_WAIT(file, position);

第二種更加直接方案:

0)預設情況下,slave的gtid_mode都是off,所以去slave上show master status 都應該是file,position
1) 先在master上,設定gtid_mode=ON_PERMISSIVE
2)然後再每臺slave上再次執行show master status,如果發現結果由file,position 變成 GTID_EXECUTED,那麼說明slave已經將non-gtid全部執行完畢了
  • 故障案例二

Last_IO_Error: The replication receiver thread cannot start because the master has GTID_MODE = ON and this server has GTID_MODE = OFF.

slave的gtid_mode=off時,卻還接受著來自master的gtid transaction的時候,會報以上錯誤。

GTID 和 mysqlbinlog

mysqlbinlog 引數:

* --exclude-gtids : 排除這些gtid
* --include-gtids : 只列印這些gtid
* --skip-gtids    : 所有gtid都不列印

可以用--skip-gtids 做傳統模式的恢復。但是這個是官方不推薦的。
mysqlbinlog --skip-gtids binlog.000001 >  /tmp/dump.sql

GTID 和 重要函式

gtid_set 用引號擴起來

Name Description
GTID_SUBSET(subset,set) returns true (1) if all GTIDs in subset are also in set
GTID_SUBTRACT(set,subset) returns only those GTIDs from set that are not in subset
WAIT_FOR_EXECUTED_GTID_SET(gtid_set[, timeout]) Wait until the given GTIDs have executed on slave.
WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS(gtid_set, timeout) Wait until the given GTIDs have executed on slave
  • GTID_SUBSET(subset,set)

subset 是否是 set 的子集,如果是返回1,不是返回0

dba:(none)> SELECT GTID_SUBSET(`3E11FA47-71CA-11E1-9E33-C80AA9429562:23`,`3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57`);
+-----------------------------------------------------------------------------------------------------+
| GTID_SUBSET(`3E11FA47-71CA-11E1-9E33-C80AA9429562:23`,`3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57`) |
+-----------------------------------------------------------------------------------------------------+
|                                                                                                   1 |
+-----------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

dba:(none)> SELECT GTID_SUBSET(`3E11FA47-71CA-11E1-9E33-C80AA9429562:23-25`,`3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57`);
+--------------------------------------------------------------------------------------------------------+
| GTID_SUBSET(`3E11FA47-71CA-11E1-9E33-C80AA9429562:23-25`,`3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57`) |
+--------------------------------------------------------------------------------------------------------+
|                                                                                                      1 |
+--------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

dba:(none)> SELECT GTID_SUBSET(`3E11FA47-71CA-11E1-9E33-C80AA9429562:23`,`3E11FA47-71CA-11E1-9E33-C80AA9429562:23`);
+--------------------------------------------------------------------------------------------------+
| GTID_SUBSET(`3E11FA47-71CA-11E1-9E33-C80AA9429562:23`,`3E11FA47-71CA-11E1-9E33-C80AA9429562:23`) |
+--------------------------------------------------------------------------------------------------+
|                                                                                                1 |
+--------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

dba:(none)> SELECT GTID_SUBSET(`3E11FA47-71CA-11E1-9E33-C80AA9429562:20-25`,`3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57`);
+--------------------------------------------------------------------------------------------------------+
| GTID_SUBSET(`3E11FA47-71CA-11E1-9E33-C80AA9429562:20-25`,`3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57`) |
+--------------------------------------------------------------------------------------------------------+
|                                                                                                      0 |
+--------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
  • GTID_SUBTRACT(set,subset)

哪些gtids僅僅是set獨有的,subset沒有的

dba:(none)> SELECT GTID_SUBTRACT(`3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57`,`3E11FA47-71CA-11E1-9E33-C80AA9429562:21`);
+-------------------------------------------------------------------------------------------------------+
| GTID_SUBTRACT(`3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57`,`3E11FA47-71CA-11E1-9E33-C80AA9429562:21`) |
+-------------------------------------------------------------------------------------------------------+
| 3e11fa47-71ca-11e1-9e33-c80aa9429562:22-57                                                            |
+-------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

dba:(none)> SELECT GTID_SUBTRACT(`3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57`,`3E11FA47-71CA-11E1-9E33-C80AA9429562:20-25`);
+----------------------------------------------------------------------------------------------------------+
| GTID_SUBTRACT(`3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57`,`3E11FA47-71CA-11E1-9E33-C80AA9429562:20-25`) |
+----------------------------------------------------------------------------------------------------------+
| 3e11fa47-71ca-11e1-9e33-c80aa9429562:26-57                                                               |
+----------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

dba:(none)> SELECT GTID_SUBTRACT(`3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57`,`3E11FA47-71CA-11E1-9E33-C80AA9429562:23-24`);
+----------------------------------------------------------------------------------------------------------+
| GTID_SUBTRACT(`3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57`,`3E11FA47-71CA-11E1-9E33-C80AA9429562:23-24`) |
+----------------------------------------------------------------------------------------------------------+
| 3e11fa47-71ca-11e1-9e33-c80aa9429562:21-22:25-57                                                         |
+----------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

以上兩個函式可以用來幹嘛呢?
通過GTID_SUBSET,master可以知道slave是否是自己的子集,可以很方便的檢查資料一致性
通過GTID_SUBTRACT,假設slave是master的子集,那麼可以很輕鬆的將slave沒有,master有的gtid傳送給slave,以便達到最終一致性

  • WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS(gtid_set, timeout)

timeout 預設為0,表示無限等待slave gtid_set全部執行完畢
如果全部執行完畢,會返回執行的gtid的數量。如果沒有執行完,會等待timeout秒。如果slave沒有起來,或者沒有開啟gtid,會返回NULL

dba:lc> SELECT WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS(`0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-3`);
+---------------------------------------------------------------------------------+
| WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS(`0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-3`,1) |
+---------------------------------------------------------------------------------+
|                                                                               0 |
+---------------------------------------------------------------------------------+
1 row in set (0.00 sec)

stop slave;

dba:lc> SELECT WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS(`0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-3`);
+---------------------------------------------------------------------------------+
| WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS(`0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-3`,1) |
+---------------------------------------------------------------------------------+
|                                                                            NULL |  ## 如果slave的IO,SQL thread 沒有running,返回NULL,不管gtid set 有木有執行完畢
+---------------------------------------------------------------------------------+
1 row in set (0.00 sec)
  • WAIT_FOR_EXECUTED_GTID_SET(gtid_set[, timeout])

含義跟WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS一樣,唯一一個區別就是:如果slave 的replication 執行緒沒有起來,不會返回NULL。

stop slave;

dba:lc> SELECT  WAIT_FOR_EXECUTED_GTID_SET(`0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-3`);
+------------------------------------------------------------------------+
| WAIT_FOR_EXECUTED_GTID_SET(`0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-3`) |
+------------------------------------------------------------------------+
|                                                                      0 |          ## 如果都執行了,返回0 , 跟slave的IO,SQL thread 起沒起來無關
+------------------------------------------------------------------------+
1 row in set (0.00 sec)

dba:lc> SELECT  WAIT_FOR_EXECUTED_GTID_SET(`0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-4`,1);
+--------------------------------------------------------------------------+
| WAIT_FOR_EXECUTED_GTID_SET(`0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-4`,1) |
+--------------------------------------------------------------------------+
|                                                                        1 |
+--------------------------------------------------------------------------+
1 row in set (1.00 sec)

GTID 的限制和缺點

  • 同事更新nontransactional和transactional的表,會導致gtid問題
  • CREATE TABLE … SELECT statements 語法對GTID來說是不安全的
  • CREATE TEMPORARY TABLE and DROP TEMPORARY TABLE 對GTID也是不安全的
  • enforce-gtid-consistency 必須設定on,可以避免以上2,3 不安全的statement
  • sql_slave_skip_counter 不允許執行,可以通過 Injecting empty transactions 來解決
  • GTID 和 mysqldump的問題,mysqldump 中 sql_log_bin 預設是關閉的.會導致匯入master後,不會寫入gtid到binlog. ( 可以通過 –set-gtid-purged=OFF 避免 )
  • GTID and mysql_upgrade, 因為部分系統表是myisam引擎的,會有問題。 (可以通過–write-binlog=off來避免 )

參考文件

  • 官方資料:

1.5 Server and Status Variables and Options Added, Deprecated, or Removed in MySQL 5.7

5.5.4 mysqldump — A Database Backup Program
5.6.7 mysqlbinlog — Utility for Processing Binary Log Files


13.17 Functions Used with Global Transaction IDs

14.4.2.1 CHANGE MASTER TO Syntax
14.4.2.6 START SLAVE Syntax
14.7.5.34 SHOW SLAVE STATUS Syntax



18.1.3 Replication with Global Transaction Identifiers
    18.1.3.1 GTID Concepts
    18.1.3.2 Setting Up Replication Using GTIDs
    18.1.3.3 Using GTIDs for Failover and Scaleout
    18.1.3.4 Restrictions on Replication with GTIDs
    18.1.5.1 Replication Mode Concepts
    18.1.5.2 Enabling GTID Transactions Online
    18.1.5.3 Disabling GTID Transactions Online
    18.1.6.1 Replication and Binary Logging Option and Variable Reference
    18.1.6.5 Global Transaction ID Options and Variables
    18.3.2 Handling an Unexpected Halt of a Replication Slave
    18.4.1.34 Replication and Transaction Inconsistencies
    18.4.3 Upgrading a Replication Setup


19.2.1.5 Adding Instances to the Group

24.10.7.1 The events_transactions_current Table
24.10.11.6 The replication_applier_status_by_worker Table
  • 第三方資料
> http://www.fromdual.ch/things-you-should-consider-before-using-gtid
> http://www.fromdual.ch/gtid_in_action
> http://www.fromdual.ch/replication-troubleshooting-classic-vs-gtid
> http://www.fromdual.ch/replication-in-a-star
> http://www.fromdual.com/controlling-worldwide-manufacturing-plants-with-mysql
> https://www.percona.com/blog/2014/05/19/errant-transactions-major-hurdle-for-gtid-based-failover-in-mysql-5-6/
> https://www.percona.com/blog/2016/12/01/database-daily-ops-series-gtid-replication-binary-logs-purge/
> https://www.percona.com/blog/2016/11/10/database-daily-ops-series-gtid-replication/
> https://www.percona.com/blog/2015/12/02/gtid-failover-with-mysqlslavetrx-fix-errant-transactions/
> https://www.percona.com/blog/2014/05/09/gtids-in-mysql-5-6-new-replication-protocol-new-ways-to-break-replication/


相關文章