技術分享 | 使用 sync_diff_inspector 對兩個 MySQL 進行資料校驗

ITPUB社群發表於2023-03-01

作者:沈光宇

愛可生南區 DBA 團隊成員,主要負責 MySQL 故障處理和效能最佳化。對技術執著,為客戶負責。

本文來源:原創投稿

*愛可生開源社群出品,原創內容未經授權不得隨意使用,轉載請聯絡小編並註明來源。


一、sync-diff-inspector簡介

sync-diff-inspector 是由 PingCAP 開源的資料校驗工具,用於校驗 MySQL/TiDB 中兩份資料是否一致。
主要功能如下:
  • 對比表結構和資料

  • 如果資料不一致,則生成用於修復資料的 SQL 語句

  • 支援不同庫名或表名的資料校驗

  • 支援分庫分表場景下的資料校驗

  • 支援 TiDB 主從叢集的資料校驗

  • 支援從 TiDB DM 拉取配置的資料校驗

sync-diff-inspector 的使用限制
  • 對於 MySQL 和 TiDB 之間的資料同步不支援線上校驗,需要保證上下游校驗的表中沒有資料寫入,或者保證某個範圍內的資料不再變更,透過配置 range 來校驗這個範圍內的資料。

  • FLOAT、DOUBLE 等浮點數型別在 TiDB 和 MySQL 中的實現方式不同,在計算 checksum 時會分別取 6 位和 15 位有效數字。如果不使用該特性,需要設定 ignore-columns 忽略這些列的檢查。

  • 支援對不包含主鍵或者唯一索引的表進行校驗,但是如果資料不一致,生成的用於修復的 SQL 可能無法正確修復資料。

本文將介紹使用 sync-diff-inspector 工具對兩個 MySQL 例項中的資料進行校驗,兩個 MySQL 例項之間使用 DTS 工具來同步資料。

二、sync-diff-inspector 工具下載安裝

#sync-diff-inspector已整合在TiDB工具包中,直接下載TiDB工具包即可
shell> wget
shell> tar zxvf tidb-community-toolkit-v6.4.0-linux-amd64.tar.gz
shell> ls -lh tidb-community-toolkit-v6.4.0-linux-amd64 | grep sync_diff_inspector
-rwxr-xr-x 1 tidb tidb  98M Nov 17 11:41 sync_diff_inspector
shell> ./sync_diff_inspector  -V
App Name: sync_diff_inspector v2.0
Release Version: v6.4.0
Git Commit Hash: f7e65073b35538def61ae094cd4a8e57e705344b
Git Branch: heads/refs/tags/v6.4.0
UTC Build Time: 2022-11-04 07:21:08
Go Version: go1.19.2

三、sync-diff-inspector工具使用示例

1. 配置檔案通用部分

shell> cat config.toml
check-thread-count = 4             # 檢查資料的執行緒數量
export-fix-sql = true              # 如果開啟,若表資料存在不一致,則輸出用於修復的SQL語句
check-struct-only = false          # 只對比表結構而不對比資料
[data-sources]                    
[data-sources.mysql1]              # 上游MySQL資料庫配置(源端)
   host = "10.186.65.57"          
   port = 3306
   user = "sgy"
   password = "admin"
   route-rules = ["rule1"]        # 對映匹配規則,透過配置相應的規則可以對單個、多個schema或table進行校驗
                                  # 如有多個rule時,可配置成 ["rule1", "rule2"]
[data-sources.mysql2]              # 下游MySQL資料庫配置(目標端)
   host = "10.186.65.89"
   port = 3309
   user = "sgy"
   password = "admin"
2. 基於 schema 的資料校驗
  • 對單個 schema 進行資料校驗
#對映匹配規則部分,需要將此部分放到置配置檔案通用部分的後面
[routes]                               # 對映關係,如上下游schema不同名可在此配置
[routes.rule1]
   schema-pattern = "sbtest"          # 匹配上游資料庫的的庫名
   target-schema = "sbtest"           # 匹配下游資料庫的庫名

[task]
   output-dir = "./output"
   source-instances = ["mysql1"]   # 上游資料庫,內容是 data-sources 宣告的唯一標識 id,分庫分表場景下支援多個上游資料庫,如:["mysql10", "mysql20"]
   target-instance = "mysql2"      # 下游資料庫,內容是 data-sources 宣告的唯一標識 id
   target-check-tables = ["sbtest.*"]    # 需要比對的下游資料庫的表
   
# 進行資料校驗
shell> ./sync_diff_inspector  --config=./config.toml
A total of 8 tables need to be compared

Progress [>------------------------------------------------------------] 0% 0/0
Comparing the table structure of ``sbtest`.`sbtest4`` ... equivalent     #表結構一致
Comparing the table structure of ``sbtest`.`sbtest5`` ... equivalent
.......................................................
Comparing the table data of ``sbtest`.`sbtest2`` ... equivalent          #表中資料一致
Comparing the table data of ``sbtest`.`sbtest8`` ... equivalent
_____________________________________________________________________________
Progress [============================================================>] 100% 0/0
A total of 8 table have been compared and all are equal.
You can view the comparision details through './output/sync_diff.log'
  • 對多個 schema 進行資料校驗
#由於對多個schema進行資料校驗,routes包含了rule1、rule2,配置檔案通用部分需要做以下修改
[data-sources.mysql1]
   route-rules = ["rule1","rule2"]
   
#對映匹配規則部分,需要將此部分放到置配置檔案通用部分的後面
[routes]
[routes.rule1]
   schema-pattern = "sbtest*"       #使用正則匹配sbtest開頭的schema,如sbtest,sbtest1,sbtest2,sbtest3
   target-schema = "sbtest*"

[routes.rule2]
   schema-pattern = "sgy"           #匹配schema:sgy
   target-schema = "sgy"

[task]
   output-dir = "./output"
   source-instances = ["mysql1"]     # 上游資料庫,內容是 data-sources 宣告的唯一標識 id,分庫分表場景下支援多個上游資料庫,如:["mysql10", "mysql20"]
   target-instance = "mysql2"        # 下游資料庫,內容是 data-sources 宣告的唯一標識 id
   target-check-tables = [ "sbtest*.*","sgy.*"]   #對源、目標例項中的sgy及以sbtest開頭的schema所有表進行校驗

#進行資料校驗
shell> ./sync_diff_inspector --config=./config.toml
A total of 24 tables need to be compared

Progress [>------------------------------------------------------------] 0% 0/0
Comparing the table structure of ``sgy`.`sbtest2`` ... equivalent
Comparing the table data of ``sgy`.`sbtest2`` ... equivalent
.......................................................
Comparing the table data of ``sbtest`.`sbtest1`` ... equivalent
Comparing the table data of ``sbtest3`.`sbtest1`` ... equivalent
Comparing the table data of ``sbtest2`.`sbtest1`` ... equivalent
Comparing the table data of ``sbtest1`.`sbtest2`` ... equivalent
_____________________________________________________________________________
Progress [============================================================>] 100% 0/0
A total of 24 table have been compared and all are equal.
You can view the comparision details through './output/sync_diff.log'

3. 基於 table 的資料校驗

  • 對單個 table 進行資料校驗
#對映匹配規則部分,需要將此部分放到配置檔案通用部分的後面
[routes]
[routes.rule1]
   schema-pattern = "sbtest"
   target-schema = "sbtest"

[task]
   output-dir = "./output"
   source-instances = ["mysql1"]      # 上游資料庫,內容是 data-sources 宣告的唯一標識 id,分庫分表場景下支援多個上游資料庫,如:["mysql10", "mysql20"]
   target-instance = "mysql2"         # 下游資料庫,內容是 data-sources 宣告的唯一標識 id
   target-check-tables = ["sbtest.sbtest1"]      #只校驗表:sbtest.sbtest1

#進行資料校驗
shell> ./sync_diff_inspector --config=./config.toml
A total of 1 tables need to be compared

Progress [>------------------------------------------------------------] 0% 0/0
Comparing the table structure of ``sbtest`.`sbtest1`` ... equivalent
Comparing the table data of ``sbtest`.`sbtest1`` ... equivalent
_____________________________________________________________________________
Progress [============================================================>] 100% 0/0
A total of 1 table have been compared and all are equal.
You can view the comparision details through './output/sync_diff.log'
  • 對多個 table 進行資料校驗
#對schema進行資料校驗也是對多個table進行資料校驗的一種,這裡以指定多個具體表名為例
#由於對多個table進行資料校驗,routes包含了rule1、rule2,配置檔案通用部分需要做以下修改
[data-sources.mysql1]
   route-rules = ["rule1","rule2"]
   
#對映匹配規則部分,需要將此部分放置到配置檔案通用部分的後面
[routes]
[routes.rule1]
   schema-pattern = "sbtest*"
   target-schema = "sbtest*"

[routes.rule2]
   schema-pattern = "sgy"
   target-schema = "sgy"

[task]
   output-dir = "./output"
   source-instances = ["mysql1"]       # 上游資料庫,內容是 data-sources 宣告的唯一標識 id,分庫分表場景下支援多個上游資料庫,如:["mysql10", "mysql20"]
   target-instance = "mysql2"          # 下游資料庫,內容是 data-sources 宣告的唯一標識 id
   #只校驗sbtest.sbtest8,sgy.sbtest4,sbtest1.sbtest1,sbtest2.sbtest2這四個表
   target-check-tables = [ "sbtest.sbtest8","sgy.sbtest4","sbtest1.sbtest1","sbtest2.sbtest2"]
   
#進行資料校驗
shell> ./sync_diff_inspector --config=./config.toml
A total of 4 tables need to be compared

Progress [>------------------------------------------------------------] 0% 0/0
Comparing the table structure of ``sbtest2`.`sbtest2`` ... equivalent
Comparing the table data of ``sbtest2`.`sbtest2`` ... equivalent
Comparing the table structure of ``sgy`.`sbtest4`` ... equivalent
Comparing the table structure of ``sbtest1`.`sbtest1`` ... equivalent
Comparing the table data of ``sgy`.`sbtest4`` ... equivalent
Comparing the table data of ``sbtest1`.`sbtest1`` ... equivalent
Comparing the table structure of ``sbtest`.`sbtest8`` ... equivalent
Comparing the table data of ``sbtest`.`sbtest8`` ... equivalent
_____________________________________________________________________________
Progress [============================================================>] 100% 0/0
A total of 4 table have been compared and all are equal.
You can view the comparision details through './output/sync_diff.log'
  • 對單表進行範圍校驗

#對映匹配規則部分,需要將此部分放置到配置檔案通用部分的後面
[routes]
[routes.rule1]
   schema-pattern = "sbtest"
   target-schema = "sbtest"

[task]
   output-dir = "./output"
   source-instances = ["mysql1"]      # 上游資料庫,內容是 data-sources 宣告的唯一標識 id,分庫分表場景下支援多個上游資料庫,如:["mysql10", "mysql20"]
   target-instance = "mysql2"         # 下游資料庫,內容是 data-sources 宣告的唯一標識 id
   target-check-tables = ["sbtest.sbtest1"]      # 指定校驗目標例項上的sbtest.sbtest1表
   target-configs = ["config1"]                  # 對部分表的額外配置

[table-configs.config1]
target-tables = ["sbtest.sbtest1"]                # 指定校驗目標例項上的sbtest.sbtest1表
range = "id > 10 AND id < 100"                    # 指定校驗目標表的具體範圍,相當於SQL中的where條件
#資料校驗見資料修復功能部分

4. 資料修復功能

#為了演示資料修復,在目標端表中刪除一些資料
shell> mysql -usgy -padmin -h 10.186.65.89 -P 3309 -e "delete from sbtest.sbtest1 where id in (11,22,33,44,55,66,77,88,99);"

# 使用對單表進行範圍校驗配置,對sbtest.sbtest1表進行範圍校驗
shell> ./sync_diff_inspector  --config=./config.toml
A total of 1 tables need to be compared

Comparing the table structure of ``sbtest`.`sbtest1`` ... equivalent     #sbtest.sbtest1表結構一致
Comparing the table data of ``sbtest`.`sbtest1`` ... failure             #sbtest.sbtest1表資料有差異
_____________________________________________________________________________
Progress [============================================================>] 100% 0/0
The data of `sbtest`.`sbtest1` is not equal

The rest of tables are all equal.
The patch file has been generated in
       'output/fix-on-mysql2/'                 # 生成修復SQL輸出到此目錄
You can view the comparision details through './output/sync_diff.log'

# 檢視輸出修復SQL
shell> cat output/fix-on-mysql2/sbtest\:sbtest1\:0\:0-0\:0.sql
-- table: sbtest.sbtest1
-- range in sequence: Full
REPLACE INTO `sbtest`.`sbtest1`(`id`,`k`,`c`,`pad`) VALUES (11,50148,'69183965773-14680923687-92934799461-07606242492-78811530738-23241332728-92911647895-70477201282-85254929997-06214236905','33737501839-63208420999-35708593012-95906952636-68691055996');
REPLACE INTO `sbtest`.`sbtest1`(`id`,`k`,`c`,`pad`) VALUES (22,41538,'14140185946-16271766410-68340573738-46226480462-08989140676-29936780681-56784925909-45742390296-67137862436-18242076592','25112986220-19824650341-42825248958-70186905082-33867163574');
REPLACE INTO `sbtest`.`sbtest1`(`id`,`k`,`c`,`pad`) VALUES (33,50105,'48402160130-78797253227-05588677001-93556313541-39295466587-91364622063-58862572731-27837539373-64526858273-89372384747','72073637794-12055602042-16862397531-87496431032-85451396141');
REPLACE INTO `sbtest`.`sbtest1`(`id`,`k`,`c`,`pad`) VALUES (44,49917,'98096713237-15265478716-72025332919-62964308854-01270604715-12000922788-50929365082-43513513022-28543412388-57790852446','33907865533-62267179125-36062850111-84091551774-69847376840');
REPLACE INTO `sbtest`.`sbtest1`(`id`,`k`,`c`,`pad`) VALUES (55,50259,'68826640374-18002055907-53999869701-72145793168-90893177888-85273641163-24331745145-62755454379-79511152711-99618812770','02724012569-91405199011-30257626349-21678066897-42535351703');
REPLACE INTO `sbtest`.`sbtest1`(`id`,`k`,`c`,`pad`) VALUES (66,41995,'58396552954-26907336026-99506693837-77815822050-42927030403-40927779227-58101279219-11438233008-00344004393-35806649113','02348992414-65327666387-20632806790-74456238429-90933031209');
REPLACE INTO `sbtest`.`sbtest1`(`id`,`k`,`c`,`pad`) VALUES (77,49757,'73705207329-00308504929-05904865650-29498186065-09990420614-84131302024-40320022420-77358683577-34731688411-70665402097','92567035674-84728177369-79087155038-84461952379-45481760225');
REPLACE INTO `sbtest`.`sbtest1`(`id`,`k`,`c`,`pad`) VALUES (88,50497,'77112982215-98755241853-36424062009-45217742824-83650985380-60232607362-47569976121-30091332050-97996374956-97911403909','88977046077-43519705750-51246090615-77629911610-94055348738');
REPLACE INTO `sbtest`.`sbtest1`(`id`,`k`,`c`,`pad`) VALUES (99,49977,'63255973711-62890656114-72914458941-22277906368-32619356110-31219579310-16762665782-69578495131-76043317830-28240408380','08589883275-03392784968-00244590156-39735355951-95769933801');

#將生成的修復SQL匯入目標庫
shell> mysql -usgy -padmin -h 10.186.65.89 -P 3309 < output/fix-on-mysql2/sbtest\:sbtest1\:0\:0-0\:0.sql
mysql: [Warning] Using a password on the command line interface can be insecure.

#再次檢驗,發現資料已經一致
shell> ./sync_diff_inspector  --config=./config.toml
A total of 1 tables need to be compared

Comparing the table structure of ``sbtest`.`sbtest1`` ... equivalent
Comparing the table data of ``sbtest`.`sbtest1`` ... equivalent
_____________________________________________________________________________
Progress [============================================================>] 100% 0/0
A total of 1 table have been compared and all are equal.
You can view the comparision details through './output/sync_diff.log'

四、相關問題

  • 修改配置檔案後需要手動刪除 outputDir 目錄
shell> vim config.toml
shell> ./sync_diff_inspector --config=./config.toml     #將output-dir刪除即可解決
Fail to initialize config.
failed to init Task: config changes breaking the checkpoint, please use another outputDir and start over again!
  • 表建議使用 utf8mb4 字符集,不支援 MySQL8.0 的 utf8mb3 字符集
mysql> select @@version;
+-----------+
| @@version |
+-----------+
| 8.0.30    |
+-----------+
1 row in set (0.00 sec)

mysql> alter table sgy.sbtest4 CHARSET=utf8;
Query OK, 0 rows affected, 1 warning (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 1

mysql> show create table sgy.sbtest4 \G
*************************** 1. row ***************************
      Table: sbtest4
Create Table: CREATE TABLE `sbtest4` (
 `id` int NOT NULL AUTO_INCREMENT,
 `k` int NOT NULL DEFAULT '0',
 `c` char(120) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL DEFAULT '',
 `pad` char(60) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL DEFAULT '',
 PRIMARY KEY (`id`),
 KEY `k_4` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=100001 DEFAULT CHARSET=utf8mb3    
1 row in set (0.00 sec)  

#執行資料校驗時報錯
shell> ./sync_diff_inspector --config=./config.toml
There is something error when initialize diff, please check log info in output/sync_diff.log

#檢視日誌檔案
shell> cat output/sync_diff.log  |grep utf8mb3
[2023/02/19 11:13:04.980 +08:00] [FATAL] [main.go:120] ["failed to initialize diff process"] [error="get table sgy.sbtest4's information error [parser:1115]Unknown character set: 'utf8mb3'\ngithub.com/pingcap/errors.
更詳細的使用說明,請參考 sync-diff-inspector 官方文件:

本文關鍵字#資料校驗#

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/70024420/viewspace-2937582/,如需轉載,請註明出處,否則將追究法律責任。

相關文章