技術分享 | 使用 sync_diff_inspector 對兩個 MySQL 進行資料校驗
作者:沈光宇
愛可生南區 DBA 團隊成員,主要負責 MySQL 故障處理和效能最佳化。對技術執著,為客戶負責。
本文來源:原創投稿
*愛可生開源社群出品,原創內容未經授權不得隨意使用,轉載請聯絡小編並註明來源。
一、sync-diff-inspector簡介
對比表結構和資料
如果資料不一致,則生成用於修復資料的 SQL 語句
支援不同庫名或表名的資料校驗
支援分庫分表場景下的資料校驗
支援 TiDB 主從叢集的資料校驗
支援從 TiDB DM 拉取配置的資料校驗
對於 MySQL 和 TiDB 之間的資料同步不支援線上校驗,需要保證上下游校驗的表中沒有資料寫入,或者保證某個範圍內的資料不再變更,透過配置 range 來校驗這個範圍內的資料。
FLOAT、DOUBLE 等浮點數型別在 TiDB 和 MySQL 中的實現方式不同,在計算 checksum 時會分別取 6 位和 15 位有效數字。如果不使用該特性,需要設定 ignore-columns 忽略這些列的檢查。
支援對不包含主鍵或者唯一索引的表進行校驗,但是如果資料不一致,生成的用於修復的 SQL 可能無法正確修復資料。
二、sync-diff-inspector 工具下載安裝
#sync-diff-inspector已整合在TiDB工具包中,直接下載TiDB工具包即可
shell> wget
shell> tar zxvf tidb-community-toolkit-v6.4.0-linux-amd64.tar.gz
shell> ls -lh tidb-community-toolkit-v6.4.0-linux-amd64 | grep sync_diff_inspector
-rwxr-xr-x 1 tidb tidb 98M Nov 17 11:41 sync_diff_inspector
shell> ./sync_diff_inspector -V
App Name: sync_diff_inspector v2.0
Release Version: v6.4.0
Git Commit Hash: f7e65073b35538def61ae094cd4a8e57e705344b
Git Branch: heads/refs/tags/v6.4.0
UTC Build Time: 2022-11-04 07:21:08
Go Version: go1.19.2
三、sync-diff-inspector工具使用示例
1. 配置檔案通用部分
shell> cat config.toml
check-thread-count = 4 # 檢查資料的執行緒數量
export-fix-sql = true # 如果開啟,若表資料存在不一致,則輸出用於修復的SQL語句
check-struct-only = false # 只對比表結構而不對比資料
[data-sources]
[data-sources.mysql1] # 上游MySQL資料庫配置(源端)
host = "10.186.65.57"
port = 3306
user = "sgy"
password = "admin"
route-rules = ["rule1"] # 對映匹配規則,透過配置相應的規則可以對單個、多個schema或table進行校驗
# 如有多個rule時,可配置成 ["rule1", "rule2"]
[data-sources.mysql2] # 下游MySQL資料庫配置(目標端)
host = "10.186.65.89"
port = 3309
user = "sgy"
password = "admin"
對單個 schema 進行資料校驗
#對映匹配規則部分,需要將此部分放到置配置檔案通用部分的後面
[routes] # 對映關係,如上下游schema不同名可在此配置
[routes.rule1]
schema-pattern = "sbtest" # 匹配上游資料庫的的庫名
target-schema = "sbtest" # 匹配下游資料庫的庫名
[task]
output-dir = "./output"
source-instances = ["mysql1"] # 上游資料庫,內容是 data-sources 宣告的唯一標識 id,分庫分表場景下支援多個上游資料庫,如:["mysql10", "mysql20"]
target-instance = "mysql2" # 下游資料庫,內容是 data-sources 宣告的唯一標識 id
target-check-tables = ["sbtest.*"] # 需要比對的下游資料庫的表
# 進行資料校驗
shell> ./sync_diff_inspector --config=./config.toml
A total of 8 tables need to be compared
Progress [>------------------------------------------------------------] 0% 0/0
Comparing the table structure of ``sbtest`.`sbtest4`` ... equivalent #表結構一致
Comparing the table structure of ``sbtest`.`sbtest5`` ... equivalent
.......................................................
Comparing the table data of ``sbtest`.`sbtest2`` ... equivalent #表中資料一致
Comparing the table data of ``sbtest`.`sbtest8`` ... equivalent
_____________________________________________________________________________
Progress [============================================================>] 100% 0/0
A total of 8 table have been compared and all are equal.
You can view the comparision details through './output/sync_diff.log'
對多個 schema 進行資料校驗
#由於對多個schema進行資料校驗,routes包含了rule1、rule2,配置檔案通用部分需要做以下修改
[data-sources.mysql1]
route-rules = ["rule1","rule2"]
#對映匹配規則部分,需要將此部分放到置配置檔案通用部分的後面
[routes]
[routes.rule1]
schema-pattern = "sbtest*" #使用正則匹配sbtest開頭的schema,如sbtest,sbtest1,sbtest2,sbtest3
target-schema = "sbtest*"
[routes.rule2]
schema-pattern = "sgy" #匹配schema:sgy
target-schema = "sgy"
[task]
output-dir = "./output"
source-instances = ["mysql1"] # 上游資料庫,內容是 data-sources 宣告的唯一標識 id,分庫分表場景下支援多個上游資料庫,如:["mysql10", "mysql20"]
target-instance = "mysql2" # 下游資料庫,內容是 data-sources 宣告的唯一標識 id
target-check-tables = [ "sbtest*.*","sgy.*"] #對源、目標例項中的sgy及以sbtest開頭的schema所有表進行校驗
#進行資料校驗
shell> ./sync_diff_inspector --config=./config.toml
A total of 24 tables need to be compared
Progress [>------------------------------------------------------------] 0% 0/0
Comparing the table structure of ``sgy`.`sbtest2`` ... equivalent
Comparing the table data of ``sgy`.`sbtest2`` ... equivalent
.......................................................
Comparing the table data of ``sbtest`.`sbtest1`` ... equivalent
Comparing the table data of ``sbtest3`.`sbtest1`` ... equivalent
Comparing the table data of ``sbtest2`.`sbtest1`` ... equivalent
Comparing the table data of ``sbtest1`.`sbtest2`` ... equivalent
_____________________________________________________________________________
Progress [============================================================>] 100% 0/0
A total of 24 table have been compared and all are equal.
You can view the comparision details through './output/sync_diff.log'
3. 基於 table 的資料校驗
對單個 table 進行資料校驗
#對映匹配規則部分,需要將此部分放到配置檔案通用部分的後面
[routes]
[routes.rule1]
schema-pattern = "sbtest"
target-schema = "sbtest"
[task]
output-dir = "./output"
source-instances = ["mysql1"] # 上游資料庫,內容是 data-sources 宣告的唯一標識 id,分庫分表場景下支援多個上游資料庫,如:["mysql10", "mysql20"]
target-instance = "mysql2" # 下游資料庫,內容是 data-sources 宣告的唯一標識 id
target-check-tables = ["sbtest.sbtest1"] #只校驗表:sbtest.sbtest1
#進行資料校驗
shell> ./sync_diff_inspector --config=./config.toml
A total of 1 tables need to be compared
Progress [>------------------------------------------------------------] 0% 0/0
Comparing the table structure of ``sbtest`.`sbtest1`` ... equivalent
Comparing the table data of ``sbtest`.`sbtest1`` ... equivalent
_____________________________________________________________________________
Progress [============================================================>] 100% 0/0
A total of 1 table have been compared and all are equal.
You can view the comparision details through './output/sync_diff.log'
對多個 table 進行資料校驗
#對schema進行資料校驗也是對多個table進行資料校驗的一種,這裡以指定多個具體表名為例
#由於對多個table進行資料校驗,routes包含了rule1、rule2,配置檔案通用部分需要做以下修改
[data-sources.mysql1]
route-rules = ["rule1","rule2"]
#對映匹配規則部分,需要將此部分放置到配置檔案通用部分的後面
[routes]
[routes.rule1]
schema-pattern = "sbtest*"
target-schema = "sbtest*"
[routes.rule2]
schema-pattern = "sgy"
target-schema = "sgy"
[task]
output-dir = "./output"
source-instances = ["mysql1"] # 上游資料庫,內容是 data-sources 宣告的唯一標識 id,分庫分表場景下支援多個上游資料庫,如:["mysql10", "mysql20"]
target-instance = "mysql2" # 下游資料庫,內容是 data-sources 宣告的唯一標識 id
#只校驗sbtest.sbtest8,sgy.sbtest4,sbtest1.sbtest1,sbtest2.sbtest2這四個表
target-check-tables = [ "sbtest.sbtest8","sgy.sbtest4","sbtest1.sbtest1","sbtest2.sbtest2"]
#進行資料校驗
shell> ./sync_diff_inspector --config=./config.toml
A total of 4 tables need to be compared
Progress [>------------------------------------------------------------] 0% 0/0
Comparing the table structure of ``sbtest2`.`sbtest2`` ... equivalent
Comparing the table data of ``sbtest2`.`sbtest2`` ... equivalent
Comparing the table structure of ``sgy`.`sbtest4`` ... equivalent
Comparing the table structure of ``sbtest1`.`sbtest1`` ... equivalent
Comparing the table data of ``sgy`.`sbtest4`` ... equivalent
Comparing the table data of ``sbtest1`.`sbtest1`` ... equivalent
Comparing the table structure of ``sbtest`.`sbtest8`` ... equivalent
Comparing the table data of ``sbtest`.`sbtest8`` ... equivalent
_____________________________________________________________________________
Progress [============================================================>] 100% 0/0
A total of 4 table have been compared and all are equal.
You can view the comparision details through './output/sync_diff.log'
對單表進行範圍校驗
#對映匹配規則部分,需要將此部分放置到配置檔案通用部分的後面
[routes]
[routes.rule1]
schema-pattern = "sbtest"
target-schema = "sbtest"
[task]
output-dir = "./output"
source-instances = ["mysql1"] # 上游資料庫,內容是 data-sources 宣告的唯一標識 id,分庫分表場景下支援多個上游資料庫,如:["mysql10", "mysql20"]
target-instance = "mysql2" # 下游資料庫,內容是 data-sources 宣告的唯一標識 id
target-check-tables = ["sbtest.sbtest1"] # 指定校驗目標例項上的sbtest.sbtest1表
target-configs = ["config1"] # 對部分表的額外配置
[table-configs.config1]
target-tables = ["sbtest.sbtest1"] # 指定校驗目標例項上的sbtest.sbtest1表
range = "id > 10 AND id < 100" # 指定校驗目標表的具體範圍,相當於SQL中的where條件
#資料校驗見資料修復功能部分
4. 資料修復功能
#為了演示資料修復,在目標端表中刪除一些資料
shell> mysql -usgy -padmin -h 10.186.65.89 -P 3309 -e "delete from sbtest.sbtest1 where id in (11,22,33,44,55,66,77,88,99);"
# 使用對單表進行範圍校驗配置,對sbtest.sbtest1表進行範圍校驗
shell> ./sync_diff_inspector --config=./config.toml
A total of 1 tables need to be compared
Comparing the table structure of ``sbtest`.`sbtest1`` ... equivalent #sbtest.sbtest1表結構一致
Comparing the table data of ``sbtest`.`sbtest1`` ... failure #sbtest.sbtest1表資料有差異
_____________________________________________________________________________
Progress [============================================================>] 100% 0/0
The data of `sbtest`.`sbtest1` is not equal
The rest of tables are all equal.
The patch file has been generated in
'output/fix-on-mysql2/' # 生成修復SQL輸出到此目錄
You can view the comparision details through './output/sync_diff.log'
# 檢視輸出修復SQL
shell> cat output/fix-on-mysql2/sbtest\:sbtest1\:0\:0-0\:0.sql
-- table: sbtest.sbtest1
-- range in sequence: Full
REPLACE INTO `sbtest`.`sbtest1`(`id`,`k`,`c`,`pad`) VALUES (11,50148,'69183965773-14680923687-92934799461-07606242492-78811530738-23241332728-92911647895-70477201282-85254929997-06214236905','33737501839-63208420999-35708593012-95906952636-68691055996');
REPLACE INTO `sbtest`.`sbtest1`(`id`,`k`,`c`,`pad`) VALUES (22,41538,'14140185946-16271766410-68340573738-46226480462-08989140676-29936780681-56784925909-45742390296-67137862436-18242076592','25112986220-19824650341-42825248958-70186905082-33867163574');
REPLACE INTO `sbtest`.`sbtest1`(`id`,`k`,`c`,`pad`) VALUES (33,50105,'48402160130-78797253227-05588677001-93556313541-39295466587-91364622063-58862572731-27837539373-64526858273-89372384747','72073637794-12055602042-16862397531-87496431032-85451396141');
REPLACE INTO `sbtest`.`sbtest1`(`id`,`k`,`c`,`pad`) VALUES (44,49917,'98096713237-15265478716-72025332919-62964308854-01270604715-12000922788-50929365082-43513513022-28543412388-57790852446','33907865533-62267179125-36062850111-84091551774-69847376840');
REPLACE INTO `sbtest`.`sbtest1`(`id`,`k`,`c`,`pad`) VALUES (55,50259,'68826640374-18002055907-53999869701-72145793168-90893177888-85273641163-24331745145-62755454379-79511152711-99618812770','02724012569-91405199011-30257626349-21678066897-42535351703');
REPLACE INTO `sbtest`.`sbtest1`(`id`,`k`,`c`,`pad`) VALUES (66,41995,'58396552954-26907336026-99506693837-77815822050-42927030403-40927779227-58101279219-11438233008-00344004393-35806649113','02348992414-65327666387-20632806790-74456238429-90933031209');
REPLACE INTO `sbtest`.`sbtest1`(`id`,`k`,`c`,`pad`) VALUES (77,49757,'73705207329-00308504929-05904865650-29498186065-09990420614-84131302024-40320022420-77358683577-34731688411-70665402097','92567035674-84728177369-79087155038-84461952379-45481760225');
REPLACE INTO `sbtest`.`sbtest1`(`id`,`k`,`c`,`pad`) VALUES (88,50497,'77112982215-98755241853-36424062009-45217742824-83650985380-60232607362-47569976121-30091332050-97996374956-97911403909','88977046077-43519705750-51246090615-77629911610-94055348738');
REPLACE INTO `sbtest`.`sbtest1`(`id`,`k`,`c`,`pad`) VALUES (99,49977,'63255973711-62890656114-72914458941-22277906368-32619356110-31219579310-16762665782-69578495131-76043317830-28240408380','08589883275-03392784968-00244590156-39735355951-95769933801');
#將生成的修復SQL匯入目標庫
shell> mysql -usgy -padmin -h 10.186.65.89 -P 3309 < output/fix-on-mysql2/sbtest\:sbtest1\:0\:0-0\:0.sql
mysql: [Warning] Using a password on the command line interface can be insecure.
#再次檢驗,發現資料已經一致
shell> ./sync_diff_inspector --config=./config.toml
A total of 1 tables need to be compared
Comparing the table structure of ``sbtest`.`sbtest1`` ... equivalent
Comparing the table data of ``sbtest`.`sbtest1`` ... equivalent
_____________________________________________________________________________
Progress [============================================================>] 100% 0/0
A total of 1 table have been compared and all are equal.
You can view the comparision details through './output/sync_diff.log'
四、相關問題
修改配置檔案後需要手動刪除 outputDir 目錄
shell> vim config.toml
shell> ./sync_diff_inspector --config=./config.toml #將output-dir刪除即可解決
Fail to initialize config.
failed to init Task: config changes breaking the checkpoint, please use another outputDir and start over again!
表建議使用 utf8mb4 字符集,不支援 MySQL8.0 的 utf8mb3 字符集
mysql> select @@version;
+-----------+
| @@version |
+-----------+
| 8.0.30 |
+-----------+
1 row in set (0.00 sec)
mysql> alter table sgy.sbtest4 CHARSET=utf8;
Query OK, 0 rows affected, 1 warning (0.01 sec)
Records: 0 Duplicates: 0 Warnings: 1
mysql> show create table sgy.sbtest4 \G
*************************** 1. row ***************************
Table: sbtest4
Create Table: CREATE TABLE `sbtest4` (
`id` int NOT NULL AUTO_INCREMENT,
`k` int NOT NULL DEFAULT '0',
`c` char(120) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL DEFAULT '',
`pad` char(60) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `k_4` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=100001 DEFAULT CHARSET=utf8mb3
1 row in set (0.00 sec)
#執行資料校驗時報錯
shell> ./sync_diff_inspector --config=./config.toml
There is something error when initialize diff, please check log info in output/sync_diff.log
#檢視日誌檔案
shell> cat output/sync_diff.log |grep utf8mb3
[2023/02/19 11:13:04.980 +08:00] [FATAL] [main.go:120] ["failed to initialize diff process"] [error="get table sgy.sbtest4's information error [parser:1115]Unknown character set: 'utf8mb3'\ngithub.com/pingcap/errors.
本文關鍵字:#資料校驗#
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/70024420/viewspace-2937582/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 技術分享 | 兩個單機 MySQL 該如何校驗資料一致性MySql
- 科學甩鍋技術: Typescript 執行時資料校驗TypeScript
- MySQL手動資料校驗+雲資料庫資料校驗MySql資料庫
- Node 在 Controller 層如何進行資料校驗Controller
- 行式填報 資料校驗 --- 小計校驗
- 前端使用bcrypt對密碼加密,伺服器對密碼進行校驗前端密碼加密伺服器
- 使用Mysql工具進行資料清洗MySql
- 火山引擎VeDI資料技術分享:兩個步驟,為Parquet降本提效
- 使用Moya庫,進行https證書校驗HTTP
- spring-boot 使用hibernate validation對引數進行優雅的校驗Springboot
- 技術分享 | MySQL 的幾種資料遷移方案MySql
- 【進階篇】使用 Stream 流對比兩個集合的常用操作分享
- springMVC:校驗框架:多規則校驗,巢狀校驗,分組校驗;ssm整合技術SpringMVC框架巢狀SSM
- java stream()流對兩個集合進行比對Java
- PHP 對資料進行驗證和過濾PHP
- 資料校驗
- SpringBoot介面 - 如何優雅的對引數進行校驗?Spring Boot
- 使用MySQL Workbench進行資料庫備份MySql資料庫
- 使用spring validation 作為資料校驗Spring
- 全網開發都可搭建技術資料庫篇-如何看待LINUX和IOS兩個程式的使用體驗資料庫LinuxiOS
- 封裝axios請求並對提交引數進行校驗封裝iOS
- 好程式設計師大資料培訓分享大資料兩大核心技術程式設計師大資料
- 使用SpringBoot進行優雅的資料驗證Spring Boot
- [資料校驗/資料質量] 資料校驗框架(Java):hibernate-validation框架Java
- 技術分享 | 你通常選用什麼工具在 Linux 中進行資料傳輸?Linux
- easypoi資料校驗
- 技術分享:Python如何進行記憶體管理?Python記憶體
- python的ORM技術:使用sqlalchemy操作mysql資料庫PythonORMMySql資料庫
- 大資料競賽技術分享大資料
- 使用@Validated校驗List集合中資料失效
- Jmeter使用beanshell對資料進行加密傳輸JMeterBean加密
- GaussDB(for MySQL)雲原生資料庫技術演進和挑戰MySql資料庫
- 轉轉業務資料校驗平臺實踐分享
- 使用 locust 對 mysql 語句進行壓測MySql
- [譯] 使用 Pandas 對 Kaggle 資料集進行統計資料分析
- 如何利用區塊鏈技術進行資料儲存?區塊鏈
- 從技術方面來看:資料治理怎樣進行?
- 從技術方面來說,資料治理怎麼進行?