MyDumper/MyLoader的進階玩法

海布里_MySQL發表於2024-04-04

原文網址 : https://www.cnblogs.com/coygfly/p/18112776

一、前言

從mydumper v0.11.5版本開始，mydumper提供了--load-data引數，使用此引數匯出的sql檔案將不再是insert語句，而是load data語句。在MySQL官方文件中關於load data是這麼描述的：When loading a table from a text file, use LOAD DATA. This is usually 20 times faster than using INSERT statements。load data的匯入比insert快20倍，雖然實際中可能很難達到這個效率，但是對於MyDumper/MyLoader來說，使用--load-data選項，對匯入速度的提升是顯而易見的。

二、--load-data選項

--load-data選項的引數主要有以下幾個。使用該選項匯出將預設生成dat資料檔案，而加上--csv引數將生成csv資料檔案。此外也可以自定義資料檔案裡欄位、行的分隔符等，如果不定義，將使用預設的，一般使用預設的就行。

--load-data                      Instead of creating INSERT INTO statements, it creates LOAD DATA statements and .dat files
--csv                            Automatically enables --load-data and set variables to export in CSV format.
--fields-terminated-by           Defines the character that is written between fields
--fields-enclosed-by             Defines the character to enclose fields. Default: "
--fields-escaped-by              Single character that is going to be used to escape characters in theLOAD DATA stament, default: '\' 
--lines-starting-by              Adds the string at the begining of each row. When --load-data is usedit is added to the LOAD DATA statement. Its affects INSERT INTO statementsalso when it is used.
--lines-terminated-by            Adds the string at the end of each row. When --load-data is used it isadded to the LOAD DATA statement. Its affects INSERT INTO statementsalso when it is used.

三、資料匯出

使用--load-data選項進行單庫的資料匯出：

mydumper -h $host -u $user -p $password -B test -o /mydata/backup/shemafile --load-data --less-locking

從以下匯出的內容中可以看到，對於每個表除了建表語句的sql檔案，還生成了一個包含load語句sql檔案，還有一個dat資料檔案，裡面包含了該表的所有資料。

[root@wusl shemafile]# ll
total 7651704
-rw-r--r-- 1 root root        698 Apr  2 11:18 metadata
-rw-r--r-- 1 root root 1958809525 Apr  2 11:18 test.sbtest1.00000.dat
-rw-r--r-- 1 root root        319 Apr  2 11:18 test.sbtest1.00000.sql
-rw-r--r-- 1 root root        402 Apr  2 11:18 test.sbtest1-schema.sql
-rw-r--r-- 1 root root 1958809801 Apr  2 11:18 test.sbtest2.00000.dat
-rw-r--r-- 1 root root        319 Apr  2 11:18 test.sbtest2.00000.sql
-rw-r--r-- 1 root root        402 Apr  2 11:18 test.sbtest2-schema.sql
-rw-r--r-- 1 root root 1958809332 Apr  2 11:18 test.sbtest3.00000.dat
-rw-r--r-- 1 root root        319 Apr  2 11:18 test.sbtest3.00000.sql
-rw-r--r-- 1 root root        402 Apr  2 11:18 test.sbtest3-schema.sql
-rw-r--r-- 1 root root 1958809678 Apr  2 11:18 test.sbtest4.00000.dat
-rw-r--r-- 1 root root        319 Apr  2 11:18 test.sbtest4.00000.sql
-rw-r--r-- 1 root root        402 Apr  2 11:18 test.sbtest4-schema.sql
-rw-r--r-- 1 root root        153 Apr  2 11:18 test-schema-create.sql
-rw-r--r-- 1 root root          0 Apr  2 11:18 test-schema-triggers.sql

[root@wusl shemafile]# more test.sbtest1.00000.dat
1       5014614 68487932199-96439406143-93774651418-41631865787-96406072701-20604855487-25459966574-28203206787-41238978918-19503783441 22195207048-70116052123-74140395089-76317954521-98
694025897
2       5024801 13241531885-45658403807-79170748828-69419634012-13605813761-77983377181-01582588137-21344716829-87370944992-02457486289 28733802923-10548894641-11867531929-71265603657-36
546888392
3       4989423 51185622598-89397522786-28007882305-52050087550-68686337807-48942386476-96555734557-05264042377-33586177817-31986479495 00592560354-80393027097-78244247549-39135306455-88
936868384
4       5026450 54133149494-75722987476-23015721680-47254589498-40242947469-55055884969-23675271222-20181439230-74473404563-55407972672 88488171626-98596569412-94026374972-58040528656-38
000028170

[root@wusl shemafile]# more test.sbtest1.00000.sql
/*!40101 SET NAMES binary*/;
/*!40014 SET FOREIGN_KEY_CHECKS=0*/;
/*!40103 SET TIME_ZONE='+00:00' */;
LOAD DATA LOCAL INFILE 'test.sbtest1.00000.dat' REPLACE INTO TABLE `sbtest1` CHARACTER SET binary FIELDS TERMINATED BY '\t' ENCLOSED BY '' ESCAPED BY '\\' LINES STARTING BY '' TERMINATED
 BY '\n' (`id`,`k`,`c`,`pad`);

四、匯入速度對比

實驗中使用的是sysbench生成的表，資料量9.5G。可以看到使用load-data匯入對比insert匯入，時間上提升了15%。有一定的提升，但不是特別明顯，主要因為是myloader在insert時，也做了很多最佳化，比如多個insert批次提交，每次提交1000行。

#--load-data匯入
[root@wusl soft]# time myloader -h 10.xx.xx.xx -u root -p xxxxxxxx -d /mydata/backup/shemafile
** (myloader:119586): WARNING **: 13:36:34.783: zstd command not found on any static location, use --exec-per-thread for non default locations

real    2m49.765s
user    0m4.755s
sys     0m4.599s

#insert匯入
[root@wusl soft]# time myloader -h 10.xx.xx.xx -u root -p xxxxxxxx -d /mydata/backup/shemafile
** (myloader:122550): WARNING **: 13:45:00.895: zstd command not found on any static location, use --exec-per-thread for non default locations

real    3m20.258s
user    0m11.874s
sys     0m5.455s

五、對比MySQL官方mysqlshell的導數工具

MySQL Shell 8.0.21 中推出的Dump & Load工具，同樣也是使用load data命令來匯入資料，所以理論上和mydumper/myloader使用--load-data效率是一樣的。關於mysqlshell的Dump&Load工具，可以參考一下官方陳臣老師的這篇文章：https://mp.weixin.qq.com/s/RC6MykrGbZ850xh3AOjrtw。我們對這兩個工具進行一下對比實驗，既然是工具對比那就得相對公平，引數環境什麼的保持一致，首先導數前重啟一下mysql釋放buffer pool。mysqlshell匯出會預設開壓縮和表分片，需要把它關閉。此外，mysqlshell預設是會生成binlog，因此myloader匯入時也要加上-e引數開啟binlog。

#使用mysqlshell進行資料匯出
MySQL  10.xx.xx.xx:33060+ ssl  JS > util.dumpSchemas(['test'],'/mydata/backup/shemafile',{compression: "none",chunking: "false"})
Acquiring global read lock
Global read lock acquired
Initializing - done 
1 schemas will be dumped and within them 8 tables, 0 views.
Gathering information - done 
All transactions have been started
Locking instance for backup
Global read lock has been released
Writing global DDL files
Running data dump using 4 threads.
NOTE: Progress information uses estimated values and may not be accurate.
Writing schema metadata - done       
Writing DDL - done       
Writing table metadata - done       
Starting data dump
106% (40.00M rows / ~37.59M rows), 323.53K rows/s, 63.45 MB/s                 
Dump duration: 00:01:05s                                     
Total duration: 00:01:05s                                    
Schemas dumped: 1                                            
Tables dumped: 8                                             
Data size: 7.84 GB                                           
Rows written: 40001025                                       
Bytes written: 7.84 GB                                       
Average throughput: 119.52 MB/s

#使用mysqlshell進行資料匯入
MySQL  10.xx.xx.xx:33060+ ssl  JS > util.loadDump("/mydata/backup/shemafile")
Loading DDL and Data from '/mydata/backup/shemafile' using 4 threads.
Opening dump...
Target is MySQL 8.0.31. Dump was produced from MySQL 8.0.31
Scanning metadata - done       
Checking for pre-existing objects...
Executing common preamble SQL
Executing DDL - done       
Executing view DDL - done       
Starting data load
2 thds loading | 100% (7.84 GB / 7.84 GB), 24.59 MB/s, 6 / 8 tables done 
Recreating indexes - done       
Executing common postamble SQL                                          
8 chunks (40.00M rows, 7.84 GB) for 8 tables in 1 schemas were loaded in 4 min 6 sec (avg throughput 32.03 MB/s)
0 warnings were reported during the load.

#使用myloader匯入（寫binlog）
[root@wusl soft]# time myloader -h 10.xx.xx.xx -u root -p xxxxxxxx -e -d /mydata/backup/shemafile
** (myloader:24020): WARNING **: 15:48:03.796: zstd command not found on any static location, use --exec-per-thread for non default locations

real    3m44.150s
user    0m4.855s
sys     0m4.606s

從以上實驗可以看出myloader還比mysqlshell導數工具快一點點，我想原因大概是myloader更輕量級，mysqlshell導數工具在資料遷移時可以檢視進度、速度等，功能上豐富了不少。

mydumper和myloader使用引數解釋
2018-05-23
mydumper和myloader引數使用說明
2018-05-17
MySQL備份與恢復——基於MyDumper/MyLoader 邏輯備份恢復
2021-07-15
MySql
pip 的高階玩法
2021-09-09
mydumper
2024-04-09
函式高階玩法
2024-05-24
函式
mydumper使用詳解
2018-10-16
【MySQL】Mydumper工作原理
2018-06-28
MySql
MyDumper實戰分享
2024-03-30
ChatGPT 的 18 種高階玩法，驚到我了！！
2023-05-18
ChatGPT
【Flutter高階玩法- Flow 】我的位置我做主
2020-03-14
Flutter
使用myloader恢復資料教程
2021-10-18
高階玩法之類的裝飾器的應用
2024-08-06
微服務化的不同階段 Kubernetes 的不同玩法
2018-04-16
微服務
mydumper備份恢復
2019-03-19
mydumper工作流程圖
2018-03-12
流程圖
Golang技巧之預設值設定的高階玩法
2020-07-01
Golang
mydumper使用及原理淺析
2020-10-07
函式的進階
2018-11-01
函式
資料探勘裡的“降維”----從五階魔方的玩法思考
2020-07-28
小程式·雲開發的雲函式路由高階玩法
2018-09-21
函式路由
【Flutter高階玩法】貝塞爾曲線的表象認知
2020-03-28
Flutter
高階前端的進階——CSS之flex
2019-03-11
前端CSSFlex
高階 Android 工程師的進階之路
2018-07-13
Android工程師
【Flutter高階玩法-shape】Path在手，天下我有
2020-03-06
Flutter
MySQL備份遷移之mydumper
2022-04-25
MySql
Python函式的進階
2019-01-06
Python函式
前端進階的破冰之旅
2018-11-17
前端
Python的基礎進階
2020-10-06
Python
css進階less的使用
2018-04-27
CSS
Kubernetes的Pod進階（十一）
2022-01-27
Kubernetes的Controller進階(十二)
2022-01-28
Controller
Vue的進階屬性
2020-12-21
Vue
高階前端進階（三）
2021-11-22
前端
高階前端進階（七）
2022-12-17
前端
高階前端進階（五）
2022-05-09
前端
.NET 資料庫事務的各種玩法進化
2020-08-24
資料庫
【flutter高階玩法】貝塞爾實戰1 - 波浪
2020-04-04
Flutter

MyDumper/MyLoader的進階玩法

相關文章