一、前言
從mydumper v0.11.5版本開始,mydumper提供了--load-data引數,使用此引數匯出的sql檔案將不再是insert語句,而是load data語句。在MySQL官方文件中關於load data是這麼描述的:When loading a table from a text file, use LOAD DATA. This is usually 20 times faster than using INSERT statements。load data的匯入比insert快20倍,雖然實際中可能很難達到這個效率,但是對於MyDumper/MyLoader來說,使用--load-data選項,對匯入速度的提升是顯而易見的。
二、--load-data選項
--load-data選項的引數主要有以下幾個。使用該選項匯出將預設生成dat資料檔案,而加上--csv引數將生成csv資料檔案。此外也可以自定義資料檔案裡欄位、行的分隔符等,如果不定義,將使用預設的,一般使用預設的就行。
--load-data Instead of creating INSERT INTO statements, it creates LOAD DATA statements and .dat files --csv Automatically enables --load-data and set variables to export in CSV format. --fields-terminated-by Defines the character that is written between fields --fields-enclosed-by Defines the character to enclose fields. Default: " --fields-escaped-by Single character that is going to be used to escape characters in theLOAD DATA stament, default: '\' --lines-starting-by Adds the string at the begining of each row. When --load-data is usedit is added to the LOAD DATA statement. Its affects INSERT INTO statementsalso when it is used. --lines-terminated-by Adds the string at the end of each row. When --load-data is used it isadded to the LOAD DATA statement. Its affects INSERT INTO statementsalso when it is used.
三、資料匯出
使用--load-data選項進行單庫的資料匯出:
mydumper -h $host -u $user -p $password -B test -o /mydata/backup/shemafile --load-data --less-locking
從以下匯出的內容中可以看到,對於每個表除了建表語句的sql檔案,還生成了一個包含load語句sql檔案,還有一個dat資料檔案,裡面包含了該表的所有資料。
[root@wusl shemafile]# ll total 7651704 -rw-r--r-- 1 root root 698 Apr 2 11:18 metadata -rw-r--r-- 1 root root 1958809525 Apr 2 11:18 test.sbtest1.00000.dat -rw-r--r-- 1 root root 319 Apr 2 11:18 test.sbtest1.00000.sql -rw-r--r-- 1 root root 402 Apr 2 11:18 test.sbtest1-schema.sql -rw-r--r-- 1 root root 1958809801 Apr 2 11:18 test.sbtest2.00000.dat -rw-r--r-- 1 root root 319 Apr 2 11:18 test.sbtest2.00000.sql -rw-r--r-- 1 root root 402 Apr 2 11:18 test.sbtest2-schema.sql -rw-r--r-- 1 root root 1958809332 Apr 2 11:18 test.sbtest3.00000.dat -rw-r--r-- 1 root root 319 Apr 2 11:18 test.sbtest3.00000.sql -rw-r--r-- 1 root root 402 Apr 2 11:18 test.sbtest3-schema.sql -rw-r--r-- 1 root root 1958809678 Apr 2 11:18 test.sbtest4.00000.dat -rw-r--r-- 1 root root 319 Apr 2 11:18 test.sbtest4.00000.sql -rw-r--r-- 1 root root 402 Apr 2 11:18 test.sbtest4-schema.sql -rw-r--r-- 1 root root 153 Apr 2 11:18 test-schema-create.sql -rw-r--r-- 1 root root 0 Apr 2 11:18 test-schema-triggers.sql [root@wusl shemafile]# more test.sbtest1.00000.dat 1 5014614 68487932199-96439406143-93774651418-41631865787-96406072701-20604855487-25459966574-28203206787-41238978918-19503783441 22195207048-70116052123-74140395089-76317954521-98 694025897 2 5024801 13241531885-45658403807-79170748828-69419634012-13605813761-77983377181-01582588137-21344716829-87370944992-02457486289 28733802923-10548894641-11867531929-71265603657-36 546888392 3 4989423 51185622598-89397522786-28007882305-52050087550-68686337807-48942386476-96555734557-05264042377-33586177817-31986479495 00592560354-80393027097-78244247549-39135306455-88 936868384 4 5026450 54133149494-75722987476-23015721680-47254589498-40242947469-55055884969-23675271222-20181439230-74473404563-55407972672 88488171626-98596569412-94026374972-58040528656-38 000028170 [root@wusl shemafile]# more test.sbtest1.00000.sql /*!40101 SET NAMES binary*/; /*!40014 SET FOREIGN_KEY_CHECKS=0*/; /*!40103 SET TIME_ZONE='+00:00' */; LOAD DATA LOCAL INFILE 'test.sbtest1.00000.dat' REPLACE INTO TABLE `sbtest1` CHARACTER SET binary FIELDS TERMINATED BY '\t' ENCLOSED BY '' ESCAPED BY '\\' LINES STARTING BY '' TERMINATED BY '\n' (`id`,`k`,`c`,`pad`);
四、匯入速度對比
實驗中使用的是sysbench生成的表,資料量9.5G。可以看到使用load-data匯入對比insert匯入,時間上提升了15%。有一定的提升,但不是特別明顯,主要因為是myloader在insert時,也做了很多最佳化,比如多個insert批次提交,每次提交1000行。
#--load-data匯入 [root@wusl soft]# time myloader -h 10.xx.xx.xx -u root -p xxxxxxxx -d /mydata/backup/shemafile ** (myloader:119586): WARNING **: 13:36:34.783: zstd command not found on any static location, use --exec-per-thread for non default locations real 2m49.765s user 0m4.755s sys 0m4.599s #insert匯入 [root@wusl soft]# time myloader -h 10.xx.xx.xx -u root -p xxxxxxxx -d /mydata/backup/shemafile ** (myloader:122550): WARNING **: 13:45:00.895: zstd command not found on any static location, use --exec-per-thread for non default locations real 3m20.258s user 0m11.874s sys 0m5.455s
五、對比MySQL官方mysqlshell的導數工具
MySQL Shell 8.0.21 中推出的Dump & Load工具,同樣也是使用load data命令來匯入資料,所以理論上和mydumper/myloader使用--load-data效率是一樣的。關於mysqlshell的Dump&Load工具,可以參考一下官方陳臣老師的這篇文章:https://mp.weixin.qq.com/s/RC6MykrGbZ850xh3AOjrtw。我們對這兩個工具進行一下對比實驗,既然是工具對比那就得相對公平,引數環境什麼的保持一致,首先導數前重啟一下mysql釋放buffer pool。mysqlshell匯出會預設開壓縮和表分片,需要把它關閉。此外,mysqlshell預設是會生成binlog,因此myloader匯入時也要加上-e引數開啟binlog。
#使用mysqlshell進行資料匯出 MySQL 10.xx.xx.xx:33060+ ssl JS > util.dumpSchemas(['test'],'/mydata/backup/shemafile',{compression: "none",chunking: "false"}) Acquiring global read lock Global read lock acquired Initializing - done 1 schemas will be dumped and within them 8 tables, 0 views. Gathering information - done All transactions have been started Locking instance for backup Global read lock has been released Writing global DDL files Running data dump using 4 threads. NOTE: Progress information uses estimated values and may not be accurate. Writing schema metadata - done Writing DDL - done Writing table metadata - done Starting data dump 106% (40.00M rows / ~37.59M rows), 323.53K rows/s, 63.45 MB/s Dump duration: 00:01:05s Total duration: 00:01:05s Schemas dumped: 1 Tables dumped: 8 Data size: 7.84 GB Rows written: 40001025 Bytes written: 7.84 GB Average throughput: 119.52 MB/s #使用mysqlshell進行資料匯入 MySQL 10.xx.xx.xx:33060+ ssl JS > util.loadDump("/mydata/backup/shemafile") Loading DDL and Data from '/mydata/backup/shemafile' using 4 threads. Opening dump... Target is MySQL 8.0.31. Dump was produced from MySQL 8.0.31 Scanning metadata - done Checking for pre-existing objects... Executing common preamble SQL Executing DDL - done Executing view DDL - done Starting data load 2 thds loading | 100% (7.84 GB / 7.84 GB), 24.59 MB/s, 6 / 8 tables done Recreating indexes - done Executing common postamble SQL 8 chunks (40.00M rows, 7.84 GB) for 8 tables in 1 schemas were loaded in 4 min 6 sec (avg throughput 32.03 MB/s) 0 warnings were reported during the load. #使用myloader匯入(寫binlog) [root@wusl soft]# time myloader -h 10.xx.xx.xx -u root -p xxxxxxxx -e -d /mydata/backup/shemafile ** (myloader:24020): WARNING **: 15:48:03.796: zstd command not found on any static location, use --exec-per-thread for non default locations real 3m44.150s user 0m4.855s sys 0m4.606s
從以上實驗可以看出myloader還比mysqlshell導數工具快一點點,我想原因大概是myloader更輕量級,mysqlshell導數工具在資料遷移時可以檢視進度、速度等,功能上豐富了不少。