在虛擬機器上測評了下MySQL 和 PostgreSQL 的各種LOAD FILE方式以及時間。 因為是虛擬機器上的測評,所以時間只做參考,不要太較真, 看看就好了。

MySQL 工具:

   1. 自帶mysqlimport工具。

   2. 命令列 load data infile …

   3. 利用mysql-connector-python Driver來寫的指令碼。

PostgreSQL 工具:

   1. pgloader 第三方工具。

   2. 命令列 copy … from …

   3. 利用psycopg2寫的python 指令碼。

測試表結構:

mysql> desc t1;
+----------+-----------+------+-----+-------------------+-------+
| Field    | Type      | Null | Key | Default           | Extra |
+----------+-----------+------+-----+-------------------+-------+
| id       | int(11)   | NO   | PRI | NULL              |       |
| rank     | int(11)   | NO   |     | NULL              |       |
| log_time | timestamp | YES  |     | CURRENT_TIMESTAMP |       |
+----------+-----------+------+-----+-------------------+-------+
3 rows in set (0.00 sec)
mysql> select count(*) from t1;
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+
1 row in set (6.80 sec)

測試CSV檔案:

t1.csv

MySQL 自身的loader: (時間24妙)

mysql> load data infile `/tmp/t1.csv` into table t1 fields terminated by `,` enclosed by `"` lines terminated by `
`;
Query OK, 1000000 rows affected (24.21 sec)
Records: 1000000  Deleted: 0  Skipped: 0  Warnings: 0

MySQL python 指令碼:(時間23秒)

>>>

Running 23.289 Seconds

MySQL 自帶mysqlimport:(時間23秒)

[root@mysql56-master ~]# time mysqlimport t_girl `/tmp/t1.csv` --fields-terminated-by=`,` --fields-enclosed-by=`"` --lines-terminated-by=`
` --use-threads=2 -uroot -proot
t_girl.t1: Records: 1000000  Deleted: 0  Skipped: 0  Warnings: 0
real    0m23.664s
user    0m0.016s
sys     0m0.037s

PostgreSQL 自身COPY:(時間7秒)

t_girl=# copy t1 from `/tmp/t1.csv` with delimiter `,`;
COPY 1000000
Time: 7700.332 ms

Psycopg2 驅動copy_to方法:(時間6秒)

[root@postgresql-instance scripts]# python load_data.py
Running 5.969 Seconds.

Pgloader 匯入CSV:(時間33秒)

[root@postgresql-instance ytt]# pgloader commands.load     
                    table name       read   imported     errors            time
                        ytt.t1    1000000    1000000          0         33.514s
------------------------------  ---------  ---------  ---------  --------------
------------------------------  ---------  ---------  ---------  --------------
             Total import time    1000000    1000000          0         33.514s
Pgloader 直接從MySQL 拉資料:(時間51秒)
[root@postgresql-instance ytt]# pgloader commands.mysql
                    table name       read   imported     errors            time
               fetch meta data          2          2          0          0.138s
------------------------------  ---------  ---------  ---------  --------------
                            t1    1000000    1000000          0         51.136s
------------------------------  ---------  ---------  ---------  --------------
------------------------------  ---------  ---------  ---------  --------------
------------------------------  ---------  ---------  ---------  --------------
             Total import time    1000000    1000000          0         51.274s

附上commands.load和commands.mysql

commands.load:
LOAD CSV 
   FROM `/tmp/ytt.csv` WITH ENCODING UTF-8
        ( 
           id, rank, log_time 
        ) 
   INTO postgresql://t_girl:t_girl@127.0.0.1:5432/t_girl?ytt.t1
   WITH skip header = 0, 
        fields optionally enclosed by `"`, 
        fields escaped by backslash-quote, 
        fields terminated by `,` 
    SET work_mem to `32 MB`, maintenance_work_mem to `64 MB`;
commands.mysql:
LOAD DATABASE 
     FROM mysql://python_user:python_user@192.168.1.131:3306/t_girl?t1
     INTO postgresql://t_girl:t_girl@127.0.0.1:5432/t_girl?ytt.t1
 with data only
  SET maintenance_work_mem to `64MB`, 
      work_mem to `3MB`, 
      search_path to `ytt`;

附pgloader 手冊:

http://pgloader.io/howto/pgloader.1.html