Datax助力輕鬆遷移SQLServer資料至GreatSQL

GreatSQL發表於2024-04-22

1.環境說明

1.1源端SQLSserver

版本 IP
Microsoft SQL Server 2017 192.168.140.160 1433

1.2目標端GreatSQL

版本 IP
GreatSQL-8.0.32 192.168.139.86 3308

2.安裝環境

2.1安裝SQLServer環境

環境說明:藉助Docker使用映象啟動資料庫

2.1.1安裝docker

1.安裝基礎軟體包

$ yum install -y wget net-tools nfs-utils lrzsz gcc gcc-c++ make cmake libxml2-devel openssl-devel curl curl-devel unzip sudo ntp libaio-devel wget vim ncurses-devel autoconf automake zlib-devel python-devel epel-release openssh-server socat ipvsadm conntrack yum-utils

2.配置 docker-ce 國內 yum 源(阿里雲)

$ yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo

3.安裝 docker 依賴包

$ yum install -y  device-mapper-persistent-data lvm2 

4.安裝 docker-ce

$ yum install docker-ce -y 

5.啟動容器

$ systemctl start docker && systemctl enable docker 

2.1.2 拉取映象

$ docker pull mcr.microsoft.com/mssql/server:2017-latest 

2.1.3執行容器

$ docker run -e "ACCEPT_EULA=Y" -e "SA_PASSWORD=********" \
-p 1433:1433 --name sqlserver2017 \
-d mcr.microsoft.com/mssql/server:2017-latest 

此處記得密碼設定為複雜密碼

引數解釋:

  • -e "ACCEPT_EULA=Y":預設選擇同意協議許可證

  • -e "SA_PASSWORD=********":設定連線密碼,密碼不能太短太簡單,否則不滿足sqlserver密碼規範,容器會停止執行

  • -p 1433:1433: 宿主機埠對映到容器埠(前者為宿主機)

  • --name sqlserver2017:容器別名

  • -d: 後臺執行

  • mcr.microsoft.com/mssql/server:2017-latest:映象名稱:標籤

2.1.4使用資料庫

1.進入容器

$ docker exec -it sqlserver2017 bash

2.連線資料庫

$ /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "********"

3.查詢資料庫

1> select name from sys.Databases;
2> go

4.建立資料庫

1> create database testdb;
2> go

5.建立表並插入資料

use testdb
create table t1(id int)
go
Insert into t1 values(1),(2)
go

2.2安裝GreatSQL環境

使用Docker映象進行安裝,直接拉取GreatSQL映象即可

$ docker pull greatsql/greatsql

並建立GreatSQL容器

$ docker run -d --name greatsql --hostname=greatsql -e MYSQL_ALLOW_EMPTY_PASSWORD=1 greatsql/greatsql

2.3安裝datax

DataX安裝需要依賴的環境

  • JDK(1.8以上,推薦1.8)

  • Python(推薦Python2.6.X及以上)

安裝步驟,解壓即用,但是解壓完成不做其他操作執行job會報錯

$ cd /soft

$ ll
total 3764708
-rw-r--r--  1 root root  853734462 Dec  9 04:06 datax.tar.gz

$ tar xf datax.tar.gz

$ python /soft/datax/bin/datax.py /soft/datax/job/job.json 

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
2023-07-19 11:19:17.483 [main] WARN  ConfigParser - 外掛[streamreader,streamwriter]載入失敗,1s後重試... Exception:Code:[Common-00], Describe:[您提供的配置檔案存在錯誤資訊,請檢查您的作業配置 .] - 配置資訊錯誤,您提供的配置檔案[/soft/datax/plugin/reader/._mysqlreader/plugin.json]不存在. 請檢查您的配置檔案. 
2023-07-19 11:19:18.488 [main] ERROR Engine - 
經DataX智慧分析,該任務最可能的錯誤原因是:
com.alibaba.datax.common.exception.DataXException: Code:[Common-00], Describe:[您提供的配置檔案存在錯誤資訊,請檢查您的作業配置 .] - 配置資訊錯誤,您提供的配置檔案[/soft/datax/plugin/reader/._mysqlreader/plugin.json]不存在. 請檢查您的配置檔案.
at com.alibaba.datax.common.exception.DataXException.asDataXException(DataXException.java:26)
at com.alibaba.datax.common.util.Configuration.from(Configuration.java:95)
at com.alibaba.datax.core.util.ConfigParser.parseOnePluginConfig(ConfigParser.java:153)
at com.alibaba.datax.core.util.ConfigParser.parsePluginConfig(ConfigParser.java:125)
at com.alibaba.datax.core.util.ConfigParser.parse(ConfigParser.java:63)
at com.alibaba.datax.core.Engine.entry(Engine.java:137)
at com.alibaba.datax.core.Engine.main(Engine.java:204)

解決報錯:刪除plugin目錄下以及plugin/reader和plugin/writer目錄下的所有._開頭的檔案

需要刪除三個目錄下的隱藏檔案

  • plugin/
  • plugin/reader/
  • plugin/writer/
$ rm -rf /opt/app/datax/plugin/._*
$ rm -rf /opt/app/datax/plugin/reader/._*
$ rm -rf /opt/app/datax/plugin/writer/._*

執行一個測例,檢測datax是否安裝成功

$ python /soft/datax/bin/datax.py  /soft/datax/job/job.json 

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
 2023-07-19 11:22:12.298 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2023-07-19 11:22:12.305 [main] INFO  Engine - the machine info  => 
osInfo: Oracle Corporation 1.8 25.251-b08
jvmInfo: Linux amd64 4.19.25-200.1.el7.bclinux.x86_64
cpu num: 48
totalPhysicalMemory: -0.00G
freePhysicalMemory: -0.00G
maxFileDescriptorCount: -1
currentOpenFileDescriptorCount: -1
GC Names [PS MarkSweep, PS Scavenge]

MEMORY_NAME                    | allocation_size                | init_size                      
PS Eden Space                  | 256.00MB                       | 256.00MB                       
Code Cache                     | 240.00MB                       | 2.44MB                         
Compressed Class Space         | 1,024.00MB                     | 0.00MB                         
PS Survivor Space              | 42.50MB                        | 42.50MB                        
PS Old Gen                     | 683.00MB                       | 683.00MB                       
Metaspace                      | -0.00MB                        | 0.00MB  

2023-07-19 11:22:12.320 [main] INFO  Engine - 
{"content":[{"reader":{"name":"streamreader",
"parameter":{"column":[
{"type":"string","value":"DataX"},
{"type":"long","value":19890604},
{"type":"date","value":"1989-06-04 00:00:00"},
{"type":"bool","value":true},
{"type":"bytes","value":"test"}
],"sliceRecordCount":100000}
},"writer":{"name":"streamwriter","parameter":{"encoding":"UTF-8","print":false}}}],
"setting":{"errorLimit":{"percentage":0.02,"record":0},
"speed":{"byte":10485760}}}

2023-07-19 11:22:12.336 [main] WARN  Engine - prioriy set to 0, because NumberFormatException, the value is: null
2023-07-19 11:22:12.337 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2023-07-19 11:22:12.338 [main] INFO  JobContainer - DataX jobContainer starts job.
2023-07-19 11:22:12.339 [main] INFO  JobContainer - Set jobId = 0
2023-07-19 11:22:12.352 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2023-07-19 11:22:12.352 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] do prepare work .
2023-07-19 11:22:12.352 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] do prepare work .
2023-07-19 11:22:12.352 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2023-07-19 11:22:12.353 [job-0] INFO  JobContainer - Job set Max-Byte-Speed to 10485760 bytes.
2023-07-19 11:22:12.354 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] splits to [1] tasks.
2023-07-19 11:22:12.354 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] splits to [1] tasks.
2023-07-19 11:22:12.371 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2023-07-19 11:22:12.375 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2023-07-19 11:22:12.376 [job-0] INFO  JobContainer - Running by standalone Mode.
2023-07-19 11:22:12.384 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2023-07-19 11:22:12.388 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2023-07-19 11:22:12.388 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2023-07-19 11:22:12.396 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2023-07-19 11:22:12.697 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[302]ms
2023-07-19 11:22:12.698 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2023-07-19 11:22:22.402 [job-0] INFO  StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes | Speed 253.91KB/s, 10000 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.020s |  All Task WaitReaderTime 0.033s | Percentage 100.00%
2023-07-19 11:22:22.402 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2023-07-19 11:22:22.402 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] do post work.
2023-07-19 11:22:22.403 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] do post work.
2023-07-19 11:22:22.403 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2023-07-19 11:22:22.403 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: /soft/datax/hook
2023-07-19 11:22:22.404 [job-0] INFO  JobContainer - 
 [total cpu info] => 
averageCpu                     | maxDeltaCpu                    | minDeltaCpu                    
-1.00%                         | -1.00%                         | -1.00%
 [total gc info] => 
 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime     
 PS MarkSweep         | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             
 PS Scavenge          | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             

2023-07-19 11:22:22.404 [job-0] INFO  JobContainer - PerfTrace not enable!
2023-07-19 11:22:22.404 [job-0] INFO  StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes | Speed 253.91KB/s, 10000 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.020s |  All Task WaitReaderTime 0.033s | Percentage 100.00%
2023-07-19 11:22:22.406 [job-0] INFO  JobContainer - 
任務啟動時刻                    : 2023-07-19 11:22:12
任務結束時刻                    : 2023-07-19 11:22:22
任務總計耗時                    :                 10s
任務平均流量                    :          253.91KB/s
記錄寫入速度                    :          10000rec/s
讀出記錄總數                    :              100000
讀寫失敗總數                    :                   0

3.SQLServer2GreatSQL全量遷移

3.1 源端(SQLServer)造測試資料

$ docker exec -it 47bd0ed79c26 /bin/bash

$ /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "********"

1> create database testdb
1> use testdb
1> insert into t1 values(1),(2),(3);
2> go
1> select * from t1;
2> go
id         
\-----------
​          1
​          2
​          3

3.2 目標端(GreatSQL)建立表結構

greatsql> create database testdb;
greatsql> use testdb;
greatsql> create table t1 (id int primary key);

3.3 編寫Datax的job檔案

$ cat /soft/datax/job/sqlserver_to_greatsql.json
{
​    "job": {
​        "content": [
​            {
​                "reader": {
​                    "name": "sqlserverreader",
​                    "parameter": {
​                        "connection": [
​                            {
​                                "jdbcUrl": ["jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb"],
​                                "table": ["t1"]
​                            }
​                        ],
​                        "password": "********",
​                        "username": "SA",
​                        "column": ["*"]
​                    }
​                },
​                "writer": {
​                    "name": "mysqlwriter",
​                    "parameter": {
​                        "column": ["*"],
​                        "connection": [
​                            {
​                                "jdbcUrl": "jdbc:mysql://10.17.139.86:3308/testdb",
​                                "table": ["t1"]
​                            }
​                       ],
​                        "password": "******",
​                        "session": [],
​                        "username": "admin",
​                        "writeMode": "insert"
​                    }
​                }
​            }
​        ],
​        "setting": {
​            "speed": {
​                "channel": "5"
​            }
​        }
​    }
}

3.4 執行Datax遷移任務

$ python /soft/datax/bin/datax.py /soft/datax/job/sqlserver_to_greatsql.json 

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.

2023-11-28 09:58:44.087 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2023-11-28 09:58:44.104 [main] INFO  Engine - the machine info  => 
osInfo: Oracle Corporation 1.8 25.181-b13
jvmInfo: Linux amd64 3.10.0-957.el7.x86_64

cpu num: 8
totalPhysicalMemory: -0.00G
freePhysicalMemory: -0.00G
maxFileDescriptorCount: -1
currentOpenFileDescriptorCount: -1
GC Names [PS MarkSweep, PS Scavenge]
MEMORY_NAME                    | allocation_size                | init_size                      
PS Eden Space                  | 256.00MB                       | 256.00MB                       
Code Cache                     | 240.00MB                       | 2.44MB                         
Compressed Class Space         | 1,024.00MB                     | 0.00MB                         
PS Survivor Space              | 42.50MB                        | 42.50MB                        
PS Old Gen                     | 683.00MB                       | 683.00MB                       
Metaspace                      | -0.00MB                        | 0.00MB                         

2023-11-28 09:58:44.137 [main] INFO  Engine - 
{
"content":[
{"reader":{
"name":"sqlserverreader",
"parameter":{
"column":["*"],
"connection":[
{"jdbcUrl":["jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb"],
"table":["t1"]}],
"password":"*************",
"username":"SA"}},

"writer":{"name":"mysqlwriter","parameter":{"column":["*"],

"connection":[{"jdbcUrl":"jdbc:mysql://10.17.139.86:3308/testdb",
"table":["t1"]}],
"password":"********",
"session":[],
"username":"admin",
"writeMode":"insert"}}}],
"setting":{"speed":{"channel":"5"}}}

2023-11-28 09:58:44.176 [main] WARN  Engine - prioriy set to 0, because NumberFormatException, the value is: null
2023-11-28 09:58:44.179 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2023-11-28 09:58:44.180 [main] INFO  JobContainer - DataX jobContainer starts job.
2023-11-28 09:58:44.183 [main] INFO  JobContainer - Set jobId = 0
2023-11-28 09:58:44.542 [job-0] INFO  OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb.
2023-11-28 09:58:44.544 [job-0] WARN  OriginalConfPretreatmentUtil - 您的配置檔案中的列配置存在一定的風險. 因為您未配置讀取資料庫表的列,當您的表欄位個數、型別有變動時,可能影響任務正確性甚至會執行出錯。請檢查您的配置並作出修改.
Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
2023-11-28 09:58:45.099 [job-0] INFO  OriginalConfPretreatmentUtil - table:[t1] all columns:[id].
2023-11-28 09:58:45.099 [job-0] WARN  OriginalConfPretreatmentUtil - 您的配置檔案中的列配置資訊存在風險. 因為您配置的寫入資料庫表的列為*,當您的表欄位個數、型別有變動時,可能影響任務正確性甚至會執行出錯。請檢查您的配置並作出修改.
2023-11-28 09:58:45.102 [job-0] INFO  OriginalConfPretreatmentUtil - Write data [
insert INTO %s (id) VALUES(?)
], which jdbcUrl like:[jdbc:mysql://10..17.139.86:16310/testdb?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]
2023-11-28 09:58:45.103 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2023-11-28 09:58:45.103 [job-0] INFO  JobContainer - DataX Reader.Job [sqlserverreader] do prepare work 
2023-11-28 09:58:45.104 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do prepare work .
2023-11-28 09:58:45.104 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2023-11-28 09:58:45.105 [job-0] INFO  JobContainer - Job set Channel-Number to 5 channels.
2023-11-28 09:58:45.112 [job-0] INFO  JobContainer - DataX Reader.Job [sqlserverreader] splits to [1] tasks.
2023-11-28 09:58:45.114 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] splits to [1] tasks.
2023-11-28 09:58:45.135 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2023-11-28 09:58:45.139 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2023-11-28 09:58:45.142 [job-0] INFO  JobContainer - Running by standalone Mode.
2023-11-28 09:58:45.151 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2023-11-28 09:58:45.157 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2023-11-28 09:58:45.158 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2023-11-28 09:58:45.173 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2023-11-28 09:58:45.181 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Begin to read record by Sql: [select * from t1 
] jdbcUrl:[jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb].
2023-11-28 09:58:45.398 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Finished read record by Sql: [select * from t1 
] jdbcUrl:[jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb].
2023-11-28 09:58:45.454 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[284]ms
2023-11-28 09:58:45.455 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2023-11-28 09:58:55.175 [job-0] INFO  StandAloneJobContainerCommunicator - Total 3 records, 3 bytes | Speed 0B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2023-11-28 09:58:55.175 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2023-11-28 09:58:55.175 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do post work.
2023-11-28 09:58:55.176 [job-0] INFO  JobContainer - DataX Reader.Job [sqlserverreader] do post work.
2023-11-28 09:58:55.176 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2023-11-28 09:58:55.176 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: /soft/datax/hook
2023-11-28 09:58:55.177 [job-0] INFO  JobContainer - 
 [total cpu info] => 
averageCpu                     | maxDeltaCpu                    | minDeltaCpu                    
-1.00%                         | -1.00%                         | -1.00%
 [total gc info] => 
 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime     
 PS MarkSweep         | 1                  | 1                  | 1                  | 0.061s             | 0.061s             | 0.061s             
 PS Scavenge          | 1                  | 1                  | 1                  | 0.039s             | 0.039s             | 0.039s             
2023-11-28 09:58:55.177 [job-0] INFO  JobContainer - PerfTrace not enable!
2023-11-28 09:58:55.177 [job-0] INFO  StandAloneJobContainerCommunicator - Total 3 records, 3 bytes | Speed 0B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2023-11-28 09:58:55.179 [job-0] INFO  JobContainer - 
任務啟動時刻                    : 2023-11-28 09:58:44
任務結束時刻                    : 2023-11-28 09:58:55
任務總計耗時                    :                 10s
任務平均流量                    :                0B/s
記錄寫入速度                    :              0rec/s
讀出記錄總數                    :                   3
讀寫失敗總數                    :                   0

3.5 到目標端驗證資料

greatsql> select * from t1;
+----+
| id |
+----+
|  1 |
|  2 |
|  3 |
+----+
3 rows in set (0.01 sec)

4. SQLServer to GreatSQL增量遷移

4.1 源端(SQLServer)建立測試資料

2> create table t2 (id int,createtime datetime);
3> go
1> insert into t2 values(1,GETDATE());
2> g
(1 rows affected)
1> insert into t2 values(2,GETDATE());
2> go
(1 rows affected)
1> insert into t2 values(3,GETDATE());
2> go
(1 rows affected)
1> insert into t2 values(4,GETDATE());
2> go
(1 rows affected)
1> insert into t2 values(5,GETDATE());
2> go
(1 rows affected)
1> insert into t2 values(6,GETDATE());
2> go
(1 rows affected)
1> select * from t2;
2> go
id          createtime             
---------- -----------------------
​          1 2023-11-28 02:18:20.790
​          2 2023-11-28 02:18:27.040
​          3 2023-11-28 02:18:32.103
​          4 2023-11-28 02:18:37.690
​          5 2023-11-28 02:18:41.450
​          6 2023-11-28 02:18:46.330

4.2 編寫Datax的全量遷移job檔案

$ cat sqlserver_to_greatsql_inc.json 
{
​    "job": {
​        "content": [
​            {
​                "reader": {
​                    "name": "sqlserverreader",
​                    "parameter": {
​                        "connection": [
​                            {
​                                "jdbcUrl": ["jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb"],
​                                "table": ["t2"]
​                            }
​                        ],
​                        "password": "********",
​                        "username": "SA",
​                        "column": ["*"]
​                    }
​                },
​                "writer": {
​                    "name": "mysqlwriter",
​                    "parameter": {
​                        "column": ["*"],
​                        "connection": [
​                            {
​                                "jdbcUrl": "jdbc:mysql://10.17.139.86:3308/testdb",
​                                "table": ["t2"]
​                            }
​                        ],
​                        "password": "!QAZ2wsx",
​                        "session": [],
​                        "username": "admin",
​                        "writeMode": "insert"
​                    }
​                }
​            }
​        ],
​        "setting": {
​            "speed": {
​                "channel": "5"
​            }
​        }
​    }
}

4.3 執行Datax全量遷移任務

$ python /soft/datax/bin/datax.py /soft/datax/job/sqlserver_to_greatsql_inc.json 

 DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
2023-11-28 10:19:59.279 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2023-11-28 10:19:59.286 [main] INFO  Engine - the machine info  => 
osInfo: Oracle Corporation 1.8 25.181-b13
jvmInfo: Linux amd64 3.10.0-957.el7.x86_64
cpu num: 8
totalPhysicalMemory: -0.00G
freePhysicalMemory: -0.00G
maxFileDescriptorCount: -1
currentOpenFileDescriptorCount: -1
GC Names [PS MarkSweep, PS Scavenge]

MEMORY_NAME                    | allocation_size                | init_size                      
PS Eden Space                  | 256.00MB                       | 256.00MB                       
Code Cache                     | 240.00MB                       | 2.44MB                         
Compressed Class Space         | 1,024.00MB                     | 0.00MB                         
PS Survivor Space              | 42.50MB                        | 42.50MB                        
PS Old Gen                     | 683.00MB                       | 683.00MB                       
Metaspace                      | -0.00MB                        | 0.00MB                         

2023-11-28 10:19:59.302 [main] INFO  Engine - 
{"content":[{"reader":{"name":"sqlserverreader","parameter":{"column":[
"*"],"connection":[{"jdbcUrl":["jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb"],
"table":["t2"]}],"password":"*************","username":"SA"}},
"writer":{"name":"mysqlwriter","parameter":{"column":["*"],
"connection":[{"jdbcUrl":"jdbc:mysql://10..17.139.86:16310/testdb","table":["t2"]}],
"password":"********",
"session":[],
"username":"admin",
"writeMode":"insert"}}}],
"setting":{"speed":{"channel":"5"}}}

2023-11-28 10:19:59.319 [main] WARN  Engine - prioriy set to 0, because NumberFormatException, the value is: null
2023-11-28 10:19:59.321 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2023-11-28 10:19:59.321 [main] INFO  JobContainer - DataX jobContainer starts job.
2023-11-28 10:19:59.324 [main] INFO  JobContainer - Set jobId = 0
2023-11-28 10:19:59.629 [job-0] INFO  OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb.
2023-11-28 10:19:59.630 [job-0] WARN  OriginalConfPretreatmentUtil - 您的配置檔案中的列配置存在一定的風險. 因為您未配置讀取資料庫表的列,當您的表欄位個數、型別有變動時,可能影響任務正確性甚至會執行出錯。請檢查您的配置並作出修改.
Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
2023-11-28 10:20:00.027 [job-0] INFO  OriginalConfPretreatmentUtil - table:[t2] all columns:[
id,createtime].
2023-11-28 10:20:00.027 [job-0] WARN  OriginalConfPretreatmentUtil - 您的配置檔案中的列配置資訊存在風險. 因為您配置的寫入資料庫表的列為*,當您的表欄位個數、型別有變動時,可能影響任務正確性甚至會執行出錯。請檢查您的配置並作出修改.
2023-11-28 10:20:00.029 [job-0] INFO  OriginalConfPretreatmentUtil - Write data [
insert INTO %s (id,createtime) VALUES(?,?)
], which jdbcUrl like:[jdbc:mysql://10..17.139.86:16310/testdb?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]
2023-11-28 10:20:00.030 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2023-11-28 10:20:00.031 [job-0] INFO  JobContainer - DataX Reader.Job [sqlserverreader] do prepare work .
2023-11-28 10:20:00.031 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do prepare work .
2023-11-28 10:20:00.032 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2023-11-28 10:20:00.032 [job-0] INFO  JobContainer - Job set Channel-Number to 5 channels.
2023-11-28 10:20:00.037 [job-0] INFO  JobContainer - DataX Reader.Job [sqlserverreader] splits to [1] tasks.
2023-11-28 10:20:00.038 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] splits to [1] tasks.
2023-11-28 10:20:00.060 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2023-11-28 10:20:00.063 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2023-11-28 10:20:00.066 [job-0] INFO  JobContainer - Running by standalone Mode.
2023-11-28 10:20:00.073 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2023-11-28 10:20:00.080 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2023-11-28 10:20:00.080 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2023-11-28 10:20:00.093 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2023-11-28 10:20:00.101 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Begin to read record by Sql: [select * from t2 
] jdbcUrl:[jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb].
2023-11-28 10:20:00.262 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Finished read record by Sql: [select * from t2 
] jdbcUrl:[jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb].
2023-11-28 10:20:00.334 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[243]ms
2023-11-28 10:20:00.335 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2023-11-28 10:20:10.087 [job-0] INFO  StandAloneJobContainerCommunicator - Total 6 records, 54 bytes | Speed 5B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2023-11-28 10:20:10.088 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2023-11-28 10:20:10.088 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do post work.
2023-11-28 10:20:10.089 [job-0] INFO  JobContainer - DataX Reader.Job [sqlserverreader] do post work.
2023-11-28 10:20:10.090 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2023-11-28 10:20:10.091 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: /soft/datax/hook
2023-11-28 10:20:10.094 [job-0] INFO  JobContainer - 
 [total cpu info] => 
averageCpu                     | maxDeltaCpu                    | minDeltaCpu                    
-1.00%                         | -1.00%                         | -1.00%                      
 [total gc info] => 
 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime     
 PS MarkSweep         | 1                  | 1                  | 1                  | 0.034s             | 0.034s             | 0.034s             
 PS Scavenge          | 1                  | 1                  | 1                  | 0.031s             | 0.031s             | 0.031s             

2023-11-28 10:20:10.094 [job-0] INFO  JobContainer - PerfTrace not enable!
2023-11-28 10:20:10.095 [job-0] INFO  StandAloneJobContainerCommunicator - Total 6 records, 54 bytes | Speed 5B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2023-11-28 10:20:10.097 [job-0] INFO  JobContainer - 
任務啟動時刻                    : 2023-11-28 10:19:59
任務結束時刻                    : 2023-11-28 10:20:10
任務總計耗時                    :                 10s
任務平均流量                    :                5B/s
記錄寫入速度                    :              0rec/s
讀出記錄總數                    :                   6
讀寫失敗總數                    :                   0

4.4 驗證全量遷移的資料

greatsql> select * from t2;
+----+---------------------+
| id | createtime          |
+----+---------------------+
|  1 | 2023-11-28 02:18:21 |
|  2 | 2023-11-28 02:18:27 |
|  3 | 2023-11-28 02:18:32 |
|  4 | 2023-11-28 02:18:38 |
|  5 | 2023-11-28 02:18:41 |
|  6 | 2023-11-28 02:18:46 |
+----+---------------------+ 

還可以用 checksum table x 來驗證,大表就不能全表select * 了

4.5 源端(SQLServer)插入增量資料

2> insert into t2 values(7,'202311-28 03:18:46.330');
3> go
Changed database context to 'jem_db'.
(1 rows affected)
1> insert into t2 values(8,'2023-11-28 03:20:46.330');
2> go
(1 rows affected)
1> insert into t2 values(9,'2023-11-28 03:25:46.330');
2> go
(1 rows affected)
1> insert into t2 values(10,'2023-11-28 03:30:46.330');
2> go
(1 rows affected)
1> select * from t2;
2> go
id          createtime             
----------- -----------------------
​          1 2023-11-28 02:18:20.790
​          2 2023-11-28 02:18:27.040
​          3 2023-11-28 02:18:32.103
​          4 2023-11-28 02:18:37.690
​          5 2023-11-28 02:18:41.450
​          6 2023-11-28 02:18:46.330
​          7 2023-11-28 03:18:46.330
​          8 2023-11-28 03:20:46.330
​          9 2023-11-28 03:25:46.330
​         10 2023-11-28 03:30:46.330

4.6 編寫Datax增量遷移job檔案

$ cat sqlserver_to_greatsql_inc.json 
{
​    "job": {
​        "content": [
​            {
​                "reader": {
​                    "name": "sqlserverreader",
​                    "parameter": {
​                        "connection": [
​                            {
​                                "jdbcUrl": ["jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb"],
​                                "table": ["t2"]
​                            }
​                        ],
​                        "password": "********",
​                        "username": "SA",
​                        "column": ["*"],
​                        "where":"createtime > '${start_time}' and createtime < '${end_time}'"
​                    }
​                },
​                "writer": {
​                    "name": "mysqlwriter",
​                    "parameter": {
​                        "column": ["*"],
​                        "connection": [
​                            {
​                                "jdbcUrl": "jdbc:mysql://10..17.139.86:16310/testdb",
​                                "table": ["t2"]
​                            }
​                        ],
​                        "password": "!QAZ2wsx",
​                        "session": [],
​                        "username": "admin",
​                        "writeMode": "insert"
​                    }
​                }
​            }
​        ],
​        "setting": {
​            "speed": {
​                "channel": "5"
​            }
​        }
​    }
}

4.7 執行Datax增量遷移任務

$ python /soft/datax/bin/datax.py /soft/datax/job/sqlserver_to_mysql_inc.json -p "-Dstart_time='2023-11-28 03:17:46.330' -Dend_time='2023-11-28 03:31:46.330'"

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.

2023-11-28 10:29:24.492 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl

2023-11-28 10:29:24.504 [main] INFO  Engine - the machine info  => 

osInfo: Oracle Corporation 1.8 25.181-b13
jvmInfo: Linux amd64 3.10.0-957.el7.x86_64
cpu num: 8

totalPhysicalMemory: -0.00G
freePhysicalMemory: -0.00G
maxFileDescriptorCount: -1
currentOpenFileDescriptorCount: -1
GC Names [PS MarkSweep, PS Scavenge]
MEMORY_NAME                    | allocation_size                | init_size                      
PS Eden Space                  | 256.00MB                       | 256.00MB                       
Code Cache                     | 240.00MB                       | 2.44MB                         
Compressed Class Space         | 1,024.00MB                     | 0.00MB                         
PS Survivor Space              | 42.50MB                        | 42.50MB                        
PS Old Gen                     | 683.00MB                       | 683.00MB                       
Metaspace                      | -0.00MB                        | 0.00MB                         

2023-11-28 10:29:24.524 [main] INFO  Engine - 
{"content":[{"reader":{"name":"sqlserverreader","parameter":{"column":["*"],
"connection":[{"jdbcUrl":["jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb"],
"table":["t2"]}],"password":"*************","username":"SA",
"where":"createtime > '2023-11-28 03:17:46.330' and createtime < '2023-11-28 03:31:46.330'"}},
"writer":{"name":"mysqlwriter","parameter":{"column":["*"],"connection":[{"jdbcUrl":"jdbc:mysql://10..17.139.86:16310/testdb","table":["t2"]}],
"password":"********",
"session":[],
"username":"admin",
"writeMode":"insert"}}}],
"setting":{"speed":{"channel":"5"}}}

2023-11-28 10:29:24.542 [main] WARN  Engine - prioriy set to 0, because NumberFormatException, the value is: null
2023-11-28 10:29:24.544 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2023-11-28 10:29:24.544 [main] INFO  JobContainer - DataX jobContainer starts job.
2023-11-28 10:29:24.546 [main] INFO  JobContainer - Set jobId = 0
2023-11-28 10:29:24.830 [job-0] INFO  OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb.
2023-11-28 10:29:24.831 [job-0] WARN  OriginalConfPretreatmentUtil - 您的配置檔案中的列配置存在一定的風險. 因為您未配置讀取資料庫表的列,當您的表欄位個數、型別有變動時,可能影響任務正確性甚至會執行出錯。請檢查您的配置並作出修改.
Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
2023-11-28 10:29:25.113 [job-0] INFO  OriginalConfPretreatmentUtil - table:[t2] all columns:[id,createtime].
2023-11-28 10:29:25.113 [job-0] WARN  OriginalConfPretreatmentUtil - 您的配置檔案中的列配置資訊存在風險. 因為您配置的寫入資料庫表的列為*,當您的表欄位個數、型別有變動時,可能影響任務正確性甚至會執行出錯。請檢查您的配置並作出修改.
2023-11-28 10:29:25.115 [job-0] INFO  OriginalConfPretreatmentUtil - Write data [
insert INTO %s (id,createtime) VALUES(?,?)
], which jdbcUrl like:[jdbc:mysql://10..17.139.86:16310/testdb?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]
2023-11-28 10:29:25.116 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2023-11-28 10:29:25.117 [job-0] INFO  JobContainer - DataX Reader.Job [sqlserverreader] do prepare work .
2023-11-28 10:29:25.117 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do prepare work .
2023-11-28 10:29:25.118 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2023-11-28 10:29:25.118 [job-0] INFO  JobContainer - Job set Channel-Number to 5 channels.
2023-11-28 10:29:25.123 [job-0] INFO  JobContainer - DataX Reader.Job [sqlserverreader] splits to [1] tasks.
2023-11-28 10:29:25.124 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] splits to [1] tasks.
2023-11-28 10:29:25.146 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2023-11-28 10:29:25.150 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2023-11-28 10:29:25.153 [job-0] INFO  JobContainer - Running by standalone Mode.
2023-11-28 10:29:25.159 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2023-11-28 10:29:25.165 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2023-11-28 10:29:25.165 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2023-11-28 10:29:25.176 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2023-11-28 10:29:25.183 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Begin to read record by Sql: [select * from t2 where (createtime > '2023-11-28 03:17:46.330' and createtime < '2023-11-28 03:31:46.330')
] jdbcUrl:[jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb].
2023-11-28 10:29:25.344 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Finished read record by Sql: [select * from t2 where (createtime > '2023-11-28 03:17:46.330' and createtime < '2023-11-28 03:31:46.330')
] jdbcUrl:[jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb].
2023-11-28 10:29:25.606 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[431]ms
2023-11-28 10:29:25.607 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2023-11-28 10:29:35.173 [job-0] INFO  StandAloneJobContainerCommunicator - Total 4 records, 37 bytes | Speed 3B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2023-11-28 10:29:35.173 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2023-11-28 10:29:35.174 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do post work.
2023-11-28 10:29:35.175 [job-0] INFO  JobContainer - DataX Reader.Job [sqlserverreader] do post work.
2023-11-28 10:29:35.175 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2023-11-28 10:29:35.177 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: /soft/datax/hook
2023-11-28 10:29:35.179 [job-0] INFO  JobContainer - 

 [total cpu info] => 
averageCpu                     | maxDeltaCpu                    | minDeltaCpu                    
-1.00%                         | -1.00%                         | -1.00%

 [total gc info] => 
 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime     
 PS MarkSweep         | 1                  | 1                  | 1                  | 0.052s             | 0.052s             | 0.052s             
 PS Scavenge          | 1                  | 1                  | 1                  | 0.024s             | 0.024s             | 0.024s             
2023-11-28 10:29:35.180 [job-0] INFO  JobContainer - PerfTrace not enable!
2023-11-28 10:29:35.181 [job-0] INFO  StandAloneJobContainerCommunicator - Total 4 records, 37 bytes | Speed 3B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%

2023-11-28 10:29:35.183 [job-0] INFO  JobContainer - 
任務啟動時刻                    : 2023-11-28 10:29:24
任務結束時刻                    : 2023-11-28 10:29:35
任務總計耗時                    :                 10s
任務平均流量                    :                3B/s
記錄寫入速度                    :              0rec/s
讀出記錄總數                    :                   4
讀寫失敗總數                    :                   0

4.8 到目標端(GreatSQL)驗證增量資料

greatsql> select * from t2;

+----+---------------------+
| id | createtime          |
+----+---------------------+
|  1 | 2023-11-28 02:18:21 |
|  2 | 2023-11-28 02:18:27 |
|  3 | 2023-11-28 02:18:32 |
|  4 | 2023-11-28 02:18:38 |
|  5 | 2023-11-28 02:18:41 |
|  6 | 2023-11-28 02:18:46 |
|  7 | 2023-11-28 03:18:46 |
|  8 | 2023-11-28 03:20:46 |
|  9 | 2023-11-28 03:25:46 |
| 10 | 2023-11-28 03:30:46 |
+----+---------------------+
10 rows in set (0.00 sec)

增量遷移總結:透過增加過濾條件達到增量遷移的目的。主要是透過過濾條件過濾掉了全量遷移的資料,進而變相的完成了增量遷移。


Enjoy GreatSQL 😃

關於 GreatSQL

GreatSQL是適用於金融級應用的國內自主開源資料庫,具備高效能、高可靠、高易用性、高安全等多個核心特性,可以作為MySQL或Percona Server的可選替換,用於線上生產環境,且完全免費併相容MySQL或Percona Server。

相關連結: GreatSQL社群 Gitee GitHub Bilibili

GreatSQL社群:

社群部落格有獎徵稿詳情:https://greatsql.cn/thread-100-1-1.html

image-20230105161905827

技術交流群:

微信:掃碼新增GreatSQL社群助手微信好友,傳送驗證資訊加群

image-20221030163217640

相關文章