一、前提
國產資料庫的崛起元年,不得不提人大金倉(Kingbase)、南大通用資料庫(Gbase)、達夢資料庫(DM)、華為資料庫(GaussDB)、阿里資料庫(Oceanbase)等,此文章介紹採用datax作為同步人大金倉Kingbase86資料庫的工具。目前github上的datax版本功能僅支援Kingbase82系列產品。而專案上如果要用Kingbase86版本作為資料庫,所以要對Datax原始碼進行二次開發,自己構建Kingbasees86Reader和Kingbasees86Writer外掛。
二、實施
Kingbase的背景不贅述,同樣基於JDBC協議進行遠端連線資料庫並執行相應的SQL語句將資料從KingbaseES庫中SELECT出來,以前玩過Datax工具的同學可以簡單把Kingbase理解成Mysql的同步指令碼。
2.1 Kingbasees86Reader外掛開發
目前Kingbasees86Reader支援大部分KingbaseES型別,但也存在部分個別型別沒有支援的情況,請注意檢查你的型別。
下面列出Kingbasees86Reader針對KingbaseES型別轉換列表:
DataX內部型別 | KingbaseES資料型別 |
Long | bigint, bigserial, integer, smallint, serial |
Double | double precision, money, numeric, real |
String | varchar, char, text, bit, inet |
Date | date, time, timestamp |
Boolean | bool |
Bytes | bytea |
2.1.1 配置樣例
下面是一個從KingbaseES資料庫中同步抽取資料到本地作業的展示指令碼
{ "job": { "setting": { "speed": { //設定傳輸速度,單位為byte/s,DataX執行會盡可能達到該速度但是不超過它. "byte": 1048576 }, //出錯限制 "errorLimit": { //出錯的record條數上限,當大於該值即報錯。 "record": 0, //出錯的record百分比上限 1.0表示100%,0.02表示2% "percentage": 0.02 } }, "content": [ { "reader": { "name": "kingbasees86reader", "parameter": { // 資料庫連線使用者名稱 "username": "xx", // 資料庫連線密碼 "password": "xx", "column": [ "id","name" ], //切分主鍵 "splitPk": "id", "connection": [ { "table": [ "table" ], "jdbcUrl": [ "jdbc:kingbase86://host:port/database" ] } ] } }, "writer": { //writer型別 "name": "streamwriter", //是否列印內容 "parameter": { "print":true, } } } ] } }
純淨版(驗證可執行透過)
{ "job": { "setting": { "speed": { "byte": 1048576 }, "errorLimit": { "record": 0, "percentage": 0.02 } }, "content": [ { "reader": { "name": "kingbasees86reader", "parameter": { "username": "root", "password": "123456", "column": ["id","name"], "splitPk": "id", "connection": [ { "table": [ "t1" ], "jdbcUrl": [ "jdbc:kingbase8://192.168.12.104:54321/test" ] } ] } }, "writer": { "name": "streamwriter", "parameter": { "print":true } } } ] } }
配置一個自定義SQL的資料庫同步任務到本地內容的作業:
{ "job": { "setting": { "speed": 1048576 }, "content": [ { "reader": { "name": "kingbasees86reader", "parameter": { "username": "xx", "password": "xx", "where": "", "connection": [ { "querySql": [ "select db_id,on_line_flag from db_info where db_id < 10;" ], "jdbcUrl": [ "jdbc:kingbase86://host:port/database", "jdbc:kingbase86://host:port/database" ] } ] } }, "writer": { "name": "streamwriter", "parameter": { "print": false, "encoding": "UTF-8" } } } ] } }
2.1.2 程式碼實現
程式碼架構
程式碼-package.xml
<assembly xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0 http://maven.apache.org/xsd/assembly-1.1.0.xsd"> <id></id> <formats> <format>dir</format> </formats> <includeBaseDirectory>false</includeBaseDirectory> <fileSets> <fileSet> <directory>src/main/resources</directory> <includes> <include>plugin.json</include> <include>plugin_job_template.json</include> </includes> <outputDirectory>plugin/reader/kingbasees86reader</outputDirectory> </fileSet> <fileSet> <directory>target/</directory> <includes> <include>kingbasees86reader-0.0.1-SNAPSHOT.jar</include> </includes> <outputDirectory>plugin/reader/kingbasees86reader</outputDirectory> </fileSet> <fileSet> <directory>src/main/libs</directory> <includes> <include>*.*</include> </includes> <outputDirectory>plugin/reader/kingbasees86reader/libs</outputDirectory> </fileSet> </fileSets> <dependencySets> <dependencySet> <useProjectArtifact>false</useProjectArtifact> <outputDirectory>plugin/reader/kingbasees86reader/libs</outputDirectory> <scope>runtime</scope> </dependencySet> </dependencySets> </assembly>
程式碼-Constant
package com.alibaba.datax.plugin.reader.kingbasees86reader; public class Constant { public static final int DEFAULT_FETCH_SIZE = 1000; }
程式碼-Kingbasees86Reader
package com.alibaba.datax.plugin.reader.kingbasees86reader; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.reader.CommonRdbmsReader; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import java.util.List; public class Kingbasees86Reader extends Reader { private static final DataBaseType DATABASE_TYPE = DataBaseType.KingbaseES86; public static class Job extends Reader.Job { private Configuration originalConfig; private CommonRdbmsReader.Job commonRdbmsReaderMaster; @Override public void init() { this.originalConfig = super.getPluginJobConf(); int fetchSize = this.originalConfig.getInt(com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE, Constant.DEFAULT_FETCH_SIZE); if (fetchSize < 1) { throw DataXException.asDataXException(DBUtilErrorCode.REQUIRED_VALUE, String.format("您配置的fetchSize有誤,根據DataX的設計,fetchSize : [%d] 設定值不能小於 1.", fetchSize)); } this.originalConfig.set(com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE, fetchSize); this.commonRdbmsReaderMaster = new CommonRdbmsReader.Job(DATABASE_TYPE); this.commonRdbmsReaderMaster.init(this.originalConfig); } @Override public List<Configuration> split(int adviceNumber) { return this.commonRdbmsReaderMaster.split(this.originalConfig, adviceNumber); } @Override public void post() { this.commonRdbmsReaderMaster.post(this.originalConfig); } @Override public void destroy() { this.commonRdbmsReaderMaster.destroy(this.originalConfig); } } public static class Task extends Reader.Task { private Configuration readerSliceConfig; private CommonRdbmsReader.Task commonRdbmsReaderSlave; @Override public void init() { this.readerSliceConfig = super.getPluginJobConf(); this.commonRdbmsReaderSlave = new CommonRdbmsReader.Task(DATABASE_TYPE, super.getTaskGroupId(), super.getTaskId()); this.commonRdbmsReaderSlave.init(this.readerSliceConfig); } @Override public void startRead(RecordSender recordSender) { int fetchSize = this.readerSliceConfig.getInt(com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE); this.commonRdbmsReaderSlave.startRead(this.readerSliceConfig, recordSender, super.getTaskPluginCollector(), fetchSize); } @Override public void post() { this.commonRdbmsReaderSlave.post(this.readerSliceConfig); } @Override public void destroy() { this.commonRdbmsReaderSlave.destroy(this.readerSliceConfig); } } }
程式碼-plugin.json
{ "name": "kingbasees86reader", "class": "com.alibaba.datax.plugin.reader.kingbasees86reader.Kingbasees86Reader", "description": "useScene: prod. mechanism: Jdbc connection using the database, execute select sql, retrieve data from the ResultSet. warn: The more you know about the database, the less problems you encounter.", "developer": "alibaba" }
程式碼-plugin_job_template.json
{ "name": "kingbasees86reader", "parameter": { "username": "", "password": "", "connection": [ { "table": [], "jdbcUrl": [] } ] } }
注意1:在根目錄的package.xml檔案下新增
<fileSet> <directory>kingbasees86reader/target/datax/</directory> <includes> <include>**/*.*</include> </includes> <outputDirectory>datax</outputDirectory> </fileSet>
注意2:在根目錄的pom.xml檔案下新增
<module>kingbasees86reader</module>
注意3:在DataBaseType中註冊Reader資訊
2.1.3 打包上傳
可以在根目錄下注釋掉不需要的module,加速打包過程。
將下面的幾個檔案複製到Kingbase安裝目錄下對應的plugin資料夾下
2.1.4 KingbaseES建立測試表
注意:需要先啟動kingbase Server服務以及檢查防火牆是否關閉
啟動Kingbase Server服務
cd /opt/Kingbase/ES/V8/Server/bin
./sys_ctl start -D /opt/Kingbase/ES/V8/data
2.1.5 執行DataX同步指令碼進行測試
2.1.6 可能遇到的問題
Description:[DataX引擎配置錯誤,該問題通常是由於DataX安裝錯誤引起,請聯絡您的運維解決 .]. - 在有總bps限速條件下,單個channel的bps值不能為空,也不能為非正數
解決辦法:
進入DataX安裝目錄,修改檔案datax/conf/core.json
修改core -> transport -> channel -> speed -> “byte”: 2000000,將單個channel的大小改為2MB即可。
相關文件
GitHUb:https://github.com/alibaba/DataX/tree/master
GitEE:https://gitee.com/mirrors/DataX/tree/master
阿里雲Maven倉庫:https://developer.aliyun.com/mvn/search