DataX二次開發詳解-Kingbasees86Reader、Kingbasees86Writer外掛

明明就-發表於2024-04-23

一、前提

國產資料庫的崛起元年,不得不提人大金倉(Kingbase)、南大通用資料庫(Gbase)、達夢資料庫(DM)、華為資料庫(GaussDB)、阿里資料庫(Oceanbase)等,此文章介紹採用datax作為同步人大金倉Kingbase86資料庫的工具。目前github上的datax版本功能僅支援Kingbase82系列產品。而專案上如果要用Kingbase86版本作為資料庫,所以要對Datax原始碼進行二次開發,自己構建Kingbasees86Reader和Kingbasees86Writer外掛。

二、實施

Kingbase的背景不贅述,同樣基於JDBC協議進行遠端連線資料庫並執行相應的SQL語句將資料從KingbaseES庫中SELECT出來,以前玩過Datax工具的同學可以簡單把Kingbase理解成Mysql的同步指令碼。

2.1 Kingbasees86Reader外掛開發

目前Kingbasees86Reader支援大部分KingbaseES型別,但也存在部分個別型別沒有支援的情況,請注意檢查你的型別。

下面列出Kingbasees86Reader針對KingbaseES型別轉換列表:

DataX內部型別 KingbaseES資料型別
Long bigint, bigserial, integer, smallint, serial
Double double precision, money, numeric, real
String varchar, char, text, bit, inet
Date date, time, timestamp
Boolean bool
Bytes bytea

2.1.1 配置樣例

下面是一個從KingbaseES資料庫中同步抽取資料到本地作業的展示指令碼

{
    "job": {
        "setting": {
            "speed": {
            //設定傳輸速度,單位為byte/s,DataX執行會盡可能達到該速度但是不超過它.
                 "byte": 1048576
            },
            //出錯限制
                "errorLimit": {
                //出錯的record條數上限,當大於該值即報錯。
                "record": 0,
                //出錯的record百分比上限 1.0表示100%,0.02表示2%
                "percentage": 0.02
            }
        },
        "content": [
            {
                "reader": {
                    "name": "kingbasees86reader",
                    "parameter": {
                        // 資料庫連線使用者名稱
                        "username": "xx",
                        // 資料庫連線密碼
                        "password": "xx",
                        "column": [
                            "id","name"
                        ],
                        //切分主鍵
                        "splitPk": "id",
                        "connection": [
                            {
                                "table": [
                                    "table"
                                ],
                                "jdbcUrl": [
     "jdbc:kingbase86://host:port/database"
                                ]
                            }
                        ]
                    }
                },
               "writer": {
                    //writer型別
                    "name": "streamwriter",
                    //是否列印內容
                    "parameter": {
                        "print":true,
                    }
                }
            }
        ]
    }
}

  

純淨版(驗證可執行透過)

{
    "job": {
        "setting": {
            "speed": {
                 "byte": 1048576
            },
                "errorLimit": {
                 "record": 0,
          "percentage": 0.02
            }
        },
        "content": [
            {
                "reader": {
                    "name": "kingbasees86reader",
                    "parameter": {
                        "username": "root",
                        "password": "123456",
                        "column": ["id","name"],
                        "splitPk": "id",
                        "connection": [
                            {
                                "table": [
                                    "t1"
                                ],
                                "jdbcUrl": [
                                    "jdbc:kingbase8://192.168.12.104:54321/test"
                                ]
                            }
                        ]
                    }
                },
               "writer": {
                    "name": "streamwriter",
                    "parameter": {
                        "print":true
                    }
                }
            }
        ]
    }
}

  

配置一個自定義SQL的資料庫同步任務到本地內容的作業:

{
    "job": {
        "setting": {
            "speed": 1048576
        },
        "content": [
            {
                "reader": {
                    "name": "kingbasees86reader",
                    "parameter": {
                        "username": "xx",
                        "password": "xx",
                        "where": "",
                        "connection": [
                            {
                                "querySql": [
                                    "select db_id,on_line_flag from db_info where db_id < 10;"
                                ],
                                "jdbcUrl": [
                                    "jdbc:kingbase86://host:port/database", "jdbc:kingbase86://host:port/database"
                                ]
                            }
                        ]
                    }
                },
                "writer": {
                    "name": "streamwriter",
                    "parameter": {
                        "print": false,
                        "encoding": "UTF-8"
                    }
                }
            }
        ]
    }
}

  

2.1.2 程式碼實現

程式碼架構

程式碼-package.xml

<assembly
        xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0 http://maven.apache.org/xsd/assembly-1.1.0.xsd">
    <id></id>
    <formats>
        <format>dir</format>
    </formats>
    <includeBaseDirectory>false</includeBaseDirectory>
    <fileSets>
        <fileSet>
            <directory>src/main/resources</directory>
            <includes>
                <include>plugin.json</include>
                <include>plugin_job_template.json</include>
            </includes>
            <outputDirectory>plugin/reader/kingbasees86reader</outputDirectory>
        </fileSet>
        <fileSet>
            <directory>target/</directory>
            <includes>
                <include>kingbasees86reader-0.0.1-SNAPSHOT.jar</include>
            </includes>
            <outputDirectory>plugin/reader/kingbasees86reader</outputDirectory>
        </fileSet>
        <fileSet>
            <directory>src/main/libs</directory>
            <includes>
                <include>*.*</include>
            </includes>
            <outputDirectory>plugin/reader/kingbasees86reader/libs</outputDirectory>
        </fileSet>
    </fileSets>

    <dependencySets>
        <dependencySet>
            <useProjectArtifact>false</useProjectArtifact>
            <outputDirectory>plugin/reader/kingbasees86reader/libs</outputDirectory>
            <scope>runtime</scope>
        </dependencySet>
    </dependencySets>
</assembly>

  

程式碼-Constant

package com.alibaba.datax.plugin.reader.kingbasees86reader;

public class Constant {

    public static final int DEFAULT_FETCH_SIZE = 1000;

}

  

程式碼-Kingbasees86Reader

package com.alibaba.datax.plugin.reader.kingbasees86reader;

import com.alibaba.datax.common.exception.DataXException;
import com.alibaba.datax.common.plugin.RecordSender;
import com.alibaba.datax.common.spi.Reader;
import com.alibaba.datax.common.util.Configuration;
import com.alibaba.datax.plugin.rdbms.reader.CommonRdbmsReader;
import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode;
import com.alibaba.datax.plugin.rdbms.util.DataBaseType;

import java.util.List;

public class Kingbasees86Reader extends Reader {

    private static final DataBaseType DATABASE_TYPE = DataBaseType.KingbaseES86;

    public static class Job extends Reader.Job {

        private Configuration originalConfig;
        private CommonRdbmsReader.Job commonRdbmsReaderMaster;

        @Override
        public void init() {
            this.originalConfig = super.getPluginJobConf();
            int fetchSize = this.originalConfig.getInt(com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE,
                    Constant.DEFAULT_FETCH_SIZE);
            if (fetchSize < 1) {
                throw DataXException.asDataXException(DBUtilErrorCode.REQUIRED_VALUE,
                        String.format("您配置的fetchSize有誤,根據DataX的設計,fetchSize : [%d] 設定值不能小於 1.", fetchSize));
            }
            this.originalConfig.set(com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE, fetchSize);

            this.commonRdbmsReaderMaster = new CommonRdbmsReader.Job(DATABASE_TYPE);
            this.commonRdbmsReaderMaster.init(this.originalConfig);
        }

        @Override
        public List<Configuration> split(int adviceNumber) {
            return this.commonRdbmsReaderMaster.split(this.originalConfig, adviceNumber);
        }

        @Override
        public void post() {
            this.commonRdbmsReaderMaster.post(this.originalConfig);
        }

        @Override
        public void destroy() {
            this.commonRdbmsReaderMaster.destroy(this.originalConfig);
        }

    }

    public static class Task extends Reader.Task {

        private Configuration readerSliceConfig;
        private CommonRdbmsReader.Task commonRdbmsReaderSlave;

        @Override
        public void init() {
            this.readerSliceConfig = super.getPluginJobConf();
            this.commonRdbmsReaderSlave = new CommonRdbmsReader.Task(DATABASE_TYPE, super.getTaskGroupId(), super.getTaskId());
            this.commonRdbmsReaderSlave.init(this.readerSliceConfig);
        }

        @Override
        public void startRead(RecordSender recordSender) {
            int fetchSize = this.readerSliceConfig.getInt(com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE);

            this.commonRdbmsReaderSlave.startRead(this.readerSliceConfig, recordSender,
                    super.getTaskPluginCollector(), fetchSize);
        }

        @Override
        public void post() {
            this.commonRdbmsReaderSlave.post(this.readerSliceConfig);
        }

        @Override
        public void destroy() {
            this.commonRdbmsReaderSlave.destroy(this.readerSliceConfig);
        }

    }

}

  

程式碼-plugin.json

{
  "name": "kingbasees86reader",
  "class": "com.alibaba.datax.plugin.reader.kingbasees86reader.Kingbasees86Reader",
  "description": "useScene: prod. mechanism: Jdbc connection using the database, execute select sql, retrieve data from the ResultSet. warn: The more you know about the database, the less problems you encounter.",
  "developer": "alibaba"
}

  

程式碼-plugin_job_template.json

{
  "name": "kingbasees86reader",
  "parameter": {
    "username": "",
    "password": "",
    "connection": [
      {
        "table": [],
        "jdbcUrl": []
      }
    ]
  }
}

  

注意1:在根目錄的package.xml檔案下新增

		<fileSet>
            <directory>kingbasees86reader/target/datax/</directory>
            <includes>
                <include>**/*.*</include>
            </includes>
            <outputDirectory>datax</outputDirectory>
        </fileSet>

  

注意2:在根目錄的pom.xml檔案下新增

        <module>kingbasees86reader</module>

  

注意3:在DataBaseType中註冊Reader資訊

2.1.3 打包上傳

可以在根目錄下注釋掉不需要的module,加速打包過程。

將下面的幾個檔案複製到Kingbase安裝目錄下對應的plugin資料夾下

2.1.4 KingbaseES建立測試表

注意:需要先啟動kingbase Server服務以及檢查防火牆是否關閉

啟動Kingbase Server服務

cd /opt/Kingbase/ES/V8/Server/bin

./sys_ctl start -D /opt/Kingbase/ES/V8/data

2.1.5 執行DataX同步指令碼進行測試

2.1.6 可能遇到的問題

Description:[DataX引擎配置錯誤,該問題通常是由於DataX安裝錯誤引起,請聯絡您的運維解決 .]. - 在有總bps限速條件下,單個channel的bps值不能為空,也不能為非正數

解決辦法:

進入DataX安裝目錄,修改檔案datax/conf/core.json

修改core -> transport -> channel -> speed -> “byte”: 2000000,將單個channel的大小改為2MB即可。

相關文件

GitHUb:https://github.com/alibaba/DataX/tree/master

GitEE:https://gitee.com/mirrors/DataX/tree/master

阿里雲Maven倉庫:https://developer.aliyun.com/mvn/search

相關文章