一文講清楚FusionInsight MRS CDL如何使用

華為雲開發者社群發表於2021-10-06
摘要:CDL是一種簡單、高效的資料實時整合服務,能夠從各種OLTP資料庫中抓取Data Change事件,然後推送至Kafka中,最後由Sink Connector消費Topic中的資料並匯入到大資料生態軟體應用中,從而實現資料的實時入湖。

本文分享自華為雲社群《華為FusionInsight MRS CDL使用指南》,作者:晉紅輕。

說明

CDL是一種簡單、高效的資料實時整合服務,能夠從各種OLTP資料庫中抓取Data Change事件,然後推送至Kafka中,最後由Sink Connector消費Topic中的資料並匯入到大資料生態軟體應用中,從而實現資料的實時入湖。

CDL服務包含了兩個重要的角色:CDLConnector和CDLService。CDLConnector是具體執行資料抓取任務的例項,CDLService是負責管理和建立任務的例項。

本此實踐介紹以mysql作為資料來源進行資料抓取

前提條件

  • MRS叢集已安裝CDL服務。
  • MySQL資料庫需要開啟mysql的bin log功能(預設情況下是開啟的)。

檢視MySQL是否開啟bin log:

使用工具或者命令列連線MySQL資料庫(本示例使用navicat工具連線),執行show variables like 'log_%'命令檢視。

例如在navicat工具選擇"File > New Query"新建查詢,輸入如下SQL命令,單擊"Run"在結果中"log_bin"顯示為"ON"則表示開啟成功。

show variables like 'log_%'
image.png

工具準備

現在cdl只能使用rest api的方式進行命令提交,所以需要提前安裝工具進行除錯。本文使用VSCode工具。
image.png

完成之後安裝rest client外掛:
image.png

完成之後建立一個cdl.http的檔案進行編輯:
image.png

建立CDL任務

CDL任務建立的流程圖如下所示:
image.png

說明:需要先建立一個MySQL link, 在建立一個Kafka link, 然後再建立一個CDL同步任務並啟動。

MySQL link部分rest請求程式碼

@hostname = 172.16.9.113
@port = 21495
@host = {{hostname}}:{{port}}
@bootstrap = "172.16.9.113:21007"
@bootstrap_normal = "172.16.9.113:21005"
@mysql_host = "172.16.2.118"
@mysql_port = "3306"
@mysql_database = "hudi"
@mysql_user = "root"
@mysql_password = "Huawei@123"

### get links
get https://{{host}}/api/v1/cdl/link

### mysql link validate

post https://{{host}}/api/v1/cdl/link?validate=true
content-type: application/json

{
"name": "MySQL_link", //link名,全域性唯一,不能重複
"description":"MySQL connection", //link描述
"link-type":"mysql", //link的型別
"enabled":"true",
"link-config-values":  {
"inputs": [
        { "name": "host", "value": {{mysql_host}} }, //資料庫安裝節點的ip
        { "name": "port", "value": {{mysql_port}} },//資料庫監聽的埠
        { "name": "database.name", "value": {{mysql_database}} }, //連線的資料庫名
        { "name": "user", "value": {{mysql_user}} }, //使用者
        { "name": "password","value": {{mysql_password}} } ,//密碼
        { "name":"schema", "value": {{mysql_database}}}//同資料庫名
        ]
    }
}

### mysql link create

post https://{{host}}/api/v1/cdl/link
content-type: application/json

{
"name": "MySQL_link", //link名,全域性唯一,不能重複
"description":"MySQL connection", //link描述
"link-type":"mysql", //link的型別
"enabled":"true",
"link-config-values":  {
"inputs": [
        { "name": "host", "value": {{mysql_host}} }, //資料庫安裝節點的ip
        { "name": "port", "value": {{mysql_port}} },//資料庫監聽的埠
        { "name": "database.name", "value": {{mysql_database}} }, //連線的資料庫名
        { "name": "user", "value": {{mysql_user}} }, //使用者
        { "name": "password","value": {{mysql_password}} } ,//密碼
        { "name":"schema", "value": {{mysql_database}}}//同資料庫名
        ]
    }
}

### mysql link update

put https://{{host}}/api/v1/cdl/link/MySQL_link
content-type: application/json

{
"name": "MySQL_link", //link名,全域性唯一,不能重複
"description":"MySQL connection", //link描述
"link-type":"mysql", //link的型別
"enabled":"true",
"link-config-values":  {
"inputs": [
        { "name": "host", "value": {{mysql_host}} }, //資料庫安裝節點的ip
        { "name": "port", "value": {{mysql_port}} },//資料庫監聽的埠
        { "name": "database.name", "value": {{mysql_database}} }, //連線的資料庫名
        { "name": "user", "value": {{mysql_user}} }, //使用者
        { "name": "password","value": {{mysql_password}} } ,//密碼
        { "name":"schema", "value": {{mysql_database}}}//同資料庫名
        ]
    }
}

Kafka link部分rest請求程式碼

### get links
get https://{{host}}/api/v1/cdl/link

### kafka link validate

post https://{{host}}/api/v1/cdl/link?validate=true
content-type: application/json

{
"name": "kafka_link",
"description":"test kafka link",
"link-type":"kafka",
"enabled":"true",
"link-config-values":  {
"inputs": [
        { "name": "bootstrap.servers", "value": "172.16.9.113:21007" },
        { "name": "sasl.kerberos.service.name", "value": "kafka" },
        { "name": "security.protocol","value": "SASL_PLAINTEXT" }//安全模式為SASL_PLAINTEXT,普通模式為PLAINTEXT
        ]
    }
}

### kafka link create

post https://{{host}}/api/v1/cdl/link
content-type: application/json

{
"name": "kafka_link",
"description":"test kafka link",
"link-type":"kafka",
"enabled":"true",
"link-config-values":  {
"inputs": [
        { "name": "bootstrap.servers", "value": "172.16.9.113:21007" },
        { "name": "sasl.kerberos.service.name", "value": "kafka" },
        { "name": "security.protocol","value": "SASL_PLAINTEXT" }//安全模式為SASL_PLAINTEXT,普通模式為PLAINTEXT
        ]
    }
}

### kafka link update

put https://{{host}}/api/v1/cdl/link/kafka_link
content-type: application/json

{
"name": "kafka_link",
"description":"test kafka link",
"link-type":"kafka",
"enabled":"true",
"link-config-values":  {
"inputs": [
        { "name": "bootstrap.servers", "value": "172.16.9.113:21007" },
        { "name": "sasl.kerberos.service.name", "value": "kafka" },
        { "name": "security.protocol","value": "SASL_PLAINTEXT" }//安全模式為SASL_PLAINTEXT,普通模式為PLAINTEXT
        ]
    }
}

CDL任務命令部分rest請求程式碼

@hostname = 172.16.9.113
@port = 21495
@host = {{hostname}}:{{port}}
@bootstrap = "172.16.9.113:21007"
@bootstrap_normal = "172.16.9.113:21005"
@mysql_host = "172.16.2.118"
@mysql_port = "3306"
@mysql_database = "hudi"
@mysql_user = "root"
@mysql_password = "Huawei@123"

### create job
post https://{{host}}/api/v1/cdl/job
content-type: application/json

{
    "job_type": "CDL_JOB", //job型別,目前只支援CDL_JOB這一種
    "name": "mysql_to_kafka", //job名稱
    "description":"mysql_to_kafka", //job描述
    "from-link-name": "MySQL_link",  //資料來源Link
    "to-link-name": "kafka_link", //目標源Link
    "from-config-values": {
        "inputs": [
            {"name" : "connector.class", "value" : "com.huawei.cdc.connect.mysql.MysqlSourceConnector"},
            {"name" : "schema", "value" : "hudi"},
            {"name" : "db.name.alias", "value" : "hudi"},
            {"name" : "whitelist", "value" : "hudisource"},
            {"name" : "tables", "value" : "hudisource"},
            {"name" : "tasks.max", "value" : "10"},
            {"name" : "mode", "value" : "insert,update,delete"},
            {"name" : "parse.dml.data", "value" : "true"},
            {"name" : "schema.auto.creation", "value" : "false"},
            {"name" : "errors.tolerance", "value" : "all"},
            {"name" : "multiple.topic.partitions.enable", "value" : "false"},
            {"name" : "topic.table.mapping", "value" : "[
                    {\"topicName\":\"huditableout\", \"tableName\":\"hudisource\"}
                ]"
            },
              {"name" : "producer.override.security.protocol", "value" : "SASL_PLAINTEXT"},//安全模式為SASL_PLAINTEXT,普通模式為PLAINTEXT
            {"name" : "consumer.override.security.protocol", "value" : "SASL_PLAINTEXT"}//安全模式為SASL_PLAINTEXT,普通模式為PLAINTEXT
        ]
    },
    "to-config-values": {"inputs": []},
    "job-config-values": {
        "inputs": [
            {"name" : "global.topic", "value" : "demo"}
        ]
    }
}

### get all job
get https://{{host}}/api/v1/cdl/job
### submit job
put https://{{host}}/api/v1/cdl/job/mysql_to_kafka/start
### get job status
get https://{{host}}/api/v1/cdl/submissions?jobName=mysql_to_kafka
### stop job
put https://{{host}}/api/v1/cdl/job/mysql_to_kafka/submissions/13/stop
### delete job
DELETE https://{{host}}/api/v1/cdl/job/mysql_to_kafka

場景驗證

生產庫MySQL原始資料如下:
image.png

提交CDL任務之後
image.png

增加操作: insert into hudi.hudisource values (11,“蔣語堂”,38,“女”,“圖”,“播放器”,28732);

對應kafka訊息體:
image.png

更改操作: UPDATE hudi.hudisource SET uname=‘Anne Marie333’ WHERE uid=11;

對應kafka訊息體:
image.png

刪除操作:delete from hudi.hudisource where uid=11;

對應kafka訊息體:
image.png

點選關注,第一時間瞭解華為雲新鮮技術~

相關文章