分庫分表—4.資料遷移系統文件

东阳马生架构發表於2024-12-07

原文網址 : https://www.cnblogs.com/mjunz/p/18592749

大綱

1.資料庫設計

2.列舉類

3.接⼝設計

4.定時任務設計

(1)定時核對校驗資料的定時任務

(2)資料量統計定時任務

(3)增量資料落地定時任務

(4)失敗重試定時任務

5.技術亮點

(1)滾動拉取方案

(2)巧妙的統計滾動進度方案

(3)防止增量同步資料丟失和高效寫入方案

(4)資料擴容方案

6.全量同步和增量同步整體流程圖

7.功能升級

(1)資料遷移系統資料來源動態化配置

(2)遷移資料庫操作對應的xml動態⽣成

(3)擴容遷移資料時的問題

1.資料庫設計

(1)訂單表——order_info

create table order_info (
    id bigint(32) auto_increment,
    order_no varchar(32) not null comment '訂單號',
    order_amount decimal(8, 2) not null comment '訂單⾦額',
    merchant_id bigint(32) not null comment '商戶ID',
    user_id bigint(32) not null comment '⽤戶ID',
    order_freight decimal(8, 2) default 0.00 not null comment '運費',
    order_status tinyint(3) default 0 not null comment '訂單狀態：10待付款，20待接單，30已接單，40配送中，50已完成，55部分退款，60全部退款，70取消訂單',
    trans_time timestamp default CURRENT_TIMESTAMP not null on update CURRENT_TIMESTAMP comment '交易時間',
    pay_status tinyint(3) default 2 not null comment '⽀付狀態：1待⽀付、2⽀付成功、3⽀付失敗',
    recharge_time timestamp default CURRENT_TIMESTAMP not null comment '⽀付完成時間',
    pay_amount decimal(8, 2) default 0.00 not null comment '實際⽀付⾦額',
    pay_discount_amount decimal(8, 2) default 0.00 not null comment '⽀付優惠⾦額',
    address_id bigint(32) not null comment '收貨地址ID',
    delivery_type tinyint(3) default 2 not null comment '配送⽅式：1⾃提，2配送',
    delivery_status tinyint(3) default 0 null comment '配送狀態：0 配送中，2已送達，3待收貨，4已送達',
    delivery_expect_time timestamp null comment '配送預計送達時間',
    delivery_complete_time timestamp null comment '配送送達時間',
    delivery_amount decimal(8, 2) default 0.00 not null comment '配送運費',
    coupon_id bigint(32) null comment '優惠券id',
    cancel_time timestamp null comment '訂單取消時間',
    confirm_time timestamp null comment '訂單確認時間',
    remark varchar(512) null comment '訂單備註留⾔',
    create_user bigint(32) null comment '建立⽤戶',
    update_user bigint(32) null comment '更新⽤戶',
    create_time timestamp default CURRENT_TIMESTAMP not null comment '建立時間',
    update_time timestamp null on update CURRENT_TIMESTAMP comment '更新時間',
    delete_flag tinyint default 0 not null comment '邏輯刪除標記',
    primary key (id, order_no)
) comment '訂單表'; 
create index inx_order_no on order_info (order_no); 
create index inx_create_time on order_info (create_time, order_no);

(2)訂單詳情表——order_info

-- auto-generated definition 
create table order_item_detail (
    id bigint(32) auto_increment primary key,
    order_no varchar(32) not null comment '訂單號',
    product_id bigint(32) not null comment '商品ID',
    category_id bigint(32) not null comment '商品分類ID',
    goods_num int(8) default 1 not null comment '商品購買數量',
    goods_price decimal(8, 2) default 0.00 not null comment '商品單價',
    goods_amount decimal(8, 2) default 0.00 not null comment '商品總價',
    product_name varchar(64) null comment '商品名',
    discount_amount decimal(8, 2) default 0.00 not null comment '商品優惠⾦額',
    discount_id bigint(32) null comment '參與活動ID',
    product_picture_url varchar(128) null comment '商品圖⽚',
    create_user bigint(32) null comment '建立⽤戶',
    update_user bigint(32) null comment '更新⽤戶',
    create_time timestamp default CURRENT_TIMESTAMP not null comment '建立時間',
    update_time timestamp default CURRENT_TIMESTAMP not null on update CURRENT_TIMESTAMP comment '更新時間',
    delete_flag tinyint default 0 not null comment '邏輯刪除標記' 
) comment '訂單明細表' charset = utf8; 
create index inx_create_time on order_item_detail (create_time, order_no); 
create index inx_item_order_no on order_item_detail (order_no);

(3)binlog訊息消費記錄表——etl_binlog_consume_record

create table etl_binlog_consume_record (
    id bigint auto_increment comment '主鍵' primary key,
    queue_id int null comment '訊息佇列id（即：queueId）',
    offset bigint null comment '訊息偏移量（唯⼀定位該訊息在佇列中的位置）',
    topic varchar(500) null comment '訊息所屬主題',
    broker_name varchar(255) null comment '訊息所在broker名稱',
    consume_status tinyint(1) null comment '消費狀態：0未消費，1消費成功，2已提交',
    create_time datetime null comment '記錄建立時間',
    update_time datetime null comment '記錄更新時間',
    constraint queue_id unique (queue_id, offset)
) comment 'binlog訊息消費記錄表' charset = utf8mb4;

(4)遷移明細表——etl_dirty_record

create table etl_dirty_record (
    id bigint auto_increment comment '主鍵' primary key,
    logic_model varchar(255) not null comment '邏輯模型名(邏輯表或模型名稱)',
    ticket varchar(32) not null comment '遷移批次',
    cur_ticket_stage int(10) not null comment '當前所屬批次階段號',
    record_key varchar(60) not null comment '欄位名',
    record_value varchar(128) null comment '欄位值',
    status int(12) null comment '遷移狀態',
    error_msg varchar(500) null comment '錯誤訊息',
    retry_times int(12) null comment '已重試次數',
    last_retry_time datetime null comment '上次重試時間',
    is_deleted tinyint(1) default 0 null comment '0未被刪除，1已刪除',
    create_time datetime default CURRENT_TIMESTAMP not null comment '建立時間',
    update_time datetime default CURRENT_TIMESTAMP not null on update CURRENT_TIMESTAMP comment '修改時間',
    sync_size int default 0 null comment '每次同步數量' 
) comment '遷移明細表' charset = utf8mb4;

(5)遷移表——etl_progress

create table etl_progress (
    id bigint auto_increment comment '主鍵' primary key,
    logic_model varchar(255) null comment '邏輯模型名(邏輯表或模型名稱)',
    ticket varchar(32) null comment '遷移批次',
    cur_ticket_stage int(10) null comment '當前所屬批次階段號',
    progress_type int(10) null comment '進度型別(0滾動查詢資料，1核對查詢資料)',
    status int(12) null comment '遷移狀態：1同步中,2同步完成,3同步失敗',
    retry_times int default 0 null comment '已同步次數',
    finish_record bigint default 0 null comment '已完成記錄數',
    scroll_id varchar(100) default '0' null comment '記錄上⼀次滾動最後記錄欄位值',
    scroll_time datetime null comment '開始滾動時間',
    scroll_end_time datetime null comment '滾動截⽌時間',
    is_deleted tinyint(1) default 0 null comment '0:未被刪除,1:已刪除',
    create_time datetime default CURRENT_TIMESTAMP not null comment '建立時間',
    update_time datetime default CURRENT_TIMESTAMP not null on update CURRENT_TIMESTAMP comment '修改時間' 
) comment '遷移表' charset = utf8mb4;

(6)遷移配置表——etl_progress_config

create table etl_progress_config (
    id bigint auto_increment comment '主鍵' primary key,
    logic_model varchar(255) null comment '邏輯模型名(邏輯表或模型名稱)',
    record_key varchar(32) null comment '遷移批次模型欄位名稱',
    record_type int(10) null comment '遷移欄位匹配型別(0唯⼀欄位,1查詢匹配欄位)',
    is_deleted tinyint(1) default 0 null comment '0:未被刪除,1:已刪除',
    create_time datetime default CURRENT_TIMESTAMP not null comment '建立時間',
    update_time datetime default CURRENT_TIMESTAMP not null on update CURRENT_TIMESTAMP comment '修改時間' 
) comment '遷移配置表' charset = utf8mb4;

(7)需要遷移的表的資料量統計表——etl_statistical

create table etl_statistical (
    id bigint auto_increment comment '主鍵' primary key,
    logic_model varchar(255) null comment '邏輯模型名(邏輯表或模型名稱)',
    statistical_count bigint null comment '統計資料量',
    statistical_time int(8) null comment '統計時間(按天為單位)',
    is_deleted tinyint(1) default 0 null comment '0:未被刪除，1:已刪除',
    create_time datetime default CURRENT_TIMESTAMP not null comment '建立時間',
    update_time datetime default CURRENT_TIMESTAMP not null on update CURRENT_TIMESTAMP comment '修改時間' 
) comment '需要遷移的表的資料量統計表' charset = utf8mb4; 
create index inx_logic_model on etl_statistical (logic_model); 
create index inx_statistical_time on etl_statistical (statistical_time);

2.列舉類

(1)增量資料同步的操作型別——BinlogType

public enum BinlogType {
    INSERT ("新增", "INSERT"),
    UPDATE ("修改", "UPDATE"),
    DELETE ("刪除", "DELETE");
}

(2)增量同步消費操作結果——ConsumerStatus

public enum ConsumerStatus {
    NOT_CONSUME ("未消費", 0),
    CONSUME_SUCCESS ("消費成功", 1),
    COMMITTED("已提交", 2); 
}

(3)DB資料庫渠道——DBChannel

public enum DBChannel {
    //渠道⼀
    CHANNEL_1 ("歷史資料庫", "1"),
    //渠道⼆
    CHANNEL_2 ("新的資料庫", "2");
}

(4)操作結果列舉值——OperateResult

public enum OperateResult {
    FAILED ("失敗", "1"),
    SUCCESS ("成功", "0"); 
}

(5)操作型別——OperateType

public enum OperateType {
    ADD ("增量", 1),
    ALL ("全量", 2);
}

(6)滾動型別——ProgressType

public enum ProgressType {
    RANGE_SCROLL("滾動查詢資料", 0),
    CHECK_DATA ("核對查詢資料", 1); 
}

3.接⼝設計

(1)訪問遷移看板界⾯

http://localhost:8080/migrate/toIndex

(2)查詢同步進度接⼝

//取得遷移進度資訊
//@param queryCondition 查詢條件
@RequestMapping(value = "/getEtlProgresses", method = RequestMethod.POST)
public Map<String, Object> getEtlProgresses(@RequestBody EtlProgressReq queryCondition) {
    Map<String, Object> resultMap = new HashMap<>();
    resultMap.put("resultCode", OperateResult.SUCCESS.getValue());
    resultMap.put("resultMsg", OperateResult.SUCCESS.getName());
    EtlProgress etlProgress = new EtlProgress();
    BeanUtils.copyProperties(queryCondition, etlProgress);
    List<EtlProgress> resultList = migrateService.getEtlProgresses(etlProgress);
    resultMap.put("resultList", resultList);
    return resultMap;
}

(3)發起全量同步接⼝

需要同步的表和時間段

//新增全量同步 將前端傳過來的世界格式化
//@param rangeScroll 全量同步條件
//@return 儲存結果
@RequestMapping(value = "/addScroll", method = RequestMethod.POST)
public Map<String, Object> addScroll(@RequestBody RangeScroll rangeScroll) {
    rangeScroll.setStartTime(DateUtils.getStartTimeOfDate(rangeScroll.getStartTime()));
    rangeScroll.setEndTime(DateUtils.getDayEndTime(rangeScroll.getEndTime()));
    Map<String, Object> resultMap = new HashMap<>();
    resultMap.put("resultCode", OperateResult.SUCCESS.getValue());
    resultMap.put("resultMsg", OperateResult.SUCCESS.getName());
    scrollProcessor.scroll(rangeScroll);
    return resultMap; 
}

4.定時任務設計

(1)定時核對校驗資料的定時任務

程式碼入口：CheckDataTask

public void CheckData() {
    log.info("資料核對校驗開始");
    if (lock.tryLock()) {
        try {
            CheckDataProcessor checkDataProcessor = CheckDataProcessor.getInstance();
            //查詢已同步完成的批次，未核對的資料進⾏核對處理
            List<RangeScroll> rangeScrollList = checkDataProcessor.queryCheckDataList();
            for (RangeScroll rangeScroll:rangeScrollList) {// 已經在核對，本次跳過
                checkDataProcessor.checkData(rangeScroll);
            }
        } catch (Exception e) {
            log.error("資料核對過程中發⽣異常 {}", e.getMessage(), e);
        } finally {
            log.info("資料核對校驗結束");
            lock.unlock();
        }
    } 
}

流程圖：

核心程式碼如下：

//核驗資料
//@param rangeScroll 要檢查的資料抽取模型
public void checkData(RangeScroll rangeScroll) {
    EtlProgress etlProgress = addEtlProgress(rangeScroll);
    try {
        //1.先獲取⽼庫的⼀批資料
        List<Map<String, Object>> sourceList = querySourceList(rangeScroll);
        //2.再獲取新庫的⼀批資料
        List<Map<String, Object>> targetList = queryTargetList(sourceList, rangeScroll);
        //3.對資料進⾏核對校驗
        Map<BinlogType, List<Map<String, Object>>> comparisonMap = comparison(sourceList,targetList,rangeScroll);
        //4.對資料進⾏歸正處理
        updateComparisonData(comparisonMap,rangeScroll);
        //5.完成資料核對校驗，更改狀態
        updateEtlDirtyRecord(etlProgress, EtlProgressStatus.CHECK_SUCCESS.getValue(),rangeScroll, null);
    } catch (Exception e) {
        //資料核對過程失敗，只記錄資料核對錯誤資訊 
        updateEtlDirtyRecord(etlProgress, EtlProgressStatus.SUCCESS.getValue(),rangeScroll, e.getMessage());
        log.error("資料核對過程中發⽣異常 {"+e.getMessage()+"}", etlProgress);
    }
}

(2)資料量統計定時任務

程式碼入口：CountCacheTask

@Scheduled(cron = "0 0 0/1 * * ? *")
void countRefresh() {
    try {
        Constants.statisticalCountMap.clear();
        //獲取所有配置的需要同步的表
        List<String> filedKeyList = MergeConfig.getFiledKey(STATISTICAL_KEY);
        for (String fileKey : filedKeyList) {
            log.info("開始同步：" + fileKey + "的表資料");
            EtlStatistical etlStatistical = new EtlStatistical();
            etlStatistical.setLogicModel(fileKey);
            EtlStatistical etlStatistical1 = migrateScrollMapper.getMaxDateEtlStatistical(etlStatistical);
            //驗證單個表的資料明細是否已統計，如果未統計則預設從最⼩時間天數開始統計
            //否則只更新最近15天的資料(超過15天的資料變化機率很低，暫不考慮更新)
            if (ObjectUtils.isNotEmpty(etlStatistical1)) {
                //已統計的最⼤時間格式
                Integer statisticalTime = etlStatistical1.getStatisticalTime();
                Long distanceDays = DateUtils.getDistanceDays(String.valueOf(statisticalTime), DateUtils.format(new Date()) + "");
                Date minDate = null;
                if (distanceDays < 15) {
                    //更新最近15天的該表統計數量
                    minDate = DateUtils.addDays( -15);
                } else {
                    minDate = DateUtils.parseStrToDate(String.valueOf(statisticalTime), DateUtils.DATE_FORMAT_YYYYMMDD);
                }
                saveStatistical(minDate, fileKey, false);
            } else {
                //先取最⼩的⽇期，然後以該⽇期，以天為單位開始統計
                Date minDate = getMinDate(fileKey);
                saveStatistical(minDate, fileKey, true);
            }
        }
    } catch (Exception e) {
        e.printStackTrace();
    } 
}

流程圖：

(3)增量資料落地定時任務

程式碼入口：IncrementTask

void IncrementTask() {
    //獲取阻塞佇列的⽅法
    LocalQueue localQueue = LocalQueue.getInstance();
    //驗證讀佇列的資料已被處理完畢
    if (!localQueue.getIsRead()) {
        log.info("增量資料執⾏寫⼊");
        //執⾏資料寫⼊
        localQueue.doCommit();
    } 
}

流程圖：

(4)失敗重試定時任務

程式碼入口：ScrollTask

public void Scroll() {
    List<EtlProgress> progressList = queryFailAndStopProgressList();
    for (EtlProgress progress:progressList) {
        if (progress.getRetryTimes() < 3) {
            RangeScroll rangeScroll = new RangeScroll();
            rangeScroll.setId(progress.getId());
            rangeScroll.setStartScrollId(progress.getScrollId());
            rangeScroll.setTableName(progress.getLogicModel());
            rangeScroll.setPageSize(progress.getFinishRecord());
            rangeScroll.setStartTime(progress.getScrollTime());
            rangeScroll.setEndTime(progress.getScrollEndTime());
            rangeScroll.setCurTicketStage(progress.getCurTicketStage());
            rangeScroll.setTicket(progress.getTicket());
            rangeScroll.setRetryFlag(true);
            scrollProcessor.scroll(rangeScroll);
       }
    } 
}

流程圖：

5.技術亮點

(1)滾動拉取方案

全量同步時，是需要分批次查詢資料的，這裡的分批次拉取資料就是滾動拉取資料。

可以看到的具體流程是：

步驟一：每次進⾏全量同步時都會往遷移表中新增⼀條記錄。

步驟二：然後每次最多查詢500條資料作為⼀個批次，該批次會在遷移明細表中對應新增⼀條記錄。

步驟三：其中會進行滾動查詢。也就是會根據當前選擇資料同步的時間範圍內，到訂單表中查詢最⼩訂單號，然後將這個最⼩的訂單號會儲存在⼀個RangeScroll的實體類中。當查詢訂單資料時，查詢條件會⽐較簡單，就是訂單號⼤於最⼩的訂單號。然後經過時間過濾以及過濾掉目標庫已有的資料後，剩下的就是本次全量同步的⽬標資料了。當這些目標資料同步到目標庫後，會更新遷移明細狀態，以及將當前已查到的訂單資料中的最⼤訂單號重置到RangeScroll類中。這樣在下⼀輪查詢時，查詢引數中的最⼩訂單號會從RangeScroll類中獲取，從而實現滾動查詢。

舉個例⼦：有四條訂單資料，對應四個訂單號：1001、1002、1003、1004。初始查詢發現最⼩訂單號為1001，此時會對1001減1=1000，保證訂單號為1001這條資料能查到，也能同步過去(這個細節容易忽視)。然後假設每次只能查詢2條資料，第⼀次查詢，因為查詢條件是⼤於最⼩訂單號減1也就是1000，查詢到了1001、1002這兩個訂單號的訂單資料。處理完第⼀輪後，會把本次最⼤的訂單號也就是1002，作為下⼀輪查詢的最⼩訂單號。下⼀輪查詢情況，會查詢訂單號⼤於1002的訂單資料，此時就會查詢出1003、1004對應的訂單資料，並且將1004作為下⼀輪的最⼩訂單號查詢。再下⼀輪查詢時，條件就變為訂單號⼤於1004，此時就查詢不到資料了，資料遷移結束。本次全量同步經過兩個批次查詢，⼀共新增了⼀條遷移記錄，還有兩條遷移明細。最後遷移記錄和遷移明細的狀態，都會更新為同步成功狀態。

步驟四：根據表名分組進行批次插⼊。當查詢源資料庫的資料並過濾掉⼀些資料後，並不是⼀條⼀條插⼊到目標庫中的。⽽是會根據表名進⾏分組，然後批次插⼊目標庫。相⽐於⼀條條資料插⼊，這樣效率也會更⾼⼀點，這是需要做的⼀個最佳化點。

步驟五：更新遷移明細表記錄和遷移記錄為同步狀態。迴圈滾動查詢出來的每一批次資料處理完後，就更新遷移明細表中該批次的同步狀態。當迴圈滾動從源資料庫查不出資料後，就更新本次遷移記錄為同步狀態。

(2)巧妙的統計滾動進度方案

在進行資料遷移時，怎麼能在不影響效能的情況下快速計算出現在遷移進度執行到多少呢？

從圖中可以看到：會有⼀個定時任務，每隔⼀個⼩時就會統計⼀下不同時間段的資料量。⾸先會建立⼀張訂單資料統計表，用於統計每天的資料量，⽅便後續計算遷移進度。定時任務會從訂單資料統計表中，查詢最新的⼀條記錄，也就是最近⼀次統計的記錄。

第⼀次統計時定時任務肯定是查不到的，所以這時定時任務會到訂單表中查詢訂單的最小建立時間，並計算距今的天數。然後開始統計訂單數量，統計⽅式是，對每天的訂單數量⽣成⼀條訂單資料統計記錄。

⽐如，最早的⼀條訂單建立時間為2020-01-01。下⼀步會計算這個時間，距離當前全量資料遷移任務執⾏時有⼏天，假設有1年。然後為這1年來的每⼀天建立每天訂單資料統計記錄，即每⼀條記錄會統計每⼀天的訂單量。比如查詢2020-01-01 00:00:00~2020-01-01 23:59:59這⼀天時間範圍內的訂單資料量，然後建立⼀條資料遷移統計記錄，以此類推。

當第⼆次執⾏這個定時任務時，就可以查到最新⽣成的⼀條訂單資料統計記錄了。此時會看下這條記錄的時間，距離現在是否超過2天，防⽌跨天資料產⽣。

有了以上資料統計後，在遷移明細表中的每條記錄中，同步完⼀批資料都會記錄⽬前已遷移完多少資料。這樣就可⽤已同步完畢的資料量，除以訂單資料統計記錄的訂單總資料量，結果就是遷移進度。

(3)防止增量同步資料丟失和高效寫入方案

在增量同步中，首先會透過Canal監聽源資料庫中的binlog⽇志，然後Canal再將監聽到的binlog⽇志傳送放到RocketMQ中，接著資料遷移系統會消費RocketMQ中的binlog訊息，把增刪改操作同步到目標資料庫。

問題一：資料遷移系統消費MQ訊息時，如何保證從MQ獲取到的binlog訊息不會丟失

如果源資料庫增刪改操作了，但由於消費異常導致binlog訊息丟失了，那麼目標資料庫中就沒有對應的增量資料操作，這樣源資料庫和目標資料庫的資料就會不⼀致。為了避免消費異常導致binlog訊息丟失，需要設定禁止自動提交訊息。

消費MQ的binlog訊息時，為了提升消費速度，可以採用多執行緒進行消費。比如每消費一條MQ訊息，就向執行緒池提交一個任務，任務執行完才提交訊息。當這些任務的執行速度慢於消費MQ訊息的速度時，執行緒池的阻塞佇列中就會積壓一些任務。如果此時機器釋出重啟，那麼就可能會導致執行緒池中阻塞佇列裡積壓的任務丟失。但是由於禁止訊息自動提交，所以這些丟失任務對應的MQ訊息後續還可以重新被消費，然後再次被提交到執行緒池中進行處理。

為了方便對binlog訊息進行管理和確保binlog訊息不丟失且有記錄可查，這裡引⼊訊息拉取落庫和非同步訊息提交機制，由兩個定時任務來完成。如下所示：

⾸先源資料庫中會有⼀張消費記錄表，定時任務1每次從MQ拉取並消費⼀條訊息時，都會先在消費記錄表中新增⼀條消費記錄，每條消費記錄的初始狀態都為未消費。然後定時任務1再將獲取到的binlog訊息，在目標資料庫中重做binlog⽇志。也就是將舊庫中的增刪改操作，在目標資料庫中重做⼀遍。重做完成後，再來更新剛剛新增的消費記錄的狀態，將記錄的狀態從未消費更新為已消費狀態。

此時需要注意：定時任務1消費MQ的binlog訊息後，並不是自動向MQ提交訊息，⽽是需要進行⼿動提交。否則如果訊息都沒有消費成功，就自動向MQ提交訊息，則可能會出現訊息丟失的情況。所以為了保證binlog訊息不丟失，不會⾃動提交訊息，⽽是將提交訊息的任務交給定時任務2來處理。

定時任務2會專⻔從消費記錄表中，查詢已消費的那些記錄，然後向MQ提交訊息，這樣下次就不會從MQ中消費到了。向MQ提交完訊息後，同時會將消費記錄表中的記錄狀態，從已消費更新為已提交。⾄此，⼀個訊息的消費流程才算結束。

問題二：如何提高增量同步時的資料寫入效率

為了提高資料寫入目標資料庫的效率，這裡引入了資料合併、過濾、讀寫佇列的機制，讀寫佇列和資料合併流程圖如下：

定時任務1新增完消費記錄後，並不會⻢上把資料寫入目標庫，⽽是把binlog日誌先放到⼀個寫佇列中，與寫佇列相對的還有⼀個讀佇列。讀佇列是專⻔用於提供給定時任務3進行處理訊息寫⼊操作的。

資料合併提升寫入效率：如果源資料庫中的資料在短時間內進⾏了多次操作，其實只需要保留最新的binlog⽇志即可。所以才使用了一個記憶體佇列來存放binlog訊息，而且會每隔15秒批次處理一次記憶體佇列的所有binlog訊息，以此減少同一條資料對應多條binlog的寫入處理。

binlog日誌的處理細節：從合併後的binlog⽇志中獲取主鍵ID，根據主鍵ID到目標庫中查詢對應的資料。

如果目標庫中能查到這條資料，那麼需要和源資料庫的binlog資料進⾏對⽐。只有當源資料庫的更新時間⼤於目標庫的更新時間，才允許更新資料到目標庫中。如果當前的binlog⽇志的操作型別為刪除操作，則可不⽤對⽐更新時間，直接在目標庫中重做這條binlog⽇志，畢竟源資料庫在刪除⼀條資料時不會更新修改時間。

如果源資料庫的⼀條binlog⽇志對應的資料在目標庫中沒有查到，那麼繼續判斷。如果binlog⽇志是刪除操作，那就沒必要在目標庫中重做這條⽇志了，直接過濾掉。目標庫都沒有資料了，就沒必要執⾏刪除操作。如果binlog⽇志的型別為修改操作，那也沒必要執⾏修改操作。因為目標庫沒資料，直接update也不⾏，可以將binlog的操作型別修改為新增操作。畢竟在binlog⽇志中，包含了⼀條訂單資料的所有欄位的值，⾜以滿⾜新增資料需要的所有欄位。

經過以上的資料過濾操作，⼀⽅⾯避免源資料庫中的舊資料覆蓋了目標庫的新資料，另⼀⽅⾯避免沒必要執⾏的刪除和更新操作也在目標庫中繼續執⾏。

程式碼入口如下：

@Component 
public class CanalConsumeTask implements ApplicationRunner { 
    //rocketmq的nameServer地址
    @Value("${rocketmq.name-server:127.0.0.1:9876}")
    private String nameServerUrl; 
    @Autowired
    private MigrateConfigService migrateConfigService;
    
    @Override
    public void run(ApplicationArguments args) throws Exception {
        // 當前配置的全部遷移系統
        List<ScrollDomain> scrollDomainList = migrateConfigService.queryScrollDomainList();
        ExecutorService executors = Executors.newFixedThreadPool(scrollDomainList.size());
        for (ScrollDomain scrollDomain:scrollDomainList) {
            if (scrollDomain.getDataSourceType().equals(1)) {
                //執⾏拉取任務
                executors.execute(new CanalPullRunner(scrollDomain.getDomainTopic(), nameServerUrl));
                //執⾏提交任務
                executors.execute(new CanalPullCommitRunner(scrollDomain.getDomainTopic(), nameServerUrl));
            } 
        } 
    } 
}

(4)資料擴容方案

擴容的過程其實就是將一個8庫8表的源資料庫資料遷移到16庫16表的目標庫中，如下：

單庫 -> 8庫8表和 8庫8表->16庫16表是⾮常相似的。因為它們都需要全量同步、增量同步、資料驗證的功能，這些處理其實是⼀樣的。可以直接使⽤單庫 -> 8庫8表的資料遷移系統的程式碼。

資料遷移系統的程式碼需要如下改動：

改動一：源資料庫和目標庫配置

讀取源資料來源(單庫)和⽬標資料來源(8庫8表)的地⽅，就是讀取單庫的地⽅，修改為讀取8庫8表，⽽寫⼊⽬標資料來源由8庫8表修改為16庫16表。

改動二：修改Canal配置

因為單庫 -> 8庫8表增量同步時，Canal監聽的是單庫的binlog。擴容時，Canal需要監聽8庫的binlog。

6.全量同步和增量同步整體流程圖

7.功能升級

現在資料遷移系統的遷移來源資料來源和寫入資料來源都是配置在migrate.properties⽂件⾥的，這種寫死資料來源的⽅式不⽅便擴充套件和修改。假設在使⽤過程中需要對其他庫和表進⾏資料遷移，則需要在程式碼⾥更改原資料來源和寫⼊資料來源，並且重新部署新項⽬進⾏相關配置後，再來進⾏遷移操作，所以我們採⽤資料庫配置的⽅式來進⾏。

(1)資料遷移系統資料來源動態化配置

新建4張表存放動態化配置：

-- auto-generated definition
create table scroll_db_config (
    id bigint auto_increment comment '主鍵' primary key,
    domain_id bigint null comment '歸屬系統資料來源配置ID',
    jdbc_url varchar(256) null comment '資料庫連線串',
    jdbc_username varchar(64) null comment '⽤戶名',
    jdbc_pasword varchar(256) null comment '密碼',
    is_deleted tinyint(1) default 0 null comment '0:未被刪除,1:已刪除',
    create_time datetime default CURRENT_TIMESTAMP not null comment '建立時間',
    update_time datetime default CURRENT_TIMESTAMP not null on update CURRENT_TIMESTAMP comment '修改時間' 
) comment '遷移資料來源配置表' charset = utf8mb4; 


-- auto-generated definition 
create table scroll_domain (
    id bigint auto_increment comment '主鍵' primary key,
    domain varchar(128) null comment '所屬系統(會員、訂單、交易)',
    domain_topic varchar(128) null comment '當資料來源為來源的時候，配置對應的訊息topic',
    data_source_type tinyint(1) null comment '資料來源型別，1資料來源配置，2資料寫⼊配置',
    sql_show tinyint(1) default 0 null comment '是否顯示 shardingsphere sql執⾏⽇志,預設不列印(0不列印，1列印)',
    table_num int(8) null comment '每個邏輯庫中表的數量',
    is_deleted tinyint(1) default 0 null comment '0:未被刪除,1:已刪除',
    create_time datetime default CURRENT_TIMESTAMP not null comment '建立時間',
    update_time datetime default CURRENT_TIMESTAMP not null on update CURRENT_TIMESTAMP comment '修改時間' 
) comment '遷移業務配置表' charset = utf8mb4;


-- auto-generated definition
create table scroll_shard_config (
    id bigint auto_increment comment '主鍵' primary key,
    domain_id bigint not null comment '資料業務系統源ID',
    logic_model varchar(128) not null comment '邏輯模型名(邏輯表或物理表名稱)',
    db_sharding_columns varchar(256) not null comment '庫分⽚列名稱，多個列以逗號分隔',
    table_sharding_columns varchar(256) not null comment '表分⽚列名稱，多個列以逗號分隔',
    db_sharding_algorithm varchar(256) not null comment '庫分⽚策略類名',
    table_sharding_algorithm varchar(256) not null comment '表分⽚策略類名',
    is_deleted tinyint(1) default 0 null comment '0:未被刪除,1:已刪除',
    create_time datetime default CURRENT_TIMESTAMP not null comment '建立時間',
    update_time datetime default CURRENT_TIMESTAMP not null on update CURRENT_TIMESTAMP comment '修改時間' 
) comment '資料來源分⽚規則配置表' charset = utf8mb4;


-- auto-generated definition
create table scroll_table_config (
    id bigint auto_increment comment '主鍵' primary key,
    domain varchar(128) null comment '業務系統',
    source_table_name varchar(128) null comment '源表名',
    target_table_name varchar(128) null comment '⽬標表名',
    create_time datetime default CURRENT_TIMESTAMP not null comment '建立時間',
    update_time datetime default CURRENT_TIMESTAMP not null on update CURRENT_TIMESTAMP comment '修改時間' 
) comment '遷移表對映配置' charset = utf8mb4;

流程圖：

表配置簡述：

一.⾸先配置scroll_domain(遷移業務配置表)表。這裡會配置2條資料，⼀個是訂單的來源資料庫資料，⼀個是訂單的⽬標庫資料。其中源資料庫需要配置domain_topic，也就是Canal讀取資料庫⽇志要傳送的MQ的topic。

二.配置scroll_db_config(遷移資料來源配置表)指定遷移資料來源。

三.配置scroll_shard_config(資料來源分⽚規則配置表)⽤來指定分庫分表的規則。

四.配置scroll_table_config(遷移表對映配置)，⽤來對映訂單系統兩個表需要遷移到分庫分表⾥的哪個表⾥⾯去。

(2)遷移資料庫操作對應的xml動態⽣成

關於動態SQL我們採⽤PreparedStatement⽅式拼接SQL，PreparedStatement的第⼀次執⾏消耗是很⾼的，它的效能體現在後⾯的重複執⾏(快取的作⽤)。相關程式碼如下：

@Override
public String queryMinScrollId(RangeScroll rangeScroll) {
    //驗證必填引數
    if (StrUtil.isNotBlank(rangeScroll.getTableName())) {
        SqlSession session = null;
        PreparedStatement pst = null;
        try {
            rangeScroll.setScrollName(MergeConfig.getSingleKey(rangeScroll.getTableName()));
            String sql = " select " + rangeScroll.getScrollName() + " from " + rangeScroll.getTableName() + " where create_time >= ?" + " order by create_time asc, " + rangeScroll.getScrollName() + " asc LIMIT 1";
            //獲取指定的資料來源
            session = scrollDataSourceConfig.getSqlSession(rangeScroll.getDomain(), 1);
            pst = session.getConnection().prepareStatement(sql);
            pst.setDate(1, new Date(rangeScroll.getStartTime().getTime()));
            ResultSet result = pst.executeQuery();
            while (result.next()) {
                return String.valueOf(Long.parseLong(result.getString(1)) - 1);
            }
            return null;
        } catch (Exception e) {
            log.error("queryInfoList⽅法執⾏出錯", e);
            return "0";
        } finally {
            closeSqlSession(session,pst,rangeScroll.getDomain(), 1);
        }
    }
    return "0"; 
}


//負責分⻚滾動資料
//@param rangeScroll 查詢條件
@Override 
@SuppressWarnings({"unchecked"}) 
public List<Map<String, Object>> queryInfoList(RangeScroll rangeScroll) {
    if (StrUtil.isNotBlank(rangeScroll.getTableName()) && StrUtil.isNotBlank(rangeScroll.getStartScrollId())) {
        SqlSession session = null;
        PreparedStatement pst = null;
        try {
            String sql = "select * from " + rangeScroll.getTableName() + " where " + rangeScroll.getScrollName() + " > ? " + " order by " + rangeScroll.getScrollName() + " asc LIMIT " + rangeScroll.getPageSize();
            //獲取指定的資料連線
            session = scrollDataSourceConfig.getSqlSession(rangeScroll.getDomain(), 1);
            pst = session.getConnection().prepareStatement(sql);
            pst.setString(1,rangeScroll.getStartScrollId());
            ResultSet resultSet = pst.executeQuery();
            return converter(resultSet);
        } catch (Exception e) {
            log.error("queryInfoList⽅法執⾏出錯", e);
            return new ArrayList<>();
        } finally {
            closeSqlSession(session,pst,rangeScroll.getDomain(), 1);
        }
    }
    return new ArrayList<>();
}

這⾥需要注意批次插⼊資料時，需要對資料進⾏格式識別和處理。

//SQL預處理，需要什麼加什麼
//@param preparedStatement SQL預處理
//@param pos 定位
//@param value 值
public static void buildPerParedStatement(PreparedStatement preparedStatement, int pos, Object value) {
    try {
        if (value instanceof String) {
            preparedStatement.setString(pos, (String) value);
        } else if (value instanceof Long) {
            preparedStatement.setLong(pos, (Long) value);
        } else if (value instanceof Float) {
            preparedStatement.setFloat(pos, (Long) value);
        } else if (value instanceof Double) {
            preparedStatement.setDouble(pos, (Long) value);
        } else if (value instanceof Integer) {
            preparedStatement.setInt(pos, (Integer) value);
        } else if (value instanceof java.util.Date) {
            preparedStatement.setTimestamp(pos, new Timestamp(((Date) value).getTime()));
        } else if (value instanceof Boolean) {
            preparedStatement.setBoolean(pos, (Boolean) value);
        } else if (value instanceof BigDecimal) {
            preparedStatement.setBigDecimal(pos, (BigDecimal) value);
        }
    } catch (Exception e) {
        log.error("SQL預處理失敗：{}", e);
    } 
}

(3)擴容遷移資料時的問題

當全域性滾動和增量滾動進⾏資料過濾時：會根據滾動號order_no進⾏in查詢，這時候會從分庫⾥⾯進⾏批次查詢。但如下的這個查詢⽅法⽐較耗時，因為ShardingSphere在in查詢操作時，會去8個庫8張表⾥⾯去輪流查詢，導致效率很低。雖然SQL的執⾏結果是正確的，但並未達到最優的查詢效率。

//批次查詢資料資訊
//@param scroll 資料物件
//@param identifiers 唯⼀標識List
//@param dbChannel 指向具體的BD庫
@Override
@SuppressWarnings({"unchecked"})
public List<Map<String, Object>> findByIdentifiers(RangeScroll scroll, List<String> identifiers,String dbChannel) {
    if (!Objects.isNull(scroll) && CollUtil.isNotEmpty(identifiers)) {
        SqlSession session = null;
        PreparedStatement pst = null;
        Integer dataSourceType = 2;
        try {
            if (DBChannel.CHANNEL_1.getValue().equals(dbChannel)) {
                dataSourceType = 1;
            }
            session = scrollDataSourceConfig.getSqlSession(scroll.getDomain(), dataSourceType);
            if (null != session) {
                StringBuffer sql = new StringBuffer();
                sql.append("select * from " + scroll.getTargetTableName() + " where " + scroll.getScrollName() + " in (");
                for (String id : identifiers) {
                    sql.append("?,");
                }
                String sqlStr = sql.substring(0, sql.length() - 1) + ")";
                pst = session.getConnection().prepareStatement(sqlStr);
                for (int i=1; i<=identifiers.size(); i++) {
                    pst.setString(i,identifiers.get(i-1));
                }
                ResultSet resultSet = pst.executeQuery();
                return converter(resultSet);
            }
        } catch (Exception e) {
            log.error("findByIdentifiers⽅法執⾏出錯", e);
            return new ArrayList<>();
        } finally {
            closeSqlSession(session,pst,scroll.getDomain(), dataSourceType);
        }
    }
    return new ArrayList<>(); 
}

效能優化之資料庫篇5-分庫分表與資料遷移
2021-07-29
優化資料庫
分庫分表？如何做到永不遷移資料和避免熱點？
2019-05-07
256變4096：分庫分表擴容如何實現平滑資料遷移？
2021-03-02
分庫分表插入資料
2020-12-09
大資料資料庫讀寫分離分庫分表
2021-01-01
大資料資料庫
資料庫怎麼分庫分表
2019-11-12
資料庫
系統資料遷移
2024-03-19
達夢資料庫系統表空間資料檔案遷移過程
2022-01-06
資料庫
資料庫分庫分表的總結
2019-02-16
資料庫
5 分鐘完成 ZooKeeper 資料遷移
2022-10-08
[資料庫][分庫分表]分庫分表之後，id主鍵如何處理
2019-02-19
資料庫
《資料儲存》之《分庫，分表》
2021-07-20
MariaDB Spider 資料庫分庫分表實踐
2022-02-07
IDE資料庫
oracle分表效率,資料庫分庫分表是什麼，什麼情況下需要用分庫分表
2022-09-12
Oracle資料庫
資料庫遷移
2020-11-30
資料庫
好好的系統，為什麼要分庫分表？
2022-11-25
3分鐘短文：一看就是乾貨！Laravel遷移資料庫！
2020-10-28
Laravel資料庫
關係型資料庫分庫分表系列之一
2022-01-05
資料庫
MySQL資料庫之分庫分表方案
2019-07-24
MySql資料庫
.Net下極限生產力之efcore分表分庫全自動化遷移CodeFirst
2022-07-07
redis資料庫遷移
2018-07-06
Redis資料庫
資料庫遷移：理解
2020-01-09
資料庫
laravel資料庫遷移
2023-02-14
Laravel資料庫
報表資料分庫儲存
2019-12-10
淺談高效能資料庫叢集——分庫分表
2018-07-03
資料庫
Java實戰：教你如何進行資料庫分庫分表
2021-07-28
Java資料庫
分庫分表系列：分庫分表的前世今生
2022-03-19
分庫分表
2024-08-20
百億級資料分庫分表後怎麼分頁查詢？
2022-12-05
基於代理的資料庫分庫分表框架 Mycat實踐
2018-12-19
資料庫框架
MySQL 資料庫之網際網路常用分庫分表方案
2020-02-08
MySql資料庫
分散式資料庫中介軟體 MyCat | 分庫分表實踐
2020-08-20
分散式資料庫
3分鐘短文：書接上回，Laravel資料庫遷移的那些個小技巧
2020-09-29
Laravel資料庫
某行XX系統DB2資料庫遷移實施方案
2021-06-12
DB2資料庫
資料遷移（1）——通過資料泵表結構批量遷移
2018-10-30
linux mysql資料庫遷移
2018-12-01
LinuxMySql資料庫
django資料庫遷移-15
2018-11-14
Django資料庫
用rman遷移資料庫
2019-07-21
資料庫

分庫分表—4.資料遷移系統文件

相關文章