PostgreSQL10.0preview功能增強-邏輯複製支援並行COPY初始化資料

德哥發表於2017-03-28

原文網址 : https://flycode.co/archives/186962

背景

PostgreSQL 已支援邏輯複製，同時對邏輯複製增加了一個初始同步的增強功能，支援通過wal receiver協議跑COPY命令（已封裝在邏輯複製的核心程式碼中），支援多表並行。

也就是說，你可以使用PostgreSQL的邏輯複製，快速的（流式、並行）將一個例項遷移到另一個例項。

Logical replication support for initial data copy  
  
Add functionality for a new subscription to copy the initial data in the  
tables and then sync with the ongoing apply process.  
  
For the copying, add a new internal COPY option to have the COPY source  
data provided by a callback function.  The initial data copy works on  
the subscriber by receiving COPY data from the publisher and then  
providing it locally into a COPY that writes to the destination table.  
  
A WAL receiver can now execute full SQL commands.  This is used here to  
obtain information about tables and publications.  
  
Several new options were added to CREATE and ALTER SUBSCRIPTION to  
control whether and when initial table syncing happens.  
  
Change pg_dump option --no-create-subscription-slots to  
--no-subscription-connect and use the new CREATE SUBSCRIPTION  
... NOCONNECT option for that.  
  
Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>  
Tested-by: Erik Rijkers <er@xs4all.nl>

邏輯複製包含的初始化COPY的流程如下

主庫開啟事務快照(快照支援在多個會話間共享, 這也是PostgreSQL的獨門祕籍之一), COPY資料, COPY結束後釋放快照, 從快照對應的WAL LSN開始接收增量.

/*-------------------------------------------------------------------------  
   2  * tablesync.c  
   3  *    PostgreSQL logical replication  
   4  *  
   5  * Copyright (c) 2012-2016, PostgreSQL Global Development Group  
   6  *  
   7  * IDENTIFICATION  
   8  *    src/backend/replication/logical/tablesync.c  
   9  *  
  10  * NOTES  
  11  *    This file contains code for initial table data synchronization for  
  12  *    logical replication.  
  13  *  
  14  *    The initial data synchronization is done separately for each table,  
  15  *    in separate apply worker that only fetches the initial snapshot data  
  16  *    from the publisher and then synchronizes the position in stream with  
  17  *    the main apply worker.  
  18  *  
  19  *    The are several reasons for doing the synchronization this way:  
  20  *     - It allows us to parallelize the initial data synchronization  
  21  *       which lowers the time needed for it to happen.  
  22  *     - The initial synchronization does not have to hold the xid and LSN  
  23  *       for the time it takes to copy data of all tables, causing less  
  24  *       bloat and lower disk consumption compared to doing the  
  25  *       synchronization in single process for whole database.  
  26  *     - It allows us to synchronize the tables added after the initial  
  27  *       synchronization has finished.  
  28  *  
  29  *    The stream position synchronization works in multiple steps.  
  30  *     - Sync finishes copy and sets table state as SYNCWAIT and waits  
  31  *       for state to change in a loop.  
  32  *     - Apply periodically checks tables that are synchronizing for SYNCWAIT.  
  33  *       When the desired state appears it will compare its position in the  
  34  *       stream with the SYNCWAIT position and based on that changes the  
  35  *       state to based on following rules:  
  36  *        - if the apply is in front of the sync in the wal stream the new  
  37  *          state is set to CATCHUP and apply loops until the sync process  
  38  *          catches up to the same LSN as apply  
  39  *        - if the sync is in front of the apply in the wal stream the new  
  40  *          state is set to SYNCDONE  
  41  *        - if both apply and sync are at the same position in the wal stream  
  42  *          the state of the table is set to READY  
  43  *     - If the state was set to CATCHUP sync will read the stream and  
  44  *       apply changes until it catches up to the specified stream  
  45  *       position and then sets state to READY and signals apply that it  
  46  *       can stop waiting and exits, if the state was set to something  
  47  *       else than CATCHUP the sync process will simply end.  
  48  *     - If the state was set to SYNCDONE by apply, the apply will  
  49  *       continue tracking the table until it reaches the SYNCDONE stream  
  50  *       position at which point it sets state to READY and stops tracking.  
  51  *  
  52  *    The catalog pg_subscription_rel is used to keep information about  
  53  *    subscribed tables and their state and some transient state during  
  54  *    data synchronization is kept in shared memory.  
  55  *  
  56  *    Example flows look like this:  
  57  *     - Apply is in front:  
  58  *        sync:8  
  59  *          -> set SYNCWAIT  
  60  *        apply:10  
  61  *          -> set CATCHUP  
  62  *          -> enter wait-loop  
  63  *        sync:10  
  64  *          -> set READY  
  65  *          -> exit  
  66  *        apply:10  
  67  *          -> exit wait-loop  
  68  *          -> continue rep  
  69  *     - Sync in front:  
  70  *        sync:10  
  71  *          -> set SYNCWAIT  
  72  *        apply:8  
  73  *          -> set SYNCDONE  
  74  *          -> continue per-table filtering  
  75  *        sync:10  
  76  *          -> exit  
  77  *        apply:10  
  78  *          -> set READY  
  79  *          -> stop per-table filtering  
  80  *          -> continue rep  
  81  *-------------------------------------------------------------------------  
  82  */  
  83

這個patch的討論，詳見郵件組，本文末尾URL。

PostgreSQL社群的作風非常嚴謹，一個patch可能在郵件組中討論幾個月甚至幾年，根據大家的意見反覆的修正，patch合併到master已經非常成熟，所以PostgreSQL的穩定性也是遠近聞名的。

參考

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=7c4f52409a8c7d85ed169bbbc1f6092274d03920

PostgreSQL邏輯複製資料同步到kafka
2022-03-31
SQLKafka
openGauss-邏輯複製
2024-08-30
PostgreSQL 邏輯複製解密
2022-09-27
SQL解密
PostgreSQL邏輯複製解密
2022-09-21
SQL解密
LightDB/Postgres邏輯複製的搭建
2023-02-14
MySQL 5.7 並行複製
2018-06-04
MySql並行
mysql 並行複製原理
2021-12-23
MySql並行
MySQL 5.7並行複製
2021-04-12
MySql並行
MySQL 8 複製效能的增強
2024-11-15
MySql
240815-PostgreSQL自帶邏輯複製簡單使用
2024-08-18
SQL
HGDB企業版V6邏輯複製搭建
2021-10-20
[Mysql]Mysql5.7並行複製
2019-11-27
MySql並行
MySQL增強（Loss-less）半同步複製
2018-08-16
MySql
如何將SQL寫成複雜邏輯和構造資料
2022-07-20
SQL
React元件化複製 react-clipboardjs-copy
2018-06-26
React元件化JS
SAP Spartacus BrowserPlatformLocation的初始化邏輯
2020-10-04
Platform
Java 中的寫時複製 (Copy on Write, COW)
2018-10-14
Java
SAP UI5 BarcodeScannerButton 的初始化邏輯 - Cordova API 檢測等邏輯
2022-03-23
UIAPI
MySQL並行複製(MTS)原理（完整版）
2022-06-09
MySql並行
MySQL並行複製-原始碼理解記錄
2023-02-17
MySql並行原始碼
DM7資料複製之資料庫級複製
2019-12-19
資料庫
資料共享（淺複製）與資料獨立（深複製）
2024-07-21
【轉】[C#] 1行程式碼實現C#複製資料夾功能
2024-07-05
C#行程
資料庫複製（一）–複製介紹
2018-09-04
資料庫
js複製功能
2024-05-28
JS
資料庫邏輯遷移方案
2021-05-06
資料庫
python中的複製copy模組怎麼使用？
2021-09-11
Python
IDEA 2024.1：Spring支援增強、GitHub Action支援增強、更新HTTP Client等
2024-02-17
IdeaSpringGithubHTTPclient
Day 7.5 資料型別總結 + 複製淺複製深複製
2024-10-08
資料型別
面試題分解—「淺複製/深複製、定義屬性使用copy還是strong ？」
2018-09-27
面試題
強業務邏輯抽象API介面
2019-02-16
抽象API
DataPipeline「自定義」資料來源，解決複雜請求邏輯外部資料獲取難題
2020-02-13
API
oracle邏輯備份之--資料泵
2018-03-02
Oracle
小程式複製功能
2018-12-17
封裝curl_multi讓請求與業務邏輯並行執行
2020-12-23
封裝並行
SAP UI5 BarcodeScannerButton 的初始化邏輯 - feature 檢測，Cordova API 檢測等邏輯
2022-03-16
UIAPI
openGauss DSS功能增強
2024-03-28
Redis的資料複製
2022-12-21
Redis
WWDC 2018：IAP最佳實踐並增強活動營銷功能
2019-03-04

PostgreSQL10.0preview功能增強-邏輯複製支援並行COPY初始化資料

標籤

背景

參考

相關文章