PostgreSQL10流式物理、邏輯主從最佳實踐

德哥發表於2017-07-13

標籤

PostgreSQL , 流複製 , 主從 , 邏輯訂閱


背景

流複製起源

PostgreSQL 自從2010年推出的9.0版本開始,支援流式物理複製,使用者可以通過流式複製,構建只讀備庫(主備物理複製,塊級別一致)。流式物理複製可以做到極低的延遲(通常在1毫秒以內)。

同步流複製

2011年推出的9.1版本,支援同步複製,當時只支援一個同步流複製備節點(例如配置了3個備,只有一個是同步模式的,其他都是非同步模式)。

在同步複製模式下,當使用者提交事務時,需要等待這筆事務的WAL日誌複製到同步流複製備節點,才會返回提交成功的ACK給客戶端。

同步模式下,可以確保資料的0丟失。(只要客戶端收到了事務提交成功的ACK,這筆事務的WAL就有兩份。)

級聯流複製

2012年推出的9.2版本,支援級聯流複製。意思是備庫還可以再連備庫。

級聯複製特別適合跨機房的使用,例如主庫在A機房,備庫在B機房,但是B機房需要建立多個備庫時,那麼B機房只需要建立一個直連主庫的備庫,其他的備庫可以通過B機房的這個備庫級聯產生。從而減少網路開銷。

流式虛擬備庫

2012年推出的9.2版本,除了支援級聯複製,還支援虛擬備庫,什麼是虛擬備庫呢?就是隻有WAL,沒有資料檔案的備庫。

通過虛擬備庫,可以流式的接收WAL,進行實時的流式WAL歸檔。提高備份或歸檔的實時性。

邏輯複製的基礎

2014年推出的9.4版本,在WAL中增加了邏輯複製需要的基礎資訊,通過外掛,可以實現邏輯複製。

邏輯複製可以做到對主庫的部分複製,例如表級複製,而不是整個叢集的塊級一致複製。

邏輯複製的備庫不僅僅是隻讀的,也可以執行寫操作。

增加幾種同步級別

2016年推出的9.6版本,PG的流式複製,通過複製WAL達到同步的目的,因此同步級別也和WAL有關。通過synchronous_commit引數,可以配置事務的同步級別。

1、on, 表示本地WAL fsync,同步standby WAL fsync。即兩份持久化的WAL。

2、remote_apply, 表示本地WAL fsync,同步standby WAL 已恢復。這個帶來的RT最高。

3、remote_write, 表示本地WAL fsync,同步standby WAL 非同步write完成。一份持久化,備庫的WAL可能還在OS CACHE中。

4、local, 表示本地WAL fsync。

5、off, 表示本地WAL寫到wal buffer中即返回客戶端事務提交成功的ACK,為非同步提交(資料庫CRASH可能導致事務丟失,但不會導致資料庫不一致)。

RT影響,從低到高如下:

off, local, remote_write, on, remote_apply。

流式備份壓縮

2017年推出的10版本,pg_basebackup, pg_receivewal支援流式壓縮備份WAL。

quorum based 同步流複製

2017年推出的10版本,支援quorum based的同步流複製,例如有3個備節點,你可以告訴主庫,這個事務需要3份WAL副本,那麼主庫需要等待至少2個備節點已將WAL同步過去的反饋,才會將事務提交成功的ACK返回給客戶端。

quorum based同步流複製,結合raft協議,可以實現零資料丟失的高可用、高可靠架構。

內建邏輯訂閱、複製

2017年推出的10版本,內建了邏輯訂閱的功能。

多master

2017年推出的10版本,通過邏輯訂閱的功能,可以實現多Master架構。

一、流式 物理主從 最佳實踐

以一主兩從,quorum based 同步(一副本)為例,介紹PG 10的主從最佳實踐。

環境

三臺機器(假設主機有64G記憶體),同一區域網,相互網路互通(至少資料庫監聽埠應該互通)。

Linux CentOS 7.x x64

HOSTA : 監聽埠1921

HOSTB : 監聽埠1921

HOSTC : 監聽埠1921

(本文使用一臺物理機來模擬,IP為127.0.0.1,埠分別為2921,2922,2923,讀者請根據實際環境修改)

軟體安裝略

請參考

《PostgreSQL on Linux 最佳部署手冊》

初始化主庫

mkdir /disk1/digoal/pgdata2921  
  
chown digoal /disk1/digoal/pgdata2921  
  
initdb -D /disk1/digoal/pgdata2921 -E UTF8 --locale=C -U postgres  

配置postgresql.conf

cd /disk1/digoal/pgdata2921  
  
vi postgresql.conf  
  
listen_addresses = `0.0.0.0`  
port = 2921  
max_connections = 1000  
unix_socket_directories = `.`  
tcp_keepalives_idle = 60  
tcp_keepalives_interval = 10  
tcp_keepalives_count = 10  
shared_buffers = 8GB  
maintenance_work_mem = 1GB  
dynamic_shared_memory_type = posix  
vacuum_cost_delay = 0  
bgwriter_delay = 10ms  
bgwriter_lru_maxpages = 500  
bgwriter_lru_multiplier = 5.0  
bgwriter_flush_after = 0  
effective_io_concurrency = 0  
max_worker_processes = 16  
backend_flush_after = 0  
wal_level = replica  
fsync = on  
synchronous_commit = remote_write  
full_page_writes = on  
wal_buffers = 128MB  
wal_writer_delay = 10ms  
wal_writer_flush_after = 0  
checkpoint_timeout = 30min  
max_wal_size = 16GB  
min_wal_size = 8GB  
checkpoint_completion_target = 0.5  
checkpoint_flush_after = 0  
max_wal_senders = 10  
wal_keep_segments = 1024  
synchronous_standby_names = `ANY 1 (*)`  
hot_standby = on  
max_standby_archive_delay = 300s  
max_standby_streaming_delay = 300s  
wal_receiver_status_interval = 1s  
hot_standby_feedback = off  
log_destination = `csvlog`  
logging_collector = on  
log_checkpoints = on  
log_connections = on  
log_disconnections = on  
log_error_verbosity = verbose      
log_timezone = `PRC`  
log_autovacuum_min_duration = 0  
autovacuum_vacuum_scale_factor = 0.1  
autovacuum_analyze_scale_factor = 0.1  
autovacuum_freeze_max_age = 1000000000  
autovacuum_multixact_freeze_max_age = 1200000000  
autovacuum_vacuum_cost_delay = 0  
autovacuum_vacuum_cost_limit = 0  
vacuum_freeze_table_age = 800000000  
vacuum_multixact_freeze_table_age = 800000000  
datestyle = `iso, mdy`  
timezone = `PRC`  
lc_messages = `C`  
lc_monetary = `C`  
lc_numeric = `C`  
lc_time = `C`  
default_text_search_config = `pg_catalog.english`  

配置pg_hba.conf

cd /disk1/digoal/pgdata2921  
  
vi pg_hba.conf  
  
# "local" is for Unix domain socket connections only  
local   all             all                                     trust  
# IPv4 local connections:  
host    all             all             127.0.0.1/32            trust  
# IPv6 local connections:  
host    all             all             ::1/128                 trust  
# Allow replication connections from localhost, by a user with the  
# replication privilege.  
local   replication     all                                     trust  
host    replication     all             127.0.0.1/32            trust  
host    replication     all             ::1/128                 trust  
host replication all 0.0.0.0/0 md5  

配置recovery.done

cd /disk1/digoal/pgdata2921  
  
vi recovery.done  
  
recovery_target_timeline = `latest`  
standby_mode = on  
primary_conninfo = `host=127.0.0.1 port=2921 user=rep password=pwd`  
# recovery_min_apply_delay = 0   #延遲多少分鐘應用,使用者可以配置延遲的備庫,例如給一點誤操作的緩衝時間。在備庫不會這麼早被應用。  

啟動主庫

pg_ctl start -D /disk1/digoal/pgdata2921  

建立流複製角色

psql -h 127.0.0.1 -p 2921  
  
psql (10beta1)  
Type "help" for help.  
  
postgres=# set synchronous_commit =off;  
SET  
postgres=# create role rep login replication encrypted password `pwd`;  
CREATE ROLE  

生成備庫1

mkdir /disk1/digoal/pgdata2922  
chown digoal /disk1/digoal/pgdata2922  
chmod 700 /disk1/digoal/pgdata2922  
  
export PGPASSWORD="pwd"  
pg_basebackup -D /disk1/digoal/pgdata2922 -F p -X stream -h 127.0.0.1 -p 2921 -U rep  

配置備庫1 postgresql.conf

cd /disk1/digoal/pgdata2922  
  
vi postgresql.conf  
  
port = 2922  

配置備庫1 recovery.conf

cd /disk1/digoal/pgdata2922  
  
mv recovery.done recovery.conf  

啟動備庫1

pg_ctl start -D /disk1/digoal/pgdata2922  

生成備庫2

mkdir /disk1/digoal/pgdata2923  
chown digoal /disk1/digoal/pgdata2923  
chmod 700 /disk1/digoal/pgdata2923  
  
export PGPASSWORD="pwd"  
pg_basebackup -D /disk1/digoal/pgdata2923 -F p -X stream -h 127.0.0.1 -p 2921 -U rep  

配置備庫2 postgresql.conf

cd /disk1/digoal/pgdata2923  
  
vi postgresql.conf  
  
port = 2923  

配置備庫2 recovery.conf

cd /disk1/digoal/pgdata2923  
  
mv recovery.done recovery.conf  

啟動備庫2

pg_ctl start -D /disk1/digoal/pgdata2923  

流複製節點的狀態監控

主庫查詢

psql -h 127.0.0.1 -p 2921  
psql (10beta1)  
Type "help" for help.  
  
postgres=# x  
Expanded display is on.  
  
postgres=# select pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn)) as sent_delay,   
  pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), write_lsn)) as write_delay,   
  pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), flush_lsn)) as flush_delay,   
  pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn)) as replay_delay,   
  *  
from pg_stat_replication;  
  
-[ RECORD 1 ]----+------------------------------  
sent_delay       | 0 bytes  
write_delay      | 0 bytes  
flush_delay      | 0 bytes  
replay_delay     | 0 bytes  
pid              | 11962  
usesysid         | 16384  
usename          | rep  
application_name | walreceiver  
client_addr      | 127.0.0.1  
client_hostname  |   
client_port      | 63083  
backend_start    | 2017-07-11 17:15:31.231492+08  
backend_xmin     |   
state            | streaming  
sent_lsn         | 1/C0000060  
write_lsn        | 1/C0000060  
flush_lsn        | 1/C0000060  
replay_lsn       | 1/C0000060  
write_lag        |   
flush_lag        |   
replay_lag       |   
sync_priority    | 1  
sync_state       | quorum  
-[ RECORD 2 ]----+------------------------------  
sent_delay       | 0 bytes  
write_delay      | 0 bytes  
flush_delay      | 0 bytes  
replay_delay     | 0 bytes  
pid              | 11350  
usesysid         | 16384  
usename          | rep  
application_name | walreceiver  
client_addr      | 127.0.0.1  
client_hostname  |   
client_port      | 63077  
backend_start    | 2017-07-11 17:15:13.818043+08  
backend_xmin     |   
state            | streaming  
sent_lsn         | 1/C0000060  
write_lsn        | 1/C0000060  
flush_lsn        | 1/C0000060  
replay_lsn       | 1/C0000060  
write_lag        |   
flush_lag        |   
replay_lag       |   
sync_priority    | 1  
sync_state       | quorum  

備庫查詢

psql -h 127.0.0.1 -p 2922  
  
-- 檢視當前WAL應用是否暫停了  
postgres=# select pg_is_wal_replay_paused();  
 pg_is_wal_replay_paused   
-------------------------  
 f  
(1 row)  
  
-- 檢視WAL接收到的位點  
postgres=# select pg_last_wal_receive_lsn();  
 pg_last_wal_receive_lsn   
-------------------------  
 1/C0000060  
(1 row)  
  
-- 檢視WAL的應用位點  
postgres=# select pg_last_wal_replay_lsn();  
 pg_last_wal_replay_lsn   
------------------------  
 1/C0000060  
(1 row)  
  
-- 檢視wal receiver的統計資訊  
postgres=# x  
Expanded display is on.  
postgres=# select * from pg_stat_get_wal_receiver();  
-[ RECORD 1 ]---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------  
pid                   | 11349  
status                | streaming  
receive_start_lsn     | 1/C0000000  
receive_start_tli     | 1  
received_lsn          | 1/C0000060  
received_tli          | 1  
last_msg_send_time    | 2017-07-11 17:23:14.372327+08  
last_msg_receipt_time | 2017-07-11 17:23:14.372361+08  
latest_end_lsn        | 1/C0000060  
latest_end_time       | 2017-07-11 17:15:13.819553+08  
slot_name             |   
conninfo              | user=rep password=******** dbname=replication host=127.0.0.1 port=2921 fallback_application_name=walreceiver sslmode=disable sslcompression=1 target_session_attrs=any  
  
  
postgres=# select pg_wal_replay_pause();  
-[ RECORD 1 ]-------+-  
pg_wal_replay_pause |   
  
-- 暫停WAL的應用,例如要做一些排錯時  
postgres=# select pg_is_wal_replay_paused();  
-[ RECORD 1 ]-----------+--  
pg_is_wal_replay_paused | t  
  
postgres=# select pg_wal_replay_resume();  
-[ RECORD 1 ]--------+-  
pg_wal_replay_resume |   
  
-- 繼續應用WAL  
postgres=# select pg_is_wal_replay_paused();  
-[ RECORD 1 ]-----------+--  
pg_is_wal_replay_paused | f  

注意事項

1、如果要防止主庫刪除備庫還沒有接收的WAL檔案,有兩種方法。

使用slot,或者配置足夠大的wal keep。

但是這兩種方法都有一定的風險或問題,例如當備庫掛了,或者備庫不再使用了,而使用者忘記刪除對應的SLOT時。可能導致主庫WAL無限膨脹。

而wal keep則會導致主庫的WAL預留足夠的個數,佔用一定空間。

相關引數

主 postgresql.conf  
# max_replication_slots = 10  
# wal_keep_segments = 1024  
  
備 recovery.conf  
# primary_slot_name = ``  

2、如果不想通過以上方法預防備庫需要的WAL已被刪除,那麼可以配置主庫的歸檔,同時備庫需要能獲取到已歸檔的WAL檔案。

相關引數

主 postgresql.conf  
#archive_mode = off             # enables archiving; off, on, or always  
                                # (change requires restart)  
#archive_command = ``           # command to use to archive a logfile segment  
                                # placeholders: %p = path of file to archive  
                                #               %f = file name only  
                                # e.g. `test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f`  
  
備 recovery.conf  
# restore_command = ``           # e.g. `cp /mnt/server/archivedir/%f %p`  

3、保護好recovery.conf檔案中的密碼,因為配置的是明文。

4、主備之間的頻寬請足夠大,否則可能導致主備延遲。

壓測

連線主庫進行TPC-B的壓測

pgbench -i -s 100 -h 127.0.0.1 -p 2921 -U postgres  
  
pgbench -n -r -P 1 -h 127.0.0.1 -p 2921 -U postgres -c 32 -j 32 -T 120  

transaction type: <builtin: TPC-B (sort of)>
scaling factor: 100
query mode: simple
number of clients: 32
number of threads: 32
duration: 120 s
number of transactions actually processed: 1326066
latency average = 2.896 ms
latency stddev = 2.030 ms
tps = 11050.199659 (including connections establishing)
tps = 11051.140876 (excluding connections establishing)
script statistics:
 - statement latencies in milliseconds:
         0.002  set aid random(1, 100000 * :scale)
         0.001  set bid random(1, 1 * :scale)
         0.001  set tid random(1, 10 * :scale)
         0.000  set delta random(-5000, 5000)
         0.043  BEGIN;
         0.154  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
         0.112  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
         0.159  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
         0.423  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
         0.092  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
         1.910  END;

觀察主備的延遲

psql -h 127.0.0.1 -p 2921  
psql (10beta1)  
Type "help" for help.  
  
postgres=# x  
Expanded display is on.  
postgres=# select pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn)) as sent_delay,   
postgres-#   pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), write_lsn)) as write_delay,   
postgres-#   pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), flush_lsn)) as flush_delay,   
postgres-#   pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn)) as replay_delay,   
postgres-#   *  
postgres-# from pg_stat_replication;  
-[ RECORD 1 ]----+------------------------------  
sent_delay       | 4024 bytes  
write_delay      | 4024 bytes  
flush_delay      | 9080 bytes  
replay_delay     | 13 kB  
pid              | 11962  
usesysid         | 16384  
usename          | rep  
application_name | walreceiver  
client_addr      | 127.0.0.1  
client_hostname  |   
client_port      | 63083  
backend_start    | 2017-07-11 17:15:31.231492+08  
backend_xmin     |   
state            | streaming  
sent_lsn         | 2/1C61E398  
write_lsn        | 2/1C61E398  
flush_lsn        | 2/1C61CFD8  
replay_lsn       | 2/1C61BEF8  
write_lag        | 00:00:00.000129  
flush_lag        | 00:00:00.001106  
replay_lag       | 00:00:00.001626  
sync_priority    | 1  
sync_state       | quorum  
-[ RECORD 2 ]----+------------------------------  
sent_delay       | 0 bytes  
write_delay      | 4024 bytes  
flush_delay      | 9080 bytes  
replay_delay     | 12 kB  
pid              | 11350  
usesysid         | 16384  
usename          | rep  
application_name | walreceiver  
client_addr      | 127.0.0.1  
client_hostname  |   
client_port      | 63077  
backend_start    | 2017-07-11 17:15:13.818043+08  
backend_xmin     |   
state            | streaming  
sent_lsn         | 2/1C61F350  
write_lsn        | 2/1C61E398  
flush_lsn        | 2/1C61CFD8  
replay_lsn       | 2/1C61C388  
write_lag        | 00:00:00.000542  
flush_lag        | 00:00:00.001582  
replay_lag       | 00:00:00.001952  
sync_priority    | 1  
sync_state       | quorum  
  
postgres=# watch 1  

二、流式 邏輯訂閱、邏輯主從 最佳實踐

《PostgreSQL 邏輯訂閱 – 給業務架構帶來了什麼希望?》

《PostgreSQL 10.0 preview 邏輯複製 – 原理與最佳實踐》

《使用PostgreSQL邏輯訂閱實現multi-master》


相關文章