一、簡介
pg常用高可用方案有repmgr,patroni等,本文介紹patroni方案。Patroni,是專門為PostgreSQL資料庫設計的一款以Python語言實現的高可用軟體。其使用外部共享儲存軟體(kubernetes、etcd、etcd3、zookeeper、aws等)來儲存patroni監控到的pg叢集狀態資訊,實現 PostgreSQL 叢集的自動故障恢復、自動故障轉移等能力。
etcd是一個分散式鍵值對儲存,設計用來可靠而快速的儲存關鍵資料並提供訪問,Etcd按照Raft演算法和協議開發的,是一個強一致性的、分散式的key-value資料庫。它為分散式系統提供了可靠的資料儲存訪問機制,patroni配合etcd一起使用來保障pg高可用。
本文透過patroni+etcd方式實現postgresql的高可用方案,以下是個軟體版本。
python | patroni | etcd |
---|---|---|
3.7.4 | 2.0.2 | 3.3.11 |
二、部署etcd叢集
1、yum方式部署etcd叢集
#安裝依賴包
yum install -y gcc python-devel epel-release libyaml
#安裝etcd
yum install -y etcd libyaml
安裝完成後etcd目錄:
[root@xl001 ~]# ll /usr/bin/etcd*
-rwxr-xr-x 1 root root 25581944 Feb 14 2019 /usr/bin/etcd
-rwxr-xr-x 1 root root 21223848 Feb 14 2019 /usr/bin/etcdctl
2、二進位制包安裝
wget https://github.com/etcd-io/etcd/releases/download/v3.2.32/etcd-v3.2.32-linux-amd64.tar.gz
解壓即可
tar -zxvf etcd-v3.2.32-linux-amd64.tar.gz
然後將 etcd etcdctl 複製到 /usr/local/bin 下
etcd --version #etcd 為服務端執行檔案
etcdctl --version #etcdctl 為客戶端執行檔案
以上兩種安裝方式任選一種即可,pg叢集三個節點上都要安裝。
3、建立目錄並授權
mkdir -p /opt/etcd
chown -R postgres:postgres /opt/etcd
4、編輯etcd配置檔案
chown -R postgres:postgres /etc/etcd/etcd.conf
vim /etc/etcd/etcd.conf
ETCD_NAME="etcd-1" #節點名稱,隨便起個名字,三個節點不同即可,我這裡三個節點叫做etcd-1,etcd-2,etcd-3
ETCD_DATA_DIR="/opt/etcd" #資料目錄
#監聽客戶端請求的地址列表url,可以監聽多個,多個用逗號分割,或者寫成0.0.0.0,監聽所有請求地址。
ETCD_LISTEN_CLIENT_URLS="http://0.0.0.0:2379"
#建議使用的客戶端通訊 url,該值用於 etcd 代理或 etcd 成員與 etcd 節點通訊,192.168.167.11是本節點ip。
ETCD_ADVERTISE_CLIENT_URLS="http://192.168.167.11:2379"
監聽的用於節點之間通訊的url,可監聽多個,叢集內部將透過這些url進行資料互動(如選舉,資料同步等)
ETCD_LISTEN_PEER_URLS="http://0.0.0.0:2380"
#建議用於節點之間通訊的url,節點間將以該值進行通訊,192.168.167.11是本節點ip。
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.167.11:2380"
#也就是叢集中所有的 initial-advertise-peer-urls 的合集。etcd啟動的時候,透過這個配置找到其他etcd節點的列表。
ETCD_INITIAL_CLUSTER="etcd-1=http://192.168.167.11:2380,etcd-2=http://192.168.167.12:2380,etcd-3=http://192.168.167.13:2380"
#節點的 token 值,設定該值後叢集將生成唯一 id,併為每個節點也生成唯一 id,當使用相同配置檔案再啟動一個叢集時,只要該 token 值不一樣,etcd 叢集就不會相互影響。
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster-token-1"
#初始化的時候,叢集的狀態:new 和 existing 兩種狀態。new代表新建的叢集,existing 代表加入已經存在的叢集
ETCD_INITIAL_CLUSTER_STATE="new"
# 這裡只寫了一個節點的配置檔案作為示例,pg三個節點都要配置
5、啟動etcd,在postgres賬號下啟動
啟動之前檢查一下時間是否同步,當各節點時間差大於1s時,etcd叢集會報錯,所以要配置時間同步伺服器
這裡的原則是誰先啟動,誰就是etcd資料庫主節點
su - postgres
etcd --config-file=/etc/etcd/etcd.conf
注意,有的版本配置檔案只能yaml格式(不知道為什麼),如果是這種情況,直接在啟動命令的命令列上指定引數,例如下面這樣:
su - postgres
nohup etcd --name etcd-1 --initial-advertise-peer-urls http://192.168.167.11:2380 --listen-peer-urls http://0.0.0.0:2380 -data-dir /opt/etcd --listen-client-urls http://0.0.0.0:2379 --advertise-client-urls http://192.168.167.11:2379 --initial-cluster-state new --initial-cluster-token etcd-cluster-token-1 --initial-cluster etcd-1=http://192.168.167.11:2380,etcd-2=http://192.168.167.12:2380,etcd-3=http://192.168.167.13:2380 &
nohup etcd --name etcd-2 --initial-advertise-peer-urls http://192.168.167.12:2380 --listen-peer-urls http://0.0.0.0:2380 -data-dir /opt/etcd --listen-client-urls http://0.0.0.0:2379 --advertise-client-urls http://192.168.167.12:2379 --initial-cluster-state new --initial-cluster-token etcd-cluster-token-1 --initial-cluster etcd-1=http://192.168.167.11:2380,etcd-2=http://192.168.167.12:2380,etcd-3=http://192.168.167.13:2380 &
nohup etcd --name etcd-3 --initial-advertise-peer-urls http://192.168.167.13:2380 --listen-peer-urls http://0.0.0.0:2380 -data-dir /opt/etcd --listen-client-urls http://0.0.0.0:2379 --advertise-client-urls http://192.168.167.13:2379 --initial-cluster-state new --initial-cluster-token etcd-cluster-token-1 --initial-cluster etcd-1=http://192.168.167.11:2380,etcd-2=http://192.168.167.12:2380,etcd-3=http://192.168.167.13:2380 &
6、etcd常用運維命令
#檢視叢集狀態
etcdctl cluster-health
eg:
[root@xl001 etcd]# etcdctl cluster-health
member 802aec41d9fb61a6 is healthy: got healthy result from http://192.168.167.11:2379
member b79cb086020dff09 is healthy: got healthy result from http://192.168.167.12:2379
member fec7c7f897b1265c is healthy: got healthy result from http://192.168.167.13:2379
cluster is healthy
#檢視叢集成員
etcdctl member list
eg:
[root@xl001 etcd]# etcdctl member list
802aec41d9fb61a6: name=etcd-1 peerURLs=http://192.168.167.11:2380 clientURLs=http://192.168.167.11:2379 isLeader=true
b79cb086020dff09: name=etcd-2 peerURLs=http://192.168.167.12:2380 clientURLs=http://192.168.167.12:2379 isLeader=false
fec7c7f897b1265c: name=etcd-3 peerURLs=http://192.168.167.13:2380 clientURLs=http://192.168.167.13:2379 isLeader=false
# 停止etcd 服務
ps -ef |grep etcd
kill -9
etcdctl mkdir v #建立目錄v
etcdctl mkdir v/sub_v #在目錄v下建立目錄sub_v
etcdctl rmdir v/sub_v #刪除目錄
etcdctl mk v/sub_v/key1 value1 #在目錄v/sub_v下建立鍵key1以及對應的值value1
etcdctl ls -r #檢視所有的目錄及子目錄以及所有目錄和子目錄下的鍵值對
etcdctl ls v/sub_v/ #檢視 該目錄下的所有鍵值對
etcdctl get v/sub_v/key1 #v/sub_v 目錄下 key1鍵的值
etcdctl delete v/sub_v/key1 #刪除鍵值對
7、如果中途出現了什麼問題,想重新初始化etcd叢集,可以這樣做
三個節點都做
1、停止etcd 服務
ps -ef |grep etcd
kill -9
2、刪除etcd資料目錄
rm -rf /opt/etcd/*
3、啟動節點重新初始化叢集
etcd --config-file=/etc/etcd/etcd.conf
三、安裝patroni
3.1、安裝python
patroni是用python寫的一個元件,需要先安裝python,然後再用python安裝patroni這個包,三個節點都要安裝pythonpython安裝見我另一篇部落格:
https://www.cnblogs.com/sunjiwei/articles/18066608
3.2、下面開始安裝patroni
python3 -m pip install --upgrade pip -i https://mirrors.aliyun.com/pypi/simple
python3 -m pip install --upgrade setuptools -i https://mirrors.aliyun.com/pypi/simple
python3 -m pip install psycopg2_binary==2.8.6 -i https://mirrors.aliyun.com/pypi/simple
python3 -m pip install patroni[etcd]==2.0.2 -i https://mirrors.aliyun.com/pypi/simple
# 檢視patroni安裝版本
patroni --version #檢視patroni 是否安裝成功
如果報錯:ImportError: urllib3 v2.0 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'OpenSSL 1.0.2k-fips 26 Jan 2017'. See: https://github.com/urllib3/urllib3/issues/2168
則執行 pip install urllib3==1.26.15 -i https://mirrors.aliyun.com/pypi/simple
# 檢視patronictl是否可用
patronictl --help
如果報錯:Traceback (most recent call last):
File "./patronictl", line 5, in <module>
from patroni.ctl import ctl
File "/usr/local/python3.7.4/lib/python3.7/site-packages/patroni/ctl.py", line 33, in <module>
from cdiff import markup_to_pager, PatchStream
ModuleNotFoundError: No module named 'cdiff'
則執行 pip install cdiff -i https://mirrors.aliyun.com/pypi/simple
3.3、編輯patroni配置檔案
# patroni需要個rewind的賬號,這個使用者主要是用來主備切換時,timeline不一致的情況會用到
create user rewind_user encrypted password '111111';
這個賬號要給相應的許可權,或者直接使用postgres這個賬號
mkdir /opt/patroni
chown -R postgres:postgres /opt/patroni
vim /etc/patroni.yml
scope: pg_cluster1 # 給叢集起個名字
namespace: /service/ # 這個是etcd裡會建立的目錄,不配置的話預設就儲存在/service,就用這個就行,
name: xl001 # 節點名稱,可以用主機名
restapi:
listen: 0.0.0.0:8008
connect_address: 192.168.167.11:8008 # 本節點ip:8008
etcd:
#Provide host to do the initial discovery of the cluster topology:
hosts: 192.168.167.11:2379,192.168.167.12:2379,192.168.167.13:2379
bootstrap:
# this section will be written into Etcd:/<namespace>/<scope>/config after initializing new cluster
# and all other cluster members will use it as a `global configuration`
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 60000
maximum_lag_on_failover: 1048576
master_start_timeout: 300
synchronous_mode: true
postgresql:
use_pg_rewind: true
use_slots: false
parameters:
max_connections: 8000
max_worker_processes: 16
synchronous_commit: "on"
wal_level: "logical"
hot_standby: "on"
wal_keep_segments: 5120
max_wal_senders: 10
max_replication_slots: 10
wal_log_hints: "on"
archive_mode: "on"
archive_timeout: 1800s
# 初始化資料庫用,也可以手動初始化好資料庫,不用patroni來初始化
# initdb:
# - encoding: UTF8
# - locale: C
# - lc-ctype: zh_CN.UTF-8
# - data-checksums
postgresql:
listen: 0.0.0.0:5432
connect_address: 192.168.167.11:5432
data_dir: /opt/pgdata/data
bin_dir: /usr/local/pgsql-12.6/bin
config_dir: /opt/pgdata/data
authentication:
replication:
username: repuser
password: 111111
superuser:
username: postgres
password: 111111
#rewind:
#username: rewind
#password: 111111
callbacks:
on_start: /bin/bash /opt/patroni/pg_cluster1.vip
on_restart: /bin/bash /opt/patroni/pg_cluster1.vip
on_role_change: /bin/bash /opt/patroni/pg_cluster.vip
#watchdog:
# mode: automatic # Allowed values: off, automatic, required
# device: /dev/watchdog
# safety_margin: 5
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
# 修改所屬使用者
chown postgres:postgres /etc/patroni.yml
三個節點都要配置,注意三個節點不同的引數是:
name: xl001
restapi裡的 connect_address: 192.168.167.11:8008
postgresql 裡的 connect_address: 192.168.167.11:5432
# 啟動patroni
#要在 postgres 賬號下 啟動
su - postgres
patroni /etc/patroni.yml > /opt/patroni/patroni.log 2>&1 &
#檢視pg叢集狀態
patronictl -c /etc/patroni.yml list pg_cluster1
# 檢視某個節點狀態等資訊
yum install jq -y # jq,格式化輸出json格式
curl -s http://192.168.167.161:8008 | jq
{
"state": "running",
"postmaster_start_time": "2023-09-27 10:44:33.542765+08:00",
"role": "master",
"server_version": 120015,
"xlog": {
"location": 167772160
},
"timeline": 1,
"replication": [
{
"usename": "replica",
"application_name": "ob11",
"client_addr": "192.168.167.162",
"state": "streaming",
"sync_state": "async",
"sync_priority": 0
},
{
"usename": "replica",
"application_name": "ob10",
"client_addr": "192.168.167.161",
"state": "streaming",
"sync_state": "sync",
"sync_priority": 1
}
],
"dcs_last_seen": 1695797837,
"database_system_identifier": "7283070492616413950",
"patroni": {
"version": "3.0.4",
"scope": "pg_cluster1"
}
}
附上vip切換指令碼,callbacks 那裡要用到
vim /opt/patroni/cluster1.vip
#!/bin/bash
VIP=192.168.167.10
GATEWAY=192.168.167.254
DEV=bond0 # 網路卡名稱
action=$1
role=$2
cluster=$3
log()
{
echo "recon: $*"|logger
}
load_vip()
{
ip a|grep -w ${DEV}|grep -w ${VIP} >/dev/null
if [ $? -eq 0 ] ;then
log "vip exists, skip load vip"
else
sudo ip addr add ${VIP}/24 dev ${DEV} label ${DEV}:2 >/dev/null
rc=$?
if [ $rc -ne 0 ] ;then
log "fail to add vip ${VIP} at dev ${DEV} rc=$rc"
exit 1
fi
log "added vip ${VIP} at dev ${DEV}"
arping -U -I ${DEV} -s ${VIP} ${GATEWAY} -c 5 >/dev/null
rc=$?
if [ $rc -ne 0 ] ;then
log "fail to call arping to gateway ${GATEWAY} rc=$rc"
exit 1
fi
log "called arping to gateway ${GATEWAY}"
fi
}
unload_vip()
{
ip a|grep -w ${DEV}|grep -w ${VIP} >/dev/null
if [ $? -eq 0 ] ;then
sudo ip addr del ${VIP}/24 dev ${DEV} label ${DEV}:2 >/dev/null
rc=$?
if [ $rc -ne 0 ] ;then
log "fail to delete vip ${VIP} at dev ${DEV} rc=$rc"
exit 1
fi
log "deleted vip ${VIP} at dev ${DEV}"
else
log "vip not exists, skip delete vip"
fi
}
log "recon start args:'$*'"
case $action in
on_start|on_restart|on_role_change)
case $role in
master)
load_vip
;;
replica)
unload_vip
;;
*)
log "wrong role '$role'"
exit 1
;;
esac
;;
*)
log "wrong action '$action'"
exit 1
;;
esac
注意:patroni 和 psycopg有可能出現版本不相容的情況,導致patroni部署不成功,我自己測試是patroni2.0.2和psycopg2.9.9不相容。
我測試可以的是 patroni2.0.2 和 psycopg2.8.6,patroni3.0.4 和 psycopg2.9.6這兩個組合,python用的都是3.7.4版本。
四、一些說明
1、patroni 對資料庫的侵入比較大,它會修改pg原有的配置檔案(pg_hba.conf,postgresql.conf),把它們變成base.conf,然後根據patroni配置檔案中的資料庫引數自己生成一份新的配置檔案,和老的配置檔案一起組成資料庫配置,新的配置檔案優先順序比老配置檔案高。
2、patroni可以自己初始化資料庫和部署流複製叢集(不建議這樣做),也可以自己部署好資料庫和流複製叢集,然後在此基礎上部署patroni,這樣的話patroni會重啟備庫(不會重啟主庫)