posgresql資料庫高可用方案-patroni

有形无形發表於2024-11-07
一、簡介
pg常用高可用方案有repmgr,patroni等,本文介紹patroni方案。

Patroni,是專門為PostgreSQL資料庫設計的一款以Python語言實現的高可用軟體。其使用外部共享儲存軟體(kubernetes、etcd、etcd3、zookeeper、aws等)來儲存patroni監控到的pg叢集狀態資訊,實現 PostgreSQL 叢集的自動故障恢復、自動故障轉移等能力。

etcd是一個分散式鍵值對儲存,設計用來可靠而快速的儲存關鍵資料並提供訪問,Etcd按照Raft演算法和協議開發的,是一個強一致性的、分散式的key-value資料庫。它為分散式系統提供了可靠的資料儲存訪問機制,patroni配合etcd一起使用來保障pg高可用。

本文透過patroni+etcd方式實現postgresql的高可用方案,以下是個軟體版本。

python patroni etcd
3.7.4 2.0.2 3.3.11
二、部署etcd叢集
1、yum方式部署etcd叢集
#安裝依賴包
yum install -y gcc python-devel epel-release libyaml
#安裝etcd
yum install -y etcd libyaml

安裝完成後etcd目錄:
[root@xl001 ~]# ll /usr/bin/etcd*
-rwxr-xr-x 1 root root 25581944 Feb 14  2019 /usr/bin/etcd
-rwxr-xr-x 1 root root 21223848 Feb 14  2019 /usr/bin/etcdctl


2、二進位制包安裝
wget https://github.com/etcd-io/etcd/releases/download/v3.2.32/etcd-v3.2.32-linux-amd64.tar.gz

解壓即可
tar -zxvf etcd-v3.2.32-linux-amd64.tar.gz
然後將 etcd  etcdctl  複製到 /usr/local/bin  下

etcd --version                    #etcd 為服務端執行檔案
etcdctl --version                 #etcdctl 為客戶端執行檔案

以上兩種安裝方式任選一種即可,pg叢集三個節點上都要安裝。

3、建立目錄並授權
mkdir -p /opt/etcd
chown -R postgres:postgres /opt/etcd


4、編輯etcd配置檔案
chown -R postgres:postgres /etc/etcd/etcd.conf
vim /etc/etcd/etcd.conf

ETCD_NAME="etcd-1"       #節點名稱,隨便起個名字,三個節點不同即可,我這裡三個節點叫做etcd-1,etcd-2,etcd-3
ETCD_DATA_DIR="/opt/etcd"  #資料目錄

#監聽客戶端請求的地址列表url,可以監聽多個,多個用逗號分割,或者寫成0.0.0.0,監聽所有請求地址。
ETCD_LISTEN_CLIENT_URLS="http://0.0.0.0:2379"  
 
#建議使用的客戶端通訊 url,該值用於 etcd 代理或 etcd 成員與 etcd 節點通訊,192.168.167.11是本節點ip。   
ETCD_ADVERTISE_CLIENT_URLS="http://192.168.167.11:2379"

監聽的用於節點之間通訊的url,可監聽多個,叢集內部將透過這些url進行資料互動(如選舉,資料同步等)
ETCD_LISTEN_PEER_URLS="http://0.0.0.0:2380"

#建議用於節點之間通訊的url,節點間將以該值進行通訊,192.168.167.11是本節點ip。
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.167.11:2380"

#也就是叢集中所有的 initial-advertise-peer-urls 的合集。etcd啟動的時候,透過這個配置找到其他etcd節點的列表。
ETCD_INITIAL_CLUSTER="etcd-1=http://192.168.167.11:2380,etcd-2=http://192.168.167.12:2380,etcd-3=http://192.168.167.13:2380"

#節點的 token 值,設定該值後叢集將生成唯一 id,併為每個節點也生成唯一 id,當使用相同配置檔案再啟動一個叢集時,只要該 token 值不一樣,etcd 叢集就不會相互影響。
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster-token-1"

#初始化的時候,叢集的狀態:new 和 existing 兩種狀態。new代表新建的叢集,existing 代表加入已經存在的叢集
ETCD_INITIAL_CLUSTER_STATE="new"

# 這裡只寫了一個節點的配置檔案作為示例,pg三個節點都要配置

5、啟動etcd,在postgres賬號下啟動
啟動之前檢查一下時間是否同步,當各節點時間差大於1s時,etcd叢集會報錯,所以要配置時間同步伺服器
這裡的原則是誰先啟動,誰就是etcd資料庫主節點
su - postgres
etcd --config-file=/etc/etcd/etcd.conf 

注意,有的版本配置檔案只能yaml格式(不知道為什麼),如果是這種情況,直接在啟動命令的命令列上指定引數,例如下面這樣:
su - postgres
nohup etcd --name etcd-1 --initial-advertise-peer-urls http://192.168.167.11:2380 --listen-peer-urls http://0.0.0.0:2380 -data-dir /opt/etcd --listen-client-urls http://0.0.0.0:2379 --advertise-client-urls http://192.168.167.11:2379 --initial-cluster-state new --initial-cluster-token etcd-cluster-token-1 --initial-cluster etcd-1=http://192.168.167.11:2380,etcd-2=http://192.168.167.12:2380,etcd-3=http://192.168.167.13:2380 &
nohup etcd --name etcd-2 --initial-advertise-peer-urls http://192.168.167.12:2380 --listen-peer-urls http://0.0.0.0:2380 -data-dir /opt/etcd --listen-client-urls http://0.0.0.0:2379 --advertise-client-urls http://192.168.167.12:2379 --initial-cluster-state new --initial-cluster-token etcd-cluster-token-1 --initial-cluster etcd-1=http://192.168.167.11:2380,etcd-2=http://192.168.167.12:2380,etcd-3=http://192.168.167.13:2380 &
nohup etcd --name etcd-3 --initial-advertise-peer-urls http://192.168.167.13:2380 --listen-peer-urls http://0.0.0.0:2380 -data-dir /opt/etcd --listen-client-urls http://0.0.0.0:2379 --advertise-client-urls http://192.168.167.13:2379 --initial-cluster-state new --initial-cluster-token etcd-cluster-token-1 --initial-cluster etcd-1=http://192.168.167.11:2380,etcd-2=http://192.168.167.12:2380,etcd-3=http://192.168.167.13:2380 &

6、etcd常用運維命令
#檢視叢集狀態
etcdctl cluster-health

eg:
[root@xl001 etcd]# etcdctl cluster-health
member 802aec41d9fb61a6 is healthy: got healthy result from http://192.168.167.11:2379
member b79cb086020dff09 is healthy: got healthy result from http://192.168.167.12:2379
member fec7c7f897b1265c is healthy: got healthy result from http://192.168.167.13:2379
cluster is healthy


#檢視叢集成員
etcdctl member list

eg:
[root@xl001 etcd]# etcdctl member list
802aec41d9fb61a6: name=etcd-1 peerURLs=http://192.168.167.11:2380 clientURLs=http://192.168.167.11:2379 isLeader=true
b79cb086020dff09: name=etcd-2 peerURLs=http://192.168.167.12:2380 clientURLs=http://192.168.167.12:2379 isLeader=false
fec7c7f897b1265c: name=etcd-3 peerURLs=http://192.168.167.13:2380 clientURLs=http://192.168.167.13:2379 isLeader=false


# 停止etcd 服務
ps -ef |grep etcd
kill -9 

etcdctl mkdir v   #建立目錄v
etcdctl mkdir v/sub_v  #在目錄v下建立目錄sub_v
etcdctl rmdir v/sub_v    #刪除目錄
etcdctl mk v/sub_v/key1 value1  #在目錄v/sub_v下建立鍵key1以及對應的值value1
etcdctl ls -r  #檢視所有的目錄及子目錄以及所有目錄和子目錄下的鍵值對
etcdctl ls v/sub_v/  #檢視 該目錄下的所有鍵值對
etcdctl get v/sub_v/key1   #v/sub_v 目錄下 key1鍵的值
etcdctl delete v/sub_v/key1  #刪除鍵值對


7、如果中途出現了什麼問題,想重新初始化etcd叢集,可以這樣做 
三個節點都做
1、停止etcd 服務
ps -ef |grep etcd
kill -9

2、刪除etcd資料目錄
rm -rf /opt/etcd/*

3、啟動節點重新初始化叢集
etcd --config-file=/etc/etcd/etcd.conf

三、安裝patroni
3.1、安裝python
patroni是用python寫的一個元件,需要先安裝python,然後再用python安裝patroni這個包,三個節點都要安裝python

python安裝見我另一篇部落格:
https://www.cnblogs.com/sunjiwei/articles/18066608

3.2、下面開始安裝patroni
python3 -m pip install --upgrade pip -i https://mirrors.aliyun.com/pypi/simple
python3 -m pip install --upgrade setuptools -i https://mirrors.aliyun.com/pypi/simple
python3 -m pip install psycopg2_binary==2.8.6 -i https://mirrors.aliyun.com/pypi/simple
python3 -m pip install patroni[etcd]==2.0.2 -i https://mirrors.aliyun.com/pypi/simple

# 檢視patroni安裝版本
patroni --version   #檢視patroni 是否安裝成功
如果報錯:ImportError: urllib3 v2.0 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'OpenSSL 1.0.2k-fips  26 Jan 2017'. See: https://github.com/urllib3/urllib3/issues/2168
則執行   pip install urllib3==1.26.15 -i https://mirrors.aliyun.com/pypi/simple


# 檢視patronictl是否可用
patronictl --help

如果報錯:Traceback (most recent call last):
  File "./patronictl", line 5, in <module>
    from patroni.ctl import ctl
  File "/usr/local/python3.7.4/lib/python3.7/site-packages/patroni/ctl.py", line 33, in <module>
    from cdiff import markup_to_pager, PatchStream
ModuleNotFoundError: No module named 'cdiff'

則執行 pip install cdiff -i https://mirrors.aliyun.com/pypi/simple

3.3、編輯patroni配置檔案
# patroni需要個rewind的賬號,這個使用者主要是用來主備切換時,timeline不一致的情況會用到
create user rewind_user encrypted password '111111';
這個賬號要給相應的許可權,或者直接使用postgres這個賬號

mkdir /opt/patroni
chown -R postgres:postgres /opt/patroni

vim /etc/patroni.yml
scope: pg_cluster1             # 給叢集起個名字
namespace: /service/           # 這個是etcd裡會建立的目錄,不配置的話預設就儲存在/service,就用這個就行,
name: xl001                    # 節點名稱,可以用主機名

restapi:
  listen: 0.0.0.0:8008
  connect_address: 192.168.167.11:8008   # 本節點ip:8008

etcd:
  #Provide host to do the initial discovery of the cluster topology:
  hosts: 192.168.167.11:2379,192.168.167.12:2379,192.168.167.13:2379

bootstrap:
  # this section will be written into Etcd:/<namespace>/<scope>/config after initializing new cluster
  # and all other cluster members will use it as a `global configuration`
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 60000
    maximum_lag_on_failover: 1048576
    master_start_timeout: 300
    synchronous_mode: true
    postgresql:
      use_pg_rewind: true
      use_slots: false
      parameters:
        max_connections: 8000
        max_worker_processes: 16
        synchronous_commit: "on"
        wal_level: "logical"
        hot_standby: "on"
        wal_keep_segments: 5120
        max_wal_senders: 10
        max_replication_slots: 10
        wal_log_hints: "on"
        archive_mode: "on"
        archive_timeout: 1800s
  # 初始化資料庫用,也可以手動初始化好資料庫,不用patroni來初始化
  # initdb:
  # - encoding: UTF8
  # - locale: C
  # - lc-ctype: zh_CN.UTF-8
  # - data-checksums

	
postgresql:   
  listen: 0.0.0.0:5432
  connect_address: 192.168.167.11:5432
  data_dir: /opt/pgdata/data
  bin_dir: /usr/local/pgsql-12.6/bin
  config_dir: /opt/pgdata/data
  authentication:
    replication:
      username: repuser
      password: 111111
    superuser:
      username: postgres
      password: 111111
    #rewind:
      #username: rewind
      #password: 111111
   callbacks:
     on_start: /bin/bash /opt/patroni/pg_cluster1.vip
     on_restart: /bin/bash /opt/patroni/pg_cluster1.vip
     on_role_change: /bin/bash /opt/patroni/pg_cluster.vip


#watchdog:
#    mode: automatic # Allowed values: off, automatic, required
#    device: /dev/watchdog
#    safety_margin: 5



tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false


# 修改所屬使用者
chown postgres:postgres /etc/patroni.yml

三個節點都要配置,注意三個節點不同的引數是:
name: xl001
restapi裡的 connect_address: 192.168.167.11:8008
postgresql 裡的 connect_address: 192.168.167.11:5432



# 啟動patroni
#要在 postgres 賬號下 啟動 
su - postgres
patroni /etc/patroni.yml > /opt/patroni/patroni.log 2>&1 &

#檢視pg叢集狀態
patronictl -c /etc/patroni.yml list pg_cluster1


# 檢視某個節點狀態等資訊
yum install jq -y   # jq,格式化輸出json格式
curl -s http://192.168.167.161:8008 | jq

{
  "state": "running",
  "postmaster_start_time": "2023-09-27 10:44:33.542765+08:00",
  "role": "master",
  "server_version": 120015,
  "xlog": {
    "location": 167772160
  },
  "timeline": 1,
  "replication": [
    {
      "usename": "replica",
      "application_name": "ob11",
      "client_addr": "192.168.167.162",
      "state": "streaming",
      "sync_state": "async",
      "sync_priority": 0
    },
    {
      "usename": "replica",
      "application_name": "ob10",
      "client_addr": "192.168.167.161",
      "state": "streaming",
      "sync_state": "sync",
      "sync_priority": 1
    }
  ],
  "dcs_last_seen": 1695797837,
  "database_system_identifier": "7283070492616413950",
  "patroni": {
    "version": "3.0.4",
    "scope": "pg_cluster1"
  }
}

附上vip切換指令碼,callbacks 那裡要用到

vim /opt/patroni/cluster1.vip
#!/bin/bash
VIP=192.168.167.10
GATEWAY=192.168.167.254
DEV=bond0   # 網路卡名稱
action=$1
role=$2
cluster=$3
log()
{
  echo "recon: $*"|logger
}

load_vip()
{
ip a|grep -w ${DEV}|grep -w ${VIP} >/dev/null
if [ $? -eq 0 ] ;then
  log "vip exists, skip load vip"
else
  sudo ip addr add ${VIP}/24 dev ${DEV} label ${DEV}:2 >/dev/null
  rc=$?
  if [ $rc -ne 0 ] ;then
    log "fail to add vip ${VIP} at dev ${DEV} rc=$rc"
    exit 1
  fi

  log "added vip ${VIP} at dev ${DEV}"

  arping -U -I ${DEV} -s ${VIP} ${GATEWAY} -c 5 >/dev/null
  rc=$?
  if [ $rc -ne 0 ] ;then
    log "fail to call arping to gateway ${GATEWAY} rc=$rc"
    exit 1
  fi
  
  log "called arping to gateway ${GATEWAY}"
fi
}

unload_vip()
{
ip a|grep -w ${DEV}|grep -w ${VIP} >/dev/null
if [ $? -eq 0 ] ;then
  sudo ip addr del ${VIP}/24 dev ${DEV} label ${DEV}:2 >/dev/null
  rc=$?
  if [ $rc -ne 0 ] ;then
    log "fail to delete vip ${VIP} at dev ${DEV} rc=$rc"
    exit 1
  fi

  log "deleted vip ${VIP} at dev ${DEV}"
else
  log "vip not exists, skip delete vip"
fi
}

log "recon start args:'$*'"

case $action in
  on_start|on_restart|on_role_change)
    case $role in
      master)
        load_vip
        ;;
      replica)
        unload_vip
        ;;
      *)
        log "wrong role '$role'"
        exit 1
        ;;
    esac
    ;;
  *)
    log "wrong action '$action'"
    exit 1
    ;;
esac

注意:patroni 和 psycopg有可能出現版本不相容的情況,導致patroni部署不成功,我自己測試是patroni2.0.2和psycopg2.9.9不相容。
我測試可以的是 patroni2.0.2 和 psycopg2.8.6,patroni3.0.4 和 psycopg2.9.6這兩個組合,python用的都是3.7.4版本。

四、一些說明
1、patroni 對資料庫的侵入比較大,它會修改pg原有的配置檔案(pg_hba.conf,postgresql.conf),把它們變成base.conf,然後根據patroni配置檔案中的資料庫引數自己生成一份新的配置檔案,和老的配置檔案一起組成資料庫配置,新的配置檔案優先順序比老配置檔案高。
2、patroni可以自己初始化資料庫和部署流複製叢集(不建議這樣做),也可以自己部署好資料庫和流複製叢集,然後在此基礎上部署patroni,這樣的話patroni會重啟備庫(不會重啟主庫)

相關文章