repmgr 叢集雙主問題處理
瀚高PG實驗室發表於2022-01-10
目錄
環境
系統平臺:
N/A
版本:
5.6.5
症狀
本文件旨在介紹出現叢集出現雙主問題時如何處理。
問題原因
repmgr 出現切換後,原主庫停止服務。手動啟動原主庫會導致資料庫叢集狀態異常,變為雙主。
解決方案
原主庫(192.168.80.221)查詢叢集狀態,發現備庫1(192.168.80.222)已經成為新的主庫,但是原主庫(192.168.80.221)仍然以主庫的模式啟動,這時候叢集就出現了雙主的問題。檢查後發現,備庫2(192.168.80.236)已經連線到了新主庫(192.168.80.222),此時可將原主庫(192.168.80.221)重新加入叢集作為備庫執行。
[highgo@localhost HighGo5.6.5-cluster]$ repmgr cluster show ID | Name | Role | Status | Upstream | Location | Priority | Replication lag | Last replayed LSN ----+----------------+---------+----------------------+----------------+----------+----------+-----------------+------------------- 1 | 192.168.80.221 | primary | * running | | default | 100 | n/a | none 2 | 192.168.80.222 | standby | ! running as primary | 192.168.80.221 | default | 100 | 96 MB | 2/D2013F28 3 | 192.168.80.236 | standby | running | 192.168.80.221 | default | 100 | 96 MB | 2/D2013F28 WARNING: following issues were detected - node "192.168.80.222" (ID: 2) is registered as standby but running as primary [highgo@localhost HighGo5.6.5-cluster]$ pg_ctl stop -m f 等待伺服器程式關閉 ..... 完成 伺服器程式已經關閉
測試是否可以將本節點重新加入叢集:
[highgo@localhost HighGo5.6.5-cluster]$ repmgr node rejoin -d 'host=192.168.80.222 dbname=highgo user=highgo' --force-rewind --config-files=postgresql.conf --verbose --dry-run INFO: looking for configuration file in "/opt/HighGo5.6.5-cluster" INFO: configuration file found at: "/opt/HighGo5.6.5-cluster/conf/hg_repmgr.conf" DEBUG: set_config(): SET synchronous_commit TO 'local' DEBUG: get_primary_node_id(): SELECT node_id FROM repmgr.nodes WHERE type = 'primary' AND active IS TRUE DEBUG: get_node_record(): SELECT n.node_id, n.type, n.upstream_node_id, n.node_name, n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' AS upstream_node_name FROM repmgr.nodes n WHERE n.node_id = 2 DEBUG: connecting to: "user=highgo connect_timeout=2 dbname=highgo host=192.168.80.222 port=5866 fallback_application_name=repmgr" DEBUG: set_config(): SET synchronous_commit TO 'local' DEBUG: get_recovery_type(): SELECT pg_catalog.pg_is_in_recovery() INFO: replication connection to the rejoin target node was successful INFO: local and rejoin target system identifiers match DETAIL: system identifier is 6823598896860049498 DEBUG: local timeline: 25; rejoin target timeline: 26 DEBUG: get_timeline_history(): TIMELINE_HISTORY 26 DEBUG: local tli: 25; local_xlogpos: 2/DE000028; follow_target_history->tli: 25; follow_target_history->end: 2/D2013F28 NOTICE: pg_rewind execution required for this node to attach to rejoin target node 2 DETAIL: rejoin target server's timeline 26 forked off current database system timeline 25 before current recovery point 2/DE000028 DEBUG: guc_set(): SELECT true FROM pg_catalog.pg_settings WHERE name = 'full_page_writes' AND setting = 'off' DEBUG: guc_set(): SELECT true FROM pg_catalog.pg_settings WHERE name = 'wal_log_hints' AND setting = 'on' INFO: prerequisites for using pg_rewind are met DEBUG: using archive directory "/tmp/repmgr-config-archive-192.168.80.221" INFO: temporary archive directory "/tmp/repmgr-config-archive-192.168.80.221" created INFO: file "postgresql.conf" would be copied to "/tmp/repmgr-config-archive-192.168.80.221/postgresql.conf" INFO: 1 files would have been copied to "/tmp/repmgr-config-archive-192.168.80.221" INFO: temporary archive directory "/tmp/repmgr-config-archive-192.168.80.221" deleted INFO: pg_rewind would now be executed DETAIL: pg_rewind command is: /opt/HighGo5.6.5-cluster/bin/pg_rewind -D '/opt/HighGo5.6.5-cluster/data' --source-server='host=192.168.80.222 user=highgo dbname=highgo port=5866 connect_timeout=2' INFO: prerequisites for executing NODE REJOIN are met
正式將節點重新加入叢集:
[highgo@localhost HighGo5.6.5-cluster]$ repmgr node rejoin -d 'host=192.168.80.222 dbname=highgo user=highgo' --force-rewind --config-files=postgresql.conf --verbose INFO: looking for configuration file in "/opt/HighGo5.6.5-cluster" INFO: configuration file found at: "/opt/HighGo5.6.5-cluster/conf/hg_repmgr.conf" DEBUG: set_config(): SET synchronous_commit TO 'local' DEBUG: get_primary_node_id(): SELECT node_id FROM repmgr.nodes WHERE type = 'primary' AND active IS TRUE DEBUG: get_node_record(): SELECT n.node_id, n.type, n.upstream_node_id, n.node_name, n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' AS upstream_node_name FROM repmgr.nodes n WHERE n.node_id = 2 DEBUG: connecting to: "user=highgo connect_timeout=2 dbname=highgo host=192.168.80.222 port=5866 fallback_application_name=repmgr" DEBUG: set_config(): SET synchronous_commit TO 'local' DEBUG: get_recovery_type(): SELECT pg_catalog.pg_is_in_recovery() DEBUG: local timeline: 25; rejoin target timeline: 26 DEBUG: get_timeline_history(): TIMELINE_HISTORY 26 DEBUG: local tli: 25; local_xlogpos: 2/DE000028; follow_target_history->tli: 25; follow_target_history->end: 2/D2013F28 NOTICE: pg_rewind execution required for this node to attach to rejoin target node 2 DETAIL: rejoin target server's timeline 26 forked off current database system timeline 25 before current recovery point 2/DE000028 DEBUG: guc_set(): SELECT true FROM pg_catalog.pg_settings WHERE name = 'full_page_writes' AND setting = 'off' DEBUG: guc_set(): SELECT true FROM pg_catalog.pg_settings WHERE name = 'wal_log_hints' AND setting = 'on' INFO: prerequisites for using pg_rewind are met DEBUG: using archive directory "/tmp/repmgr-config-archive-192.168.80.221" DEBUG: copying "postgresql.conf" to "/tmp/repmgr-config-archive-192.168.80.221/postgresql.conf" INFO: 1 files copied to "/tmp/repmgr-config-archive-192.168.80.221" NOTICE: executing pg_rewind DETAIL: pg_rewind command is "/opt/HighGo5.6.5-cluster/bin/pg_rewind -D '/opt/HighGo5.6.5-cluster/data' --source-server='host=192.168.80.222 user=highgo dbname=highgo port=5866 connect_timeout=2'" DEBUG: executing: /opt/HighGo5.6.5-cluster/bin/pg_rewind -D '/opt/HighGo5.6.5-cluster/data' --source-server='host=192.168.80.222 user=highgo dbname=highgo port=5866 connect_timeout=2' DEBUG: result of command was 0 (13) DEBUG: local_command(): output returned was: servers diverged at WAL location 2/D2013F28 on timeline 25 DEBUG: using archive directory "/tmp/repmgr-config-archive-192.168.80.221" DEBUG: copying "/tmp/repmgr-config-archive-192.168.80.221/postgresql.conf" to "/opt/HighGo5.6.5-cluster/data/postgresql.conf" NOTICE: 1 files copied to /opt/HighGo5.6.5-cluster/data INFO: directory "/tmp/repmgr-config-archive-192.168.80.221" deleted INFO: deleting "recovery.done" DEBUG: get_node_record(): SELECT n.node_id, n.type, n.upstream_node_id, n.node_name, n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' AS upstream_node_name FROM repmgr.nodes n WHERE n.node_id = 1 NOTICE: setting node 1's upstream to node 2 DEBUG: create_recovery_file(): creating "/opt/HighGo5.6.5-cluster/data/recovery.conf"... DEBUG: recovery file is: standby_mode = 'on' primary_conninfo = 'user=highgo connect_timeout=2 host=192.168.80.222 port=5866 application_name=192.168.80.221' recovery_target_timeline = 'latest' DEBUG: is_server_available(): ping status for "host=192.168.80.221 user=highgo dbname=highgo password=highgo@123 port=5866 connect_timeout=2" is PQPING_NO_RESPONSE WARNING: unable to ping "host=192.168.80.221 user=highgo dbname=highgo password=highgo@123 port=5866 connect_timeout=2" DETAIL: PQping() returned "PQPING_NO_RESPONSE" NOTICE: starting server using "/opt/HighGo5.6.5-cluster/bin/pg_ctl -w -D '/opt/HighGo5.6.5-cluster/data' start" DEBUG: executing: /opt/HighGo5.6.5-cluster/bin/pg_ctl -w -D '/opt/HighGo5.6.5-cluster/data' start DEBUG: result of command was 0 (13) DEBUG: local_command(): output returned was: 等待伺服器程式啟動 ....2020-06-25 09:41:54.523 CST [5308] WARNING: 01000: gdb version should large than 7.10 DEBUG: update_node_record_status(): UPDATE repmgr.nodes SET type = 'standby', upstream_node_id = 2, active = TRUE WHERE node_id = 1 DEBUG: is_server_available(): ping status for "host=192.168.80.221 user=highgo dbname=highgo password=highgo@123 port=5866 connect_timeout=2" is PQPING_REJECT WARNING: unable to ping "host=192.168.80.221 user=highgo dbname=highgo password=highgo@123 port=5866 connect_timeout=2" DETAIL: PQping() returned "PQPING_REJECT" INFO: waiting for node 1 to respond to pings; 1 of max 60 attempts DEBUG: is_server_available(): ping status for "host=192.168.80.221 user=highgo dbname=highgo password=highgo@123 port=5866 connect_timeout=2" is PQPING_REJECT WARNING: unable to ping "host=192.168.80.221 user=highgo dbname=highgo password=highgo@123 port=5866 connect_timeout=2" DETAIL: PQping() returned "PQPING_REJECT" DEBUG: sleeping 1 second waiting for node 1 to respond to pings; 2 of max 60 attempts DEBUG: is_server_available(): ping status for "host=192.168.80.221 user=highgo dbname=highgo password=highgo@123 port=5866 connect_timeout=2" is PQPING_REJECT WARNING: unable to ping "host=192.168.80.221 user=highgo dbname=highgo password=highgo@123 port=5866 connect_timeout=2" DETAIL: PQping() returned "PQPING_REJECT" DEBUG: sleeping 1 second waiting for node 1 to respond to pings; 3 of max 60 attempts DEBUG: is_server_available(): ping status for "host=192.168.80.221 user=highgo dbname=highgo password=highgo@123 port=5866 connect_timeout=2" is PQPING_OK INFO: demoted primary is pingable INFO: node 1 has attached to its upstream node DEBUG: _create_event(): event is "node_rejoin" for node 1 DEBUG: get_recovery_type(): SELECT pg_catalog.pg_is_in_recovery() DEBUG: _create_event(): INSERT INTO repmgr.events ( node_id, event, successful, details ) VALUES ($1, $2, $3, $4) RETURNING event_timestamp DEBUG: _create_event(): Event timestamp is "2020-06-30 10:15:47.522104+08" NOTICE: NODE REJOIN successful DETAIL: node 1 is now attached to node 2
重新查詢叢集狀態,叢集狀態正常。
[highgo@localhost HighGo5.6.5-cluster]$ repmgr cluster show ID | Name | Role | Status | Upstream | Location | Priority | Replication lag | Last replayed LSN ----+----------------+---------+----------------------+----------------+----------+----------+-----------------+------------------- 1 | 192.168.80.221 | standby | running | 192.168.80.221 | default | 100 | 0 MB | 2/D4013F28 2 | 192.168.80.222 | primary | running | | default | 100 | n/a | none 3 | 192.168.80.236 | standby | running | 192.168.80.221 | default | 100 | 0 MB | 2/D4013F28
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/69994931/viewspace-2851503/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- RocketMQ雙主雙從叢集搭建2020-09-24MQ
- Kafka叢集訊息積壓問題及處理策略2021-03-03Kafka
- RocketMQ4.7.1雙主雙從叢集搭建2020-10-30MQ
- PostgreSQL repmgr高可用叢集+keepalived高可用2020-09-02SQL
- ODPS主備叢集雙向資料複製導致主備中心網路打爆問題2021-11-08
- PostgreSQL叢集管理工具repmgr相關命令2024-01-09SQL
- Linux 問題處理集錦2018-11-14Linux
- zookeeper叢集崩潰處理2017-07-12
- socket服務叢集處理2007-08-08
- Spark叢集資料處理速度慢(資料本地化問題)2018-01-18Spark
- 關於Oracle RAC 叢集日誌無法輪循的問題處理2015-09-29Oracle
- 在Docker下進行MyCAT管理雙主雙從MySQL叢集2021-01-10DockerMySql
- MySQL叢集搭建(6)-雙主+keepalived高可用2019-01-28MySql
- docker-compose安裝rocketmq雙主叢集2020-12-30DockerMQ
- nginx主主叢集2016-04-07Nginx
- 搭建Kubernetes叢集時DNS無法解析問題的處理過程2019-03-01DNS
- WebServer叢集的問題2007-05-20WebServer
- kingbaseV8R6叢集常見問題處理步驟以及思路2023-12-13
- 如何處理CSDN部落格主頁404問題2020-10-21
- Quartz叢集增強版_01.叢集及缺火處理(ClusterMisfireHandler)2024-11-12quartz
- redis叢集密碼問題2018-01-30Redis密碼
- Mysql雙主雙從高可用叢集的搭建且與MyCat進行整合2021-10-18MySql
- 基於Dokcer搭建Redis叢集(主從叢集)2020-12-10Redis
- mysql叢集02:幾個問題2018-10-24MySql
- redis 叢集常見問題 QA2023-04-21Redis
- 【問題追查】mc叢集寫入恍惚問題排查2021-09-09
- tidb之dm叢集同步異常處理2022-03-01TiDB
- 資料庫主機重啟卡住問題處理分享2022-10-09資料庫
- MySQL主從不同步問題分析與處理思路2018-05-13MySql
- 雙核處理(動態規劃的01揹包問題)2017-05-18動態規劃
- 【RAC】處理VIP資源被佔用導致Cluster叢集軟體無法正常部署問題2014-09-30
- oracle 11.2.0.2 安裝grid叢集軟體報[INS-30060]問題處理2011-05-04Oracle
- 轉載was當機處理:WAS7叢集上傳下載問題分析及動靜分離2013-02-25
- 叢集故障處理之處理思路以及健康狀態檢查(三十二)2019-08-07
- 搭建postgresql叢集的問題彙總2015-08-24SQL
- 專案叢集引發的問題2009-04-09
- 請教個mysql叢集的問題2010-08-10MySql
- Memcached 叢集架構方面的問題2010-07-30架構