案例說明:
KingbaseES V8R6叢集,在license中禁用“讀寫分離模組”後,備庫的資料庫引數,hot_standby只能配置為off,導致備庫處於"starting up"狀態,無法正常連線訪問;對於repmgr管理的叢集,主備流複製正常,但叢集狀態異常,repmgrd無法正常監控叢集節點狀態。
Tips:
hot_standby=on ,備庫處於熱備狀態,資料庫可以連線訪問。
適用版本:
KingbaseES V8R6
一、問題現象
1、備庫資料庫狀態
如下所示,在license中禁用“讀寫分離模組”後,備庫資料庫服務啟動後,sys_log日誌提示“hot_standby is not supported....",此時備庫的hot_standby只能配置為off,處於”starting up“狀態,無法正常連線訪問。
備庫sys_log日誌:
2、叢集狀態
如下所示,叢集中備庫狀態異常:
[kingbase@node201 bin]$ ./repmgr cluster show
string
----+-------+---------+---------------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 67 | | host=192.168.1.201 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
2 | node2 | standby | ? unreachable | ? node1 | default | 100 | | ? | host=192.168.1.202 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
[WARNING] following issues were detected
- unable to connect to node "node2" (ID: 2)
- node "node2" (ID: 2) is registered as an active standby but is unreachable
備庫資料庫日誌:
如下所示,備庫資料庫服務一直在”starting up“狀態:
......
CST,,0,LOG,00000,"started streaming WAL from primary at 3/6000000 on timeline 67",,,,,,,,,""
2024-09-06 10:25:10.463 CST,"esrep","esrep",5392,"192.168.1.201:65494",66da6806.1510,1,"",2024-09-06 10:25:10 CST,,0,FATAL,57P03,"the database system is starting up",,,,,,,,,""
2024-09-06 10:25:37.185 CST,"esrep","esrep",5529,"192.168.1.201:9005",66da6821.1599,1,"",2024-09-06 10:25:37 CST,,0,FATAL,57P03,"the database system is starting up",,,,,,,,,""
2024-09-06 10:25:37.190 CST,"esrep","esrep",5530,"192.168.1.201:9007",66da6821.159a,1,"",2024-09-06 10:25:37 CST,,0,FATAL,57P03,"the database system is starting up",,,,,,,,,""
二、問題分析
1、檢查流複製狀態
如下所示,叢集主備流複製狀態正常:
test=# select * from sys_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend
_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_l
ag | replay_lag | sync_priority | sync_state | reply_time
------+----------+---------+------------------+---------------+-----------------+-------------+----------------
---------------+--------------+-----------+----------+-----------+-----------+------------+-----------+--------
---+------------+---------------+------------+-------------------------------
3420 | 16384 | esrep | node2 | 192.168.1.202 | | 30761 | 2024-09-06 10:2
5:09.409266+08 | | streaming | | 3/70000D0 | 3/70000D0 | 3/70000D0 | |
| | 1 | quorum | 2024-09-06 10:29:18.758652+08
(1 row)
2、檢視備庫hamgr.log
如下所示,備庫repmgrd程序檢測到資料庫服務一直處於”startping up“狀態,無法透過pqping()連線:
[2024-09-06 10:32:23] [NOTICE] rwcmgrd (repmgrd 5.0.0) starting up
[2024-09-06 10:32:23] [INFO] connecting to database "host=192.168.1.202 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000"
[2024-09-06 10:32:23] [DEBUG] connecting to: "user=esrep connect_timeout=10 dbname=esrep host=192.168.1.202 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000 fallback_application_name=internal_rwcmgr options=-csearch_path="
[2024-09-06 10:32:23] [ERROR] connection to database failed
[2024-09-06 10:32:23] [DETAIL]
FATAL: the database system is starting up
3、啟動備庫repmgrd程序
如下所示,備庫啟動repmgrd程序失敗:
[kingbase@node202 bin]$ ./repmgrd -d
[2024-09-09 14:48:17] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6C8/HAC8/kingbase/log/hamgr.log"
[kingbase@node202 bin]$ ps -ef |grep repmgrd
三、問題總結
讀寫分離模組被禁用,導致備庫hot_standby=off:
1、備庫資料庫服務只能啟動到recovery(starting up)狀態,無法建立連線訪問。
2、備庫repmgrd程序無法啟動,失去叢集的高可用守候功能。
3、透過repmgr cluster show檢視叢集狀態,備庫狀態異常。
4、主備流複製正常。