KingbaseES V8R6叢集運維案例之---license禁用讀寫分離模組

天涯客1224發表於2024-09-09

案例說明:
KingbaseES V8R6叢集,在license中禁用“讀寫分離模組”後,備庫的資料庫引數,hot_standby只能配置為off,導致備庫處於"starting up"狀態,無法正常連線訪問;對於repmgr管理的叢集,主備流複製正常,但叢集狀態異常,repmgrd無法正常監控叢集節點狀態。
Tips:
hot_standby=on ,備庫處於熱備狀態,資料庫可以連線訪問。

適用版本:
KingbaseES V8R6

一、問題現象
1、備庫資料庫狀態
如下所示,在license中禁用“讀寫分離模組”後,備庫資料庫服務啟動後,sys_log日誌提示“hot_standby is not supported....",此時備庫的hot_standby只能配置為off,處於”starting up“狀態,無法正常連線訪問。

備庫sys_log日誌:

2、叢集狀態
如下所示,叢集中備庫狀態異常:

[kingbase@node201 bin]$ ./repmgr cluster show
string                                                                                                               
----+-------+---------+---------------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running     |          | default  | 100      | 67       |         | host=192.168.1.201 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
 2  | node2 | standby | ? unreachable | ? node1  | default  | 100      |          | ?       | host=192.168.1.202 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000

[WARNING] following issues were detected
  - unable to connect to node "node2" (ID: 2)
  - node "node2" (ID: 2) is registered as an active standby but is unreachable

備庫資料庫日誌:
如下所示,備庫資料庫服務一直在”starting up“狀態:

......
CST,,0,LOG,00000,"started streaming WAL from primary at 3/6000000 on timeline 67",,,,,,,,,""
2024-09-06 10:25:10.463 CST,"esrep","esrep",5392,"192.168.1.201:65494",66da6806.1510,1,"",2024-09-06 10:25:10 CST,,0,FATAL,57P03,"the database system is starting up",,,,,,,,,""
2024-09-06 10:25:37.185 CST,"esrep","esrep",5529,"192.168.1.201:9005",66da6821.1599,1,"",2024-09-06 10:25:37 CST,,0,FATAL,57P03,"the database system is starting up",,,,,,,,,""
2024-09-06 10:25:37.190 CST,"esrep","esrep",5530,"192.168.1.201:9007",66da6821.159a,1,"",2024-09-06 10:25:37 CST,,0,FATAL,57P03,"the database system is starting up",,,,,,,,,""

二、問題分析
1、檢查流複製狀態
如下所示,叢集主備流複製狀態正常:

test=# select * from sys_stat_replication;
 pid  | usesysid | usename | application_name |  client_addr  | client_hostname | client_port |         backend
_start         | backend_xmin |   state   | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_l
ag | replay_lag | sync_priority | sync_state |          reply_time
------+----------+---------+------------------+---------------+-----------------+-------------+----------------
---------------+--------------+-----------+----------+-----------+-----------+------------+-----------+--------
---+------------+---------------+------------+-------------------------------
 3420 |    16384 | esrep   | node2            | 192.168.1.202 |                 |       30761 | 2024-09-06 10:2
5:09.409266+08 |              | streaming |          | 3/70000D0 | 3/70000D0 | 3/70000D0  |           |
   |            |             1 | quorum     | 2024-09-06 10:29:18.758652+08
(1 row)

2、檢視備庫hamgr.log
如下所示,備庫repmgrd程序檢測到資料庫服務一直處於”startping up“狀態,無法透過pqping()連線:

[2024-09-06 10:32:23] [NOTICE] rwcmgrd (repmgrd 5.0.0) starting up
[2024-09-06 10:32:23] [INFO] connecting to database "host=192.168.1.202 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000"
[2024-09-06 10:32:23] [DEBUG] connecting to: "user=esrep connect_timeout=10 dbname=esrep host=192.168.1.202 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000 fallback_application_name=internal_rwcmgr options=-csearch_path="
[2024-09-06 10:32:23] [ERROR] connection to database failed
[2024-09-06 10:32:23] [DETAIL]
FATAL:  the database system is starting up

3、啟動備庫repmgrd程序
如下所示,備庫啟動repmgrd程序失敗:

[kingbase@node202 bin]$ ./repmgrd -d
[2024-09-09 14:48:17] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6C8/HAC8/kingbase/log/hamgr.log"

[kingbase@node202 bin]$ ps -ef |grep repmgrd

三、問題總結
讀寫分離模組被禁用,導致備庫hot_standby=off:
1、備庫資料庫服務只能啟動到recovery(starting up)狀態,無法建立連線訪問。
2、備庫repmgrd程序無法啟動,失去叢集的高可用守候功能。
3、透過repmgr cluster show檢視叢集狀態,備庫狀態異常。
4、主備流複製正常。

相關文章