案例說明:
主庫主機系統重啟,觸發failover切換,原主庫系統啟動後,資料庫服務被啟動,出現“雙主”,使用“repmgr node rejoin --force-rewind”恢復叢集。透過本案例瞭解,叢集“雙主”產生的原因及解決方案,並熟悉sys_rewind在叢集恢復中的應用。
資料庫版本:
KingbaseES V8R6
一、叢集架構
如下所示,叢集原節點狀態:
[kingbase@node202 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 17 | | host=192.168.1.201 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
2 | node2 | standby | running | node1 | default | 100 | 17 | 0 bytes | host=192.168.1.202 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
二、主庫系統重啟
[root@node201 ~]# reboot
三、主庫重啟後叢集狀態
如下所示,原主庫主機系統重啟後,資料庫服務自動啟動,導致出現“雙主”:
1、failover後叢集節點狀態
[kingbase@node202 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | - failed | ? | default | 100 | | | host=192.168.1.201 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
2 | node2 | primary | * running | | default | 100 | 18 | | host=192.168.1.202 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
[WARNING] following issues were detected
- unable to connect to node "node1" (ID: 1)
2、檢視節點上timeline變化
如下所示,新主庫觸發promote後,timeline為(18=1*16^1 + 2 * 16^0),原主庫的timeline為(17):
新主庫:
[kingbase@node202 bin]$ ls -lh ../data/sys_wal
-rw------- 1 kingbase kingbase 16M Jan 31 10:52 0000001100000000000000EB
-rw------- 1 kingbase kingbase 16M Jan 31 10:57 0000001100000000000000EC.partial
-rw------- 1 kingbase kingbase 694 Jan 26 09:41 00000011.history
-rw------- 1 kingbase kingbase 16M Jan 31 11:08 0000001200000000000000EC
-rw------- 1 kingbase kingbase 738 Jan 31 11:07 00000012.history
原主庫:
[kingbase@node201 bin]$ ls -lh ../data/sys_wal
-rw------- 1 kingbase kingbase 16M Jan 31 10:52 0000001100000000000000EB
-rw------- 1 kingbase kingbase 16M Jan 31 11:06 0000001100000000000000EC
-rw------- 1 kingbase kingbase 16M Jan 31 11:06 0000001100000000000000ED
-rw------- 1 kingbase kingbase 694 Jan 3 16:27 00000011.history
drwx------ 2 kingbase kingbase 16K Jan 31 11:06 archive_status
四、模擬“雙主”故障
1、觸發failover切換
# 叢集引數配置
[kingbase@node201 bin]$ cat ../etc/repmgr.conf |grep -E 'recovery|failover'
failover='automatic'
recovery='manual'
# 關閉主庫資料庫服務
[kingbase@node201 bin]$ ./sys_ctl stop -D ../data/
主庫資料庫服務被關閉後,將觸發叢集主備failover切換,在引數recovery=manual情況下,原主庫將需要人工參與恢復為備庫加入叢集。
2、誤啟動原主庫資料庫服務
如下所示,如果需要人工將原主庫恢復為新的備庫加入到叢集,需要在原主庫data目錄下建立standby.signal檔案後,再啟動資料庫服務,資料庫服務將啟動到備庫狀態,如果在沒有建立standby.signal標識檔案下,直接啟動原主庫資料庫服務,將直接進入主節點模式。
[kingbase@node201 bin]$ ./sys_ctl start -D ../data/
2、檢視資料庫服務程序
如下所示,原主庫啟動後,資料庫服務程序啟動為主庫,叢集出現“雙主”。
[kingbase@node201 bin]$ ps -ef |grep kingbase
kingbase 4137 1 0 11:14 ? 00:00:00 /home/kingbase/cluster/R6C8/HAC8/kingbase/bin/kingbase -D ../data
kingbase 4138 4137 0 11:14 ? 00:00:00 kingbase: logger
kingbase 4140 4137 0 11:14 ? 00:00:00 kingbase: checkpointer
kingbase 4141 4137 0 11:14 ? 00:00:00 kingbase: background writer
kingbase 4142 4137 0 11:14 ? 00:00:00 kingbase: walwriter
kingbase 4143 4137 0 11:14 ? 00:00:00 kingbase: autovacuum launcher
kingbase 4144 4137 0 11:14 ? 00:00:00 kingbase: archiver last was 0000001100000000000000EC
kingbase 4145 4137 0 11:14 ? 00:00:00 kingbase: stats collector
kingbase 4146 4137 0 11:14 ? 00:00:00 kingbase: kwr collector
kingbase 4147 4137 0 11:14 ? 00:00:00 kingbase: ksh writer
kingbase 4148 4137 0 11:14 ? 00:00:00 kingbase: ksh collector
kingbase 4149 4137 0 11:14 ? 00:00:00 kingbase: logical replication launcher
# 資料庫狀態
test=# select sys_is_in_recovery();
sys_is_in_recovery()
----------------------
f
(1 row)
3、檢視資料庫日誌
[kingbase@node201 sys_log]$ tail -1000 kingbase-2024-01-31_111429.csv
2024-01-31 11:14:29.129 CST,,,4137,,65b9bb15.1029,1,,2024-01-31 11:14:29 CST,,0,LOG,00000,"ending log output to stderr",,"Future log output will go to log destination ""csvlog"".",,,,,,,""
2024-01-31 11:14:29.149 CST,,,4139,,65b9bb15.102b,1,,2024-01-31 11:14:29 CST,,0,LOG,00000,"database system was shut down at 2024-01-31 11:06:32 CST",,,,,,,,,""
2024-01-31 11:14:29.164 CST,,,4137,,65b9bb15.1029,2,,2024-01-31 11:14:29 CST,,0,LOG,00000,"database system is ready to accept connections",,,,,,,,,""
2024-01-31 11:14:31.775 CST,"esrep","esrep",4156,"192.168.1.201:32776",65b9bb17.103c,1,"ALTER SYSTEM",2024-01-31 11:14:31 CST,4/21,0,LOG,XX000,"attention:user esrep is modifying synchronous_standby_names by ALTER SYSTEM SET statement",,,,,,"ALTER SYSTEM SET synchronous_standby_names = ''",,,"internal_rwcmgr"
2024-01-31 11:14:31.783 CST,,,4137,,65b9bb15.1029,3,,2024-01-31 11:14:29 CST,,0,LOG,00000,"received SIGHUP, reloading configuration files",,,,,,,,,""
2024-01-31 11:14:31.784 CST,,,4137,,65b9bb15.1029,4,,2024-01-31 11:14:29 CST,,0,LOG,00000,"parameter ""synchronous_standby_names"" changed to """"",,,,,,,,,""
4、叢集節點狀態
如下所示,叢集出現“雙主”狀態:
[kingbase@node202 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | - failed | ? | default | 100 | | | host=192.168.1.201 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
2 | node2 | primary | * running | | default | 100 | 18 | | host=192.168.1.202 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
[WARNING] following issues were detected
- unable to connect to node "node1" (ID: 1)
五、恢復叢集
如下所示,將原主庫透過node rejoin方式作為備庫加入到叢集,以下為日誌資訊:
1、執行‘repmgr node rejoin’
[kingbase@node201 bin]$ ./repmgr node rejoin -h 192.168.1.202 -U esrep -d esrep --force-rewind
......
[NOTICE] rejoin target is node "node2" (ID: 2)
[DEBUG] connecting to: "user=esrep connect_timeout=10 dbname=esrep host=192.168.1.202 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=internal_rwcmgr options=-csearch_path="
[DEBUG] local tli: 17; local_xlogpos: 0/EE000028; follow_target_history->tli: 17; follow_target_history->end: 0/EC000FA0
[NOTICE] sys_rewind execution required for this node to attach to rejoin target node 2
[DETAIL] rejoin target server's timeline 18 forked off current database system timeline 17 before current recovery point 0/EE000028
[NOTICE] executing sys_rewind
[DETAIL] sys_rewind command is "/home/kingbase/cluster/R6C8/HAC8/kingbase/bin/sys_rewind -D '/home/kingbase/cluster/R6C8/HAC8/kingbase/data' --source-server='host=192.168.1.202 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3'"
sys_rewind: servers diverged at WAL location 0/EC000FA0 on timeline 17
sys_rewind: rewinding from last common checkpoint at 0/EB000088 on timeline 17
sys_rewind: find last common checkpoint start time from 2024-01-31 11:18:07.003968 CST to 2024-01-31 11:18:07.204522 CST, in "0.200554" seconds.
sys_rewind: collect the number of WAL files to be processed:3, start time from 2024-01-31 11:18:07.204754 CST to 2024-01-31 11:18:07.320593 CST, cost "0.115839" seconds.
sys_rewind: read WAL in target server from 0/EB000088 to 0/EE0000A0 (endpoint 0/EE0000A0)
sys_rewind: read the local Wal file information, start time from 2024-01-31 11:18:07.204754 CST to 2024-01-31 11:18:07.321418 CST, cost "0.000793" seconds.
sys_rewind: file replication start time from 2024-01-31 11:18:07.321453 CST to 2024-01-31 11:18:07.827427 CST, cost "0.505974" seconds.
sys_rewind: update the control file: minRecoveryPoint is '0/EC013BE0', minRecoveryPointTLI is '18', and database state is 'in archive recovery'
sys_rewind: we will remove the dir '/home/kingbase/cluster/R6C8/HAC8/kingbase/data/sys_replslot/repmgr_slot_2.rewind' and all the file/dir in it.
sys_rewind: rewind start wal location 0/EB000058 (file 0000001100000000000000EB), end wal location 0/EC013BE0 (file 0000001200000000000000EC). wal data increment:16462(kB). time from 2024-01-31 11:18:07.321453 CST to 2024-01-31 11:18:08.543478 CST, in "1.539510" seconds.
sys_rewind: Done!
[NOTICE] 0 files copied to /home/kingbase/cluster/R6C8/HAC8/kingbase/data
[INFO] creating replication slot as user "esrep"
[DEBUG] CreateSlotBySQL(): creating slot "repmgr_slot_1" on upstream
[NOTICE] setting node 1's upstream to node 2
[WARNING] unable to ping "host=192.168.1.201 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000"
[DETAIL] KCIping() returned "KCIPING_NO_RESPONSE"
[NOTICE] begin to start server at 2024-01-31 11:18:08.556667
[NOTICE] starting server using "/home/kingbase/cluster/R6C8/HAC8/kingbase/bin/sys_ctl -w -t 90 -D '/home/kingbase/cluster/R6C8/HAC8/kingbase/data' -l /home/kingbase/cluster/R6C8/HAC8/kingbase/bin/logfile start"
[NOTICE] start server finish at 2024-01-31 11:18:09.265062
[NOTICE] NODE REJOIN successful
[DETAIL] node 1 is now attached to node 2
2、rewind對比原主庫和新主庫history檔案
如下所示,sys_rewind透過對比源和目標庫的history檔案,查詢節點之間的分叉點。
原主庫:
[kingbase@node201 sys_wal]$ cat 00000011.history
1 0/180000A0 no recovery target specified
2 0/190000A0 no recovery target specified
3 0/250000A0 no recovery target specified
4 0/260000A0 no recovery target specified
5 0/AB0000A0 no recovery target specified
6 0/AD0000A0 no recovery target specified
7 0/B10000A0 no recovery target specified
8 0/B30000A0 no recovery target specified
9 0/B60000A0 no recovery target specified
10 0/B70000A0 no recovery target specified
11 0/BE0000A0 no recovery target specified
12 0/BF0000A0 no recovery target specified
13 0/C00000A0 no recovery target specified
14 0/C10000A0 no recovery target specified
15 0/C90000A0 no recovery target specified
16 0/CA0000A0 no recovery target specified
新主庫:
[kingbase@node202 bin]$ cat ../data/sys_wal/00000012.history
1 0/180000A0 no recovery target specified
2 0/190000A0 no recovery target specified
3 0/250000A0 no recovery target specified
4 0/260000A0 no recovery target specified
5 0/AB0000A0 no recovery target specified
6 0/AD0000A0 no recovery target specified
7 0/B10000A0 no recovery target specified
8 0/B30000A0 no recovery target specified
9 0/B60000A0 no recovery target specified
10 0/B70000A0 no recovery target specified
11 0/BE0000A0 no recovery target specified
12 0/BF0000A0 no recovery target specified
13 0/C00000A0 no recovery target specified
14 0/C10000A0 no recovery target specified
15 0/C90000A0 no recovery target specified
16 0/CA0000A0 no recovery target specified
17 0/EC000FA0 no recovery target specified --history分叉點
源庫和目標庫的分叉點:
如下所示,在rewind過程中,源庫和目標庫之間透過history檔案獲取分叉點,然後從分叉點之前最近的checkpoint開始對wal日誌執行recovery(必須保證從checkpoint開始,源庫上的wal日誌必須存在,否則將會因為缺失wal日誌,rewind失敗。):
3、檢視新主庫資料庫日誌
2024-01-31 11:18:09.261 CST,"esrep","",13596,"192.168.1.201:32307",65b9bbf1.351c,1,"idle",2024-01-31 11:18:09 CST,4/0,0,LOG,00000,"received replication command: IDENTIFY_SYSTEM",,,,,,,,,"node1"
2024-01-31 11:18:09.261 CST,"esrep","",13596,"192.168.1.201:32307",65b9bbf1.351c,2,"idle",2024-01-31 11:18:09 CST,4/0,0,LOG,00000,"received replication command: START_REPLICATION SLOT ""repmgr_slot_1"" 0/EC000000 TIMELINE 18",,,,,,,,,"node1"
2024-01-31 11:18:11.237 CST,"esrep","esrep",3552,"192.168.1.202:49745",65b9b688.de0,3,"ALTER SYSTEM",2024-01-31 10:55:04 CST,2/4414,0,LOG,XX000,"attention:user esrep is modifying synchronous_standby_names by ALTER SYSTEM SET statement",,,,,,"ALTER SYSTEM SET synchronous_standby_names = 'ANY 1(node1)'",,,"internal_rwcmgr"
2024-01-31 11:18:11.239 CST,,,2823,,65b9b66f.b07,25,,2024-01-31 10:54:39 CST,,0,LOG,00000,"received SIGHUP, reloading configuration files",,,,,,,,,""
2024-01-31 11:18:11.241 CST,,,2823,,65b9b66f.b07,26,,2024-01-31 10:54:39 CST,,0,LOG,00000,"parameter ""synchronous_standby_names"" changed to ""ANY 1(node1)""",,,,,,,,,""
2024-01-31 11:18:11.402 CST,"esrep","",13596,"192.168.1.201:32307",65b9bbf1.351c,3,"streaming 0/EC015300",2024-01-31 11:18:09 CST,4/0,0,LOG,00000,"standby ""node1"" is now a candidate for quorum synchronous standby",,,,,,,,,"node1"
4、叢集恢復完成
[kingbase@node202 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | standby | running | node2 | default | 100 | 17 | 0 bytes | host=192.168.1.201 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
2 | node2 | primary | * running | | default | 100 | 18 | | host=192.168.1.202 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
六、總結
sys_rewind檢查源和目標叢集的時間線歷史以確定它們的分叉點,從分叉點之前最近的checkpoint位置開始解析WAL,解析出變動的資料頁,然後將變動的資料頁複製過來,並從分叉點最近的checkpoint開始應用wal日誌,最終保證源庫和目標庫資料一致。
sys_rewind 的使用不限於故障轉移,例如,可以提升備用伺服器主庫,執行一些寫入事務,然後重新回滾再次成為備用伺服器。