Gaussdb: CN修復失敗對openssl版本依賴問題處理

我爱睡莲發表於2024-09-05

1.問題背景

GaussDB輕量化分散式叢集安裝完成後,進行openssh和openssl升級,現有環境openssh-8.2p1-9.p03.ky10.x86_64和openssl-1.1.1f-2.ky10.x86_64版本,可以安裝資料庫,然後升級這兩個版本到openssh-8.2p1-9.p15.ky10.x86_64和openssl-1.1.1f-4.p17.ky10.x86_64。

對叢集安裝完成後的命令測試,啟停機群節點都沒問題,然後但是被協調節點被剔除以後,修復出現了這個故障,出現了報錯,跟第一次安裝的叢集出現了一樣的問題,報錯截圖如下:

叢集狀態如下,有一個CN節點顯示被剔除,叢集狀態變為降級,DN正常,叢集仍為可用狀態

2.進行openssh和openssl版本規避

修改說明:

1. 修改GaussDB(DWS) 的環境變數檔案/opt/huawei/Bigdata/mppdb/.mppdbgs_profile, 調整LD_LIBRARY_PATH變數執行
修改前:
[omm@redhat-4 ~]$ cat  /opt/huawei/Bigdata/mppdb/.mppdbgs_profile  | grep -in LD_LIBRARY_PATH
5:export LD_LIBRARY_PATH=$GPHOME/lib:$LD_LIBRARY_PATH
7:export LD_LIBRARY_PATH=$GPHOME/lib/libsimsearch:$LD_LIBRARY_PATH
11:export LD_LIBRARY_PATH=$GAUSSHOME/lib:$LD_LIBRARY_PATH
12:export LD_LIBRARY_PATH=$GAUSSHOME/lib/libsimsearch:$LD_LIBRARY_PATH

修改後:

[omm@redhat-4 ~]$ cat  /opt/huawei/Bigdata/mppdb/.mppdbgs_profile  | grep -in LD_LIBRARY_PATH
5:export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$GPHOME/lib
7:export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$GPHOME/lib/libsimsearch
11:export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$GAUSSHOME/lib
12:export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$GAUSSHOME/lib/libsimsearch
增加內容如下:
export LD_LIBRARY_PATH=/lib64:$LD_LIBRARY_PATH
2. 在/etc/profile中增加LD_LIBRARY_PATH變數。其中/lib64為ssh二進位制工具的依賴庫路徑。
增加內容如下:
export LD_LIBRARY_PATH=/lib64:$LD_LIBRARY_PATH

3.重新修復CN

3.1重新進行gs_replace修復協調節點,但是有其他報錯

[omm@DN01 ~]$ gs_replace -t config -h DN02
Checking all the cm_agent instances.
There are [0] cm_agents need to be repaired in cluster.
Fixing all the CMAgents instances.
Checking and restoring the secondary standby instance.
The secondary standby instance does not need to be restored.
Configuring
Waiting for promote peer instances.
.
Successfully upgraded standby instances.
Configuring replacement instances.
Successfully configured replacement instances.
Deleting abnormal CN from pgxc_node on the normal CN.
No abnormal CN needs to be deleted.
Unlocking cluster.
Successfully unlocked cluster.
Locking cluster.
Successfully locked cluster.
Unlocking cluster.
Successfully unlocked cluster.
Creating all fixed CN on the normal CN.
No CN needs to be created.
Warning: failed to turn off O&M management. Please re-execute "cm_ctl set --maintenance=off" once again.
[GAUSS-51400] : Failed to execute the command: source /opt/huawei/Bigdata/mppdb/.mppdbgs_profile ; cm_ctl set --maintenance=on  -n 2. Error:
cm_ctl: Starting to enable the maintenance mode.
cm_ctl: Close maintenance mode on cm instances.
cm_ctl: Close maintenance mode on cm instances failed.

3.2 執行如上面報錯提示

[omm@DN01 ~]$ source /opt/huawei/Bigdata/mppdb/.mppdbgs_profile
[omm@DN01 ~]$
[omm@DN01 ~]$ cm_ctl set --maintenance=on  -n 2
cm_ctl: Starting to enable the maintenance mode.
cm_ctl: Close maintenance mode on cm instances.
cm_ctl: Close maintenance mode on cm instances failed.

3.3 檢視日誌

[omm@DN01 ~]$ cd $GAUSSLOG/bin/cm_ctl
[omm@DN01 cm_ctl]$ less cm_ctl-2024-07-13_191612-current.log

報錯截圖如下:

3.4三節點移除pssh檔案

[omm@DN01 cm_ctl]$ sudo mv /usr/bin/pssh /usr/bin/pssh.bak
[omm@DN02 cm_ctl]$ sudo mv /usr/bin/pssh /usr/bin/pssh.bak
[omm@DN03 cm_ctl]$ sudo mv /usr/bin/pssh /usr/bin/pssh.bak

3.5重新呼叫提示命令

[omm@DN01 cm_ctl]$ cm_ctl set --maintenance=on  -n 2
cm_ctl: Starting to enable the maintenance mode.
cm_ctl: Close maintenance mode on cm instances.
cm_ctl: Close maintenance mode on cm instances successfully.
cm_ctl: Generate and distribute the maintenance white-list file.
cm_ctl: Generate and distribute the maintenance white-list file successfully.
cm_ctl: Set maintenance mode on related cm instances.
cm_ctl: Set maintenance mode on related cm instances successfully.
cm_ctl: Reload configuration on related cm instances.
cm_ctl: Reload configuration on related cm instances successfully.
cm_ctl: Query the maintenance mode from the primary cm server.
cm_ctl: Enable the maintenance mode successfully.

The following nodes enter the maintenance mode:
node_2

3.6 重新呼叫gs_replace

[omm@DN01 cm_ctl]$ gs_replace -t config -h DN02
Checking all the cm_agent instances.
There are [0] cm_agents need to be repaired in cluster.
Fixing all the CMAgents instances.
Checking and restoring the secondary standby instance.
The secondary standby instance does not need to be restored.
Configuring
Waiting for promote peer instances.
.
Successfully upgraded standby instances.
Configuring replacement instances.
Successfully configured replacement instances.
Deleting abnormal CN from pgxc_node on the normal CN.
No abnormal CN needs to be deleted.
Unlocking cluster.
Successfully unlocked cluster.
Locking cluster.
Successfully locked cluster.
Incremental building CN from the Normal CN.
Successfully incremental built CN from the Normal CN.
Creating fixed CN on the normal CN.
Successfully created fixed CN on the normal CN.
Starting the fixed cns.
Successfully started the fixed cns.
Creating fixed CN on the fixed CN.
Successfully created fixed CN on the fixed CN.
Unlocking cluster.
Successfully unlocked cluster.
Creating unfixed CN on the fixed and normal CN.
No CN needs to be created.
Configuration succeeded.

3.7 gs_replace啟動CN

[omm@DN01 cm_ctl]$ gs_replace -t start -h DN02
Starting.
======================================================================
.
Successfully started instance process. Waiting to become Normal.
======================================================================

======================================================================
Start succeeded.

3.8叢集balanced操作

[omm@DN01 cm_ctl]$ gs_om -t switch --reset
Operating: Switch reset.
cm_ctl: cmserver is rebalancing the cluster automatically.
.......
cm_ctl: switchover successfully.
Operation succeeded: Switch reset.

3.9叢集狀態

叢集修復

[omm@DN01 cm_ctl]$ gs_om -t status --detail
[  CMServer State   ]

node    node_ip         instance                                    state
---------------------------------------------------------------------------
1  DN01 10.254.21.75    1    /opt/huawei/Bigdata/mppdb/cm/cm_server Primary
3  DN03 10.254.21.77    2    /opt/huawei/Bigdata/mppdb/cm/cm_server Standby

[   Cluster State   ]

cluster_state   : Normal
redistributing  : No
balanced        : Yes

[ Coordinator State ]

node    node_ip         instance                                   state
--------------------------------------------------------------------------
1  DN01 10.254.21.75    5001 /srv/BigData/mppdb/data1/coordinator Normal
2  DN02 10.254.21.76    5002 /srv/BigData/mppdb/data1/coordinator Normal
3  DN03 10.254.21.77    5003 /srv/BigData/mppdb/data1/coordinator Normal

[ Central Coordinator State ]

node    node_ip         instance                                  state
-------------------------------------------------------------------------
3  DN03 10.254.21.77    5003 /srv/BigData/mppdb/data1/coordinator Normal

[     GTM State     ]

node    node_ip         instance                           state                    sync_state
---------------------------------------------------------------
3  DN03 10.254.21.77    1001 /opt/huawei/Bigdata/mppdb/gtm P Primary Connection ok  Sync
1  DN01 10.254.21.75    1002 /opt/huawei/Bigdata/mppdb/gtm S Standby Connection ok  Sync

[  Datanode State   ]

node    node_ip         instance                                  state            | node    node_ip         instance                                  state            | node    node_ip         instance                                  state
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1  DN01 10.254.21.75    6001 /srv/BigData/mppdb/data1/master1     P Primary Normal | 2  DN02 10.254.21.76    6002 /srv/BigData/mppdb/data1/slave1      S Standby Normal | 3  DN03 10.254.21.77    3002 /srv/BigData/mppdb/data1/dummyslave1 R Secondary Normal
1  DN01 10.254.21.75    6003 /srv/BigData/mppdb/data2/master2     P Primary Normal | 3  DN03 10.254.21.77    6004 /srv/BigData/mppdb/data1/slave2      S Standby Normal | 2  DN02 10.254.21.76    3003 /srv/BigData/mppdb/data1/dummyslave2 R Secondary Normal
2  DN02 10.254.21.76    6005 /srv/BigData/mppdb/data1/master1     P Primary Normal | 3  DN03 10.254.21.77    6006 /srv/BigData/mppdb/data2/slave1      S Standby Normal | 1  DN01 10.254.21.75    3004 /srv/BigData/mppdb/data1/dummyslave1 R Secondary Normal
2  DN02 10.254.21.76    6007 /srv/BigData/mppdb/data2/master2     P Primary Normal | 1  DN01 10.254.21.75    6008 /srv/BigData/mppdb/data1/slave2      S Standby Normal | 3  DN03 10.254.21.77    3005 /srv/BigData/mppdb/data2/dummyslave2 R Secondary Normal
3  DN03 10.254.21.77    6009 /srv/BigData/mppdb/data1/master1     P Primary Normal | 1  DN01 10.254.21.75    6010 /srv/BigData/mppdb/data2/slave1      S Standby Normal | 2  DN02 10.254.21.76    3006 /srv/BigData/mppdb/data2/dummyslave1 R Secondary Normal
3  DN03 10.254.21.77    6011 /srv/BigData/mppdb/data2/master2     P Primary Normal | 2  DN02 10.254.21.76    6012 /srv/BigData/mppdb/data2/slave2      S Standby Normal | 1  DN01 10.254.21.75    3007 /srv/BigData/mppdb/data2/dummyslave2 R Secondary Normal

3.10正常狀態資料庫環境變數

[root@DN01 ~]# tail -5f /etc/profile
fi
#TMOUT=600
export TMOUT=0
#LD_LIBRARY_PATH=/usr/local/lib/
export LD_LIBRARY_PATH=/lib64:$LD_LIBRARY_PATH
[omm@DN01 ~]$ cat .bash_profile
# Source /root/.bashrc if user has one
[ -f ~/.bashrc ] && . ~/.bashrc
source /home/omm/.profile

LD_LIBRARY_PATH=/usr/local/lib/
export LD_LIBRARY_PATH=/lib64:$LD_LIBRARY_PATH
[omm@DN01 ~]$ cat /opt/huawei/Bigdata/mppdb/.mppdbgs_profile
#LD_LIBRARY_PATH=/usr/local/lib
export MPPDB_ENV_SEPARATE_PATH=/opt/huawei/Bigdata/mppdb/.mppdbgs_profile
export LDAPCONF=/opt/huawei/Bigdata/mppdb/ldap.conf
export GPHOME=/opt/huawei/Bigdata/mppdb/wisequery
export PATH=$PATH:$GPHOME/script/gspylib/pssh/bin:$GPHOME/script
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$GPHOME/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$GPHOME/lib/libsimsearch
export PYTHONPATH=$GPHOME/lib
export GAUSS_WARNING_TYPE=1
export GAUSSHOME=/opt/huawei/Bigdata/mppdb/core
export PATH=$GAUSSHOME/bin:$PATH
export S3_CLIENT_CRT_FILE=$GAUSSHOME/lib/client.crt
export GAUSS_VERSION=8.2.1
export PGHOST=/opt/huawei/Bigdata/mppdb/mppdb_tmp
export GS_CLUSTER_NAME=FI-MPPDB
export GAUSSLOG=/var/log/Bigdata/mpp/omm
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$GAUSSHOME/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$GAUSSHOME/lib/libsimsearch
export ETCD_UNSUPPORTED_ARCH=386
if [ -f '/opt/huawei/Bigdata/mppdb/core/utilslib/env_ec' ] && [ `id -u` -ne 0 ]; then source '/opt/huawei/Bigdata/mppdb/core/utilslib/env_ec'; fi
export GAUSS_ENV=2
export LD_LIBRARY_PATH=/lib64:$LD_LIBRARY_PATH

相關文章