多路徑配置問題和ACFS啟用原因導致rac二節點不能正常啟動

lmxx2020發表於2024-01-15

二節點啟動時,crsd一直不能啟動成功,crsctl stat res -t -init檢視crsd是offline狀態

ora.asm
      1        ONLINE  ONLINE       rac2                     Started,STABLE
ora.cluster_interconnect.haip
      1        ONLINE  OFFLINE      rac2                     STABLE
ora.crf
      1        ONLINE  ONLINE       rac2                     STABLE
ora.crsd
      1        ONLINE  OFFLINE                               STABLE
ora.cssd
      1        ONLINE  ONLINE       rac2                     STABLE

檢查叢集alert日誌,發現,ACFS Driver load完後就沒有CRSD啟動的資訊

[ohasd(44185)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE
[client(45793)]CRS-10001:31-Jul-23 15:10 ACFS-9391: Checking for existing ADVM/ACFS installation.
[client(45798)]CRS-10001:31-Jul-23 15:10 ACFS-9392: Validating ADVM/ACFS installation files for operating system.
[client(45800)]CRS-10001:31-Jul-23 15:10 ACFS-9393: Verifying ASM Administrator setup.
[client(45803)]CRS-10001:31-Jul-23 15:10 ACFS-9308: Loading installed ADVM/ACFS drivers.
[client(45806)]CRS-10001:31-Jul-23 15:10 ACFS-9154: Loading 'oracleoks.ko' driver.
[client(45839)]CRS-10001:31-Jul-23 15:10 ACFS-9154: Loading 'oracleadvm.ko' driver.
[client(45877)]CRS-10001:31-Jul-23 15:10 ACFS-9154: Loading 'oracleacfs.ko' driver.
[client(45986)]CRS-10001:31-Jul-23 15:10 ACFS-9327: Verifying ADVM/ACFS devices.
[client(45994)]CRS-10001:31-Jul-23 15:10 ACFS-9156: Detecting control device '/dev/asm/.asm_ctl_spec'.
[client(45998)]CRS-10001:31-Jul-23 15:10 ACFS-9156: Detecting control device '/dev/ofsctl'.
[client(46003)]CRS-10001:31-Jul-23 15:10 ACFS-9322: completed

從日誌看起來,二節點的叢集啟動hung在了crsd程式的啟動上,檢查OS日誌,ACFS Driver load完後,multipath 程式在操作asm path。

Jul 29 12:40:35 rac2 kernel: OKSK-00004: Module load succeeded. Build information: (LOW DEBUG) USM_11.2.0.4.0ACFSPSU_LINUX.X64_160211 2016/02/11 10:45:33
Jul 29 12:40:36 rac2 kernel: ADVMK-00001: Module load succeeded. Build information: (LOW DEBUG) - USM_11.2.0.4.0ACFSPSU_LINUX.X64_160211 built on 2016/02/11 11:04:07.
Jul 29 12:40:36 rac2 multipathd: asm!.asm_ctl_spec: add path (uevent)
Jul 29 12:40:36 rac2 multipathd: asm!.asm_ctl_spec: failed to get path uid
Jul 29 12:40:36 rac2 multipathd: uevent trigger error
Jul 29 12:40:36 rac2 multipathd: asm!.asm_ctl_vmb: add path (uevent)
Jul 29 12:40:36 rac2 multipathd: asm!.asm_ctl_vmb: failed to get path uid
Jul 29 12:40:36 rac2 multipathd: uevent trigger error
Jul 29 12:40:36 rac2 multipathd: asm!.asm_ctl_vdbg: add path (uevent)
Jul 29 12:40:36 rac2 multipathd: asm!.asm_ctl_vdbg: failed to get path uid
Jul 29 12:40:36 rac2 multipathd: uevent trigger error

這個報錯在mos中有提到:

Error 'Multipathd: Asm!.Asm_ctl_spec: Failed To Store Path Info' found In /var/log/messages (Doc ID 1268895.1)    

可以對multipath多路徑修改解決,按照文件的描述應該是在multipath.conf中新增以下程式碼:

blacklist {

devnode "^asm/*"

devnode "ofsctl"

}

檢查多路徑配置,發現確實沒有這些配置,這類asm path具體是指哪些呢?

指的是/dev/ofsctl 和 /dev/asm/* 這些,要排除掉下面的。

[root@rac01 ~]# ls -la /dev/asm*
total 0
drwxrwx---  2 root asmadmin     280 Aug 23 14:38 .
drwxr-xr-x 20 root root        7280 Aug 23 14:38 ..
brwxrwx---  1 root asmadmin 252,  0 Aug 23 14:38 .asm_ctl_spec
brwxrwx---  1 root asmadmin 252, 10 Aug 23 14:38 .asm_ctl_vbg0
brwxrwx---  1 root asmadmin 252, 11 Aug 23 14:38 .asm_ctl_vbg1
brwxrwx---  1 root asmadmin 252, 12 Aug 23 14:38 .asm_ctl_vbg2
brwxrwx---  1 root asmadmin 252, 13 Aug 23 14:38 .asm_ctl_vbg3
brwxrwx---  1 root asmadmin 252, 14 Aug 23 14:38 .asm_ctl_vbg4
brwxrwx---  1 root asmadmin 252, 15 Aug 23 14:38 .asm_ctl_vbg5
brwxrwx---  1 root asmadmin 252, 16 Aug 23 14:38 .asm_ctl_vbg6
brwxrwx---  1 root asmadmin 252, 17 Aug 23 14:38 .asm_ctl_vbg7
brwxrwx---  1 root asmadmin 252, 18 Aug 23 14:38 .asm_ctl_vbg8
brwxrwx---  1 root asmadmin 252,  1 Aug 23 14:38 .asm_ctl_vdbg
brwxrwx---  1 root asmadmin 252,  2 Aug 23 14:38 .asm_ctl_vmb

還要確認multipath可以開啟 /dev/ofsctl:

lsof /dev/ofsctl

由於生產環境另外一個節點不能停,暫時不能對多路徑配置做修改。

所以根據叢集alert日誌,嘗試把acfs先暫時禁用掉,參考Doc ID 1417294.1文件

"crsctl stop crs" Fails to Stop ora.drivers.acfs With CRS-2675 (Doc ID 1417294.1)

進入$GRID_HOME/bin下,透過命令禁用acfs

./acfsroot disable

禁用acfs後,二節點叢集資料庫就正常拉起來了。


來自 “ ITPUB部落格 ” ,連結:https://blog.itpub.net/22967847/viewspace-3003842/,如需轉載,請註明出處,否則將追究法律責任。

相關文章