關於Oracle 11G RAC雙節點之間存在防火牆導致只能一個節點執行

騎驢射大飛機發表於2021-11-18

################關於Oracle 11G RAC雙節點之間存在防火牆導致只能一個節點執行

#問題背景:

深信服雲平臺內部分散式防火牆開啟後,所有的主機之間都會存在防火牆。開始不知道分散式防火牆的啟動。


#問題現象:

故障節點啟動叢集時會出現異常,導致叢集無法完成啟動。alert日誌:


2021-10-28 07:30:56.944: 

[ohasd(2313)]CRS-2112:The OLR service started on node cmsorcl2.

2021-10-28 07:30:56.955: 

[ohasd(2313)]CRS-1301:Oracle High Availability Service started on node cmsorcl2.

2021-10-28 07:30:56.958: 

[ohasd(2313)]CRS-8017:location: /etc/oracle/lastgasp has 2 reboot advisory log files, 0 were announced and 0 errors occurred

2021-10-28 07:30:57.283: 

[/u01/app/11.2.0/grid/bin/oraagent.bin(7198)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/u01/app/11.2.0/grid/log/cmsorcl2/agent/ohasd/oraagent_grid/oraagent_grid.log"

2021-10-28 07:31:00.326: 

[/u01/app/11.2.0/grid/bin/orarootagent.bin(7202)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running). 

2021-10-28 07:31:02.503: 

[ohasd(2313)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running). 

2021-10-28 07:31:02.504: 

[gpnpd(8934)]CRS-2328:GPNPD started on node cmsorcl2. 

2021-10-28 07:31:04.840: 

[cssd(10891)]CRS-1713:CSSD daemon is started in clustered mode

2021-10-28 07:31:06.672: 

[ohasd(2313)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE

2021-10-28 07:31:06.673: 

[ohasd(2313)]CRS-2769:Unable to failover resource 'ora.diskmon'.

2021-10-28 07:31:30.624: 

[cssd(10891)]CRS-1707:Lease acquisition for node cmsorcl2 number 2 completed

2021-10-28 07:31:32.008: 

[cssd(10891)]CRS-1605:CSSD voting file is online: /dev/asm-diskc; details in /u01/app/11.2.0/grid/log/cmsorcl2/cssd/ocssd.log.

2021-10-28 07:31:32.048: 

[cssd(10891)]CRS-1605:CSSD voting file is online: /dev/asm-diskb; details in /u01/app/11.2.0/grid/log/cmsorcl2/cssd/ocssd.log.

2021-10-28 07:31:32.089: 

[cssd(10891)]CRS-1605:CSSD voting file is online: /dev/asm-diska; details in /u01/app/11.2.0/grid/log/cmsorcl2/cssd/ocssd.log.

2021-10-28 15:38:34.926: 

[/u01/app/11.2.0/grid/bin/cssdagent(10858)]CRS-5818:Aborted command 'start' for resource 'ora.cssd'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/11.2.0/grid/log/cmsorcl2/agent/ohasd/oracssdagent_root/oracssdagent_root.log.

2021-10-28 15:38:34.926: 

[cssd(10891)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/11.2.0/grid/log/cmsorcl2/cssd/ocssd.log

2021-10-28 15:38:34.926: 

[cssd(10891)]CRS-1603:CSSD on node cmsorcl2 shutdown by user.

2021-10-28 15:38:40.254: 

[ohasd(2313)]CRS-2765:Resource 'ora.cssdmonitor' has failed on server 'cmsorcl2'.

2021-10-28 15:38:41.668: 

[cssd(6922)]CRS-1713:CSSD daemon is started in clustered mode

2021-10-28 15:38:43.435: 

[ohasd(2313)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE

2021-10-28 15:38:43.435: 

[ohasd(2313)]CRS-2769:Unable to failover resource 'ora.diskmon'.

2021-10-28 15:38:57.400: 

[cssd(6922)]CRS-1707:Lease acquisition for node cmsorcl2 number 2 completed



#解決方案:

css異常,一般都是節點間的訪問異常,在ocssd.log中可檢視具體資訊,可看到一些私網埠訪問異常的資訊。

在防火牆中把資料庫pub,priv,vip,scan IP之間的限制全部取消。

如果資料庫不啟用HAIP的話,這時資料庫應該就恢復正常了;如果啟用了HAIP(預設啟用,169段ip),需要把HAIP之間的限制取消掉。如果不取消掉,asm例項無法啟動。現象如下:

Wed Nov 17 22:05:45 2021

MMON started with pid=20, OS id=6979 

Wed Nov 17 22:05:45 2021

MMNL started with pid=21, OS id=6981 

lmon registered with NM - instance number 2 (internal mem no 1)

Wed Nov 17 22:07:45 2021

PMON (ospid: 6911): terminating the instance due to error 481              <-----問題現象,參考http://blog.itpub.net/29615408/viewspace-1384760/

Wed Nov 17 22:07:45 2021

ORA-1092 : opitsk aborting process

Wed Nov 17 22:07:45 2021

System state dump requested by (instance=2, osid=6911 (PMON)), summary=[abnormal instance termination].

System State dumped to trace file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_diag_6949_20211117220745.trc

Dumping diagnostic data in directory=[cdmp_20211117220745], requested by (instance=2, osid=6911 (PMON)), summary=[abnormal instance termination].

Instance terminated by PMON, pid = 6911

Wed Nov 17 22:08:06 2021

NOTE: No asm libraries found in the system


雙節點之間HAIP有防火牆的現象提示不是很明顯,asm的alert日誌如上,手動啟asm到nomount時報ORA-03113: end-of-file on communication channel ,叢集告警日誌和trc檔案中的報錯都沒有明確指出HAIP的問題。PMON (ospid: 6911): terminating the instance due to error 481 的出現原因如下:

Case1: link local IP (169.254.x.x) is being used by other adapter/network

  Case2: firewall exists between nodes on private network (iptables etc)

  Case3: HAIP is up on some nodes but not on all

  Case4: HAIP is up on all nodes but some do not have route info

最終關閉了HAIP之間的防火牆策略後資料庫恢復!


來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/69900971/viewspace-2842920/,如需轉載,請註明出處,否則將追究法律責任。

相關文章