關於Oracle 11G RAC雙節點之間存在防火牆導致只能一個節點執行
################關於Oracle 11G RAC雙節點之間存在防火牆導致只能一個節點執行
#問題背景:
深信服雲平臺內部分散式防火牆開啟後,所有的主機之間都會存在防火牆。開始不知道分散式防火牆的啟動。
#問題現象:
故障節點啟動叢集時會出現異常,導致叢集無法完成啟動。alert日誌:
2021-10-28 07:30:56.944:
[ohasd(2313)]CRS-2112:The OLR service started on node cmsorcl2.
2021-10-28 07:30:56.955:
[ohasd(2313)]CRS-1301:Oracle High Availability Service started on node cmsorcl2.
2021-10-28 07:30:56.958:
[ohasd(2313)]CRS-8017:location: /etc/oracle/lastgasp has 2 reboot advisory log files, 0 were announced and 0 errors occurred
2021-10-28 07:30:57.283:
[/u01/app/11.2.0/grid/bin/oraagent.bin(7198)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/u01/app/11.2.0/grid/log/cmsorcl2/agent/ohasd/oraagent_grid/oraagent_grid.log"
2021-10-28 07:31:00.326:
[/u01/app/11.2.0/grid/bin/orarootagent.bin(7202)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).
2021-10-28 07:31:02.503:
[ohasd(2313)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).
2021-10-28 07:31:02.504:
[gpnpd(8934)]CRS-2328:GPNPD started on node cmsorcl2.
2021-10-28 07:31:04.840:
[cssd(10891)]CRS-1713:CSSD daemon is started in clustered mode
2021-10-28 07:31:06.672:
[ohasd(2313)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE
2021-10-28 07:31:06.673:
[ohasd(2313)]CRS-2769:Unable to failover resource 'ora.diskmon'.
2021-10-28 07:31:30.624:
[cssd(10891)]CRS-1707:Lease acquisition for node cmsorcl2 number 2 completed
2021-10-28 07:31:32.008:
[cssd(10891)]CRS-1605:CSSD voting file is online: /dev/asm-diskc; details in /u01/app/11.2.0/grid/log/cmsorcl2/cssd/ocssd.log.
2021-10-28 07:31:32.048:
[cssd(10891)]CRS-1605:CSSD voting file is online: /dev/asm-diskb; details in /u01/app/11.2.0/grid/log/cmsorcl2/cssd/ocssd.log.
2021-10-28 07:31:32.089:
[cssd(10891)]CRS-1605:CSSD voting file is online: /dev/asm-diska; details in /u01/app/11.2.0/grid/log/cmsorcl2/cssd/ocssd.log.
2021-10-28 15:38:34.926:
[/u01/app/11.2.0/grid/bin/cssdagent(10858)]CRS-5818:Aborted command 'start' for resource 'ora.cssd'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/11.2.0/grid/log/cmsorcl2/agent/ohasd/oracssdagent_root/oracssdagent_root.log.
2021-10-28 15:38:34.926:
[cssd(10891)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/11.2.0/grid/log/cmsorcl2/cssd/ocssd.log
2021-10-28 15:38:34.926:
[cssd(10891)]CRS-1603:CSSD on node cmsorcl2 shutdown by user.
2021-10-28 15:38:40.254:
[ohasd(2313)]CRS-2765:Resource 'ora.cssdmonitor' has failed on server 'cmsorcl2'.
2021-10-28 15:38:41.668:
[cssd(6922)]CRS-1713:CSSD daemon is started in clustered mode
2021-10-28 15:38:43.435:
[ohasd(2313)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE
2021-10-28 15:38:43.435:
[ohasd(2313)]CRS-2769:Unable to failover resource 'ora.diskmon'.
2021-10-28 15:38:57.400:
[cssd(6922)]CRS-1707:Lease acquisition for node cmsorcl2 number 2 completed
#解決方案:
css異常,一般都是節點間的訪問異常,在ocssd.log中可檢視具體資訊,可看到一些私網埠訪問異常的資訊。
在防火牆中把資料庫pub,priv,vip,scan IP之間的限制全部取消。
如果資料庫不啟用HAIP的話,這時資料庫應該就恢復正常了;如果啟用了HAIP(預設啟用,169段ip),需要把HAIP之間的限制取消掉。如果不取消掉,asm例項無法啟動。現象如下:
Wed Nov 17 22:05:45 2021
MMON started with pid=20, OS id=6979
Wed Nov 17 22:05:45 2021
MMNL started with pid=21, OS id=6981
lmon registered with NM - instance number 2 (internal mem no 1)
Wed Nov 17 22:07:45 2021
PMON (ospid: 6911): terminating the instance due to error 481 <-----問題現象,參考http://blog.itpub.net/29615408/viewspace-1384760/
Wed Nov 17 22:07:45 2021
ORA-1092 : opitsk aborting process
Wed Nov 17 22:07:45 2021
System state dump requested by (instance=2, osid=6911 (PMON)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_diag_6949_20211117220745.trc
Dumping diagnostic data in directory=[cdmp_20211117220745], requested by (instance=2, osid=6911 (PMON)), summary=[abnormal instance termination].
Instance terminated by PMON, pid = 6911
Wed Nov 17 22:08:06 2021
NOTE: No asm libraries found in the system
雙節點之間HAIP有防火牆的現象提示不是很明顯,asm的alert日誌如上,手動啟asm到nomount時報ORA-03113: end-of-file on communication channel ,叢集告警日誌和trc檔案中的報錯都沒有明確指出HAIP的問題。PMON (ospid: 6911): terminating the instance due to error 481 的出現原因如下:
Case1: link local IP (169.254.x.x) is being used by other adapter/network
Case2: firewall exists between nodes on private network (iptables etc)
Case3: HAIP is up on some nodes but not on all
Case4: HAIP is up on all nodes but some do not have route info
最終關閉了HAIP之間的防火牆策略後資料庫恢復!
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/69900971/viewspace-2842920/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- RAC節點之間通訊問題 兩節點 11g RAC
- oracle 11g rac新增節點前之清除節點資訊Oracle
- 關於Oracle RAC節點間免密碼策略Oracle密碼
- RAC 雙節點 轉單節點流程
- oracle兩節點RAC,由於gipc導致某節點crs無法啟動問題分析Oracle
- RAC中job會在哪個節點執行
- asm例項自動dismount導致rac一個節點當機ASM
- RAC資料庫只能啟動一個節點的故障資料庫
- Oracle 19c rac安裝,只能啟動一個節點的ASMOracleASM
- rac中控制節點間並行並行
- Oracle 11g RAC重新新增節點Oracle
- ORACLE 11G 建立 DATAGUARD(雙節點RAC-->單例項DG)Oracle單例
- Oracle RAC新增節點Oracle
- Oracle RAC 新增節點Oracle
- 【RAC】部署安裝RAC時確保主節點的時間小於其他節點時間
- Oracle RAC命中ORA-7445只能開啟一個節點故障案例分析Oracle
- RAC系統當中,job在哪個節點執行?
- Oracle Rac 刪除節點Oracle
- 雙節點RAC 修改PROCESS程式數
- 如何執行一個 Conflux 節點UX
- Oracle10g RAC 單獨關閉一個節點(維護)Oracle
- 【NTP】手工同步NTP保證RAC 節點主機之間秒級一致
- ORACLE 10G增加一個節點rac3Oracle 10g
- oracle 11gR2 rac 兩節點有一個節點down掉問題處理Oracle
- RAC一個節點恢復另一個節點在帶庫上的備份
- 刪除alwayson主節點一個表的統計資訊導致一個輔助節點的同步延遲
- Oracle RAC節點時間差過大解決方法Oracle
- oracle goldengate 在節點之間進行 遷移OracleGo
- Sqlserver 2014 alwayson架構主節點執行alter table導致從節點的阻塞問題SQLServer架構
- [網摘] Oracle RAC新增節點Oracle
- 檢視oracle rac的節點Oracle
- RAC中跨節點並行並行
- 刪除oracle10g rac(rhel4)節點_節點Oracle
- oracle 10g rac,刪除故障節點並新增新節點Oracle 10g
- oracle 10g rac 新增節點與刪除節點步驟Oracle 10g
- oracle11g RAC新增節點Oracle
- Oracle10g RAC 加節點Oracle
- 關於RAC共享儲存兩個節點磁碟裝置名稱不一致的問題