Oracle Haip無法啟動問題學習

綠茶有點甜發表於2021-09-25

原文網址 : https://www.cnblogs.com/lvcha001/p/15327558.html

OracleAI

一、目標：Oracle Haip 啟動報錯

需求：日常運維過程中，已經遇到兩次由於HAIP引發的問題，特此進行記錄。

本次問題是看著大佬-李海清操作,整完了記錄一下，上一次HAIP折騰了4個小時。

Oracle Haip是給Oracle 提供私網的高可用，存在多個私網網路卡的情況下，會有多個HAIP，這樣某個私網網路卡down,並不影響整個RAC叢集的通訊。

二、HAIP啟動失敗排查思路

可能性很多，那麼怎麼排查定位問題。

參考

https://blog.csdn.net/m0_38048955/article/details/115345414
https://www.cnblogs.com/jyzhao/p/7686243.html

1）心跳網路卡異常

2) 多播工作機制異常

3）防火牆等原因

4） Oracle bug

問題時間出現在初始化安裝、日常維護，這樣容易找到差異點。

三、HAIP案例

3.1 網路策略導致HAIP 無法通訊

這個之前發過部落格

https://www.cnblogs.com/lvcha001/p/12155042.html

小結一下：
1.安裝11g R2 RAC時，節點1安裝GI沒問題，HAIP可以啟動；
2.安裝節點2GI時，HAIP啟動失敗；
3.通過上述情況基本可以定位為兩個節點的HAIP無法通訊
4.通過ping 判斷節點1->節點2 haip網路不通；
5.客戶網路工程師確認無法開通HAIP


原文連結：https://blog.csdn.net/evils798/article/details/27248263

操作，為私網配置閘道器，重啟網路服務，重新安裝(建議可以嘗試上述連結，指定IP的網路卡路由)

    重新配置私網閘道器地址之後，重新安裝RAC ，再次執行root.sh 報錯，本次169.254 的Oracle haip地址使用的是eth1的私網網路卡地址，但是發現

 節點1無法ping通節點2的haip地址。

ASM on Non-First Node (Second or Others) Fails to Start: PMON (ospid: nnnn): terminating the instance due to error 481 (Doc ID 1383737.1)

並且，grid$sqlplus / as sysasm

SQL>startup 報錯如上述，還是私網HAIP不通的問題。

Case5. HAIP is up on all nodes and route info is presented but HAIP is not pingable
Symptom:
HAIP is presented on both nodes and route information is also presented, but both nodes can not ping or traceroute against the other node HAIP.

······

Solution: 

For Openstack Cloud implementation, engage system admin to create another neutron port to map link-local traffic. For other environment, 
engage SysAdmin/NetworkAdmin to review routing/network setup.

解決方案是讓網路工程師調整，但是雲廠商很難具體開通HAIP之間的連線。

本次選擇禁用HAIP 服務，達到雲環境安裝目的。

3.2 AIX環境更換私網網路卡後bpf device裝置未更新導致Haip服務啟動失敗

問題表象，RAC 11.2.0.4 正常使用,增加新的私網網路卡後，刪除原來的私網網路卡後，重啟叢集CRS. haip 節點1啟動失敗，此時節點2叢集出於關閉狀態，不存在網路互動的問題。
手工啟動haip服務

crsctl start res ora.cluster_interconnect.haip -init

啟動失敗，觀察日誌，匹配如下MOS bug ，oracle bug建議使用root 執行命令進行重新整理！重新整理後，問題解決。


已知問題：Grid Infrastructure Redundant Interconnect 和 ora.cluster_interconnect.haip（文件 ID 1640865.1）    
2012-04-21 12:36:43.951: [ USRTHRD][2572] {0:0:2} (null) category: -2, operation: SETIF, loc: bpfopen:21,o, OS error: 6, other: dev /dev/bpf0, ifr en2  
Solution/Workaround:
It's known on AIX and Solaris that command executed via sudo etc may not have full root environment, which could cause HAIP startup failure.
The solution is to obtain and apply patch 16445624 on AIX.
The workaround is to execute root script (root.sh or rootupgrade.sh) as real root user directly.
If root script already failed, try one or all of the following:
 - reboot the node
 - execute "/usr/sbin/tcpdump -D" as root user, if the timestamp of the bpf device didn't get updated, delete the device and re-run the same "tcpdump -D" command
Before re-running root script, verify whether the following exists and the timestamp is updated
ls -ltr /dev/bpf*
cr--------   1 root     system       42,  0 Oct 03 10:32 /dev/bpf0
..

Grid Infrastructure Redundant Interconnect 和 ora.cluster_interconnect.haip（文件 ID 1210883.1）    
https://blog.csdn.net/m0_38048955/article/details/115345414
對於心跳網路卡異常，如果只有一塊心跳網路卡，那麼ping其他的ip就可以進行驗證了，這一點很容易排除。
對於多播的問題，可以通過Oracle提供的mcasttest.pl指令碼進行檢測(請參考Grid Infrastructure Startup During Patching, Install or Upgrade May Fail 
Due to Multicasting Requirement (ID 1212703.1)

Oracle日常問題-資料庫無法啟動(案例二)
2020-02-16
Oracle資料庫
Oracle日常問題處理-資料庫無法啟動
2020-02-14
Oracle資料庫
解決ASM無法啟動問題
2019-07-20
ASM
記錄Android學習-遇到的第一個問題，AS自帶AVD無法啟動
2019-04-10
Android
qt6 QtOpcUa無法正常啟動問題
2024-05-27
QT
【ASM】ASM啟動無法找到spfile問題原因
2021-09-29
ASM
Oracle 11g RAC之HAIP相關問題總結
2020-06-07
OracleAI
解決vscode安裝後無法啟動的問題
2018-12-12
VSCode
mysql因為事務日誌問題無法啟動
2020-06-15
MySql
Windows Defender無法開啟的問題
2024-08-28
Windows
oracle兩節點RAC，由於gipc導致某節點crs無法啟動問題分析
2019-01-14
Oracle
VMware DHCP Service服務無法啟動問題的解決
2019-02-26
關於Oath2.0Startup類無法啟動的問題
2019-04-18
寶塔皮膚mysql無法啟動問題如何解決
2024-10-13
MySql
記一次 Ubuntu 服務 Nginx 無法啟動問題
2021-05-07
UbuntuNginx
MAC電腦出現問題，無法正常啟動怎麼辦？
2018-03-17
Mac
LightningChart部署到Windows11某些電腦，無法啟動問題
2024-10-29
GCWindows
記一次 Ubuntu 伺服器 Nginx 無法啟動問題
2021-05-07
Ubuntu伺服器Nginx
VMware無法啟動/VMware和wsl衝突問題/VMware與Hyper-V衝突問題
2024-11-19
華納雲：如何解決hadoop叢集無法啟動的問題？
2024-01-10
Hadoop
機器學習無法解決自然語言理解問題 - thegradient
2021-08-10
機器學習
Oracle 12.2應用PSU後資料庫無法啟動
2020-05-30
Oracle資料庫
應用使用JNDI，資料庫無法連線，導致的程序無法啟動問題處理
2024-03-18
資料庫
記錄一次手動升級達夢後DmAPService無法啟動問題
2024-09-26
Oracle資料庫啟動問題彙總(一)
2021-04-09
Oracle資料庫
修改daemon.json重新載入後docker無法啟動問題記錄
2020-11-19
JSONDocker
關於XAMPP中Apache和Mysql因埠占用無法啟動的問題
2020-12-14
ApacheMySql
Oracle 12c RAC CSSD程式無法啟動real time模式
2022-11-28
OracleCSS模式
Oracle 解決like中無法匹配下劃線的問題
2024-04-02
Oracle
【Python】jupyter notebook啟動後網頁無法訪問
2019-05-24
Python網頁
@FeignClient @Resource 無法注入Bean Springboot無法啟動
2024-03-13
clientBeanSpring Boot
安裝oracle11g碰到“無法訪問臨時位置”的問題
2019-03-15
Oracle
懷疑私網網路卡多播問題導致crs無法正常啟動
2018-10-17
解決Linux無法開啟android模擬器問題
2018-07-22
LinuxAndroid
EXECL無法開啟，問題定位資料，識別不了
2020-08-06
ORACLE RAC 11.2.0.4 FOR RHEL6叢集無法啟動的處理
2018-12-06
Oracle
一次詭異的Oracle使用者無法su問題
2019-01-18
Oracle
Manjaro下Steam無法啟動
2018-03-08
JAR