gipchaLowerProcessNode: no valid interfaces found to node

raysuen發表於2023-02-23
Oracle 
11.2.0.3
OS
AIX6

##問題描述
客戶描述,一個節點的叢集無法啟動。我透過遠端操作發現,一個節點的叢集狀態正常,另一個節點的叢集無法啟動,檢視日誌發現一下資訊:


##crsd.log報錯
2023-02-22 14:33:58.101: [GIPCHALO][2314] gipchaLowerSend: deffering startup of hdr 111870478 { len 232, seq 0, type gipchaHdrTypeSend (1), lastSeq 0, lastAc
k 0, minAck 0, flags 0x0, srcLuid 00000000-00000000, dstLuid 00000000-00000000, msgId 0 }, node 111899c50 { host 'udb01', haName '4cc9-a30f-f951-d124', srcLu
id 5863b4e3-af05b441, dstLuid 00000000-00000000 numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [0 : 0], createTime 4267810059, sentRegister 0, loc
alMonitor 0, flags 0x0 }
2023-02-22 14:33:58.101: [GIPCHALO][2314] gipchaLowerProcessNode: no valid interfaces found to node for 4267810060 ms, node 111899c50 { host 'udb01', haName
'4cc9-a30f-f951-d124', srcLuid 5863b4e3-af05b441, dstLuid 00000000-00000000 numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [0 : 0], createTime 426
7810059, sentRegister 0, localMonitor 0, flags 0x4 }
2023-02-22 14:34:03.110: [GIPCHALO][2314] gipchaLowerProcessNode: no valid interfaces found to node for 4267815068 ms, node 111899c50 { host 'udb01', haName
'4cc9-a30f-f951-d124', srcLuid 5863b4e3-af05b441, dstLuid 00000000-00000000 numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [5 : 5], createTime 426
7810059, sentRegister 1, localMonitor 0, flags 0x4 }
2023-02-22 14:34:08.113: [GIPCHALO][2314] gipchaLowerProcessNode: no valid interfaces found to node for 4267820072 ms, node 111899c50 { host 'udb01', haName
'4cc9-a30f-f951-d124', srcLuid 5863b4e3-af05b441, dstLuid 00000000-00000000 numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [10 : 10], createTime 4
267810059, sentRegister 1, localMonitor 0, flags 0x4 }
2023-02-22 14:34:13.119: [GIPCHALO][2314] gipchaLowerProcessNode: no valid interfaces found to node for 4267825077 ms, node 111899c50 { host 'udb01', haName
'4cc9-a30f-f951-d124', srcLuid 5863b4e3-af05b441, dstLuid 00000000-00000000 numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [15 : 15], createTime 4
267810059, sentRegister 1, localMonitor 0, flags 0x4 }
2023-02-22 14:34:18.123: [GIPCHALO][2314] gipchaLowerProcessNode: no valid interfaces found to node for 4267830081 ms, node 111899c50 { host 'udb01', haName
'4cc9-a30f-f951-d124', srcLuid 5863b4e3-af05b441, dstLuid 00000000-00000000 numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [20 : 20], createTime 4
267810059, sentRegister 1, localMonitor 0, flags 0x4 }
2023-02-22 14:34:23.130: [GIPCHALO][2314] gipchaLowerProcessNode: no valid interfaces found to node for 4267835088 ms, node 111899c50 { host 'udb01', haName
'4cc9-a30f-f951-d124', srcLuid 5863b4e3-af05b441, dstLuid 00000000-00000000 numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [25 : 25], createTime 4
267810059, sentRegister 1, localMonitor 0, flags 0x4 }
2023-02-22 14:34:26.408: [ CRSMAIN][515] Policy Engine is not initialized yet!
2023-02-22 14:34:28.103: [GIPCXCPT][3342] gipchaInternalResolve: failed to resolve ret gipcretKeyNotFound (36), host 'udb02', port '1fa6-abec-891e-9ad7', hct
x 110eb29f0 [0000000000000010] { gipchaContext : host 'udb02', name '11d9-f6f9-6ba2-657d', luid '5863b4e3-00000000', numNode 1, numInf 1, usrFlags 0x0, flags
0x1 }, ret gipcretKeyNotFound (36)
2023-02-22 14:34:28.103: [GIPCHGEN][3342] gipchaResolveF [gipcmodGipcResolve : gipcmodGipc.c : 804]: EXCEPTION[ ret gipcretKeyNotFound (36) ]  failed to reso
lve ctx 110eb29f0 [0000000000000010] { gipchaContext : host 'udb02', name '11d9-f6f9-6ba2-657d', luid '5863b4e3-00000000', numNode 1, numInf 1, usrFlags 0x0,
flags 0x1 }, host 'udb02', port '1fa6-abec-891e-9ad7', flags 0x0
2023-02-22 14:34:28.107: [GIPCHALO][2314] gipchaLowerSend: deffering startup of hdr 1118a9938 { len 232, seq 0, type gipchaHdrTypeSend (1), lastSeq 0, lastAc
k 0, minAck 0, flags 0x0, srcLuid 00000000-00000000, dstLuid 00000000-00000000, msgId 0 }, node 111899c50 { host 'udb01', haName '4cc9-a30f-f951-d124', srcLu
id 5863b4e3-af05b441, dstLuid 00000000-00000000 numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [30 : 30], createTime 4267810059, sentRegister 1, l
ocalMonitor 0, flags 0x4 }

##gipc.log
023-02-22 14:56:33.871: [GIPCDMON][1286] gipcdMonitorCssCheck: found node udb01
2023-02-22 14:56:33.871: [GIPCDMON][1286] gipcdMonitorCssCheck: updating timeout node udb01
2023-02-22 14:56:33.871: [GIPCDMON][1286] gipcdMonitorCssCheck: updating timeout node udb01
2023-02-22 14:56:33.872: [GIPCDMON][1286] gipcdMonitorCssCheck: found node udb02
2023-02-22 14:56:34.093: [ CLSINET][1286] Returning NETDATA: 1 interfaces
2023-02-22 14:56:34.093: [ CLSINET][1286] # 0 Interface 'en9',ip='10.0.1.4',mac='34-40-b5-f3-88-0c',mask='255.255.255.0',net='10.0.1.0',use='cluster_intercon
nect'
2023-02-22 14:56:35.349: [GIPCDCLT][772] gipcdClientThread: req from local client of type gipcdmsgtypeInterfaceMetrics, endp 000000000000012d
2023-02-22 14:56:35.349: [GIPCDCLT][772] gipcdClientInterfaceMetrics: Received type(gipcdmsgtypeInterfaceMetrics), endp(000000000000012d), len(1032), buf(111
634598), inf(ip: 10.0.1.4, mask: 255.255.255.0, subnet: 10.0.1.0, mac: , ifname: ) time(2), retry(0), stamp(5), send(5), recv(5)
2023-02-22 14:56:35.349: [GIPCDMON][1286] gipcdMonitorFailZombieNodes: skipping live node 'udb01', time 0 ms, endp 0000000000000000, 00000000000009fe
2023-02-22 14:56:35.349: [GIPCDMON][1286] gipcdMonitorFailZombieNodes: skipping live node 'udb01', time 0 ms, endp 0000000000000000, 0000000000000ad4
2023-02-22 14:56:35.480: [GIPCDCLT][772] gipcdClientThread: req from local client of type gipcdmsgtypeInterfaceMetrics, endp 000000000000035e
2023-02-22 14:56:35.480: [GIPCDCLT][772] gipcdClientInterfaceMetrics: Received type(gipcdmsgtypeInterfaceMetrics), endp(000000000000035e), len(1032), buf(111
634598), inf(ip: 10.0.1.4, mask: 255.255.255.0, subnet: 10.0.1.0, mac: , ifname: ) time(0), retry(0), stamp(0), send(0), recv(0)
2023-02-22 14:56:35.784: [GIPCDCLT][772] gipcdClientThread: req from local client of type gipcdmsgtypeInterfaceMetrics, endp 000000000000040d
2023-02-22 14:56:35.784: [GIPCDCLT][772] gipcdClientInterfaceMetrics: Received type(gipcdmsgtypeInterfaceMetrics), endp(000000000000040d), len(1032), buf(111
634598), inf(ip: 10.0.1.4, mask: 255.255.255.0, subnet: 10.0.1.0, mac: , ifname: ) time(18), retry(0), stamp(21), send(20), recv(21)
2023-02-22 14:56:36.552: [GIPCDCLT][772] gipcdClientThread: req from local client of type gipcdmsgtypeInterfaceMetrics, endp 0000000000000646
2023-02-22 14:56:36.552: [GIPCDCLT][772] gipcdClientInterfaceMetrics: Received type(gipcdmsgtypeInterfaceMetrics), endp(0000000000000646), len(1032), buf(111
634598), inf(ip: 10.0.1.4:9556, mask: 255.255.255.0, subnet: 10.0.1.0, mac: , ifname: ) time(0), retry(0), stamp(0), send(0), recv(0)
2023-02-22 14:56:38.871: [GIPCDCLT][772] gipcdClientThread: req from local client of type gipcdmsgtypeInterfaceMetrics, endp 000000000000096e
2023-02-22 14:56:38.871: [GIPCDCLT][772] gipcdClientInterfaceMetrics: Received type(gipcdmsgtypeInterfaceMetrics), endp(000000000000096e), len(1032), buf(111
634598), inf(ip: 10.0.1.4:9771, mask: 255.255.255.0, subnet: 10.0.1.0, mac: , ifname: ) time(0), retry(0), stamp(0), send(0), recv(0)


##分析
首先,gipc服務主要是管理私網網路卡等相關伺服器,那麼結合日誌資訊判斷問題大致方向為私有網路。
其次,根據MOS查詢發現類似的檔案<< Grid Infrastructure Upgrade/Install CRSD fails to start on the 2nd node with "gipchaLowerProcessNode: no valid interfaces found to node" in crsd.log (Doc ID 1280234.1)>>裡面有提到 udp_sendspace引數設定。
經過對比2個節點的資訊發現相關引數不符合oracle官方檔案推介的引數設定

##處理步驟
#1 修改引數
/usr/sbin/no -po tcp_recvspace=65536
/usr/sbin/no -po tcp_sendspace=65536
/usr/sbin/no -po udp_recvspace=655360
/usr/sbin/no -po udp_sendspace=65536
 #2 重啟網路服務
/etc/rc.tcpip 或者 refresh -s inetd

#3 啟動crs
#root使用者
./crsctl start crs
#crs和叢集服務正常啟動,問題解決


來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/28572479/viewspace-2936548/,如需轉載,請註明出處,否則將追究法律責任。

相關文章