multicast導致節點無法加入叢集

liaoxiaomao發表於2011-11-29

1.CRS故障描述
只有一個節點可以起CRS,其他3個節點都無法啟動,cssd.log一下錯誤:
2011-11-17 21:14:15.799: [ CSSD][2833]clssnmvDHBValidateNCopy: node 1, raca1, has a disk HB, but no network HB, DHB has rcfg 212474945, wrtcnt, 4392389, LATS 56951623, lastSeqNo 4392388, uniqueness 1321584882, timestamp 1321586055/1228391238
2011-11-17 21:14:15.799: [ CSSD][2833]clssnmvDHBValidateNCopy: node 2, raca2, has a disk HB, but no network HB, DHB has rcfg 212474945, wrtcnt, 4351379, LATS 56951623, lastSeqNo 4351378, uniqueness 1321585233, timestamp 1321586055/1228149573
2011-11-17 21:14:15.799: [ CSSD][2833]clssnmvDHBValidateNCopy: node 4, racb2, has a disk HB, but no network HB, DHB has rcfg 212474928, wrtcnt, 4325534, LATS 56951623, lastSeqNo 4325533, uniqueness 1321585745, timestamp 1321586055/3135065466
2011-11-17 21:14:16.322: [ CSSD][3604]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
2011-11-17 21:14:16.811: [ CSSD][2833]clssnmvDHBValidateNCopy: node 1, raca1, has a disk HB, but no network HB, DHB has rcfg 212474945, wrtcnt, 4392390, LATS 56952634, lastSeqNo 4392389, uniqueness 1321584882, timestamp 1321586056/1228392250
2011-11-17 21:14:16.811: [ CSSD][2833]clssnmvDHBValidateNCopy: node 2, raca2, has a disk HB, but no network HB, DHB has rcfg 212474945, wrtcnt, 4351380, LATS 56952634, lastSeqNo 4351379, uniqueness 1321585233, timestamp 1321586056/1228150582
2011-11-17 21:14:16.811: [ CSSD][2833]clssnmvDHBValidateNCopy: node 4, racb2, has a disk HB, but no network HB, DHB has rcfg 212474928, wrtcnt, 4325535, LATS 56952634, lastSeqNo 4325534, uniqueness 1321585745, timestamp 1321586056/3135066472

"has a disk HB, but no network HB" 沒有網路心跳

2.根據Notes 1212703.1 解釋,11.2開始GRID開始支援冗餘的interconnect,為了支援冗餘的insterconnect,
multicasting必須enabled,否則節點無法加入cluster!
Oracle Grid Infrastructure 11.2.0.2 introduces a new feature called "Redundant Interconnect Usage", which provides an Oracle internal mechanism to make use of physically redundant network interfaces for the Oracle (private) interconnect
As part of this new feature, multicast based communication on the private interconnect is utilized to establish communication with peers in the cluster on each startup of the stack on a node
Multicasting on either of these IPs and the respective port must, however, be enabled and functioning across the network and on each node meant to be part of the cluster.
If multicasting is not enabled as required, nodes will fail to join the cluster with the symptoms discussed.


multicasting測試:
perl mcasttest.pl -n raca1,raca2,racb1,racb2 -i en18
########### Setup for node raca1 ##########
Checking node access 'raca1'
Checking node login 'raca1'
Checking/Creating Directory /tmp/mcasttest for binary on node 'raca1'
Distributing mcast2 binary to node 'raca1'
########### Setup for node raca2 ##########
Checking node access 'raca2'
Checking node login 'raca2'
Checking/Creating Directory /tmp/mcasttest for binary on node 'raca2'
Distributing mcast2 binary to node 'raca2'
########### Setup for node racb1 ##########
Checking node access 'racb1'
Checking node login 'racb1'
Checking/Creating Directory /tmp/mcasttest for binary on node 'racb1'
Distributing mcast2 binary to node 'racb1'
########### Setup for node racb2 ##########
Checking node access 'racb2'
Checking node login 'racb2'
Checking/Creating Directory /tmp/mcasttest for binary on node 'racb2'
Distributing mcast2 binary to node 'racb2'
########### testing Multicast on all nodes ##########

Test for Multicast address 230.0.1.0

Nov 18 15:48:23 | Multicast Failed for en18 using address 230.0.1.0:42000

Test for Multicast address 224.0.0.251

Nov 18 15:48:25 | Multicast Succeeded for en18 using address 224.0.0.251:42001

測試結果:
Multicast在 230.0.1.0地址上沒有enable,所以節點無法加入cluster

the test has failed for the 230.0.1.0 address, but succeeded for the 224.0.0.251 multicast address. In this case, Patch: 9974223 must be applied to enable Oracle Grid Infrastructure to use the 224.0.0.251 multicast address.
Should the mcasttest.pl test-tool have failed for both, the 230.0.1.0 address only

解決方案:

必須要打補丁9974223,11.2.0.2 Grid Infrastructure PSU1以後已經包含了9974223,所以只要只PSU即可

Apply Patch: 9974223 or any subsequent GI Bundle (or PSU) Patch including Patch: 9974223
Patch: 9974223 is included in Oracle Grid Infrastructure Bundle Patch 1 and later and it is recommended to apply Patch: 9974223 via a Bundle Patch rather than applying it individually

打完PSU3 12419353補丁後節點可以正常加入cluster

[@more@]

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/90901/viewspace-1056602/,如需轉載,請註明出處,否則將追究法律責任。

相關文章