HAIP&linked-local ip&RFC-3927&IPv4 169.254xxlinked local

yyp2009發表於2017-10-27

APPLIES TO:

Oracle Database - Enterprise Edition - Version 11.2.0.2 and later
Information in this document applies to any platform.
Covers release 11.2

PURPOSE

The purpose of this note is to supplement the HAIP information available in Grid Infrastructure Redundant Interconnect and ora.cluster_interconnect.haip Note 1210883.1

 

 

QUESTIONS AND ANSWERS

1. What does HAIP stand for?

Highly Available IP (address)

2. What clusterware version was HAIP introduced in?

Oracle Grid Infrastructure 11.2.0.2

3. What is the purpose of HAIP?  

Haip has two primary functions:

HAIP allows for redundant cluster interconnect NICs on cluster nodes without requiring any OS level bonding, teaming or aggregating of the NICs.
HAIP provides a layer of transparency between multiple Network Interface Cards (NICs) on a given node that are used for the cluster_interconnect which is used by RDBMS and ASM instances. However HAIP is not used for the Clusterware network communication.  The Clusterware utilizes a feature called "Redundant Interconnect" for communication which essentially makes use of all available cluster_interconnect paths for communication.

4. How does HAIP create a layer of transparency between the cluster interconnect NICs on a given node and the RDBMS/ASM instances?

When clusterware on a given node is started, a virtual ip address (HAIP)  is dynamically allocated and linked (link-local) to each NIC configured for use as a cluster interconnect.  Each NIC defined as an interconnect (using oifcfg setif) will be assigned a unique HAIP during clusterware startup.  The HAIP from each interconnect NIC is then used by the the ASM and RDBMS instances and ACFS.  The static ip address associated with the physical NIC is not used by the ASM and RDBMS instances. The NIC name is not used by the ASM or RDBMS instance but is used by other clusterware components.

Each NIC defined as a cluster interconnect (using oifcfg setif) on a given node will have a static ip address assigned to it and each cluster interconnect NIC on a given node must be on a unique subnet. Clusterware requires that each subnet used by the cluster interconnect have a functioning NIC on each active cluster node. If any one of the cluster interconnect NICs is down on a node, then the subnet associated with the down NIC is considered not usable by any node of the cluster.

5. How does HAIP obtain an IP address for use in the virtual link?

HAIP takes advantage of linked-local ip addresses as defined in the Internet Engineering Task Force RFC-3927 .  This RFC reserves the IPv4 prefix 169.254/16 specifically for linked local use.

6. How can I check if the HAIP is in use?

Use the OS command "ifconfig -a". Look in the ASM and RDBMS instance alert logs. During instance startup the interfaces and associated HAIP will be shown.

Alert log example showing two NICs
Starting ORACLE instance (normal)
...
Private Interface 'eth1:1' configured from GPnP for use as a private interconnect.
  [name='eth1:1', type=1, ip=169.254.89.182, mac=08-00-27-06-a4-37, net=169.254.0.0/17, mask=255.255.128.0, use=haip:cluster_interconnect/62]
Private Interface 'eth1:2' configured from GPnP for use as a private interconnect.
  [name='eth1:2', type=1, ip=169.254.224.204, mac=08-00-27-06-a4-37, net=169.254.128.0/17, mask=255.255.128.0, use=haip:cluster_interconnect/62]
...
 Cluster communication is configured to use the following interface(s) for this instance
  169.254.89.182
  169.254.224.204

7. Which Oracle clusterware log shows the failover or failback of a HAIP?

The ohasd orarootagent_root logs under <GRID_HOME>/log/<node>/agent/ohasd/orarootagent_root/orarootagent_root.log.

8. What happens to the HAIP if a physical NIC fails or is disabled?

The HAIP associated with the failed NIC will failover and be linked-local to one of the remaining working physical NICs designated for the cluster interconnect. So the HAIP will now be on a different subnet, one that is working on all nodes.
An ARP broadcast will be made using the new MAC address for the NIC that the HAIP has attached to.  Since the ASM and RDBMS instances are using the HAIP, they transparently continue to work using the NIC that HAIP failed over to. An important point to note is that that HAIP will perform the same failover on EVERY active cluster node. The reason this is done is to ensure that the HAIP is attached to an NIC on the same subnet across all nodes. 

9. What happens to the HAIP if the failed or disabled physical NIC is brought back up?

The HAIP will failback to the physical NIC it was linked to before the failover. The HAIP will be unlinked from the redundant NIC it was temporarily on, and locally-linked to the interconnect NIC that was brought back online. As with a failover this also will happen on all nodes of the cluster to ensure the HAIP is on the same subnet on all nodes. An ARP broadcast is made to publish the changed MAC address of the NIC that each HAIP that failed back to.

10. Does HAIP require NIC bonding or teaming?

No. HAIP allows you to provide redundant NICS for the cluster interconnect without using OS level NIC bonding or teaming or aggregation. Each physical interface is presented to the clusterware and registered with the clusterware via the Oracle clusterware command "oifcfg setif".

11. Can OS bonding or teaming still be used with HAIP?

Yes, however Oracle's best practice is to use unbonded, unteamed NICs, without any additional layers of virtualization.

12 . Is the HAIP active even if there is only one NIC per node for the cluster interconnect?

Yes.  This cannot be disabled manually (with the exception on Solaris where, when IPMP is in use, HAIP will not be enabled during installation).

13. Is there any limit to how many redundant interconnect NICs that can be used by clusterware?

Yes. Clusterware will use up to four NICs, even if more are available.

14. With HAIP, are all the interconnect NICs active concurrently?

The default is for all interconnect NICs to be active.  The Oracle best practice to have all NICs active.

15. Can I manually failover or failback HAIP?

No. There is no command that allows you to directly control failover and failback of HAIP.  Clusterware does this automatically.

16. Is HAIP supported with pre-11.2.0.2 databases?

The default behaviour for pre-11.2.0.2 databases is to use the fixed IP address associated with NICs and not use the HAIP.

The recommendation is to use OS level NIC bonding/teaming for pre 11.2.0.2 databases.

17. What is the recommendation regarding using OS level NIC virtualization with HAIP?

Oracle recommends using just HAIP, with no additional OS level NIC virtualization.

18. Can HAIP feature be disabled manually?

No, It's NOT supported to disable HAIP while the cluster is up and running. There are certain situations where HAIP feature is disabled during installation. Please refer to Note 1210883.1 Section Miscellaneous. For IPMP implementation on Solaris case, cluster_interconnects is required to set to IPMP IP. For other platforms, OS bonding technology should be used if more than 1 private network is used.

19. Can RDS be used with HAIP?

Yes with restrictions.  See  Doc ID 745616.1

REFERENCES

NOTE:1054902.1 - How to Validate Network and Name Resolution Setup for the Clusterware and RAC
NOTE:1210883.1 - Grid Infrastructure Redundant Interconnect and ora.cluster_interconnect.haip
Oracle? Clusterware Administration and Deployment Guide 11g Release 2 (11.2) E41959-03 
NOTE:745616.1 - Oracle Reliable Datagram Sockets (RDS) and InfiniBand (IB) Support for RAC Interconnect and Exadata Storage

#####################################################################################

APPLIES TO:

Oracle Database - Enterprise Edition - Version 11.2.0.2 and later
Information in this document applies to any platform.

PURPOSE

This document is intended to explain what is ora.cluster_interconnect.haip resource in 11gR2 Grid Infrastructure.

DETAILS

Redundant Interconnect without any 3rd-party IP failover technology (bond, IPMP or similar) is supported natively by Grid Infrastructure starting from 11.2.0.2.  Multiple private network adapters can be defined either during the installation phase or afterward using the oifcfg.  Oracle Database, CSS, OCR, CRS, CTSS, and EVM components in 11.2.0.2 employ it automatically.

Grid Infrastructure can activate a maximum of four private network adapters at a time even if more are defined. The ora.cluster_interconnect.haip resource will start one to four link local  HAIP on private network adapters for interconnect communication for Oracle RAC, Oracle ASM, and Oracle ACFS etc. 

Grid automatically picks free link local addresses from reserved 169.254.*.* subnet for HAIP. According to RFC-3927, link local subnet 169.254.*.* should not be used for any other purpose. With HAIP, by default, interconnect traffic will be load balanced across all active interconnect interfaces, and corresponding HAIP address will be failed over transparently to other adapters if one fails or becomes non-communicative. .

After GI is configured, more private network interface can be added with "<GRID_HOME>/bin/oifcfg setif" command. The number of HAIP addresses is decided by how many private network adapters are active when Grid comes up on the first node in the cluster .  If there's only one active private network, Grid will create one; if two, Grid will create two; and if more than two, Grid will create four HAIPs. The number of HAIPs won't change even if more private network adapters are activated later, a restart of clusterware on all nodes is required for the number to change, however, the newly activated adapters can be used for fail over purpose. 


 

NOTE: If using the 11.2.0.2 (and above) Redundant Interconnect/HAIP feature (as documented in CASE 2 below) - At present it is REQUIRED that all interconnect interfaces be placed on separate subnets.  If the interfaces are all on the same subnet and the cable is pulled from the first NIC in the routing table a rebootless-restart or node reboot will occur. 

 

At the time of this writing, redundant private network requires different subnet for each network adapter, for example, if eth1, eth2 and eth3 are used for private network, each should be on different subnet, Refer to Case 2.

When Oracle Clusterware is fully up, resource haip should show status of ONLINE:

$GRID_HOME/bin/crsctl stat res -t -init
..
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       racnode1

Case 1: Single Private Network Adapter

If multiple physical network adapters are bonded together at the OS level and presented as a single device name, for example bond0, it's still considered a single network adapter environment. Single private network adapter does not offer true HAIP, as there's only one adapter, at least two is recommended to gain true HAIP. If only one private network adapter is defined, such as eth1 in the example below, one virtual IP will be created by HAIP. Here is what's expected when Grid is up and running:

$GRID_HOME/bin/oifcfg getif
eth1  10.1.0.128  global  cluster_interconnect
eth3  10.1.0.0  global  public

$GRID_HOME/bin/oifcfg iflist -p -n
eth1  10.1.0.128  PRIVATE  255.255.255.128
eth1  169.254.0.0  UNKNOWN  255.255.0.0
eth3  10.1.0.0  PRIVATE  255.255.255.128

Note: 1. subnet 169.254.0.0 on eth1 is started by resource haip; 2. refer to note 1386709.1 for explanation of the output

ifconfig
..
eth1      Link encap:Ethernet  HWaddr 00:16:3E:11:11:22
          inet addr:10.1.0.168  Bcast:10.1.0.255  Mask:255.255.255.128
          inet6 addr: fe80::216:3eff:fe11:1122/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:6369306 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4270790 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:3037449975 (2.8 GiB)  TX bytes:2705797005 (2.5 GiB)

eth1:1    Link encap:Ethernet  HWaddr 00:16:3E:11:22:22
          inet addr:169.254.167.163  Bcast:169.254.255.255  Mask:255.255.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
Instance alert.log (ASM and database):

Private Interface 'eth1:1' configured from GPnP for use as a private interconnect.
  [name='eth1:1', type=1, ip=169.254.167.163, mac=00-16-3e-11-11-22, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]
Public Interface 'eth3' configured from GPnP for use as a public interface.
  [name='eth3', type=1, ip=10.1.0.68, mac=00-16-3e-11-11-44, net=10.1.0.0/25, mask=255.255.255.128, use=public/1]
..
Shared memory segment for instance monitoring created
Picked latch-free SCN scheme 3
..
Cluster communication is configured to use the following interface(s) for this instance
  169.254.167.163

Note: interconnect will use virtual private IP 169.254.167.163 instead of real private IP. For pre-11.2.0.2 instance, by default it will still use the real private IP; 
to take advantage of the new feature, init.ora parameter cluster_interconnects can be updated each time Grid is restarted .


For 11.2.0.2 and above, v$cluster_interconnects will show haip info:

SQL> select name,ip_address from v$cluster_interconnects;

NAME            IP_ADDRESS
--------------- ----------------
eth1:1          169.254.167.163

Case 2: Multiple Private Network Adapters

Multiple switches can be deployed if there's more than one private network adapters on each node, in case one network adapter fails, the HAIP on that network segment will be failed over to others on all nodes.

2.1. Default Status

Here is an example of 3 private networks eth1eth6 and eth7 when Grid is up and running:

$GRID_HOME/bin/oifcfg getif
eth1 10.1.0.128   global  cluster_interconnect
eth3 10.1.0.0  global  public
eth6 10.11.0.128  global  cluster_interconnect
eth7 10.12.0.128  global  cluster_interconnect

$GRID_HOME/bin/oifcfg iflist -p -n
eth1  10.1.0.128  PRIVATE  255.255.255.128
eth1  169.254.0.0  UNKNOWN  255.255.192.0
eth1  169.254.192.0  UNKNOWN  255.255.192.0
eth3  10.1.0.0  PRIVATE  255.255.255.128
eth6  10.11.0.128  PRIVATE  255.255.255.128
eth6  169.254.64.0  UNKNOWN  255.255.192.0
eth7  10.12.0.128  PRIVATE  255.255.255.128
eth7  169.254.128.0  UNKNOWN  255.255.192.0

Note: resource haip started four virtual private IPs, two on eth1, and one on eth6 and eth7

ifconfig
..
eth1      Link encap:Ethernet  HWaddr 00:16:3E:11:11:22
          inet addr:10.1.0.168  Bcast:10.1.0.255  Mask:255.255.255.128
          inet6 addr: fe80::216:3eff:fe11:1122/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:15176906 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10239298 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:7929246238 (7.3 GiB)  TX bytes:5768511630 (5.3 GiB)

eth1:1    Link encap:Ethernet  HWaddr 00:16:3E:11:11:22
          inet addr:169.254.30.98  Bcast:169.254.63.255  Mask:255.255.192.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth1:2    Link encap:Ethernet  HWaddr 00:16:3E:11:11:22
          inet addr:169.254.244.103  Bcast:169.254.255.255  Mask:255.255.192.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth6      Link encap:Ethernet  HWaddr 00:16:3E:11:11:77
          inet addr:10.11.0.188  Bcast:10.11.0.255  Mask:255.255.255.128
          inet6 addr: fe80::216:3eff:fe11:1177/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:7068185 errors:0 dropped:0 overruns:0 frame:0
          TX packets:595746 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2692567483 (2.5 GiB)  TX bytes:382357191 (364.6 MiB)

eth6:1    Link encap:Ethernet  HWaddr 00:16:3E:11:11:77
          inet addr:169.254.112.250  Bcast:169.254.127.255  Mask:255.255.192.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth7      Link encap:Ethernet  HWaddr 00:16:3E:11:11:88
          inet addr:10.12.0.208  Bcast:10.12.0.255  Mask:255.255.255.128
          inet6 addr: fe80::216:3eff:fe11:1188/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:6435829 errors:0 dropped:0 overruns:0 frame:0
          TX packets:314780 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2024577502 (1.8 GiB)  TX bytes:172461585 (164.4 MiB)

eth7:1    Link encap:Ethernet  HWaddr 00:16:3E:11:11:88
          inet addr:169.254.178.237  Bcast:169.254.191.255  Mask:255.255.192.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

Instance alert.log (ASM and database):

Private Interface 'eth1:1'
 configured from GPnP for use as a private interconnect.
  [name='eth1:1', type=1, ip=169.254.30.98, mac=00-16-3e-11-11-22, net=169.254.0.0/18, mask=255.255.192.0, use=haip:cluster_interconnect/62]
Private Interface 'eth6:1' configured from GPnP for use as a private interconnect.
  [name='eth6:1', type=1, ip=169.254.112.250, mac=00-16-3e-11-11-77, net=169.254.64.0/18, mask=255.255.192.0, use=haip:cluster_interconnect/62]
Private Interface 'eth7:1' configured from GPnP for use as a private interconnect.
  [name='eth7:1', type=1, ip=169.254.178.237, mac=00-16-3e-11-11-88, net=169.254.128.0/18, mask=255.255.192.0, use=haip:cluster_interconnect/62]
Private Interface 'eth1:2' configured from GPnP for use as a private interconnect.
  [name='eth1:2', type=1, ip=169.254.244.103, mac=00-16-3e-11-11-22, net=169.254.192.0/18, mask=255.255.192.0, use=haip:cluster_interconnect/62]
Public Interface 'eth3' configured from GPnP for use as a public interface.
  [name='eth3', type=1, ip=10.1.0.68, mac=00-16-3e-11-11-44, net=10.1.0.0/25, mask=255.255.255.128, use=public/1]
Picked latch-free SCN scheme 3

..
Cluster communication is configured to use the following interface(s) for this instance
  169.254.30.98
  169.254.112.250
  169.254.178.237
  169.254.244.103

Note: interconnect communication will use all four virtual private IPs; in case of network failure, as long as there is one private network adapter functioning, all four IPs will remain active.

2.2. When Private Network Adapter Fails

If one private network adapter fails, in this example eth6, virtual private IP on eth6 will be relocated automatically to a healthy adapter, and it is transparent to instances (ASM or database)

$GRID_HOME/bin/oifcfg iflist -p -n
eth1  10.1.0.128  PRIVATE  255.255.255.128
eth1  169.254.0.0  UNKNOWN  255.255.192.0
eth1  169.254.128.0  UNKNOWN  255.255.192.0
eth7  10.12.0.128  PRIVATE  255.255.255.128
eth7  169.254.64.0  UNKNOWN  255.255.192.0
eth7  169.254.192.0  UNKNOWN  255.255.192.0

Note: virtual private IP on eth6 subnet 169.254.64.0 relocated to eth7

ifconfig
..
eth1      Link encap:Ethernet  HWaddr 00:16:3E:11:11:22
          inet addr:10.1.0.168  Bcast:10.1.0.255  Mask:255.255.255.128
          inet6 addr: fe80::216:3eff:fe11:1122/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:15183840 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10245071 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:7934311823 (7.3 GiB)  TX bytes:5771878414 (5.3 GiB)

eth1:1    Link encap:Ethernet  HWaddr 00:16:3E:11:11:22
          inet addr:169.254.30.98  Bcast:169.254.63.255  Mask:255.255.192.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth1:3    Link encap:Ethernet  HWaddr 00:16:3E:11:11:22
          inet addr:169.254.178.237  Bcast:169.254.191.255  Mask:255.255.192.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth7      Link encap:Ethernet  HWaddr 00:16:3E:11:11:88
          inet addr:10.12.0.208  Bcast:10.12.0.255  Mask:255.255.255.128
          inet6 addr: fe80::216:3eff:fe11:1188/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:6438985 errors:0 dropped:0 overruns:0 frame:0
          TX packets:315877 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2026266447 (1.8 GiB)  TX bytes:173101641 (165.0 MiB)

eth7:2    Link encap:Ethernet  HWaddr 00:16:3E:11:11:88
          inet addr:169.254.112.250  Bcast:169.254.127.255  Mask:255.255.192.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth7:3    Link encap:Ethernet  HWaddr 00:16:3E:11:11:88
          inet addr:169.254.244.103  Bcast:169.254.255.255  Mask:255.255.192.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

2.3. When Another Private Network Adapter Fails

If another private network adapter is down, in this example eth1, virtual private IP on it will be relocated automatically to other healthy adapter with no impact on instances (ASM or database)

$GRID_HOME/bin/oifcfg iflist -p -n
eth7  10.12.0.128  PRIVATE  255.255.255.128
eth7  169.254.64.0  UNKNOWN  255.255.192.0
eth7  169.254.192.0  UNKNOWN  255.255.192.0
eth7  169.254.0.0  UNKNOWN  255.255.192.0
eth7  169.254.128.0  UNKNOWN  255.255.192.0

ifconfig
..
eth7      Link encap:Ethernet  HWaddr 00:16:3E:11:11:88
          inet addr:10.12.0.208  Bcast:10.12.0.255  Mask:255.255.255.128
          inet6 addr: fe80::216:3eff:fe11:1188/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:6441559 errors:0 dropped:0 overruns:0 frame:0
          TX packets:317271 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2027824788 (1.8 GiB)  TX bytes:173810658 (165.7 MiB)

eth7:1    Link encap:Ethernet  HWaddr 00:16:3E:11:11:88
          inet addr:169.254.30.98  Bcast:169.254.63.255  Mask:255.255.192.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth7:2    Link encap:Ethernet  HWaddr 00:16:3E:11:11:88
          inet addr:169.254.112.250  Bcast:169.254.127.255  Mask:255.255.192.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth7:3    Link encap:Ethernet  HWaddr 00:16:3E:11:11:88
          inet addr:169.254.244.103  Bcast:169.254.255.255  Mask:255.255.192.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth7:4    Link encap:Ethernet  HWaddr 00:16:3E:11:11:88
          inet addr:169.254.178.237  Bcast:169.254.191.255  Mask:255.255.192.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

2.4. When Private Network Adapter Restores

If private network adapter eth6 is restored, it will be activated automatically as virtual private IPs will be assigned to it:

$GRID_HOME/bin/oifcfg iflist -p -n
..
eth6  10.11.0.128  PRIVATE  255.255.255.128
eth6  169.254.128.0  UNKNOWN  255.255.192.0
eth6  169.254.0.0  UNKNOWN  255.255.192.0
eth7  10.12.0.128  PRIVATE  255.255.255.128
eth7  169.254.64.0  UNKNOWN  255.255.192.0
eth7  169.254.192.0  UNKNOWN  255.255.192.0

ifconfig 
..
eth6      Link encap:Ethernet  HWaddr 00:16:3E:11:11:77
          inet addr:10.11.0.188  Bcast:10.11.0.255  Mask:255.255.255.128
          inet6 addr: fe80::216:3eff:fe11:1177/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:398 errors:0 dropped:0 overruns:0 frame:0
          TX packets:121 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:185138 (180.7 KiB)  TX bytes:56439 (55.1 KiB)

eth6:1    Link encap:Ethernet  HWaddr 00:16:3E:11:11:77
          inet addr:169.254.178.237  Bcast:169.254.191.255  Mask:255.255.192.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth6:2    Link encap:Ethernet  HWaddr 00:16:3E:11:11:77
          inet addr:169.254.30.98  Bcast:169.254.63.255  Mask:255.255.192.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth7      Link encap:Ethernet  HWaddr 00:16:3E:11:11:88
          inet addr:10.12.0.208  Bcast:10.12.0.255  Mask:255.255.255.128
          inet6 addr: fe80::216:3eff:fe11:1188/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:6442552 errors:0 dropped:0 overruns:0 frame:0
          TX packets:317983 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2028404133 (1.8 GiB)  TX bytes:174103017 (166.0 MiB)

eth7:2    Link encap:Ethernet  HWaddr 00:16:3E:11:11:88
          inet addr:169.254.112.250  Bcast:169.254.127.255  Mask:255.255.192.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth7:3    Link encap:Ethernet  HWaddr 00:16:3E:11:11:88
          inet addr:169.254.244.103  Bcast:169.254.255.255  Mask:255.255.192.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

Miscellaneous

It's NOT supported to disable or stop HAIP while the cluster is up and running unless otherwise advised by Oracle Support/Development.

1. The feature is disabled in 11.2.0.2/11.2.0.3 if Sun Cluster exists

2. The feature does not exist in Windows 11.2.0.2/11.2.0.3

3. The feature is disabled in 11.2.0.2/11.2.0.3 if Fujitsu PRIMECLUSTER exists



4. With the fix of bug 11077756 (fixed in 11.2.0.2 GI PSU6, 11.2.0.3), HAIP will be disabled if it fails to start while running root script (root.sh or rootupgrade.sh), for more details, refer to Section bug 11077756 
  
5. The feature is disabled on Solaris 11 if IPMP is used for private network. Tracking bug 16982332

6. The feature is disabled on HP-UX and AIX if cluster_interconnect/"private network" is Infiniband

 

 

HAIP Log File

Resource haip is managed by ohasd.bin, resource log is located in $GRID_HOME/log/<nodename>/ohasd/ohasd.log and $GRID_HOME/log/<nodename>/agent/ohasd/orarootagent_root/orarootagent_root.log

L1. Log Sample When Private Network Adapter Fails

In a multiple private network adapter environment, if one of the adapters fails:

  • ohasd.log
2010-09-24 09:10:00.891: [GIPCHGEN][1083025728]gipchaInterfaceFail: marking interface failing 0x2aaab0269a10 { host '', haName 'CLSFRAME_a2b2', local (nil), ip '10.11.0.188', subnet '10.11.0.128', mask '255.255.255.128', numRef 0, numFail 0, flags 0x4d }
2010-09-24 09:10:00.902: [GIPCHGEN][1138145600]gipchaInterfaceDisable: disabling interface 0x2aaab0269a10 { host '', haName 'CLSFRAME_a2b2', local (nil), ip '10.11.0.188', subnet '10.11.0.128', mask '255.255.255.128', numRef 0, numFail 0, flags 0x1cd }
2010-09-24 09:10:00.902: [GIPCHDEM][1138145600]gipchaWorkerCleanInterface: performing cleanup of disabled interface 0x2aaab0269a10 { host '', haName 'CLSFRAME_a2b2', local (nil), ip '10.11.0.188', subnet '10.11.0.128', mask '255.255.255.128', numRef 0, numFail 0, flags 0x1ed }
  • orarootagent_root.log
2010-09-24 09:09:57.708: [ USRTHRD][1129138496] {0:0:2} failed to receive ARP request
2010-09-24 09:09:57.708: [ USRTHRD][1129138496] {0:0:2} Assigned IP 169.254.112.250 no longer valid on inf eth6
2010-09-24 09:09:57.708: [ USRTHRD][1129138496] {0:0:2} VipActions::startIp {
2010-09-24 09:09:57.708: [ USRTHRD][1129138496] {0:0:2} Adding 169.254.112.250 on eth6:1
2010-09-24 09:09:57.719: [ USRTHRD][1129138496] {0:0:2} VipActions::startIp }
2010-09-24 09:09:57.719: [ USRTHRD][1129138496] {0:0:2} Reassigned IP:  169.254.112.250 on interface eth6
2010-09-24 09:09:58.013: [ USRTHRD][1082325312] {0:0:2} HAIP:  Updating member info HAIP1;10.11.0.128#0;10.11.0.128#1
2010-09-24 09:09:58.015: [ USRTHRD][1082325312] {0:0:2} HAIP:  Moving ip '169.254.112.250' from inf 'eth6' to inf 'eth7'
2010-09-24 09:09:58.015: [ USRTHRD][1082325312] {0:0:2} pausing thread
2010-09-24 09:09:58.015: [ USRTHRD][1082325312] {0:0:2} posting thread
2010-09-24 09:09:58.016: [ USRTHRD][1082325312] {0:0:2} Thread:[NetHAWork]start {
2010-09-24 09:09:58.016: [ USRTHRD][1082325312] {0:0:2} Thread:[NetHAWork]start }
2010-09-24 09:09:58.016: [ USRTHRD][1082325312] {0:0:2} HAIP:  Moving ip '169.254.244.103' from inf 'eth1' to inf 'eth7'
2010-09-24 09:09:58.016: [ USRTHRD][1082325312] {0:0:2} pausing thread
2010-09-24 09:09:58.016: [ USRTHRD][1082325312] {0:0:2} posting thread
2010-09-24 09:09:58.016: [ USRTHRD][1082325312] {0:0:2} Thread:[NetHAWork]start {
2010-09-24 09:09:58.016: [ USRTHRD][1082325312] {0:0:2} Thread:[NetHAWork]start }
2010-09-24 09:09:58.016: [ USRTHRD][1082325312] {0:0:2} HAIP:  Moving ip '169.254.178.237' from inf 'eth7' to inf 'eth1'
2010-09-24 09:09:58.016: [ USRTHRD][1082325312] {0:0:2} pausing thread
2010-09-24 09:09:58.016: [ USRTHRD][1082325312] {0:0:2} posting thread
2010-09-24 09:09:58.017: [ USRTHRD][1082325312] {0:0:2} Thread:[NetHAWork]start {
2010-09-24 09:09:58.017: [ USRTHRD][1116531008] {0:0:2} [NetHAWork] thread started
2010-09-24 09:09:58.017: [ USRTHRD][1116531008] {0:0:2}  Arp::sCreateSocket {
2010-09-24 09:09:58.017: [ USRTHRD][1093232960] {0:0:2} [NetHAWork] thread started
2010-09-24 09:09:58.017: [ USRTHRD][1093232960] {0:0:2}  Arp::sCreateSocket {
2010-09-24 09:09:58.017: [ USRTHRD][1082325312] {0:0:2} Thread:[NetHAWork]start }
2010-09-24 09:09:58.018: [ USRTHRD][1143847232] {0:0:2} [NetHAWork] thread started
2010-09-24 09:09:58.018: [ USRTHRD][1143847232] {0:0:2}  Arp::sCreateSocket {
2010-09-24 09:09:58.034: [ USRTHRD][1116531008] {0:0:2}  Arp::sCreateSocket }
2010-09-24 09:09:58.034: [ USRTHRD][1116531008] {0:0:2} Starting Probe for ip 169.254.112.250
2010-09-24 09:09:58.034: [ USRTHRD][1116531008] {0:0:2} Transitioning to Probe State
2010-09-24 09:09:58.034: [ USRTHRD][1093232960] {0:0:2}  Arp::sCreateSocket }
2010-09-24 09:09:58.035: [ USRTHRD][1093232960] {0:0:2} Starting Probe for ip 169.254.244.103
2010-09-24 09:09:58.035: [ USRTHRD][1093232960] {0:0:2} Transitioning to Probe State
2010-09-24 09:09:58.050: [ USRTHRD][1143847232] {0:0:2}  Arp::sCreateSocket }
2010-09-24 09:09:58.050: [ USRTHRD][1143847232] {0:0:2} Starting Probe for ip 169.254.178.237
2010-09-24 09:09:58.050: [ USRTHRD][1143847232] {0:0:2} Transitioning to Probe State
2010-09-24 09:09:58.231: [ USRTHRD][1093232960] {0:0:2}  Arp::sProbe {
2010-09-24 09:09:58.231: [ USRTHRD][1093232960] {0:0:2} Arp::sSend:  sending type 1
2010-09-24 09:09:58.231: [ USRTHRD][1093232960] {0:0:2}  Arp::sProbe }

2010-09-24 09:10:04.879: [ USRTHRD][1116531008] {0:0:2}  Arp::sAnnounce {
2010-09-24 09:10:04.879: [ USRTHRD][1116531008] {0:0:2} Arp::sSend:  sending type 1
2010-09-24 09:10:04.879: [ USRTHRD][1116531008] {0:0:2}  Arp::sAnnounce }
2010-09-24 09:10:04.879: [ USRTHRD][1116531008] {0:0:2} Transitioning to Defend State
2010-09-24 09:10:04.879: [ USRTHRD][1116531008] {0:0:2} VipActions::startIp {
2010-09-24 09:10:04.879: [ USRTHRD][1116531008] {0:0:2} Adding 169.254.112.250 on eth7:2
2010-09-24 09:10:04.880: [ USRTHRD][1116531008] {0:0:2} VipActions::startIp }
2010-09-24 09:10:04.880: [ USRTHRD][1116531008] {0:0:2} Assigned IP:  169.254.112.250 on interface eth7

2010-09-24 09:10:05.150: [ USRTHRD][1143847232] {0:0:2}  Arp::sAnnounce {
2010-09-24 09:10:05.150: [ USRTHRD][1143847232] {0:0:2} Arp::sSend:  sending type 1
2010-09-24 09:10:05.150: [ USRTHRD][1143847232] {0:0:2}  Arp::sAnnounce }
2010-09-24 09:10:05.150: [ USRTHRD][1143847232] {0:0:2} Transitioning to Defend State
2010-09-24 09:10:05.150: [ USRTHRD][1143847232] {0:0:2} VipActions::startIp {
2010-09-24 09:10:05.151: [ USRTHRD][1143847232] {0:0:2} Adding 169.254.178.237 on eth1:3
2010-09-24 09:10:05.151: [ USRTHRD][1143847232] {0:0:2} VipActions::startIp }
2010-09-24 09:10:05.151: [ USRTHRD][1143847232] {0:0:2} Assigned IP:  169.254.178.237 on interface eth1
2010-09-24 09:10:05.470: [ USRTHRD][1093232960] {0:0:2}  Arp::sAnnounce {
2010-09-24 09:10:05.470: [ USRTHRD][1093232960] {0:0:2} Arp::sSend:  sending type 1
2010-09-24 09:10:05.470: [ USRTHRD][1093232960] {0:0:2}  Arp::sAnnounce }
2010-09-24 09:10:05.470: [ USRTHRD][1093232960] {0:0:2} Transitioning to Defend State
2010-09-24 09:10:05.470: [ USRTHRD][1093232960] {0:0:2} VipActions::startIp {
2010-09-24 09:10:05.471: [ USRTHRD][1093232960] {0:0:2} Adding 169.254.244.103 on eth7:3
2010-09-24 09:10:05.471: [ USRTHRD][1093232960] {0:0:2} VipActions::startIp }
2010-09-24 09:10:05.471: [ USRTHRD][1093232960] {0:0:2} Assigned IP:  169.254.244.103 on interface eth7
2010-09-24 09:10:06.047: [ USRTHRD][1082325312] {0:0:2} Thread:[NetHAWork]stop {
2010-09-24 09:10:06.282: [ USRTHRD][1129138496] {0:0:2} [NetHAWork] thread stopping
2010-09-24 09:10:06.282: [ USRTHRD][1129138496] {0:0:2} Thread:[NetHAWork]isRunning is reset to false here
2010-09-24 09:10:06.282: [ USRTHRD][1082325312] {0:0:2} Thread:[NetHAWork]stop }
2010-09-24 09:10:06.282: [ USRTHRD][1082325312] {0:0:2} VipActions::stopIp {
2010-09-24 09:10:06.282: [ USRTHRD][1082325312] {0:0:2} NetInterface::sStopIp {
2010-09-24 09:10:06.282: [ USRTHRD][1082325312] {0:0:2} Stopping ip '169.254.112.250', inf 'eth6', mask '10.11.0.128'
2010-09-24 09:10:06.288: [ USRTHRD][1082325312] {0:0:2} NetInterface::sStopIp }
2010-09-24 09:10:06.288: [ USRTHRD][1082325312] {0:0:2} VipActions::stopIp }
2010-09-24 09:10:06.288: [ USRTHRD][1082325312] {0:0:2} Thread:[NetHAWork]stop {
2010-09-24 09:10:06.298: [ USRTHRD][1131239744] {0:0:2} [NetHAWork] thread stopping
2010-09-24 09:10:06.298: [ USRTHRD][1131239744] {0:0:2} Thread:[NetHAWork]isRunning is reset to false here
2010-09-24 09:10:06.298: [ USRTHRD][1082325312] {0:0:2} Thread:[NetHAWork]stop }
2010-09-24 09:10:06.298: [ USRTHRD][1082325312] {0:0:2} VipActions::stopIp {

2010-09-24 09:10:06.298: [ USRTHRD][1082325312] {0:0:2} NetInterface::sStopIp {
2010-09-24 09:10:06.298: [ USRTHRD][1082325312] {0:0:2} Stopping ip '169.254.178.237', inf 'eth7', mask '10.12.0.128'
2010-09-24 09:10:06.299: [ USRTHRD][1082325312] {0:0:2} NetInterface::sStopIp }
2010-09-24 09:10:06.299: [ USRTHRD][1082325312] {0:0:2} VipActions::stopIp }
2010-09-24 09:10:06.299: [ USRTHRD][1082325312] {0:0:2} Thread:[NetHAWork]stop {
2010-09-24 09:10:06.802: [ USRTHRD][1133340992] {0:0:2} [NetHAWork] thread stopping
2010-09-24 09:10:06.802: [ USRTHRD][1133340992] {0:0:2} Thread:[NetHAWork]isRunning is reset to false here
2010-09-24 09:10:06.802: [ USRTHRD][1082325312] {0:0:2} Thread:[NetHAWork]stop }
2010-09-24 09:10:06.802: [ USRTHRD][1082325312] {0:0:2} VipActions::stopIp {
2010-09-24 09:10:06.802: [ USRTHRD][1082325312] {0:0:2} NetInterface::sStopIp {
2010-09-24 09:10:06.802: [ USRTHRD][1082325312] {0:0:2} Stopping ip '169.254.244.103', inf 'eth1', mask '10.1.0.128'
2010-09-24 09:10:06.802: [ USRTHRD][1082325312] {0:0:2} NetInterface::sStopIp }
2010-09-24 09:10:06.802: [ USRTHRD][1082325312] {0:0:2} VipActions::stopIp }
2010-09-24 09:10:06.803: [ USRTHRD][1082325312] {0:0:2} USING HAIP[  0 ]:  eth7 - 169.254.112.250
2010-09-24 09:10:06.803: [ USRTHRD][1082325312] {0:0:2} USING HAIP[  1 ]:  eth1 - 169.254.178.237
2010-09-24 09:10:06.803: [ USRTHRD][1082325312] {0:0:2} USING HAIP[  2 ]:  eth7 - 169.254.244.103
2010-09-24 09:10:06.803: [ USRTHRD][1082325312] {0:0:2} USING HAIP[  3 ]:  eth1 - 169.254.30.98

Note: from above, even only NIC eth6 failed, there could be multiple virtual private IP movement among surviving NICs
  • ocssd.log
2010-09-24 09:09:58.314: [ GIPCNET][1089964352] gipcmodNetworkProcessSend: [network]  failed send attempt endp 0xe1b9150 [0000000000000399] { gipcEndpoint : localAddr 'udp://10.11.0.188:60169', remoteAddr '', numPend 5, numReady 1, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x2, usrFlags 0x4000 }, req 0x2aaab00117f0 [00000000004b0cae] { gipcSendRequest : addr 'udp://10.11.0.189:41486', data 0x2aaab0050be8, len 80, olen 0, parentEndp 0xe1b9150, ret gipcretEndpointNotAvailable (40), objFlags 0x0, reqFlags 0x2 }
2010-09-24 09:09:58.314: [ GIPCNET][1089964352] gipcmodNetworkProcessSend: slos op  :  sgipcnValidateSocket
2010-09-24 09:09:58.314: [ GIPCNET][1089964352] gipcmodNetworkProcessSend: slos dep :  Invalid argument (22)
2010-09-24 09:09:58.314: [ GIPCNET][1089964352] gipcmodNetworkProcessSend: slos loc :  address not
2010-09-24 09:09:58.314: [ GIPCNET][1089964352] gipcmodNetworkProcessSend: slos info:  addr '10.11.0.188:60169', len 80, buf 0x2aaab0050be8, cookie 0x2aaab00117f0
2010-09-24 09:09:58.314: [GIPCXCPT][1089964352] gipcInternalSendSync: failed sync request, ret gipcretEndpointNotAvailable (40)
2010-09-24 09:09:58.314: [GIPCXCPT][1089964352] gipcSendSyncF [gipchaLowerInternalSend : gipchaLower.c : 755]: EXCEPTION[ ret gipcretEndpointNotAvailable (40) ]  failed to send on endp 0xe1b9150 [0000000000000399] { gipcEndpoint : localAddr 'udp://10.11.0.188:60169', remoteAddr '', numPend 5, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x2, usrFlags 0x4000 }, addr 0xe4e6d10 [00000000000007ed] { gipcAddress : name 'udp://10.11.0.189:41486', objFlags 0x0, addrFlags 0x1 }, buf 0x2aaab0050be8, len 80, flags 0x0
2010-09-24 09:09:58.314: [GIPCHGEN][1089964352] gipchaInterfaceFail: marking interface failing 0xe2bd5f0 { host 'racnode2', haName 'CSS_a2b2', local 0x2aaaac2098e0, ip '10.11.0.189:41486', subnet '10.11.0.128', mask '255.255.255.128', numRef 0, numFail 0, flags 0x6 }
2010-09-24 09:09:58.314: [GIPCHALO][1089964352] gipchaLowerInternalSend: failed to initiate send on interface 0xe2bd5f0 { host 'racnode2', haName 'CSS_a2b2', local 0x2aaaac2098e0, ip '10.11.0.189:41486', subnet '10.11.0.128', mask '255.255.255.128', numRef 0, numFail 0, flags 0x86 }, hctx 0xde81d10 [0000000000000010] { gipchaContext : host 'racnode1', name 'CSS_a2b2', luid '4f06f2aa-00000000', numNode 1, numInf 3, usrFlags 0x0, flags 0x7 }
2010-09-24 09:09:58.326: [GIPCHGEN][1089964352] gipchaInterfaceDisable: disabling interface 0x2aaaac2098e0 { host '', haName 'CSS_a2b2', local (nil), ip '10.11.0.188', subnet '10.11.0.128', mask '255.255.255.128', numRef 0, numFail 1, flags 0x14d }
2010-09-24 09:09:58.326: [GIPCHGEN][1089964352] gipchaInterfaceDisable: disabling interface 0xe2bd5f0 { host 'racnode2', haName 'CSS_a2b2', local 0x2aaaac2098e0, ip '10.11.0.189:41486', subnet '10.11.0.128', mask '255.255.255.128', numRef 0, numFail 0, flags 0x86 }
2010-09-24 09:09:58.327: [GIPCHALO][1089964352] gipchaLowerCleanInterfaces: performing cleanup of disabled interface 0xe2bd5f0 { host 'racnode2', haName 'CSS_a2b2', local 0x2aaaac2098e0, ip '10.11.0.189:41486', subnet '10.11.0.128', mask '255.255.255.128', numRef 0, numFail 0, flags 0xa6 }
2010-09-24 09:09:58.327: [GIPCHGEN][1089964352] gipchaInterfaceReset: resetting interface 0xe2bd5f0 { host 'racnode2', haName 'CSS_a2b2', local 0x2aaaac2098e0, ip '10.11.0.189:41486', subnet '10.11.0.128', mask '255.255.255.128', numRef 0, numFail 0, flags 0xa6 }
2010-09-24 09:09:58.338: [GIPCHDEM][1089964352] gipchaWorkerCleanInterface: performing cleanup of disabled interface 0x2aaaac2098e0 { host '', haName 'CSS_a2b2', local (nil), ip '10.11.0.188', subnet '10.11.0.128', mask '255.255.255.128', numRef 0, numFail 0, flags 0x16d }
2010-09-24 09:09:58.338: [GIPCHTHR][1089964352] gipchaWorkerUpdateInterface: created remote interface for node 'racnode2', haName 'CSS_a2b2', inf 'udp://10.11.0.189:41486'
2010-09-24 09:09:58.338: [GIPCHGEN][1089964352] gipchaWorkerAttachInterface: Interface attached inf 0xe2bd5f0 { host 'racnode2', haName 'CSS_a2b2', local 0x2aaaac2014f0, ip '10.11.0.189:41486', subnet '10.11.0.128', mask '255.255.255.128', numRef 0, numFail 0, flags 0x6 }
2010-09-24 09:10:00.454: [    CSSD][1108904256]clssnmSendingThread: sending status msg to all nodes

Note: from above, ocssd.bin won't fail as long as there's at least one private network adapter is working

L2. Log Sample When Private Network Adapter Restores

In a multiple private network adapter environment, if one of the failed adapters becomes restored:

  • ohasd.log
2010-09-24 09:14:30.962: [GIPCHGEN][1083025728]gipchaNodeAddInterface: adding interface information for inf 0x2aaaac1a53d0 { host '', haName 'CLSFRAME_a2b2', local (nil), ip '10.11.0.188', subnet '10.11.0.128', mask '255.255.255.128', numRef 0, numFail 0, flags 0x41 }
2010-09-24 09:14:30.972: [GIPCHTHR][1138145600]gipchaWorkerUpdateInterface: created local bootstrap interface for node 'eyrac1f', haName 'CLSFRAME_a2b2', inf 'mcast://230.0.1.0:42424/10.11.0.188'
2010-09-24 09:14:30.972: [GIPCHTHR][1138145600]gipchaWorkerUpdateInterface: created local interface for node 'eyrac1f', haName 'CLSFRAME_a2b2', inf '10.11.0.188:13235'
  • ocssd.log
2010-09-24 09:14:30.961: [GIPCHGEN][1091541312] gipchaNodeAddInterface: adding interface information for inf 0x2aaab005af00 { host '', haName 'CSS_a2b2', local (nil), ip '10.11.0.188', subnet '10.11.0.128', mask '255.255.255.128', numRef 0, numFail 0, flags 0x41 }
2010-09-24 09:14:30.972: [GIPCHTHR][1089964352] gipchaWorkerUpdateInterface: created local bootstrap interface for node 'racnode1', haName 'CSS_a2b2', inf 'mcast://230.0.1.0:42424/10.11.0.188'
2010-09-24 09:14:30.972: [GIPCHTHR][1089964352] gipchaWorkerUpdateInterface: created local interface for node 'racnode1', haName 'CSS_a2b2', inf '10.11.0.188:10884'
2010-09-24 09:14:30.972: [GIPCHGEN][1089964352] gipchaNodeAddInterface: adding interface information for inf 0x2aaab0035490 { host 'racnode2', haName 'CSS_a2b2', local (nil), ip '10.21.0.208', subnet '10.12.0.128', mask '255.255.255.128', numRef 0, numFail 0, flags 0x42 }
2010-09-24 09:14:30.972: [GIPCHGEN][1089964352] gipchaNodeAddInterface: adding interface information for inf 0x2aaab00355c0 { host 'racnode2', haName 'CSS_a2b2', local (nil), ip '10.11.0.188', subnet '10.11.0.128', mask '255.255.255.128', numRef 0, numFail 0, flags 0x42 }
2010-09-24 09:14:30.972: [GIPCHTHR][1089964352] gipchaWorkerUpdateInterface: created remote interface for node 'racnode2', haName 'CSS_a2b2', inf 'mcast://230.0.1.0:42424/10.12.0.208'
2010-09-24 09:14:30.972: [GIPCHGEN][1089964352] gipchaWorkerAttachInterface: Interface attached inf 0x2aaab0035490 { host 'racnode2', haName 'CSS_a2b2', local 0x2aaab005af00, ip '10.12.0.208', subnet '10.12.0.128', mask '255.255.255.128', numRef 0, numFail 0, flags 0x46 }
2010-09-24 09:14:30.972: [GIPCHTHR][1089964352] gipchaWorkerUpdateInterface: created remote interface for node 'racnode2', haName 'CSS_a2b2', inf 'mcast://230.0.1.0:42424/10.11.0.188'
2010-09-24 09:14:30.972: [GIPCHGEN][1089964352] gipchaWorkerAttachInterface: Interface attached inf 0x2aaab00355c0 { host 'racnode2', haName 'CSS_a2b2', local 0x2aaab005af00, ip '10.11.0.188', subnet '10.11.0.128', mask '255.255.255.128', numRef 0, numFail 0, flags 0x46 }
2010-09-24 09:14:31.437: [GIPCHGEN][1089964352] gipchaInterfaceDisable: disabling interface 0x2aaab00355c0 { host 'racnode2', haName 'CSS_a2b2', local 0x2aaab005af00, ip '10.11.0.188', subnet '10.11.0.128', mask '255.255.255.128', numRef 0, numFail 0, flags 0x46 }
2010-09-24 09:14:31.437: [GIPCHALO][1089964352] gipchaLowerCleanInterfaces: performing cleanup of disabled interface 0x2aaab00355c0 { host 'racnode2', haName 'CSS_a2b2', local 0x2aaab005af00, ip '10.11.0.188', subnet '10.11.0.128', mask '255.255.255.128', numRef 0, numFail 0, flags 0x66 }
2010-09-24 09:14:31.446: [GIPCHGEN][1089964352] gipchaInterfaceDisable: disabling interface 0x2aaab0035490 { host 'racnode2', haName 'CSS_a2b2', local 0x2aaab005af00, ip '10.12.0.208', subnet '10.12.0.128', mask '255.255.255.128', numRef 0, numFail 0, flags 0x46 }
2010-09-24 09:14:31.446: [GIPCHALO][1089964352] gipchaLowerCleanInterfaces: performing cleanup of disabled interface 0x2aaab0035490 { host 'racnode2', haName 'CSS_a2b2', local 0x2aaab005af00, ip '10.12.0.208', subnet '10.12.0.128', mask '255.255.255.128', numRef 0, numFail 0, flags 0x66 }

 ##############################################################################################################################

APPLIES TO:

Oracle Database - Enterprise Edition - Version 11.2.0.2 and later
Information in this document applies to any platform.

PURPOSE

This document lists knowns HAIP issues in 11gR2/12c Grid Infrastructure. Refer to note 1210883.1 for explanation of HAIP feature.

DETAILS

Bug 10332426 - HAIP fails to start due to network mismatch

Issue: HAIP fails to start while running rootupgrade.sh

Symptom:

  • Output of root script:
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'racnode1'
CRS-5017: The resource action "ora.cluster_interconnect.haip start" 
encountered the following error:
Start action for HAIP aborted
CRS-2674: Start of 'ora.cluster_interconnect.haip' on 'racnode1' failed
  • $GRID_HOME/log/<hostname>/gipcd/gipcd.log
2010-12-12 09:41:35.201: [ CLSINET][1088543040] Returning NETDATA: 0 interfaces
2010-12-12 09:41:40.201: [ CLSINET][1088543040] Returning NETDATA: 0 interfaces

Solution:

The cause is mismatch of private network information in OCR and on OS, output of the following should be consistent with each other regarding network adapter name, subnet and netmask - see note 1296579.1 for what to check.

oifcfg iflist -p -n
oifcfg getif
ifconfig

Bug 19270660 - AIX: category: -2, operation: open, loc: bpfopen:1,os, OS error: 2, other: ARP device /dev/bpf4, interface en8 

Issue: HAIP fails to start on AIX

Symptom:

  • $GRID_HOME/log/<hostname>/agent/ohasd/orarootagent_root/orarootagent_root.log

2014-07-21 16:38:59.240: [ USRTHRD][4372]{0:0:2} failed to create arp 
2014-07-21 16:38:59.240: [ USRTHRD][4115]{0:0:2} (null) category: -2, operation: open, loc: bpfopen:1,os, OS error: 2, other: ARP device /dev/bpf4, interface en8

Solution/Workaround:

bug 19270660 is fixed in 12.1.0.2, apply interim patch 19270660 if the issue is encountered.

 

Bug 16445624 - AIX: HAIP fails to start

Issue: HAIP fails to start if root script (root.sh or rootupgrade.sh) is executed via sudo (not as root user directly) or if bpf device is not functionin properly


Symptom:

  • Output of root script:
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'racnode1'
CRS-5017: The resource action "ora.cluster_interconnect.haip start" encountered the following error:
Start action for HAIP aborted
CRS-2674: Start of 'ora.cluster_interconnect.haip' on 'racnode1' failed
  • $GRID_HOME/log/<hostname>/agent/ohasd/orarootagent_root/orarootagent_root.log
2010-12-04 17:19:54.893: [ USRTHRD][2084] {0:3:37} failed to create arp
2010-12-04 17:19:54.893: [ USRTHRD][2084] {0:3:37} (null) category: -2, operation: ioctl, loc: bpfopen:2,os, OS error: 14, other:

OR

2011-09-29 16:44:46.771: [ USRTHRD][3600] {0:3:14} (null) category: -2, operation: open, loc: bpfopen:1,os, OS error: 2, other: 

OR

2011-09-29 16:44:46.771: [ USRTHRD][3600] {0:3:14} (null) category: -2, operation: open, loc: bpfopen:1,os, OS error: 22, other: 

OR

2011-09-29 16:44:46.771: [ USRTHRD][3600] {0:3:14} (null) category: -2, operation: open, loc: bpfopen:1,os, OS error: 22, other: 

OR

 

2012-04-21 12:36:43.951: [ USRTHRD][2572] {0:0:2} (null) category: -2, operation: SETIF, loc: bpfopen:21,o, OS error: 6, other: dev /dev/bpf0, ifr en2  

OR 


Various other OS error code can be seen as well

Solution/Workaround:

It's known on AIX and Solaris that command executed via sudo etc may not have full root environment, which could cause HAIP startup failure. 

The solution is to obtain and apply patch 16445624 on AIX.

The workaround is to execute root script (root.sh or rootupgrade.sh) as real root user directly.

If root script already failed, try one or all of the following:

 - reboot the node

 - execute "/usr/sbin/tcpdump -D" as root user, if the timestamp of the bpf device didn't get updated, delete the device and re-run the same "tcpdump -D" command

Before re-running root script, verify whether the following exists and the timestamp is updated

ls -ltr /dev/bpf*
cr--------   1 root     system       42,  0 Oct 03 10:32 /dev/bpf0
..

 


 

Bug 13989181 - AIX: HAIP fails to start with: category: -2, operation: SETIF, loc: bpfopen:21,o, OS error: 6, other: dev /dev/bpf0

Duplicate Bug 14358011

Issue: HAIP fails to start on AIX

Symptom:

  • $GRID_HOME/log/<hostname>/agent/ohasd/orarootagent_root/orarootagent_root.log

2012-04-21 12:36:43.951: [ USRTHRD][2572] {0:0:2} failed to create arp 
2012-04-21 12:36:43.951: [ USRTHRD][2572] {0:0:2} (null) category: -2, operation: SETIF, loc: bpfopen:21,o, OS error: 6, other: dev /dev/bpf0, ifr en2 
... 
2012-04-21 17:12:41.086: [ora.cluster_interconnect.haip][3343] {0:0:2} [start] Start of HAIP aborted 
2012-04-21 17:12:41.086: [   AGENT][3343] {0:0:2} UserErrorException: Locale is 
2012-04-21 17:12:41.087: [ora.cluster_interconnect.haip][3343] {0:0:2} [start] clsnUtils::error Exception type=2 string=

CRS-5017: The resource action "ora.cluster_interconnect.haip start" encountered the following error: Start action for HAIP aborted

Solution/Workaround:

bug 13989181 is fixed in 11.2.0.4, apply interim patch 13989181 if the issue is encountered.

 

 

Note 1447517.1 - AIX: HAIP fails to start if bpf and other devices using same major/minor number

Issue: HAIP fails to start on AIX as other system devices using same major/minor number as bpf devices

orarootagent_root.log shows: category: -2, operation: SETIF, loc: bpfopen:21,o, OS error: 22, other: dev /dev/bpf0, ifr en15

The solution is to ensure no other device is using same major/minor as bpf device, refer to note 1447517.1 for more details.

 

Bug 10253028 - "oifcfg iflist -p -n" not showing HAIP on AIX as expected

Issue: "oifcfg iflist -p -n" not showing HAIP on AIX

Fixed in: Expected behaviour on AIX

Symptom:

  • "oifcfg getif" output
en12  10.0.1.0  global  public
en13  10.1.1.0  global  cluster_interconnect
  • "ifconfig -a" output
en13: flags=5e080863,c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),PSEG,LARGESEND,CHAIN>
       inet 10.1.1.143 netmask 0xffffff00 broadcast 10.1.1.255
       inet 169.254.228.154 netmask 0xffff0000 broadcast 169.254.255.255
        tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0
..
Note HAIP exists
  • v$cluster_interconnects
SQL> select * from gv$cluster_interconnects;

  INST_ID NAME            IP_ADDRESS       IS_ SOURCE
---------- --------------- ---------------- --- 
        1 en13            169.254.228.154  NO
        2 en13            169.254.55.162   NO
  • "oifcfg iflist -p -n" output
en12  10.0.1.0  PUBLIC  255.255.255.0
en13  10.1.1.0  PUBLIC  255.255.255.0

Note usually we expect HAIP to be listed here as well, however it's not listed on AIX

  

Bug 13332363 - Wrong MTU for HAIP on Solaris

Issue: Wrong MTU size for HAIP on Solaris, refer to note 1290585.1 for more details.

Fixed in: 11.2.0.2 GI PSU5, 11.2.0.3 GI PSU1, 11.2.0.4

 

Bug 10114953 - only one HAIP is create on HP-UX

Issue: Only one HAIP created on HP-UX

2013-05-29 17:21:31.280: [ USRTHRD][29499] {0:0:56578} Arp::sCreateSocket {
2013-05-29 17:21:31.280: [ USRTHRD][29499] {0:0:56578} failed to create arp
2013-05-29 17:21:31.281: [ USRTHRD][29499] {0:0:56578} (null) category: -2, operation: ssclsi_dlpi_request, loc: dlpireq:8,na, OS error: 4, other:


The bug is fixed in 11.2.0.4, patch 10114953 is required before 11.2.0.4 is released.

OS kernel parameter dlpi_max_ub_promisc must be set to greater than 1 for the patch to be effective.

To find out value of dlpi_max_ub_promisc: kctune -v dlpi_max_ub_promisc

Refer to bug 15940367

 

Bug 10363902 - HAIP Infiniband support for Linux and Solaris

Issue: GIPC HA disabled or HAIP fails to start if cluster interconnect is Infiniband or any other network hardware that has hardware address (MAC) longer than 6 bytes

Fixed in: 11.2.0.3 for Linux and Solaris

Symptom:

  • Output of root script:
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'racnode1'
CRS-5017: The resource action "ora.cluster_interconnect.haip start" 
encountered the following error:
Start action for HAIP aborted
CRS-2674: Start of 'ora.cluster_interconnect.haip' on 'racnode1' failed
  • $GRID_HOME/log/<hostname>/gipcd/gipcd.log
2010-12-07 13:23:08.560: [ USRTHRD][3858] {0:0:62} Arp::sCreateSocket {
2010-12-07 13:23:08.560: [ USRTHRD][3858] {0:0:62} failed to create arp
2010-12-07 13:23:08.561: [ USRTHRD][3858] {0:0:62} (null) category: -2, 
operation: ssclsi_aix_get_phys_addr, loc: aixgetpa:4,n, OS error: 2, other:


 

 

Bug 10357258 - Many HAIP started on Solaris IPMP - not affecting 11.2.0.3

Issue: many HAIP created after active NIC fails in IPMP

Fixed in: 11.2.0.3, 11.2.0.2 GI PSU3, interim patch 10357258 exists for 11.2.0.2, patch 11865154 for 11.2.0.2.1, affects Solaris only

Symptom:

  • ifconfig output:
nxge3:2: flags=21000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,STANDBY> mtu  1500 index 5
          inet 169.254.20.88 netmask ffff0000 broadcast 169.254.255.255
nxge3:3: flags=21000842<BROADCAST,RUNNING,MULTICAST,IPv4,STANDBY> mtu 1500  index 5
          inet 169.254.20.88 netmask ffff0000 broadcast 169.254.255.255
..

Note the same HAIP shows up multiple times

 

Bug 10397652/ 12767231 - HAIP not failing over when private network fails - not affecting 11.2.0.3

Issue: HAIP does not failover even when private network experiences problem (i.e. switch port disabled or such) as OS is not providing reliable link information

Fixed in: Bug 12767231 is fixed in 11.2.0.2 GI PSU4, 11.2.0.3

Workaround on AIX is to set "MONITOR" flag for all private network adapters

ifconfig en1 monitor
ifconfig en1
en1: flags=5e080863,2c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,
GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),PSEG,LARGESEND,CHAIN,MONITOR
        inet 192.168.10.83 netmask 0xfffffc00 broadcast 192.168.11.255
        inet 169.254.74.136 netmask 0xffff8000 broadcast 169.254.127.255
         tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0

Bug 11077756 - allow root script to continue upon HAIP failure

Issue: Startup failure of HAIP fails root script, fix of the bug will allow root script to continue so HAIP issue can be worked later.

Fixed in: 11.2.0.2 GI PSU6, 11.2.0.3 and above

Note: the consequence is that HAIP will be disabled. Once the cause is identified and solution is implemented, HAIP needs to be enabled when there's an outage window. To enable, as root on ALL nodes:

# $GRID_HOME/bin/crsctl modify res ora.cluster_interconnect.haip -attr "ENABLED=1" -init
# $GRID_HOME/bin/crsctl stop crs
# $GRID_HOME/bin/crsctl start crs

 

Bug 12546712 - not affecting 11.2.0.3

Issue: ASM crashes as HAIP does not fail over when two or more private network fails , refer to note 1323995.1 for more details.

 

HAIP fails to start if default gateway is configured for VLAN for private network on network switch

Issue: HAIP fails to start if default gateway is configured for VLAN for private network on network switch

orarootagent_root.log shows: PROBE: conflict detected src { 169.254.12.247, <gateway MAC on switch> }, target { 0.0.0.0, <private NIC MAC> }

The solution is to remove default gateway setting on network switch for private network (VLAN), refer to Note 1366211.1 for more details.

 

Bug 12425730 - HAIP does not start, 11.2.0.3 not affected

Issue: HAIP fails to start, gipcd.log shows rank 0 or "-1" for private network

Fixed in: 11.2.0.2 GI PSU6, 11.2.0.3 and onward, refer to note 1374360.1 for details.

 


 

ASM on Non-First Node (Second or Others) Fails to Start: PMON (ospid: nnnn): terminating the instance due to error 481

HAIP not running could affect instance start. Refer Note 1383737.1 for details

 

11gR2 GI HAIP Resource Not Created in Solaris 11 if IPMP is Used for Private Network

HAIP will not be enabled on Solaris 11 if IPMP is configured for private network. This is by design. Refer to Note 1512141.1

 

BUG 20900984 - HAIP CONFLICTION DETECTED BETWEEN TWO RAC CLUSTERS IN THE SAME VLAN

Issue: Found same HAIP address on two different clusters in the same VLAN


The solution is to apply the bug fix, the bug# on AIX is bug 21124045

 

note 2059802.1 - AIX HAIP: failed to create arp - operation: open, loc: bpfopen:1,os, OS error: 2

Issue: HAIP doesn't start on AIX

 

note 2059461.1 - AIX: HAIP Fails to Start: operation: open, loc: bpfopen:1,os, OS error: 2

Issue: HAIP doesn't start on AIX

 

note 1963248.1 - AIX HAIP: operation: SETIF, loc: bpfopen:21,o, OS error: 6, other: dev /dev/bpf0

Issue: HAIP doesn't start on AIX

note 2066903.1 - operation: routedel, loc: system, OS error: 76, other: failed to execute route del, ret 256

Issue: aforementioned message repeatedly printed.

 

HAIP failover improvement

Bug 20579351 - HAIP failover operation causing session spike with gc waits

Fixed in 12.1.0.2 Jul2016PSU, 12.2

BUG 23728054 - IMPROVE HAIP FAILOVER TIME IN MULTI-INTERCONNECT ENV

Fixed in 12.2

  

BUG 23472436 - HAIP USING ONLY 2 OF 4 NETWORK INTERFACES ASSIGNED TO THE INTERCONNECT

Fixed in 12.2 

 

BUG 25073182 - ONE GIPC RANK FOR EACH PRIVATE NETWORK INTERFACE

Fixed in 12.2.0.2, refer to the following for details:

note 2236102.1 - Extended cluster instance crashes if one interconnect network fails

 

Bug 24509481 - HAIP TO HAVE SMALLER SUBNET INSTEAD OF WHOLE LINK LOCAL 169.254.*.*

Fixed in 12.2.0.2, with the fix, HAIP will use a small subnet of link local instead of the whole 169.254.*.*


來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/13750068/viewspace-2146518/,如需轉載,請註明出處,否則將追究法律責任。

相關文章