通過實驗詳解CLUSTER_INTERCONNECTS引數對例項的影響

尛樣兒發表於2015-10-20

   在Oracle RAC環境中,RAC例項的Cache Fusion通常都使用的是Clusterware的私有心跳網路,特別是11.2.0.2版本之後,多用HAIP技術,這種技術在提高頻寬的同時(最多4個心跳網路),也保證了心跳網路的容錯能力,例如:RAC節點伺服器4條心跳網路,同時壞3條都不會引起Oracle RAC和Clusterware當機。

    但是當一套RAC環境中部署有多套資料庫時,不同資料庫例項之間的Cache Fusion活動會相互的影響,可能有些庫對頻寬要求高些,有些庫對頻寬要求低些,為了避免同一套RAC環境的多套資料庫的心跳之間相互影響,Oracle在資料庫層面提供了cluster_interconnects引數,該引數的作用就是覆蓋預設的心跳網路,使用指定的網路用於資料庫例項Cache Fusion活動,但該引數不具備容錯的能力,下面我們通過實驗來說明:

Oracle RAC環境:12.1.0.2.0 標準Cluster for Oracle Linux 5.9 x64。

一.網路配置。

>節點1:
[root@rhel1 ~]# ifconfig -a
eth0      Link encap:Ethernet  HWaddr 00:50:56:A8:16:15                       <<<< eth0管理網路。
          inet addr:172.168.4.20  Bcast:172.168.4.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:13701 errors:0 dropped:522 overruns:0 frame:0
          TX packets:3852 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1122408 (1.0 MiB)  TX bytes:468021 (457.0 KiB)

eth1      Link encap:Ethernet  HWaddr 00:50:56:A8:25:6B                       <<<< eth1公共網路。
          inet addr:10.168.4.20  Bcast:10.168.4.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:23074 errors:0 dropped:520 overruns:0 frame:0
          TX packets:7779 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:15974971 (15.2 MiB)  TX bytes:2980403 (2.8 MiB)

eth1:1    Link encap:Ethernet  HWaddr 00:50:56:A8:25:6B  
          inet addr:10.168.4.22  Bcast:10.168.4.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth1:2    Link encap:Ethernet  HWaddr 00:50:56:A8:25:6B  
          inet addr:10.168.4.24  Bcast:10.168.4.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth2      Link encap:Ethernet  HWaddr 00:50:56:A8:21:0A                       <<<< eth2心跳網路,屬於Clusterware HAIP其中之一。
          inet addr:10.0.1.20  Bcast:10.0.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:11322 errors:0 dropped:500 overruns:0 frame:0
          TX packets:10279 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:6765147 (6.4 MiB)  TX bytes:5384321 (5.1 MiB)

eth2:1    Link encap:Ethernet  HWaddr 00:50:56:A8:21:0A   
          inet addr:169.254.10.239  Bcast:169.254.127.255  Mask:255.255.128.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth3      Link encap:Ethernet  HWaddr 00:50:56:A8:F7:F7                       <<<< eth3心跳網路,屬於Clusterware HAIP其中之一。
          inet addr:10.0.2.20  Bcast:10.0.2.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:347096 errors:0 dropped:500 overruns:0 frame:0
          TX packets:306170 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:210885992 (201.1 MiB)  TX bytes:173504069 (165.4 MiB)

eth3:1    Link encap:Ethernet  HWaddr 00:50:56:A8:F7:F7  
          inet addr:169.254.245.28  Bcast:169.254.255.255  Mask:255.255.128.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth4      Link encap:Ethernet  HWaddr 00:50:56:A8:DC:CC                      <<<< eth4~eth9心跳網路,但不屬於Clusterware HAIP。
          inet addr:10.0.3.20  Bcast:10.0.3.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:7247 errors:0 dropped:478 overruns:0 frame:0
          TX packets:6048 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:3525191 (3.3 MiB)  TX bytes:2754275 (2.6 MiB)

eth5      Link encap:Ethernet  HWaddr 00:50:56:A8:A1:86  
          inet addr:10.0.4.20  Bcast:10.0.4.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:40028 errors:0 dropped:480 overruns:0 frame:0
          TX packets:23700 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:15139172 (14.4 MiB)  TX bytes:9318750 (8.8 MiB)

eth6      Link encap:Ethernet  HWaddr 00:50:56:A8:F7:53  
          inet addr:10.0.5.20  Bcast:10.0.5.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:13324 errors:0 dropped:470 overruns:0 frame:0
          TX packets:128 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1075873 (1.0 MiB)  TX bytes:16151 (15.7 KiB)

eth7      Link encap:Ethernet  HWaddr 00:50:56:A8:E4:78  
          inet addr:10.0.6.20  Bcast:10.0.6.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:13504 errors:0 dropped:457 overruns:0 frame:0
          TX packets:120 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1158553 (1.1 MiB)  TX bytes:14643 (14.2 KiB)

eth8      Link encap:Ethernet  HWaddr 00:50:56:A8:C0:B0  
          inet addr:10.0.7.20  Bcast:10.0.7.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:13272 errors:0 dropped:442 overruns:0 frame:0
          TX packets:126 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1072609 (1.0 MiB)  TX bytes:15999 (15.6 KiB)

eth9      Link encap:Ethernet  HWaddr 00:50:56:A8:5E:F6  
          inet addr:10.0.8.20  Bcast:10.0.8.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:14316 errors:0 dropped:431 overruns:0 frame:0
          TX packets:127 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1169023 (1.1 MiB)  TX bytes:15293 (14.9 KiB)

節點2:
[root@rhel2 ~]# ifconfig -a                                                       <<<< 網路配置和節點1一致。
eth0      Link encap:Ethernet  HWaddr 00:50:56:A8:C2:66  
          inet addr:172.168.4.21  Bcast:172.168.4.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:19156 errors:0 dropped:530 overruns:0 frame:0
          TX packets:278 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:4628107 (4.4 MiB)  TX bytes:37558 (36.6 KiB)

eth1      Link encap:Ethernet  HWaddr 00:50:56:A8:18:1A  
          inet addr:10.168.4.21  Bcast:10.168.4.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:21732 errors:0 dropped:531 overruns:0 frame:0
          TX packets:7918 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:4110335 (3.9 MiB)  TX bytes:14783715 (14.0 MiB)

eth1:2    Link encap:Ethernet  HWaddr 00:50:56:A8:18:1A  
          inet addr:10.168.4.23  Bcast:10.168.4.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth2      Link encap:Ethernet  HWaddr 00:50:56:A8:1B:DD  
          inet addr:10.0.1.21  Bcast:10.0.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:410244 errors:0 dropped:524 overruns:0 frame:0
          TX packets:433865 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:206461212 (196.8 MiB)  TX bytes:283858870 (270.7 MiB)

eth2:1    Link encap:Ethernet  HWaddr 00:50:56:A8:1B:DD  
          inet addr:169.254.89.158  Bcast:169.254.127.255  Mask:255.255.128.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth3      Link encap:Ethernet  HWaddr 00:50:56:A8:2B:68  
          inet addr:10.0.2.21  Bcast:10.0.2.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:323060 errors:0 dropped:512 overruns:0 frame:0
          TX packets:337911 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:176652414 (168.4 MiB)  TX bytes:212347379 (202.5 MiB)

eth3:1    Link encap:Ethernet  HWaddr 00:50:56:A8:2B:68  
          inet addr:169.254.151.103  Bcast:169.254.255.255  Mask:255.255.128.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth4      Link encap:Ethernet  HWaddr 00:50:56:A8:81:DB  
          inet addr:10.0.3.21  Bcast:10.0.3.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:37308 errors:0 dropped:507 overruns:0 frame:0
          TX packets:27565 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:10836885 (10.3 MiB)  TX bytes:14973305 (14.2 MiB)

eth5      Link encap:Ethernet  HWaddr 00:50:56:A8:43:EA  
          inet addr:10.0.4.21  Bcast:10.0.4.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:38506 errors:0 dropped:496 overruns:0 frame:0
          TX packets:27985 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:10940661 (10.4 MiB)  TX bytes:14859794 (14.1 MiB)

eth6      Link encap:Ethernet  HWaddr 00:50:56:A8:84:76  
          inet addr:10.0.5.21  Bcast:10.0.5.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:13653 errors:0 dropped:484 overruns:0 frame:0
          TX packets:114 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1102617 (1.0 MiB)  TX bytes:14161 (13.8 KiB)

eth7      Link encap:Ethernet  HWaddr 00:50:56:A8:B6:4F  
          inet addr:10.0.6.21  Bcast:10.255.255.255  Mask:255.0.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:13633 errors:0 dropped:474 overruns:0 frame:0
          TX packets:115 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1101251 (1.0 MiB)  TX bytes:14343 (14.0 KiB)

eth8      Link encap:Ethernet  HWaddr 00:50:56:A8:97:62  
          inet addr:10.0.7.21  Bcast:10.0.7.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:13633 errors:0 dropped:459 overruns:0 frame:0
          TX packets:115 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1102065 (1.0 MiB)  TX bytes:14343 (14.0 KiB)

eth9      Link encap:Ethernet  HWaddr 00:50:56:A8:28:10  
          inet addr:10.0.8.21  Bcast:10.0.8.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:13764 errors:0 dropped:446 overruns:0 frame:0
          TX packets:115 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1159479 (1.1 MiB)  TX bytes:14687 (14.3 KiB)


二.叢集當前的心跳網路配置。

[grid@rhel1 ~]$ oifcfg getif
eth1  10.168.4.0  global  public
eth2  10.0.1.0  global  cluster_interconnect
eth3  10.0.2.0  global  cluster_interconnect


三.cluster_interconnects引數調整前。

SQL> show parameter cluster_interconnect

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
cluster_interconnects                string

cluster_interconnects預設為空。

SQL> select * from v$cluster_interconnects;

NAME            IP_ADDRESS       IS_ SOURCE                              CON_ID
--------------- ---------------- --- ------------------------------- ----------
eth2:1          169.254.10.239   NO                                           0
eth3:1          169.254.245.28   NO                                           0

V$CLUSTER_INTERCONNECTS displays one or more interconnects that are being used for cluster communication.

    查詢v$cluster_interconnects發現,當前RAC環境使用的是HAIP,請注意:這裡顯示的是HAIP地址,並不是系統配置的地址,這和之後的顯示是有區別的。


四.調整cluster_interconnects引數。

    調整cluster_interconnects引數,為了儘可能大的提高心跳頻寬,我們為每臺機器配置了9個心跳網路:
SQL> alter system set cluster_interconnects="10.0.1.20:10.0.2.20:10.0.3.20:10.0.4.20:10.0.5.20:10.0.6.20:10.0.7.20:10.0.8.20:10.0.9.20" scope=spfile sid='orcl1';    <<<< 注意IP之間用冒號隔開,雙引號引起來;設定cluster_interconnects引數將覆蓋掉通過oifcfg getif命令檢視到的clusterware心跳網路,該網路也是RAC心跳通訊的預設網路。

System altered.

SQL> alter system set cluster_interconnects="10.0.1.21:10.0.2.21:10.0.3.21:10.0.4.21:10.0.5.21:10.0.6.21:10.0.7.21:10.0.8.21:10.0.9.21" scope=spfile sid='orcl2';

System altered.

重啟資料庫例項收到如下報錯:
Advanced Analytics and Real Application Testing options
[oracle@rhel1 ~]$ srvctl stop database -d orcl
[oracle@rhel1 ~]$ srvctl start database -d orcl
PRCR-1079 : Failed to start resource ora.orcl.db
CRS-5017: The resource action "ora.orcl.db start" encountered the following error: 
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:ip_list failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpcini
ORA-27303: additional information: Too many IPs specified to SKGXP.  Max supported is 4, given 9.
. For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/rhel2/crs/trace/crsd_oraagent_oracle.trc".

CRS-2674: Start of 'ora.orcl.db' on 'rhel2' failed
CRS-5017: The resource action "ora.orcl.db start" encountered the following error: 
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:ip_list failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpcini
ORA-27303: additional information: Too many IPs specified to SKGXP.  Max supported is 4, given 9.
. For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/rhel1/crs/trace/crsd_oraagent_oracle.trc".

CRS-2674: Start of 'ora.orcl.db' on 'rhel1' failed
CRS-2632: There are no more servers to try to place resource 'ora.orcl.db' on that would satisfy its placement policy

看來即使是使用cluster_interconnects網路地址也不能超過4個,這個跟HAIP一致。

於是,去掉後面的5個IP,保留前4個IP用於心跳網路:
節點1:10.0.1.20:10.0.2.20:10.0.3.20:10.0.4.20
節點2:10.0.1.21:10.0.2.21:10.0.3.21:10.0.4.21


五.測試cluster_interconnects引數容錯的能力。

下面我們來測試一下cluster_interconnects的容錯能力:

SQL> set linesize 200
SQL> select * from v$cluster_interconnects;

NAME            IP_ADDRESS       IS_ SOURCE                              CON_ID
--------------- ---------------- --- ------------------------------- ----------
eth2            10.0.1.20        NO  cluster_interconnects parameter          0
eth3            10.0.2.20        NO  cluster_interconnects parameter          0
eth4            10.0.3.20        NO  cluster_interconnects parameter          0
eth5            10.0.4.20        NO  cluster_interconnects parameter          0

重啟例項之後發現當前RAC使用之前指定的4個IP用於心跳網路。

RAC雙節點例項都正常執行:
[oracle@rhel1 ~]$ srvctl status database -d orcl
Instance orcl1 is running on node rhel1
Instance orcl2 is running on node rhel2

手動down掉節點1的其中一個心跳網路卡:
[root@rhel1 ~]# ifdown eth4                     <<<<  該網路卡不是HAIP其中的IP網口。

[oracle@rhel1 ~]$ srvctl status database -d orcl
Instance orcl1 is running on node rhel1
Instance orcl2 is running on node rhel2
通過srvctl工具顯示例項依然是執行狀態。

用sqlplus本地登陸:
[oracle@rhel1 ~]$ sql

SQL*Plus: Release 12.1.0.2.0 Production on Tue Oct 20 18:11:35 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Connected.
SQL>    
這個狀態顯然不對了。

檢查告警日誌,收到如下報錯:
2015-10-20 18:10:22.996000 +08:00
SKGXP: ospid 32107: network interface query failed for IP address 10.0.3.20.
SKGXP: [error 32607] 
2015-10-20 18:10:31.600000 +08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_qm03_453.trc  (incident=29265) (PDBNAME=CDB$ROOT):
ORA-00603: ORACLE server session terminated by fatal error
ORA-27501: IPC error creating a port
ORA-27300: OS system dependent operation:bind failed with status: 99
ORA-27301: OS failure message: Cannot assign requested address
ORA-27302: failure occurred at: sskgxpsock
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_29265/orcl1_qm03_453_i29265.trc
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_cjq0_561.trc  (incident=29297) (PDBNAME=CDB$ROOT):
ORA-00603: ORACLE server session terminated by fatal error
ORA-27544: Failed to map memory region for export
ORA-27300: OS system dependent operation:bind failed with status: 99
ORA-27301: OS failure message: Cannot assign requested address
ORA-27302: failure occurred at: sskgxpsock
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_29297/orcl1_cjq0_561_i29297.trc
2015-10-20 18:10:34.724000 +08:00
Dumping diagnostic data in directory=[cdmp_20151020181034], requested by (instance=1, osid=561 (CJQ0)), summary=[incident=29297].
2015-10-20 18:10:35.819000 +08:00
Dumping diagnostic data in directory=[cdmp_20151020181035], requested by (instance=1, osid=453 (QM03)), summary=[incident=29265].

從日誌來看,例項並沒有down掉,HANG在那裡了,檢視另一個節點的資料庫例項日誌,發現RAC的其他例項並沒有報錯,不受影響。

手動恢復網路卡:
[root@rhel1 ~]# ifup eth4

隨即例項恢復正常,整個過程例項並沒有down掉。

那HAIP對應的網口down掉會不會影響例項呢?於是將eth2 down掉:
[root@rhel1 ~]# ifdown eth2

從測試來看,例項依然hang住,跟down掉非HAIP網口的情況一致,網口恢復後例項即恢復正常。

    總結:從測試來看,不管指定的是HAIP網口,還是非HAIP網口,設定cluster_interconnects引數都將使心跳網路不具備容錯能力,任何一個指定的網口出現問題,都將使例項HANG住,直到網口恢復正常,例項才能恢復正常,同時cluster_interconnects引數也只支援到4個IP地址。
雖然在RAC環境多資料庫的情況下,通過設定資料庫例項的cluster_interconnects初始化引數可以覆蓋預設的clusterware心跳網路,多個資料庫例項的心跳通訊相互隔離,但指定的任何網路卡出現故障都會引起例項HANG住,高可用性沒有得到保障。


相關文章:
   《Oracle CLUSTER_INTERCONNECTS引數詳解》:http://blog.itpub.net/23135684/viewspace-714734/

--end--

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/23135684/viewspace-1815252/,如需轉載,請註明出處,否則將追究法律責任。

相關文章