11gR2私有ip修改錯誤導致crs無法啟動解決方法

梓沐發表於2016-06-01

--oifcfg使用幫助
[root@rac1 grid]# oifcfg -help

Name:

    oifcfg - Oracle Interface Configuration Tool.

Usage:  oifcfg iflist [-p [-n]]

    oifcfg setif {-node | -global} {/:}...

   oifcfg getif [-node | -global] [ -if [/] [-type ] ]

    oifcfg delif {{-node | -global} [[/]] [-force] | -force}

    oifcfg [-help]

    - name of the host, as known to a communications network

      - name by which the interface is configured in the system

       - subnet address of the interface

      - type of the interface { cluster_interconnect | public }

這裡面特別要注意的是,setif中寫入的是subnet,這個如果寫不對就會發生crs無法啟動的情況

--我本機上的hosts配置情況

[root@rac1 grid]# cat /etc/hosts

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4

::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

#node1

192.168.8.221   rac1 rac1.oracle.com

192.168.8.242   rac1-vip 

172.168.0.18    rac1-priv

#node2

192.168.8.223   rac2 rac2.oracle.com

192.168.8.244   rac2-vip

172.168.0.19    rac2-priv

#scan-ip

192.168.8.245   rac-cluster rac-cluster-scan


--檢視ip

[root@rac1 grid]# oifcfg getif

eth0  192.168.8.0  global  public

eth1  172.168.0.0  global  cluster_interconnect

[root@rac1 grid]# ifconfig eth1

eth1      Link encap:Ethernet  HWaddr 08:00:27:72:5A:8F 

          inet addr:172.168.0.18  Bcast:172.168.255.255  Mask:255.255.0.0

          

可以看到這裡的Mask:255.255.0.0而不是255.255.255.0,所以如果要將hosts檔案中私有IP172.168.0.18修改成172.168.8.18時,在setif時輸入的subnet就要注意subnet,如下可以說明:

因為ifconfig eth1查出來Mask255.255.0.0

[root@rac1 grid]# ipcalc -bnm 172.168.0.18 255.255.0.0

NETMASK=255.255.0.0

BROADCAST=172.168.255.255

NETWORK=172.168.0.0

[root@rac1 grid]# ipcalc -bnm 172.168.8.18 255.255.0.0

NETMASK=255.255.0.0

BROADCAST=172.168.255.255

NETWORK=172.168.0.0

如上可以看出,在Mask255.255.0.0情況下,subnet就算修改了ip,(上面是NETWORK=172.168.0.0)其實是不變的

如果這裡查出來Mask255.255.255.0

[root@rac1 grid]# ipcalc -bnm 172.168.0.18 255.255.255.0

NETMASK=255.255.255.0

BROADCAST=172.168.0.255

NETWORK=172.168.0.0

[root@rac1 grid]# ipcalc -bnm 172.168.8.18 255.255.255.0

NETMASK=255.255.255.0

BROADCAST=172.168.8.255

NETWORK=172.168.8.0

如上可以看的出來subnet發生了變化,所以這裡要格外的注意

不幸的是我在這裡就疏忽了,在修改私有IP時,subnet錯寫成了172.168.8.0,以下是故障重現場景

--刪除Private配置

[grid@racl ~]$ oifcfg delif -global eth1

PRIF-31: Failed to delete the specified network interface because it is the last private interface

11.2.0.2以後的版本,是無法直接刪除最後一個private IP ,如果要刪除,必須先新增一個。然後重啟CRS,再刪除舊的private資訊即可。

--檢視網路卡配置

[grid@racl ~]$ oifcfg getif -global

eth0  192.168.8.0  global  public

eth1  172.168.0.0  global  cluster_interconnect

--新增新的private配置(注意這裡是錯誤的,正確的應該是172.168.0.0)

[grid@rac1 ~]$ oifcfg setif -global eth1/172.168.8.0:cluster_interconnect

--檢視修改後的配置

[grid@rac1 ~]$ oifcfg getif -global

eth0  192.168.8.0  global  public

eth1  172.168.0.0  global  cluster_interconnect

eth1  172.168.8.0  global  cluster_interconnect

--刪除舊配置

[grid@rac1 ~]$ oifcfg delif -global eth1/172.168.0.0

--再次驗證:

[grid@rac1 ~]$ oifcfg getif -global

eth0  192.168.8.0  global  public

eth1  172.168.8.0  global  cluster_interconnect

--root停止所有節點上的clusterware

[root@racl ~]# crsctl stop crs -f

[root@rac2 ~]# crsctl stop crs -f

--重新啟動crs時,log日誌裡面報錯如下:

[/u01/app/11.2.0/grid/bin/orarootagent.bin(3196)]CRS-5818:Aborted command 'start' for resource 'ora.cluster_interconnect.haip'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/11.2.0/grid/log/rac1/agent/ohasd/ora

rootagent_root/orarootagent_root.log.

2016-06-01 06:47:30.200:

[ohasd(3017)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.cluster_interconnect.haip'. Details at (:CRSPE00111:) {0:0:2} in /u01/app/11.2.0/grid/log/rac1/ohasd/ohasd.log.

2016-06-01 06:48:30.227:

....

[/u01/app/11.2.0/grid/bin/orarootagent.bin(17207)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) {0:5:44} in /u01/app/11.2.0/grid/log/rac1/agent/

crsd/orarootagent_root/orarootagent_root.log.

--check crs時也報錯

[root@rac1 rac1]# crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4535: Cannot communicate with Cluster Ready Services

CRS-4530: Communications failure contacting Cluster Synchronization Services daemon

CRS-4534: Cannot communicate with Event Manager

恢復步驟如下:

--兩個節點關閉crs

[root@rac1 rac1]# crsctl stop crs -f

CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac1'

CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac1'

CRS-2673: Attempting to stop 'ora.cssd' on 'rac1'

CRS-2677: Stop of 'ora.cssd' on 'rac1' succeeded

CRS-2673: Attempting to stop 'ora.crf' on 'rac1'

CRS-2677: Stop of 'ora.mdnsd' on 'rac1' succeeded

CRS-2677: Stop of 'ora.crf' on 'rac1' succeeded

CRS-2673: Attempting to stop 'ora.gipcd' on 'rac1'

CRS-2677: Stop of 'ora.gipcd' on 'rac1' succeeded

CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac1'

CRS-2677: Stop of 'ora.gpnpd' on 'rac1' succeeded

CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac1' has completed

CRS-4133: Oracle High Availability Services has been stopped.

--已獨佔方式啟動crs(這時haip啟動就會報錯,不用管)

[root@rac1 rac1]# crsctl start crs -excl -nocrs

CRS-4123: Oracle High Availability Services has been started.

CRS-2672: Attempting to start 'ora.mdnsd' on 'rac1'

CRS-2676: Start of 'ora.mdnsd' on 'rac1' succeeded

CRS-2672: Attempting to start 'ora.gpnpd' on 'rac1'

CRS-2676: Start of 'ora.gpnpd' on 'rac1' succeeded

CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1'

CRS-2672: Attempting to start 'ora.gipcd' on 'rac1'

CRS-2676: Start of 'ora.cssdmonitor' on 'rac1' succeeded

CRS-2676: Start of 'ora.gipcd' on 'rac1' succeeded

CRS-2672: Attempting to start 'ora.cssd' on 'rac1'

CRS-2672: Attempting to start 'ora.diskmon' on 'rac1'

CRS-2676: Start of 'ora.diskmon' on 'rac1' succeeded

CRS-2676: Start of 'ora.cssd' on 'rac1' succeeded

CRS-2679: Attempting to clean 'ora.cluster_interconnect.haip' on 'rac1'

CRS-2672: Attempting to start 'ora.ctssd' on 'rac1'

CRS-2681: Clean of 'ora.cluster_interconnect.haip' on 'rac1' succeeded

CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac1'

CRS-2676: Start of 'ora.ctssd' on 'rac1' succeeded

CRS-5017: The resource action "ora.cluster_interconnect.haip start" encountered the following error:

Start action for HAIP aborted. For details refer to "(:CLSN00107:)" in "/u01/app/11.2.0/grid/log/rac1/agent/ohasd/orarootagent_root/orarootagent_root.

CRS-2674: Start of 'ora.cluster_interconnect.haip' on 'rac1' failed

CRS-2679: Attempting to clean 'ora.cluster_interconnect.haip' on 'rac1'

CRS-2681: Clean of 'ora.cluster_interconnect.haip' on 'rac1' succeeded

CRS-2673: Attempting to stop 'ora.ctssd' on 'rac1'

CRS-2677: Stop of 'ora.ctssd' on 'rac1' succeeded

CRS-4000: Command Start failed, or completed with errors.

--備份crs的配置資訊

[root@rac1 ~]# mkdir /home/oracle/gpnp

[root@rac1 ~]# export GPNPDIR=/home/oracle/gpnp

[root@rac1 ~]# gpnptool get -o=$GPNPDIR/profile.xml

Resulting profile written to "/home/oracle/gpnp/profile.xml".

Success.

[root@rac1 ~]# cat /home/oracle/gpnp/profile.xml

+DlT/meivHNPx1yzXh/Lh5gpB6w=jUZ0IQfYt5dvupziJf8nPo/KtWu2aPl3nl0ute/RAPYYkIOw3ZvDqHuREggvNsgDKGv28mLeDVzmt0N1aU0QprVrg3Rxlt1R3AFxREukvqawQ4BLwbiEo2yoBcBNhP1AQV7ZVdgQqX9FYntVcKNZeP7pMnMpJcmG2Cp87iop05U=

--檢視crs配置資訊

[root@rac1 ~]# gpnptool get

Warning: some command line parameters were defaulted. Resulting command line:

         /u01/app/11.2.0/grid/bin/gpnptool.bin get -o-

+DlT/meivHNPx1yzXh/Lh5gpB6w=jUZ0IQfYt5dvupziJf8nPo/KtWu2aPl3nl0ute/RAPYYkIOw3ZvDqHuREggvNsgDKGv28mLeDVzmt0N1aU0QprVrg3Rxlt1R3AFxREukvqawQ4BLwbiEo2yoBcBNhP1AQV7ZVdgQqX9FYntVcKNZeP7pMnMpJcmG2Cp87iop05U=

Success.

--修改備份的CRS配置資訊

--備份配置檔案

[root@rac1 ~]# cp $GPNPDIR/profile.xml $GPNPDIR/p.xml

--獲取當前的crs序列號

[root@rac1 ~]# gpnptool getpval -p=$GPNPDIR/p.xml -prf_sq -o-

29

--獲取公有網路和私有網路標識(與實際網路卡名稱不一致,可以在配置檔案中找到)

[root@rac1 ~]# gpnptool getpval -p=$GPNPDIR/p.xml -net -o-

net1 net2

--修改配置檔案中的序列號(原序列號值加1,即29+1=30)和私網的正確實際網段(subnet172.168.0.0)資訊:

[root@rac1 ~]# gpnptool edit -p=$GPNPDIR/p.xml -o=$GPNPDIR/p.xml -ovr -prf_sq=30 -net2:net_ip=172.168.0.0

Resulting profile written to "/home/oracle/gpnp/p.xml".

Success.

--用私鑰重新標識配置檔案

[root@rac1 ~]# gpnptool sign -p=$GPNPDIR/p.xml -o=$GPNPDIR/p.xml -ovr -w=cw-fs:peer

Resulting profile written to "/home/oracle/gpnp/p.xml".

Success.

--將配置檔案資訊回寫到crs

[root@rac1 ~]# gpnptool put -p=$GPNPDIR/p.xml

Success.

--驗證crs中配置資訊

[root@rac1 ~]# gpnptool find -c=rac-cluster(這個為hosts檔案中的scan name)

Found 1 instances of service 'gpnp'.

    mdns:service:gpnp._tcp.local.://rac1:44022/agent=gpnpd,cname=rac-cluster,host=rac1,pid=23802/gpnpd h:rac1 c:rac-cluster

[root@rac1 ~]# gpnptool rget -h=rac1(這個為節點一的主機名)

Warning: some command line parameters were defaulted. Resulting command line:

         /u01/app/11.2.0/grid/bin/gpnptool.bin rget -h=rac1 -o-

Found 1 gpnp service instance(s) to rget profile from.

RGET from tcp://rac1:44022 (mdns:service:gpnp._tcp.local.://rac1:44022/agent=gpnpd,cname=rac-cluster,host=rac1,pid=23802/gpnpd h:rac1 c:rac-cluster):

0dwyjB220ul3DWEmv5pAz1GzH4w=fuboD8S5uj1LH7A/Wdg321x6QGfQ4wkzSj/yXk9SnTVYuGwi2E9+XXaVk/pos8pVHqiChsuiWwGhjXZxnIuJrMrRF+t06PGqGlBxf0JQ557OmT1WZOvgsb1QPbRjb2tSqaazDIfG+y0ps0nNZMO5E4d2zITqmcBRUkV5UBnrvj8=

Success.

--啟動crsd程式

[root@rac1 ~]# crsctl start res ora.crsd -init

CRS-2672: Attempting to start 'ora.ctssd' on 'rac1'

CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac1'

CRS-2676: Start of 'ora.ctssd' on 'rac1' succeeded

CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rac1' succeeded

CRS-2672: Attempting to start 'ora.asm' on 'rac1'

CRS-2676: Start of 'ora.asm' on 'rac1' succeeded

CRS-2672: Attempting to start 'ora.crsd' on 'rac1'

CRS-2676: Start of 'ora.crsd' on 'rac1' succeeded

--檢視私有網路配置(會顯示有警告)

[root@rac1 ~]# oifcfg getif

eth0  192.168.8.0  global  public

eth1  172.168.0.0  global  cluster_interconnect

Only in OCR: eth1  172.168.8.0  global  cluster_interconnect

PRIF-30: Network information in OCR and GPnP profile differs

--修改私有網路配置

[root@rac1 ~]# oifcfg setif -global eth1/172.168.0.0:cluster_interconnect

--再次檢視警告消失

[root@rac1 ~]# oifcfg getif

eth0  192.168.8.0  global  public

eth1  172.168.0.0  global  cluster_interconnect

--重啟crs服務

[root@rac1 ~]# crsctl stop crs -f

CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac1'

CRS-2673: Attempting to stop 'ora.crsd' on 'rac1'

CRS-2677: Stop of 'ora.crsd' on 'rac1' succeeded

CRS-2673: Attempting to stop 'ora.ctssd' on 'rac1'

CRS-2673: Attempting to stop 'ora.asm' on 'rac1'

CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac1'

CRS-2677: Stop of 'ora.mdnsd' on 'rac1' succeeded

CRS-2677: Stop of 'ora.asm' on 'rac1' succeeded

CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'rac1'

CRS-2677: Stop of 'ora.ctssd' on 'rac1' succeeded

CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'rac1' succeeded

CRS-2673: Attempting to stop 'ora.cssd' on 'rac1'

CRS-2677: Stop of 'ora.cssd' on 'rac1' succeeded

CRS-2673: Attempting to stop 'ora.gipcd' on 'rac1'

CRS-2677: Stop of 'ora.gipcd' on 'rac1' succeeded

CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac1'

CRS-2677: Stop of 'ora.gpnpd' on 'rac1' succeeded

CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac1' has completed

CRS-4133: Oracle High Availability Services has been stopped.

--兩個節點都啟動

[root@rac1 ~]# crsctl start crs

CRS-4123: Oracle High Availability Services has been started.

[root@rac2 ~]# crsctl start crs

CRS-4123: Oracle High Availability Services has been started.

--驗證crs

[root@rac1 ~]# crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4535: Cannot communicate with Cluster Ready Services

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

[root@rac2 ~]# crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4535: Cannot communicate with Cluster Ready Services

CRS-4529: Cluster Synchronization Services is online

CRS-4534: Cannot communicate with Event Manager

[root@rac1 ~]# crs_stat -t

Name           Type           Target    State     Host       

------------------------------------------------------------

ora.DATADG.dg  ora....up.type ONLINE    ONLINE    rac1       

ora....ER.lsnr ora....er.type ONLINE    ONLINE    rac1       

ora....N1.lsnr ora....er.type ONLINE    ONLINE    rac1       

ora....EMDG.dg ora....up.type ONLINE    ONLINE    rac1       

ora.asm        ora.asm.type   ONLINE    ONLINE    rac1       

ora.cvu        ora.cvu.type   ONLINE    ONLINE    rac1       

ora.gsd        ora.gsd.type   OFFLINE   OFFLINE              

ora....network ora....rk.type ONLINE    ONLINE    rac1       

ora.oc4j       ora.oc4j.type  ONLINE    ONLINE    rac1       

ora.ons        ora.ons.type   ONLINE    ONLINE    rac1       

ora.orcl.db    ora....se.type ONLINE    ONLINE    rac1       

ora....taf.svc ora....ce.type ONLINE    ONLINE    rac1       

ora....SM1.asm application    ONLINE    ONLINE    rac1       

ora....C1.lsnr application    ONLINE    ONLINE    rac1       

ora.rac1.gsd   application    OFFLINE   OFFLINE              

ora.rac1.ons   application    ONLINE    ONLINE    rac1       

ora.rac1.vip   ora....t1.type ONLINE    ONLINE    rac1       

ora....SM2.asm application    ONLINE    ONLINE    rac2       

ora....C2.lsnr application    ONLINE    ONLINE    rac2       

ora.rac2.gsd   application    OFFLINE   OFFLINE              

ora.rac2.ons   application    ONLINE    ONLINE    rac2       

ora.rac2.vip   ora....t1.type ONLINE    ONLINE    rac2       

ora.scan1.vip  ora....ip.type ONLINE    ONLINE    rac1

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29812844/viewspace-2112502/,如需轉載,請註明出處,否則將追究法律責任。

相關文章