【轉】How to recover from root.sh on 11.2 Grid Infrastructure Failed

ewelamb發表於2013-09-05

從10g的clusterware到11g Release2的Grid Infrastructure,Oracle往RAC這個框架裡塞進了太多東西。雖然照著Step by Step Installation指南步步為營地去安裝11.2.0.1的GI,但在實際執行root.sh指令碼的時候,不免又要出現這樣那樣的錯誤。例如下面的一例:

[root@rh3 grid]# ./root.sh
Running Oracle 11g root.sh script...

The following environment variables are set as:
    ORACLE_OWNER= maclean
    ORACLE_HOME=  /u01/app/11.2.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]: 

The file "dbhome" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]: The file "oraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]:
The file "coraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]: 

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root.sh script.
Now product-specific root actions will be performed.
2011-03-28 20:43:13: Parsing the host name
2011-03-28 20:43:13: Checking for super user privileges
2011-03-28 20:43:13: User has super user privileges
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
LOCAL ADD MODE
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Adding daemon to inittab
CRS-4123: Oracle High Availability Services has been started.
ohasd is starting

ADVM/ACFS is not supported on oraclelinux-release-5-5.0.2

一個節點上的root.sh指令碼執行居然提示說ADVM/ACFS不支援OEL 5.5,但實際上Redhat 5或者OEL 5是目前僅有的少數支援ACFS的平臺(The ACFS install would be on a supported Linux release – either Oracle Enterprise Linux 5 or Red Hat 5)。

檢索Metalink發現這是一個Linux平臺上的。

因為以上Not Supported錯誤資訊在另外一個節點(也是Enterprise Linux Enterprise Linux Server release 5.5 (Carthage)) 執行root.sh指令碼時並未出現,那麼一般只要找出2個節點間的差異就可能解決問題了:

未出錯節點上release相關rpm包的情況

[maclean@rh6 tmp]$ cat /etc/issue
Enterprise Linux Enterprise Linux Server release 5.5 (Carthage)
Kernel \r on an \m

[maclean@rh6 tmp]$ rpm -qa|grep release
enterprise-release-notes-5Server-17
enterprise-release-5-0.0.22

出錯節點上release相關rpm包的情況

[root@rh3 tmp]# rpm -qa | grep release
oraclelinux-release-5-5.0.2
enterprise-release-5-0.0.22
enterprise-release-notes-5Server-17

以上可以看到相比起沒有出錯的節點,出錯節點上多安裝了一個名為oraclelinux-release-5-5.0.2的rpm包,我們嘗試來解除安裝該rpm是否能解決問題,補充實際上該問題也可以透過修改/tmp/.linux_release檔案的內容為enterprise-release-5-0.0.17來解決,而無需如我們這裡做的解除安裝名為oraclelinux-release-5*的rpm軟體包:

[root@rh3 install]# rpm -e oraclelinux-release-5-5.0.2

[root@rh3 grid]# ./root.sh
Running Oracle 11g root.sh script...

The following environment variables are set as:
    ORACLE_OWNER= maclean
    ORACLE_HOME=  /u01/app/11.2.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The file "dbhome" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]:
The file "oraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]:
The file "coraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]: 

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root.sh script.
Now product-specific root actions will be performed.
2011-03-28 20:57:21: Parsing the host name
2011-03-28 20:57:21: Checking for super user privileges
2011-03-28 20:57:21: User has super user privileges
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
CRS is already configured on this node for crshome=0
Cannot configure two CRS instances on the same cluster.
Please deconfigure before proceeding with the configuration of new home.

再次在失敗節點上執行root.sh,被提示告知需要首先deconfigure然後才能再次配置。在官方文件中介紹瞭如何反向配置11g release 2中的Grid Infrastructure(Deconfiguring Oracle Clusterware Without Removing Binaries):

/* 同為管理Grid Infra所以仍需要root使用者來執行以下操作 */

[root@rh3 grid]# pwd
/u01/app/11.2.0/grid

/* 目前位於GRID_HOME目錄下  */

[root@rh3 grid]# cd crs/install

/* 以-deconfig選項執行一個名為rootcrs.pl的指令碼 */

[root@rh3 install]# ./rootcrs.pl -deconfig
2011-03-28 21:03:05: Parsing the host name
2011-03-28 21:03:05: Checking for super user privileges
2011-03-28 21:03:05: User has super user privileges
Using configuration parameter file: ./crsconfig_params
VIP exists.:rh3
VIP exists.: //192.168.1.105/255.255.255.0/eth0
VIP exists.:rh6
VIP exists.: //192.168.1.103/255.255.255.0/eth0
GSD exists.
ONS daemon exists. Local port 6100, remote port 6200
eONS daemon exists. Multicast port 20796, multicast IP address 234.227.83.81, listening port 2016
Please confirm that you intend to remove the VIPs rh3 (y/[n]) y
ACFS-9200: Supported
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rh3'
CRS-2673: Attempting to stop 'ora.crsd' on 'rh3'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'rh3'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'rh3'
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'rh3' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'rh3' has completed
CRS-2677: Stop of 'ora.crsd' on 'rh3' succeeded
CRS-2673: Attempting to stop 'ora.mdnsd' on 'rh3'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'rh3'
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'rh3'
CRS-2673: Attempting to stop 'ora.ctssd' on 'rh3'
CRS-2673: Attempting to stop 'ora.evmd' on 'rh3'
CRS-2677: Stop of 'ora.cssdmonitor' on 'rh3' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'rh3' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'rh3' succeeded
CRS-2677: Stop of 'ora.evmd' on 'rh3' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'rh3' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'rh3'
CRS-2677: Stop of 'ora.cssd' on 'rh3' succeeded
CRS-2673: Attempting to stop 'ora.diskmon' on 'rh3'
CRS-2673: Attempting to stop 'ora.gipcd' on 'rh3'
CRS-2677: Stop of 'ora.gipcd' on 'rh3' succeeded
CRS-2677: Stop of 'ora.diskmon' on 'rh3' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rh3' has completed
CRS-4133: Oracle High Availability Services has been stopped.
Successfully deconfigured Oracle clusterware stack on this node

/* 如果以上deconfig操作未能成功反向配置那麼可以以-FORCE選項執行rootcrs.pl指令碼 */

[root@rh3 install]# ./rootcrs.pl -deconfig -force
2011-03-28 21:41:00: Parsing the host name
2011-03-28 21:41:00: Checking for super user privileges
2011-03-28 21:41:00: User has super user privileges
Using configuration parameter file: ./crsconfig_params
VIP exists.:rh3
VIP exists.: //192.168.1.105/255.255.255.0/eth0
VIP exists.:rh6
VIP exists.: //192.168.1.103/255.255.255.0/eth0
GSD exists.
ONS daemon exists. Local port 6100, remote port 6200
eONS daemon exists. Multicast port 20796, multicast IP address 234.227.83.81, listening port 2016
ACFS-9200: Supported
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rh3'
CRS-2673: Attempting to stop 'ora.crsd' on 'rh3'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'rh3'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'rh3'
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'rh3' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'rh3' has completed
CRS-2677: Stop of 'ora.crsd' on 'rh3' succeeded
CRS-2673: Attempting to stop 'ora.mdnsd' on 'rh3'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'rh3'
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'rh3'
CRS-2673: Attempting to stop 'ora.ctssd' on 'rh3'
CRS-2673: Attempting to stop 'ora.evmd' on 'rh3'
CRS-2677: Stop of 'ora.cssdmonitor' on 'rh3' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'rh3' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'rh3' succeeded
CRS-2677: Stop of 'ora.evmd' on 'rh3' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'rh3' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'rh3'
CRS-2677: Stop of 'ora.cssd' on 'rh3' succeeded
CRS-2673: Attempting to stop 'ora.diskmon' on 'rh3'
CRS-2673: Attempting to stop 'ora.gipcd' on 'rh3'
CRS-2677: Stop of 'ora.gipcd' on 'rh3' succeeded
CRS-2677: Stop of 'ora.diskmon' on 'rh3' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rh3' has completed
CRS-4133: Oracle High Availability Services has been stopped.
Successfully deconfigured Oracle clusterware stack on this node

/* 所幸以上這招總是能夠奏效,否則豈不是每次都要完全解除安裝後重新安裝GI? */

順利完成以上反向配置CRS後,就可以再次嘗試執行多災多難的root.sh了:

[root@rh3 grid]# pwd
/u01/app/11.2.0/grid

[root@rh3 grid]# ./root.sh
Running Oracle 11g root.sh script...

The following environment variables are set as:
    ORACLE_OWNER= maclean
    ORACLE_HOME=  /u01/app/11.2.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The file "dbhome" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]:
The file "oraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]:
The file "coraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]: 

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root.sh script.
Now product-specific root actions will be performed.
2011-03-28 21:07:29: Parsing the host name
2011-03-28 21:07:29: Checking for super user privileges
2011-03-28 21:07:29: User has super user privileges
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
LOCAL ADD MODE
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Adding daemon to inittab
CRS-4123: Oracle High Availability Services has been started.
ohasd is starting
FATAL: Module oracleoks not found.
FATAL: Module oracleadvm not found.
FATAL: Module oracleacfs not found.
acfsroot: ACFS-9121: Failed to detect /dev/asm/.asm_ctl_spec.

acfsroot: ACFS-9310: ADVM/ACFS installation failed.

acfsroot: ACFS-9311: not all components were detected after the installation.

CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node rh6, number 1, and is terminating
An active cluster was found during exclusive startup, restarting to join the cluster
CRS-2672: Attempting to start 'ora.mdnsd' on 'rh3'
CRS-2676: Start of 'ora.mdnsd' on 'rh3' succeeded
CRS-2672: Attempting to start 'ora.gipcd' on 'rh3'
CRS-2676: Start of 'ora.gipcd' on 'rh3' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'rh3'
CRS-2676: Start of 'ora.gpnpd' on 'rh3' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rh3'
CRS-2676: Start of 'ora.cssdmonitor' on 'rh3' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rh3'
CRS-2672: Attempting to start 'ora.diskmon' on 'rh3'
CRS-2676: Start of 'ora.diskmon' on 'rh3' succeeded
CRS-2676: Start of 'ora.cssd' on 'rh3' succeeded
CRS-2672: Attempting to start 'ora.ctssd' on 'rh3'
CRS-2676: Start of 'ora.ctssd' on 'rh3' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'rh3'
CRS-2676: Start of 'ora.crsd' on 'rh3' succeeded
CRS-2672: Attempting to start 'ora.evmd' on 'rh3'
CRS-2676: Start of 'ora.evmd' on 'rh3' succeeded
/u01/app/11.2.0/grid/bin/srvctl start vip -i rh3 ... failed
Preparing packages for installation...
cvuqdisk-1.0.7-1
Configure Oracle Grid Infrastructure for a Cluster ... failed
Updating inventory properties for clusterware
Starting Oracle Universal Installer...

Checking swap space: must be greater than 500 MB.   Actual 5023 MB    Passed
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /s01/oraInventory
'UpdateNodeList' was successful.

以上雖然繞過了”ADVM/ACFS is not supported”的問題,但又出現了”FATAL: Module oracleoks/oracleadvm/oracleacfs not found”,Linux下ACFS/ADVM相關載入Module無法找到的問題,查了下metalink發現這是GI 11.2.0.2中2個被確認的或,而實際我所安裝的是11.2.0.1版本的GI…….. 好了,所幸我目前的環境是使用NFS的儲存,所以如ADVM/ACFS這些儲存選項的問題可以忽略不計,準備在11.2.0.2上再測試下。

不得不說11.2.0.1版本GI的安裝存在太多的問題,以至於Oracle Support不得不撰寫了不少相關故障診斷的文件,例如:,。目前為止還沒體驗過11.2.0.2的GI,希望它不像上一個版本那麼糟糕!

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/7191998/viewspace-772226/,如需轉載,請註明出處,否則將追究法律責任。

相關文章