Oracle 10gR2 RAC Clusterware ONS服務的管理

尛樣兒發表於2012-06-12
       
        下面透過一個實際的案例討論ONS服務的管理。
        在10gR2 RAC環境中,表決磁碟資料丟失,且沒有備份,於是準備清空Clusterware配置資訊,重新執行root.sh指令碼來恢復Clusterware的執行。參考文章:http://space.itpub.net/23135684/viewspace-721081成功執行了/u01/crs/bin/racgons add_config rhel:6251 rhel2:6251命令,之後執行vipca指令碼建立兩個節點的nodeapps(請注意:vipca指令碼會自動建立ons服務,所以之前使用racgons建立ons是沒必要的),但是在建立和啟動過程中發現第二個節點的ons服務無法啟動,檢視第二個節點的ons日誌:
ons日誌的位置是:
/u01/app/oracle/crs/log/rhel2/racg/ora.rhel2.ons.log
格式是: $ORA_CRS_HOME/log//racg/ora..ons.log 
跟蹤日誌發現如下資訊:
2012-06-12 17:21:05.030: [    RACG][3999896128] [13230][3999896128][ora.rhel2.ons]: GETHOSTBYNAME(rhel): 2
GETHOSTBYNAME(rhel): 2
Remote port for local node in local config does not match that from OCR.
Number of configuration nodes retrieved: 2
0: {node = rhel, port = 6251}
Adding remote host rhel:6251
1: {node = rhel2, port = 6251}

2012-06-12 17:21:05.032: [    RACG][3999896128] [13230][3999896128][ora.rhel2.ons]: GETHOSTBYNAME(rhel): 2
GETHOSTBYNAME(rhel): 2
Remote port for local node in local config does not match that from OCR.
GETHOSTBYNAME(rhel): 2
GETHOSTBYNAME(rhel): 2
Remote port for local node in local config does not match that from OCR.

2012-06-12 17:21:05.032: [    RACG][3999896128] [13230][3999896128][ora.rhel2.ons]: Number of configuration nodes retrieved: 2
0: {node = rhel, port = 6251}
Adding remote host rhel:6251
1: {node = rhel2, port = 6251}
onsctl: ons failed to start

2012-06-12 17:21:05.133: [    RACG][3999896128] [13230][3999896128][ora.rhel2.ons]: clsrcexecut: env ORACLE_CONFIG_HOME=/u01/app/oracle/crs

2012-06-12 17:21:05.133: [    RACG][3999896128] [13230][3999896128][ora.rhel2.ons]: clsrcexecut: cmd = /u01/app/oracle/crs/bin/racgeut -e _USR_ORA_DEBUG=0 540 /u01/app/oracle/crs/bin/onsctl start

2012-06-12 17:21:05.133: [    RACG][3999896128] [13230][3999896128][ora.rhel2.ons]: clsrcexecut: rc = 1, time = 1.650s

2012-06-12 17:21:05.448: [    RACG][3999896128] [13230][3999896128][ora.rhel2.ons]: GETHOSTBYNAME(rhel): 2
GETHOSTBYNAME(rhel): 2
Remote port for local node in local config does not match that from OCR.
Number of configuration nodes retrieved: 2
0: {node = rhel, port = 6251}
Adding remote host rhel:6251
1: {node = rhel2, port = 6251}

2012-06-12 17:21:05.448: [    RACG][3999896128] [13230][3999896128][ora.rhel2.ons]: ons is not running ...

2012-06-12 17:21:05.448: [    RACG][3999896128] [13230][3999896128][ora.rhel2.ons]: clsrcexecut: env ORACLE_CONFIG_HOME=/u01/app/oracle/crs

2012-06-12 17:21:05.448: [    RACG][3999896128] [13230][3999896128][ora.rhel2.ons]: clsrcexecut: cmd = /u01/app/oracle/crs/bin/racgeut -e _USR_ORA_DEBUG=0 540 /u01/app/oracle/crs/bin/onsctl ping

2012-06-12 17:21:05.448: [    RACG][3999896128] [13230][3999896128][ora.rhel2.ons]: clsrcexecut: rc = 1, time = 0.310s

2012-06-12 17:21:05.448: [    RACG][3999896128] [13230][3999896128][ora.rhel2.ons]: end for resource = ora.rhel2.ons, action = start, status = 1, time = 2.060s

2012-06-12 17:21:07.228: [    RACG][740729408] [13260][740729408][ora.rhel2.ons]: onsctl: shutting down ons daemon ...
GETHOSTBYNAME(rhel): 2
GETHOSTBYNAME(rhel): 2
Remote port for local node in local config does not match that from OCR.
Number of configuration nodes retrieved: 2
0: {node = rhel, port = 6251}

2012-06-12 17:21:07.228: [    RACG][740729408] [13260][740729408][ora.rhel2.ons]: Adding remote host rhel:6251
1: {node = rhel2, port = 6251}
onsctl: shutdown of ons failed!

2012-06-12 17:21:07.228: [    RACG][740729408] [13260][740729408][ora.rhel2.ons]: clsrcexecut: env ORACLE_CONFIG_HOME=/u01/app/oracle/crs

2012-06-12 17:21:07.228: [    RACG][740729408] [13260][740729408][ora.rhel2.ons]: clsrcexecut: cmd = /u01/app/oracle/crs/bin/racgeut -e _USR_ORA_DEBUG=0 540 /u01/app/oracle/crs/bin/onsctl stop

2012-06-12 17:21:07.228: [    RACG][740729408] [13260][740729408][ora.rhel2.ons]: clsrcexecut: rc = 3, time = 0.470s

        從上面的日誌可以看出應該是兩個節點的埠不匹配導致的問題,手動建立ONS服務使用的是6251埠,使用vipca建立的可能不是6251埠,所以導致兩邊的埠不匹配。

一.onsctl工具
下面是onsctl工具的幫助資訊:
[root@rhel1 bin]# ./onsctl
usage: ./onsctl start|stop|ping|reconfig|debug

start                            - Start opmn only.
stop                             - Stop ons daemon
ping                             - Test to see if ons daemon is running
debug                            - Display debug information for the ons daemon
reconfig                         - Reload the ons configuration
help                             - Print a short syntax description (this).
detailed                         - Print a verbose syntax description.

[root@rhel1 bin]# ./onsctl detailed
usage: ./onsctl start|stop|ping|reconfig|debug

start
    Start ons daemon

stop
    Shutdown ons daemon

reconfig
    Trigger ons to re-read it's configuration files.

ping
    Test to see if ons daemon is alive

debug
    Display debug information about the ons daemon

help
    Print a short syntax description.

detailed
    Print a verbose syntax description (this message).


在第一個節點執行onsctl ping命令:
[root@rhel1 bin]# ./onsctl ping
Number of configuration nodes retrieved: 2
0: {node = rhel, port = 6251}
GETHOSTBYNAME(rhel): 2
Adding remote host rhel:6251
GETHOSTBYNAME(rhel): 2
1: {node = rhel2, port = 6251}
Adding remote host rhel2:6251
ons is running ...
        ons在第一個節點已經處於執行狀態。

在第二個節點執行onsctl ping命令:
[root@rhel2 bin]# ./onsctl ping
Number of configuration nodes retrieved: 2
0: {node = rhel, port = 6251}
GETHOSTBYNAME(rhel): 2
Adding remote host rhel:6251
GETHOSTBYNAME(rhel): 2
1: {node = rhel2, port = 6251}
Remote port for local node in local config does not match that from OCR.
ons is not running ...

發現第二個節點ons因為埠與第一個節點不匹配的原因而沒有啟動。

二.檢視節點程式:
檢視第一個節點的ons程式:
[root@rhel1 bin]# ps -ef | grep ons
root      2412     1  0 16:47 ?        00:00:00 sendmail: accepting connections
oracle   13513     1  0 17:17 ?        00:00:00 /u01/app/oracle/crs/opmn/bin/ons -d
oracle   13515 13513  0 17:17 ?        00:00:00 /u01/app/oracle/crs/opmn/bin/ons -d
root     15646  3340  0 17:22 pts/0    00:00:00 grep ons

檢視第二個節點的osn程式:
[root@rhel2 bin]# ps -ef | grep ons
root      2400     1  0 16:45 ?        00:00:00 sendmail: accepting connections
root     13847  3546  0 17:22 pts/0    00:00:00 grep ons


三.ONS配置檔案
執行find命令找到了ons的配置檔案,如下:
./opmn/conf/ons.config.tmp
./opmn/conf/ons.config
./opmn/conf/ons.config.backup.10205

[root@rhel1 crs]# cat ./opmn/conf/ons.config
localport=6113
remoteport=6200
loglevel=3
useocr=on

顯然配置檔案中的埠與執行racgons配置的6251不匹配。

四.RACGONS工具
RACGONS的幫助資訊如下:
[root@rhel1 bin]# ./racgons
To add ONS daemons configuration:
./racgons.bin add_config hostname:port [hostname:port] ...
To remove ONS daemons configuration:
./racgons.bin remove_config hostname[:port] [hostname:port] ...

        在OCR中可能配置有兩條ONS的資訊,執行以下的命令刪除原有的6251埠配置:
[root@rhel1 bin]# ./racgons remove_config rhel:6251 rhel2:6251
racgons: Existing key value on rhel = 6251.
racgons: rhel:6251 removed from OCR.
racgons: Existing key value on rhel2 = 6251.
racgons: rhel2:6251 removed from OCR.

重新啟動nodeapps:
[root@rhel1 bin]# ./srvctl start nodeapps -n rhel2
[root@rhel1 bin]# ./srvctl start nodeapps -n rhel1

檢視兩個節點的狀態:
[root@rhel1 bin]# ./crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora.rhel1.gsd  application    ONLINE    ONLINE    rhel1
ora.rhel1.ons  application    ONLINE    ONLINE    rhel1
ora.rhel1.vip  application    ONLINE    ONLINE    rhel1
ora.rhel2.gsd  application    ONLINE    ONLINE    rhel2
ora.rhel2.ons  application    ONLINE    ONLINE    rhel2
ora.rhel2.vip  application    ONLINE    ONLINE    rhel2

恢復正常。
--end--



來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/23135684/viewspace-732562/,如需轉載,請註明出處,否則將追究法律責任。

相關文章