CRS-0184: Cannot communicate with the CRS daemon

jqs發表於2008-04-15
RAC節點重起有問題,檢視crs 狀態有如下錯誤
CRS-0184: Cannot communicate with the CRS daemon
[@more@]
前幾天搭建了兩節點的一個RAC資料庫
發現一個問題,當我重起兩臺主機的時候,有如下的問題:
虛擬IP全跑到其中一個節點上去了
# ifconfig -a
en0: flags=5e080863,c0
inet 182.1.21.151 netmask 0xffff0000 broadcast 182.1.255.255
inet 182.1.21.156 netmask 0xffff0000 broadcast 182.1.255.255
inet 182.1.21.157 netmask 0xffff0000 broadcast 182.1.255.255
tcp_sendspace 131072 tcp_recvspace 65536
en1: flags=5e080863,c0
inet 192.168.128.1 netmask 0xffffff00 broadcast 192.168.128.255
lo0: flags=e08084b
inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
inet6 ::1/0
tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1
#
啟動另一個節點,但啟動的時候卻沒有把vip在這臺上up起來
# ifconfig -a
en0: flags=5e080863,c0
inet 182.1.21.152 netmask 0xffff0000 broadcast 182.1.255.255 tcp_sendspace 131072 tcp_recvspace 65536
en1: flags=5e080863,c0
inet 192.168.128.2 netmask 0xffffff00 broadcast 192.168.128.255
lo0: flags=e08084b
inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
inet6 ::1/0
tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1
此時這個節點上檢視crs的程式,卻沒有發現什麼問題
# ps -ef|grep crs
oracle 30076 40192 2 15:20:51 - 0:00 /u01/crs/oracle/product/10.2.0/crs/bin/ocssd.bin
root 30902 37254 11 15:21:18 - 0:00 /bin/sh /u01/crs/oracle/product/10.2.0/crs/bin/racgwrap check
oracle 36098 37854 0 15:20:59 - 0:00 /u01/crs/oracle/product/10.2.0/crs/bin/evmlogger.bin -o /u01/crs/oracle/product/10.2.0/crs/evm/log/evmlogger.info -l /u01/crs/oracle/product/10.2.0/crs/evm/log/evmlogger.log
root 37254 1 15 15:21:14 - 0:00 /u01/crs/oracle/product/10.2.0/crs/bin/crsd.bin restart
oracle 37854 1 0 15:18:38 - 0:00 /u01/crs/oracle/product/10.2.0/crs/bin/evmd.bin
root 39134 41568 0 15:20:49 - 0:00 /u01/crs/oracle/product/10.2.0/crs/bin/oprocd run -t 1000 -m 500 -f
root 39514 30902 7 15:21:18 - 0:00 /u01/crs/oracle/product/10.2.0/crs/bin/racgmain check
oracle 40192 35124 0 15:20:51 - 0:00 /bin/sh -c ulimit -c unlimited; cd /u01/crs/oracle/product/10.2.0/crs/log/zj1/cssd; /u01/crs/oracle/product/10.2.0/crs/bin/ocssd || exit $?
root 41376 38730 0 15:21:18 pts/1 0:00 grep crs
但是資料庫等相關服務是沒有啟動的,只有前面那個節點上的是正常的
檢視crs狀態,提示
$crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon
後來重新啟動crs
# /etc/init.crs stop
Shutting down Oracle Cluster Ready Services (CRS):
Apr 14 15:13:45.107 | INF | daemon shutting down

Stopping resources.
Successfully stopped CRS resources
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.
Shutdown has begun. The daemons should exit soon.
# /etc/init.crs start
Startup will be queued to init within 30 seconds.
所有狀態就正常了
===================================
但再次重新啟動的時候還是有這個問題,看來要找到問題的根源,不能每次都這麼折騰
檢視crsd.log
發現有
2008-04-15 09:38:33.286: [ CRSMAIN][10548]32Failed to spawn a thread for UI connection. error=-11
2008-04-15 09:49:15.441: [ CRSMAIN][10548]32Failed to spawn a thread for UI connection. error=-11
......
metalink上居然還沒有這個錯誤的解決方法,暈了
查遍metalink上的CRS-0184,有提及到/etc/security/limits,可能需要更改root的unlimit引數,但Oracle的安裝文件上是沒有提及到要更改root的相關引數的
更改兩個節點的/etc/security/limits
修改下面的配置:
root:
fsize = -1
core = 2097151
cpu = -1
data = -1
rss = -1
stack = -1
nofiles = -1
重新啟動系統:
一切正常:
-a
en0: flags=5e080863,c0
inet 182.1.21.151 netmask 0xffff0000 broadcast 182.1.255.255
inet 182.1.21.156 netmask 0xffff0000 broadcast 182.1.255.255

tcp_sendspace 131072 tcp_recvspace 65536
en1: flags=5e080863,c0
inet 192.168.128.1 netmask 0xffffff00 broadcast 192.168.128.255
lo0: flags=e08084b
inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
inet6 ::1/0
tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1
>
-a
en0: flags=5e080863,c0
inet 182.1.21.152 netmask 0xffff0000 broadcast 182.1.255.255
inet 182.1.21.157 netmask 0xffff0000 broadcast 182.1.255.255
tcp_sendspace 131072 tcp_recvspace 65536
en1: flags=5e080863,c0
inet 192.168.128.2 netmask 0xffffff00 broadcast 192.168.128.255
lo0: flags=e08084b
inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
inet6 ::1/0
tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1
>
-t
Name Type Target State Host
------------------------------------------------------------
ora.mdzj.db application ONLINE ONLINE zj1
ora....j1.inst application ONLINE ONLINE zj1
ora....j2.inst application ONLINE ONLINE zj2
ora.....zj1.cs application ONLINE ONLINE zj2
ora....zj1.srv application ONLINE ONLINE zj1
ora....zj2.srv application ONLINE ONLINE zj2
ora.....zj2.cs application ONLINE ONLINE zj2
ora....zj1.srv application ONLINE ONLINE zj1
ora....zj2.srv application ONLINE ONLINE zj2
ora.....zj3.cs application ONLINE ONLINE zj2
ora....zj1.srv application ONLINE ONLINE zj1
ora....zj2.srv application ONLINE ONLINE zj2
ora....J1.lsnr application ONLINE ONLINE zj1
ora.zj1.gsd application ONLINE ONLINE zj1
ora.zj1.ons application ONLINE ONLINE zj1
ora.zj1.vip application ONLINE ONLINE zj1
ora....J2.lsnr application ONLINE ONLINE zj2
ora.zj2.gsd application ONLINE ONLINE zj2
ora.zj2.ons application ONLINE ONLINE zj2
ora.zj2.vip application ONLINE ONLINE zj2
由於是新安裝的作業系統,忽略了更改root的ulimit引數,導致這個問題。

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/15385/viewspace-1002548/,如需轉載,請註明出處,否則將追究法律責任。

相關文章