Failure 1 contacting Cluster Synchronization Services daemon_1466098.1

rongshiyuan發表於2014-05-08

Failure 1 contacting Cluster Synchronization Services daemon (Doc ID 1466098.1)


In this Document

Purpose
Troubleshooting Steps
References

Applies to:

Oracle Database - Enterprise Edition - Version 10.1.0.2 to 11.1.0.7 [Release 10.1 to 11.1]
Information in this document applies to any platform.

Purpose

This note provides steps to troubleshoot CSSD check failure from "crsctl check crs" or "crsctl check cssd" command. Depending on the version, the error message can be slightly different:

10gR2

Failure 1 contacting CSS daemon
Cannot communicate with CRS
Cannot communicate with EVM


11gR1

Failure 1 contacting Cluster Synchronization Services daemon
Cannot communicate with Cluster Ready Services
Cannot communicate with Event Manager


11gR2

CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager

 

Note:

1. The note discusses CSSD check error only, CRSD and EVMD is out of the scope of this note.

2. The note applies only when CRS is running in cluster mode, it does not apply to local cssd (localconfig) that's required to run single instance ASM.

3. For 11gR2, refer to Document 1050908.1 section "Case 3: OCSSD.BIN does not start"

 

 

Troubleshooting Steps


1. First step is to check whether CRS is up or not, a good sign of CRS is up is that RAC database is running fine, and the following daemons are running:

# ps -ef| egrep 'crsd.bin|ocssd.bin|evmd.bin' | grep -v grep
crsuser      5031     1  0 Jun26 ?        00:10:37 /ocw/crs/bin/ocssd.bin
crsuser      5156     1  0 Jun26 ?        00:00:43 /ocw/crs/bin/evmd.bin
crsuser      5479     1  0 Jun26 ?        00:02:32 /ocw/crs/bin/crsd.bin reboot

If CRS is up and only check fails, check Document 370605.1, if it does not apply, check cssdOUT.log, ocssd.log and ocssd.trc in /log//cssd/

If CRS is down, start with "crsctl start crs" command, if it does not start, go to Step 2.

If the problem is CRS is not auto starting upon node reboot, check whether CRS is enabled or not (By default CRS is enabled for auto start upon node reboot)

To verify whether its currently enabled or not:

# cat $SCRBASE/$HOSTNAME/root/crsstart
enable

SCRBASE is /etc/oracle/scls_scr on Linux and AIX, /var/opt/oracle/scls_scr on hp-ux and Solaris

Note: NEVER EDIT THE FILE MANUALLY, use "crsctl enable/disable crs" command instead.

To enable:

# $GRID_HOME/bin/crsctl enable crs

 

2. Execute the following as root user and check the exit code:

# su - -c "/bin/crsctl check boot"
# echo $?


i.e.

# su - crsuser -c "/ocw/crs/bin/crsctl check boot"
# echo $?
0

If there's an error, the error message will be printed on the screen and the error code will be non-zero



3. Common causes of CSSD not starting:

3.1. OS is not at appropriate run level:

OS need to be at specified run level before CRS will try to start up.

To find out at which run level the clusterware will come up:

grep init.cssd /etc/inittab
h2:35:respawn:/etc/init.d/init.cssd fatal >/dev/null 2>&1

Above shows CRS will run at run level 3 and 5. Note depend on platform, CRS comes up at different run level.

To find out current run level:

who -r

 

3.2. "init.cssd fatal" is not up

On Linux/UNIX, as "init.cssd fatal" is configured in /etc/inittab, process init (pid 1, /sbin/init on Linux, Solaris and hp-ux, /usr/sbin/init on AIX) will start and respawn "init.cssd fatal" if it fails. Without "init.cssd fatal" up and running, ocssd.bin will not start:

ps -ef|grep "init.cssd fatal"|grep -v grep
root      4519     1  0 22:29 ?        00:00:00 /bin/sh /etc/init.d/init.cssd fatal


init process may not start "/etc/init.d/init.ocssd fatal" for various reasons (i.e. rc Snncommand script stuck - located in rcn.d, example S98gcstartup), engage OS vendor to find out why init.cssd is not being loaded.



3.3.
File System that CRS_HOME resides is not online when init script S96init.crs is executed



3.4. ocssd.bin is not able to access network socket files:

[    CSSD]2012-07-11 10:21:22.211 [1108330816] >TRACE:   clsclisten: Permission denied for (ADDRESS=(PROTOCOL=ipc)(KEY=racnode1DBG_CSSD))

[  clsdmt]Fail to listen to (ADDRESS=(PROTOCOL=ipc)(KEY=racnode1DBG_CSSD))
[  clsdmt]Terminating clsdm listening thread

OR

[    CSSD]2012-07-11 10:21:24.418 [1150290240] >TRACE:   clsclisten: Permission denied for (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crsclu_1))

[    CSSD]2012-07-11 10:21:24.418 [1150290240] >ERROR:   clssgmclientlsnr: listening failed for (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crsclu_1)) (3)

The solution is to disable CRS from auto starting, reboot the node, remove all network socket files, and enable CRS from auto starting.


3.5. Vendor clusterware is not up (if using vendor clusterware)

CRS provide full clusterware functionality and doesn't need Vendor clusterware to be installed; but if you happened to have CRS on top of Vendor clusterware in your environment, then Vendor clusterware need to come up fully before CRS can be started, to verify, as crs user:

$ $CRS_HOME/bin/lsnodes -n
racnode1    1
racnode1    0

If vendor clusterware is not fully up, likely ocssd.log will have similar messages like following:

2010-08-30 18:28:13.207: [    CSSD][36]clssnm_skgxninit: skgxncin failed, will retry
2010-08-30 18:28:14.207: [    CSSD][36]clssnm_skgxnmon: skgxn init failed
2010-08-30 18:28:14.208: [    CSSD][36]###################################
2010-08-30 18:28:14.208: [    CSSD][36]clssscExit: CSSD signal 11 in thread skgxnmon



3.6. If CSSD still fails to start, refer to:

  • OS messages (/var/log/messages on Linux, /var/adm/messages on AIX and /var/adm/syslog/syslog.log on hp-ux  
Jan 25 14:46:43 racnode1 logger: Cluster Ready Services completed waiting on dependencies.
Jan 25 14:46:43 racnode1 last message repeated 2 times
Jan 25 14:46:43 racnode1 logger: Oracle CSS Family monitor starting.
Jan 25 14:46:43 racnode1 logger: Running CRSD with TZ =
Jan 25 14:46:44 racnode1 logger: Oracle CSS restart. 0, 1

##>> CRS is coming up, good example

OR 

Jul 19 16:45:33 racnode1 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.2372.

##>> refer to the /tmp/crsctl. for error message

  

  • cssdOUT.log, ocssd.log and ocssd.trcin /log//cssd/


 

 

References

NOTE:1050908.1 - Troubleshoot Grid Infrastructure Startup Issues




來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/17252115/viewspace-1156736/,如需轉載,請註明出處,否則將追究法律責任。

相關文章