ORACLE RAC UNKNOWN

renjixinchina發表於2013-03-26

In this Document
  Symptoms
  Cause
  Solution


Applies to:

Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 11.1.0.7 - Release: 10.2 to 11.1
Information in this document applies to any platform.

Symptoms

Trying to start the CRS resources, like VIP and instance, fail and those resources become UNKNOWN state.  

Running "crs_stat -t" shows that the resources are in UNKNOWN state.  

Sometimes resources go into UNKNOWN state after they were started successfully.

crsd.log might report:
operation: scls_canexec, loc: , OS error: 0, other: no exe permission

Cause

The CRS resources go to the UNKNOWN state because the action script. failed while starting, stopping, or checking the state of the resources.  

The CRS checks the health of the resources regularly.

The common problems that cause action scripts to fail and put resources into UNKNOWN state are:

1.  The permission of the resource trace file is incorrect.
2.  The permission of the action script. and other racg script. is incorrect. For example, racgvip script. missing execute permission.
3.  The server load is very heavy and the action script. times out.
4.  The look up to NIS hangs or takes very long time and causes the action script. to time out.

Solution

Find out the action script. name and its location by issuing:

"crs_stat -p | grep ACTION_SCRIPT"

Issue "crs_stat | grep -i name" to find the resource names.

Please check and correct as follows:

1.  The permission of the resource trace file if it is incorrect. 
The resource trace file in in the HOME/log//racg directory where HOME is the the HOME directory of action script. for the resource. It should have read/write permission by the resource owner.

2.  The permission of racg scripts in the /bin directory if it they are incorrect. 
HOME is the the HOME directory of action script. for the resource. Please issue "ls -l /bin/racg*" to get the permission of the racg script. Please issue "ls -l /bin/racg*" as user oracle or a user who normally starts up failing resources.
If any of the racg script. is a soft link to another file, then check the permission of the file to which the racg script. is soft linked. The racg* script. should have execute permission by everyone. For example: racgvip, racgwrap etc.

3.  Check crsd.log and see if the resource action script. timed out. 
If it did, then check if the server load was heavy (95% used or higher) for a minute or longer at the time of the failure. Setting up OSWatcher or IPD/OS can help troubleshooting this if the timeout occurs intermittently. Also, check if the NIC was having a problem at the time of the failure. 

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/15747463/viewspace-757158/,如需轉載,請註明出處,否則將追究法律責任。

相關文章