資料庫突然hang了

yeahokay發表於2012-02-01

rac2----crsd.log日誌

2012-02-01 16:10:05.708: [ CRSEVT][1500211520]0CAAMonitorHandler :: 0:Could not join /oracle/product/10.2.0/crs_1/bin/racgwrap(check)
category: 1234, operation: scls_process_join, loc: childcrash, OS error: 0, other: Abnormal termination of the child

2012-02-01 16:10:05.763: [ CRSEVT][1500211520]0CAAMonitorHandler :: 0:Action Script /oracle/product/10.2.0/crs_1/bin/racgwrap(check) timed out for ora.perac2.vip! (timeout=60)
2012-02-01 16:10:05.763: [ CRSAPP][1500211520]0CheckResource error for ora.perac2.vip error code = -2
2012-02-01 16:11:48.986: [ CRSEVT][1500211520]0CAAMonitorHandler :: 0:Could not join /oracle/product/10.2.0/crs_1/bin/racgwrap(check)
category: 1234, operation: scls_process_join, loc: childcrash, OS error: 0, other: Abnormal termination of the child

2012-02-01 16:11:49.389: [ CRSEVT][1500211520]0CAAMonitorHandler :: 0:Action Script /oracle/product/10.2.0/crs_1/bin/racgwrap(check) timed out for ora.perac2.vip! (timeout=60)
2012-02-01 16:11:49.390: [ CRSAPP][1500211520]0CheckResource error for ora.perac2.vip error code = -2
2012-02-01 16:18:27.234: [ CRSEVT][1500211520]0CAAMonitorHandler :: 0:Could not join /oracle/product/10.2.0/crs_1/bin/racgwrap(check)
category: 1234, operation: scls_process_join, loc: childcrash, OS error: 0, other: Abnormal termination of the child

2012-02-01 16:18:27.373: [ CRSEVT][1500211520]0CAAMonitorHandler :: 0:Action Script /oracle/product/10.2.0/crs_1/bin/racgwrap(check) timed out for ora.perac2.vip! (timeout=60)
2012-02-01 16:18:27.374: [ CRSAPP][1500211520]0CheckResource error for ora.perac2.vip error code = -2


rac1----crsd.log日誌
2012-02-01 16:18:18.476: [ CRSEVT][1497348416]0CAAMonitorHandler :: 0:Action Script /oracle/product/10.2.0/crs_1/bin/racgwrap(check) timed out for ora.perac1.vip! (timeout=60)
2012-02-01 16:18:19.574: [ CRSAPP][1497348416]0CheckResource error for ora.perac1.vip error code = -2

兩個節點的ocssd.log與系統messages無任何錯誤或warnning資訊

alert日誌報ora-3136錯誤


相關文件
10g/11gR1: Many Orphaned Or Hanging "racgmain" Processes Running [ID 732086.1]

Cause
crsd.bin invokes the racgmain to check the status of the resources that are managed by CRS. The racgmain is invoked through the wrapper script racgwrap.

If the resource action timed out, crsd kills the action script, which is racgwrap, while racgmain process will not be killed. Over time, this might create lot of orphan racgmain processes in the system. This would eventually slow down the due to the resource contention at the OS level.

Internal bug:6196746 addresses this issue.


Solution


?This is fixed in 11.1.0.7 patchset.. If you are running into this issue in 10gR2, please go ahead and apply 10.2.0.4 patchset and the latest CRS bundle patch. This fix is included in CRS bundle patch from bundle #2 onwards.

?Following option could be used as a temporary workaround until the patch is applied.


1. Make a copy of racgwrap located under $ORACLE_HOME/bin and $CRS_HOME/bin on ALL Nodes

2. Edit the file racgwrap and modify the last 3 lines from:

~~~
$ORACLE_HOME/bin/racgmain "$@"
status=$?
exit $status

to:

# Line added to fix for Bug 6196746
exec $ORACLE_HOME/bin/racgmain "$@"
~~~

3. Kill all the orphan racgmain processes running.

$ ps -ef|grep "racgmain check"
oracle 18701 1 0 Aug 1 ? 0:00 /oracle/product/10.2.0/database/bin/racgmain check
oracle 14653 1 0 Aug 1 ? 0:00 /oracle/product/10.2.0/database/bin/racgmain check
oracle 24517 1 0 Aug 1 ? 0:00 /oracle/product/10.2.0/database/bin/racgmain check

$ kill -9

References
BUG:7009245 - "RACGMAIN CHECK" PROCESS NOT TERMINATING

[@more@]

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/786540/viewspace-1057250/,如需轉載,請註明出處,否則將追究法律責任。

相關文章