ORACLE CRS日誌中的ClssnmPollingThread Missed Checkins Messages

zhang41082發表於2019-05-04

最近發現CRS日誌中的missed checkin很多,有時候有連著將近20次,之前有過一次RAC的當機記錄,所以對這個咚咚很警覺。CRS中,會定期的去讀取VOTING DISK上的資訊來判斷CLUSTER中的每個節點的活動狀態,如果出現多次的讀取失敗,則認為這個節點已經死掉,機會把它趕出CLUSTER了。讀取失敗的引數是可以透過CRSCTL命令設定的,這個值預設是60次。crsctl get css misscount命令可以返回當前設定的值,crsctl set css misscount 120則是把失敗次數設定為120次。一般是不需要調整這個值的,如果調整也不要調的太大。還是來看看metalink上是怎麼說的吧。

[@more@]

Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 10.2.0.2
This problem can occur on any platform.
SymptomsThe ocssd.log contains irregular messages about missed checkins from the remote nodes(s) or the local node. The missed checkins are not increasing.

Extract from ocssd.log:

[ CSSD]2006-06-01 14:10:25.138 [10] >TRACE: clssgmClientConnectMsg: Connect from con(100aca070) proc(100acd0f0) pid() proto(10:2:1:1)
[ CSSD]2006-06-01 14:11:24.288 [14] >TRACE: clssnmPollingThread: node myrac2 (2) missed(2) checkin(s)
[ CSSD]2006-06-01 14:11:25.547 [10] >TRACE: clssgmClientConnectMsg: Connect from con(100aca070) proc(100ae74e0) pid() proto(10:2:1:1)
[ CSSD]2006-06-01 14:11:50.468 [14] >TRACE: clssnmPollingThread: node myrac1 (1) missed(2) checkin(s)
[ CSSD]2006-06-01 14:12:25.960 [10] >TRACE: clssgmClientConnectMsg: Connect from con(100acd800) proc(100acd0f0) pid() proto(10:2:1:1)

Changes Install/upgrade to 10.2.
CauseThe missed checkins are printed for diagnostic purpose only.
SolutionThe harmless messages can be ignored. It has been removed with the patch for bug 5037871, which will be included in 10.2.0.3.


基本上如果出現問題的話,那missed(2)就會變成missed(3)、missed(4)等一路累加,直到超過MISSCOUNT的值,導致節點被趕出CLUSTER

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/25016/viewspace-1005674/,如需轉載,請註明出處,否則將追究法律責任。

相關文章