hp-ux: CRS not Start on One of Nodes in a Two Node Cluster With HP MC_967090.1

rongshiyuan發表於2014-09-15

hp-ux: CRS Does not Start on One of the Nodes in a Two Node Cluster With HP MC Service Guard (Doc ID 967090.1)


In this Document
  Symptoms
  Changes
  Cause
  Solution
  References


Applies to:

Oracle Server - Enterprise Edition - Version: 11.1.0.6 and later   [Release: 11.1 and later ]
HP-UX Itanium

Symptoms

This is 2 node RAC cluster,
1. Service Guard was upgraded from 11.17.01 to 11.19 last week. This was done on both nodes of the cluster using rolling upgrade.
2. CRS can not start on one node. It starts and stops on the other node without issue. 

crsd.log shows:
2009-10-13 16:49:01.872: [ CSSCLNT][1]clsssInitNative: failed to connect to (ADDRESS=(PROTO =ipc)(KEY=OCSSD_LL_linus_)), rc 9

2009-10-13 16:49:01.873: [ CRSRTI][1] CSS is not ready. Received status 3 from CSS. Waiting for good status ..

2009-10-13 16:49:03.182: [ COMMCRS][1339]clsc_connect: (6000000000340570) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_linus_
))

2009-10-13 16:49:03.182: [ CSSCLNT][1]clsssInitNative: failed to connect to (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_linus_)), rc 9

2009-10-13 16:49:03.183: [ CRSRTI][1] CSS is not ready. Received status 3 from CSS. Waiting for good status ..


2. ocssd.log shows "clssgmWaitOnEventValue" repeated:

[ CSSD]2009-10-13 15:01:19.750 [1] >TRACE: clssnmStartNM: Initializing with OCR id (1317466550)
[ CSSD]2009-10-13 15:01:19.750 [6] >TRACE: clssnm_skgxninit: initialized skgxn version (2/0/Hewlett-Packard SKGXN 2.0)
[ CSSD]2009-10-13 15:01:20.752 [5] >TRACE: clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
[ CSSD]2009-10-13 15:01:21.762 [5] >TRACE: clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
[ CSSD]2009-10-13 15:01:22.772 [5] >TRACE: clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
[ CSSD]2009-10-13 15:01:23.782 [5] >TRACE: clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
......


3. The stack trace for occsd.bin shows "waiting on skgxnreg()"

Attaching to process 26772
Reading symbols from /u01/crs/product/crs/bin/ocssd.bin...done.
0xc000000000435e10:0 in __ksleep+0x30 () from /usr/lib/hpux64/libc.so.1
(gdb)
Thread 6 (system thread 5883298):
#0 0xc000000000439990:0 in _select_sys+0x30 () from /usr/lib/hpux64/libc.so.1
#1 0xc00000000044ef70:0 in select+0xf0 () from /usr/lib/hpux64/libc.so.1
#2 0xc0000000050f1cf0:0 in wait_for_socket () at utils/cl_msg.c:733
#3 0xc0000000050f87c0:0 in cl_msg_tcp_recv () at utils/cl_msg.c:1936
#4 0xc0000000051234b0:0 in gm_service () at gmapi/gmapi.c:1325
#5 0xc000000005122170:0 in gm_primary_attach () at gmapi/gmapi.c:375
#6 0xc0000000050dc180:0 in skgxnreg () at nmapi2/nmapi2.c:1400
#7 0x400000000005ffa0:0 in clssnm_skgxnmon+0x480 ()
#8 0x40000000000229b0:0 in clssscthrdmain+0xf0 ()
#9 0xc0000000000fb220:0 in __pthread_bound_body+0x190 ()

Changes

HP MC Service Guard has been upgraded from 11.17 to 11.19.

Cause

This is caused by HP ServieGuard issue related with rolling upgrade of MC Service Guard A.11.19.

HP reference is CR QXCR1000984570.

Solution

The workaround is to halt the entire cluster and restart. After that CRS starts fine on both nodes.

References

BUG:9160308 - CRS NOT START AFTER REBOOT ON NODE1 CLSSOMON: TIMED OUT CONNECTING TO CSS.
 

Document Details

 
Email link to this documentOpen document in new windowPrintable Page
Type:
Status:
Last Major Update:
Last Update:
PROBLEM
PUBLISHED
9/22/2011
9/22/2011
     
 

Related Products

 
Oracle Database - Enterprise Edition
     
 

Document References

 
No References available for this document.
     




來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/17252115/viewspace-1270208/,如需轉載,請註明出處,否則將追究法律責任。

相關文章