修改系統時間導致RAC環境的一個例項重啟

yangtingkun發表於2007-05-25

由於RAC的兩個節點要進行時間同步。目前採用的方法是一個節點從另一個節點直接同步時間。結果發現RAC環境中的同步時間的那個節點上的Oracle例項重啟了。


前一段時間看了Kamus在itpub上的帖子:http://www.itpub.net/showthread.php?s=&threadid=747833。還特意測試了一下,沒有發現類似的問題。

結果沒過多長時間,自己就碰到了。不過雖然情況類似,但是沒有那麼野蠻,只是將節點二的資料庫重啟了,而沒有重啟系統。而且,資料庫關閉是採用的還是正常的關閉方式。

摘錄一些log如下,節點1上的alert檔案:

Thu May 24 21:15:37 2007
Reconfiguration started (old inc 4, new inc 6)
List of nodes:
0
Global Resource Directory frozen
* dead instance detected - domain 0 invalid = TRUE
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Thu May 24 21:15:37 2007
LMS 0: 0 GCS shadows cancelled, 0 closed
Thu May 24 21:15:37 2007
LMS 1: 0 GCS shadows cancelled, 0 closed
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Post SMON to start 1st pass IR
Thu May 24 21:15:38 2007
Instance recovery: looking for dead threads
Thu May 24 21:15:38 2007
Beginning instance recovery of 1 threads
Thu May 24 21:15:39 2007
LMS 0: 453637 GCS shadows traversed, 0 replayed
Thu May 24 21:15:39 2007
LMS 1: 456482 GCS shadows traversed, 0 replayed
Thu May 24 21:15:39 2007
Submitted all GCS remote-cache requests
Fix write in gcs resources
Reconfiguration complete
Thu May 24 21:15:41 2007
parallel recovery started with 7 processes
Thu May 24 21:15:41 2007
Started redo scan
Thu May 24 21:15:42 2007
Completed redo scan
182 redo blocks read, 27 data blocks need recovery
Thu May 24 21:15:42 2007
Started redo application at
Thread 2: logseq 52, block 379401
Thu May 24 21:15:42 2007
Recovery of Online Redo Log: Thread 2 Group 8 Seq 52 Reading mem 0
Mem# 0: /dev/vx/rdsk/datadg/tradedb_redo2_4_1_1g
Mem# 1: /dev/vx/rdsk/datadg/tradedb_redo2_4_2_1g
Thu May 24 21:15:42 2007
Completed redo application
Thu May 24 21:15:42 2007
Completed instance recovery at
Thread 2: logseq 52, block 379583, scn 5137925732
27 data blocks read, 27 data blocks written, 182 redo blocks read
Switch log for thread 2 to sequence 53
Thu May 24 21:21:16 2007
Reconfiguration started (old inc 6, new inc 8)
List of nodes:
0 1
Global Resource Directory frozen
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Thu May 24 21:21:16 2007
LMS 0: 0 GCS shadows cancelled, 0 closed
Thu May 24 21:21:16 2007
LMS 1: 0 GCS shadows cancelled, 0 closed
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Thu May 24 21:21:17 2007
LMS 0: 11529 GCS shadows traversed, 4001 replayed
Thu May 24 21:21:17 2007
LMS 1: 11363 GCS shadows traversed, 4001 replayed
LMS 1: 11403 GCS shadows traversed, 4001 replayed
Thu May 24 21:21:17 2007
LMS 0: 11539 GCS shadows traversed, 4001 replayed
LMS 0: 11294 GCS shadows traversed, 4001 replayed
Thu May 24 21:21:17 2007
LMS 1: 11492 GCS shadows traversed, 4001 replayed
.
.
.
LMS 1: 11344 GCS shadows traversed, 4001 replayed
Thu May 24 21:21:21 2007
LMS 0: 5661 GCS shadows traversed, 1995 replayed
Thu May 24 21:21:21 2007
LMS 1: 10713 GCS shadows traversed, 3746 replayed
Thu May 24 21:21:21 2007
Submitted all GCS remote-cache requests
Post SMON to start 1st pass IR
Fix write in gcs resources
Reconfiguration complete

節點2上的alert檔案:

Thu May 24 21:20:44 2007
Starting ORACLE instance (normal)
Thu May 24 21:21:06 2007
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Interface type 1 ce1 10.0.0.0 configured from OCR for use as a cluster interconnect
Interface type 1 ce0 172.0.2.0 configured from OCR for use as a public interface
Picked latch-free SCN scheme 3
Autotune of undo retention is turned on.
LICENSE_MAX_USERS = 0
SYS auditing is disabled
ksdpec: called for event 13740 prior to event group initialization
Starting up ORACLE RDBMS Version: 10.2.0.3.0.
System parameters with non-default values:
processes = 300
sessions = 335
.
.
.
pga_aggregate_target = 4395630592
Cluster communication is configured to use the following interface(s) for this instance 10.0.0.2
Thu May 24 21:21:09 2007
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
PMON started with pid=2, OS id=4474
.
.
.
MMON started with pid=26, OS id=4537
Thu May 24 21:21:10 2007
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
MMNL started with pid=28, OS id=4539
Thu May 24 21:21:10 2007
starting up 1 shared server(s) ...
Thu May 24 21:21:11 2007
lmon registered with NM - instance id 2 (internal mem no 1)
Thu May 24 21:21:16 2007
Reconfiguration started (old inc 0, new inc 8)
List of nodes:
0 1
Global Resource Directory frozen
* allocate domain 0, invalid = TRUE
Communication channels reestablished
* domain 0 valid = 1 according to instance 0
Thu May 24 21:21:16 2007
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Thu May 24 21:21:16 2007
LMS 0: 0 GCS shadows cancelled, 0 closed
Thu May 24 21:21:16 2007
LMS 1: 0 GCS shadows cancelled, 0 closed
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Thu May 24 21:21:17 2007
LMS 0: 0 GCS shadows traversed, 0 replayed
Thu May 24 21:21:17 2007
LMS 1: 0 GCS shadows traversed, 0 replayed
Thu May 24 21:21:17 2007
Submitted all GCS remote-cache requests
Fix write in gcs resources
Reconfiguration complete
LCK0 started with pid=34, OS id=4908
Thu May 24 21:21:24 2007
ALTER DATABASE MOUNT
Thu May 24 21:21:28 2007
Setting recovery target incarnation to 1
Thu May 24 21:21:28 2007
Successful mount of redo thread 2, with mount id 4181908813
Thu May 24 21:21:28 2007
Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)
Completed: ALTER DATABASE MOUNT
Thu May 24 21:21:29 2007
ALTER DATABASE OPEN
Thu May 24 21:21:29 2007
Picked broadcast on commit scheme to generate SCNs
Thu May 24 21:21:30 2007
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=38, OS id=5133
Thu May 24 21:21:31 2007
ARC0: Archival started
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=9, OS id=5164
Thu May 24 21:21:32 2007
Thread 2 opened at log sequence 53
Current log# 5 seq# 53 mem# 0: /dev/vx/rdsk/datadg/tradedb_redo2_1_1_1g
Current log# 5 seq# 53 mem# 1: /dev/vx/rdsk/datadg/tradedb_redo2_1_2_1g
Successful open of redo thread 2
Thu May 24 21:21:32 2007
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Thu May 24 21:21:32 2007
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
Thu May 24 21:21:32 2007
ARC1: Becoming the heartbeat ARCH
Thu May 24 21:21:32 2007
SMON: enabling cache recovery
Thu May 24 21:21:38 2007
Successfully onlined Undo Tablespace 4.
Thu May 24 21:21:38 2007
SMON: enabling tx recovery
Thu May 24 21:21:38 2007
Database Characterset is ZHS16GBK
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=40, OS id=5400
Thu May 24 21:21:43 2007
Completed: ALTER DATABASE OPEN

節點1上$ORACLE_CRS/log/hostname目錄下alterhostname.log檔案如下:

2007-05-24 21:15:33.720
[cssd(2665)]CRS-1607:CSSD evicting node ahrac2. Details in /opt/oracle/product/10.2/crs/log/ahrac1/cssd/ocssd.log.
2007-05-24 21:15:33.825
[cssd(2665)]CRS-1601:CSSD Reconfiguration complete. Active nodes are ahrac1 .
2007-05-24 21:15:40.004
[crsd(1000)]CRS-1204:Recovering CRS resources for node ahrac2.
2007-05-24 21:20:00.197
[cssd(2665)]CRS-1601:CSSD Reconfiguration complete. Active nodes are ahrac1 ahrac2 .

節點2上$ORACLE_CRS/log/hostname目錄下alterhostname.log檔案如下:

2007-05-24 21:19:56.369
[cssd(2252)]CRS-1605:CSSD voting file is online: /dev/vx/rdsk/datadg/vot2. Details in /opt/oracle/product/10.2/crs/log/ahrac2/cssd/ocssd.log.
2007-05-24 21:19:56.382
[cssd(2252)]CRS-1605:CSSD voting file is online: /dev/vx/rdsk/datadg/vot3. Details in /opt/oracle/product/10.2/crs/log/ahrac2/cssd/ocssd.log.
2007-05-24 21:19:56.382
[cssd(2252)]CRS-1605:CSSD voting file is online: /dev/vx/rdsk/datadg/vot1. Details in /opt/oracle/product/10.2/crs/log/ahrac2/cssd/ocssd.log.
2007-05-24 21:19:59.975
[cssd(2252)]CRS-1601:CSSD Reconfiguration complete. Active nodes are ahrac1 ahrac2 .
2007-05-24 21:20:00.890
[crsd(965)]CRS-1012:The OCR service started on node ahrac2.
2007-05-24 21:20:00.961
[evmd(2126)]CRS-1401:EVMD started on node ahrac2.
2007-05-24 21:20:05.985
[crsd(965)]CRS-1201:CRSD started on node ahrac2.

節點1上$ORACLE_CRS/log/hostname/cssd目錄下ossd.log檔案:

[ CSSD]2007-05-24 21:14:26.057 [11] >TRACE: clssgmClientConnectMsg: Connect from con(100b33dd0) proc(100b34800) pid() proto(10:2:1:1)
[ CSSD]2007-05-24 21:15:02.410 [15] >TRACE: clssnmPollingThread: node ahrac2 (2) missed(2) checkin(s)
[ CSSD]2007-05-24 21:15:03.420 [15] >TRACE: clssnmPollingThread: node ahrac2 (2) missed(3) checkin(s)
.
.
.
[ CSSD]2007-05-24 21:15:29.680 [15] >TRACE: clssnmPollingThread: node ahrac2 (2) missed(29) checkin(s)
[ CSSD]2007-05-24 21:15:30.690 [15] >TRACE: clssnmPollingThread: Eviction started for node ahrac2 (2), flags 0x000d, state 3, wt4c 0
[ CSSD]2007-05-24 21:15:30.690 [17] >TRACE: clssnmDoSyncUpdate: Initiating sync 3
[ CSSD]2007-05-24 21:15:30.690 [17] >TRACE: clssnmSetupAckWait: Ack message type (11)
[ CSSD]2007-05-24 21:15:30.690 [17] >TRACE: clssnmSetupAckWait: node(1) is ALIVE
[ CSSD]2007-05-24 21:15:30.690 [17] >TRACE: clssnmSetupAckWait: node(2) is ALIVE
[ CSSD]2007-05-24 21:15:30.690 [17] >TRACE: clssnmSendSync: syncSeqNo(3)
[ CSSD]2007-05-24 21:15:30.690 [10] >TRACE: clssnmHandleSync: Acknowledging sync: src[1] srcName[ahrac1] seq[9] sync[3]
[ CSSD]2007-05-24 21:15:30.690 [17] >TRACE: clssnmWaitForAcks: Ack message type(11), ackCount(2)
[ CSSD]2007-05-24 21:15:30.790 [1] >USER: NMEVENT_SUSPEND [00][00][00][06]
[ CSSD]2007-05-24 21:15:31.700 [17] >TRACE: clssnmWaitForAcks: node(2) is expiring, msg type(11)
[ CSSD]2007-05-24 21:15:31.700 [17] >TRACE: clssnmWaitForAcks: done, msg type(11)
[ CSSD]2007-05-24 21:15:31.700 [17] >TRACE: clssnmDoSyncUpdate: node(0) missCount(2679266) state(0)
[ CSSD]2007-05-24 21:15:31.700 [17] >TRACE: clssnmDoSyncUpdate: node(2) missCount(30) state(3)
[ CSSD]2007-05-24 21:15:31.700 [17] >TRACE: clssnmSetupAckWait: Ack message type (13)
[ CSSD]2007-05-24 21:15:31.700 [17] >TRACE: clssnmSetupAckWait: node(1) is ACTIVE
[ CSSD]2007-05-24 21:15:31.700 [17] >TRACE: clssnmSendVote: syncSeqNo(3)
[ CSSD]2007-05-24 21:15:31.700 [17] >TRACE: clssnmWaitForAcks: Ack message type(13), ackCount(1)
[ CSSD]2007-05-24 21:15:31.700 [10] >TRACE: clssnmSendVoteInfo: node(1) syncSeqNo(3)
[ CSSD]2007-05-24 21:15:32.710 [17] >TRACE: clssnmWaitForAcks: done, msg type(13)
[ CSSD]2007-05-24 21:15:32.710 [17] >TRACE: clssnmCheckDskInfo: Checking disk info...
[ CSSD]2007-05-24 21:15:33.720 [17] >TRACE: clssnmEvict: Start
[ CSSD]2007-05-24 21:15:33.720 [17] >TRACE: clssnmEvict: Evicting node 2, birth 2, death 3, killme 1
[ CSSD]2007-05-24 21:15:33.731 [17] >TRACE: clssnmEvict: Evicting Node(2), timeout(31804)
[ CSSD]2007-05-24 21:15:33.731 [17] >TRACE: clssnmSendShutdown: req to node 2, kill time 2704220850
[ CSSD]2007-05-24 21:15:33.732 [17] >TRACE: clssnmDiscHelper: node ahrac2 (2) connection failed
[ CSSD]2007-05-24 21:15:33.732 [17] >TRACE: clssnmWaitOnEvictions: Start
[ CSSD]2007-05-24 21:15:33.732 [17] >TRACE: clssnmWaitOnEvictions: Node(0) down, LATS(0),timeout(-1590746434)
[ CSSD]2007-05-24 21:15:33.732 [17] >TRACE: clssnmWaitOnEvictions: Node(2) down, LATS(2704189046),timeout(31816)
[ CSSD]2007-05-24 21:15:33.732 [17] >TRACE: clssnmSetupAckWait: Ack message type (15)
[ CSSD]2007-05-24 21:15:33.732 [17] >TRACE: clssnmSetupAckWait: node(1) is ACTIVE
[ CSSD]2007-05-24 21:15:33.732 [17] >TRACE: clssnmSendUpdate: syncSeqNo(3)
[ CSSD]2007-05-24 21:15:33.732 [17] >TRACE: clssnmWaitForAcks: Ack message type(15), ackCount(1)
[ CSSD]2007-05-24 21:15:33.732 [10] >TRACE: clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
[ CSSD]2007-05-24 21:15:33.732 [10] >TRACE: clssnmDeactivateNode: node 0 () left cluster

[ CSSD]2007-05-24 21:15:33.732 [10] >TRACE: clssnmUpdateNodeState: node 1, state (3/3) unique (1177308817/1177308817) prevConuni(0) birth (1/1) (old/new)
[ CSSD]2007-05-24 21:15:33.732 [10] >TRACE: clssnmUpdateNodeState: node 2, state (0/0) unique (1177308827/1177308827) prevConuni(1177308827) birth (2/0) (old/new)
[ CSSD]2007-05-24 21:15:33.732 [10] >TRACE: clssnmDeactivateNode: node 2 (ahrac2) left cluster

[ CSSD]2007-05-24 21:15:33.732 [10] >USER: clssnmHandleUpdate: SYNC(3) from node(1) completed
[ CSSD]2007-05-24 21:15:33.732 [10] >USER: clssnmHandleUpdate: NODE 1 (ahrac1) IS ACTIVE MEMBER OF CLUSTER
[ CSSD]2007-05-24 21:15:33.810 [20] >TRACE: clssgmReconfigThread: started for reconfig (3)
[ CSSD]2007-05-24 21:15:33.810 [20] >USER: NMEVENT_RECONFIG [00][00][00][02]
[ CSSD]2007-05-24 21:15:33.810 [20] >TRACE: clssgmCleanupGrocks: cleaning up grock crs_version type 2
[ CSSD]2007-05-24 21:15:33.810 [20] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(crs_version) birth(2/2)
[ CSSD]2007-05-24 21:15:33.810 [20] >TRACE: clssgmCleanupGrocks: cleaning up grock DAALL_DB type 2
[ CSSD]2007-05-24 21:15:33.810 [20] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(DAALL_DB) birth(2/2)
[ CSSD]2007-05-24 21:15:33.810 [20] >TRACE: clssgmCleanupGrocks: cleaning up grock CRSDMAIN type 2
[ CSSD]2007-05-24 21:15:33.810 [20] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(2) grock(CRSDMAIN) birth(2/2)
[ CSSD]2007-05-24 21:15:33.811 [20] >TRACE: clssgmCleanupGrocks: cleaning up grock EVMDMAIN type 2
[ CSSD]2007-05-24 21:15:33.811 [20] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(2) grock(EVMDMAIN) birth(2/2)
[ CSSD]2007-05-24 21:15:33.811 [20] >TRACE: clssgmCleanupGrocks: cleaning up grock DBTRADEDB type 2
[ CSSD]2007-05-24 21:15:33.811 [20] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(DBTRADEDB) birth(2/2)
[ CSSD]2007-05-24 21:15:33.811 [20] >TRACE: clssgmCleanupGrocks: cleaning up grock ORA_CLSRD_1_tradedb type 2
[ CSSD]2007-05-24 21:15:33.811 [20] >TRACE: clssgmCleanupGrocks: cleaning up grock ORA_CLSRD_1_tradedb type 3
[ CSSD]2007-05-24 21:15:33.811 [20] >TRACE: clssgmCleanupGrocks: cleaning up grock ORA_CLSRD_2_tradedb type 2
[ CSSD]2007-05-24 21:15:33.811 [20] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(ORA_CLSRD_2_tradedb) birth(2/2)
[ CSSD]2007-05-24 21:15:33.811 [20] >TRACE: clssgmCleanupGrocks: cleaning up grock ORA_CLSRD_2_tradedb type 3
[ CSSD]2007-05-24 21:15:33.811 [20] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(ORA_CLSRD_2_tradedb) birth(2/2)
[ CSSD]2007-05-24 21:15:33.811 [20] >TRACE: clssgmCleanupGrocks: cleaning up grock DGTRADEDB type 2
[ CSSD]2007-05-24 21:15:33.811 [20] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(DGTRADEDB) birth(2/2)
[ CSSD]2007-05-24 21:15:33.812 [20] >TRACE: clssgmCleanupGrocks: cleaning up grock ocr_crs type 2
[ CSSD]2007-05-24 21:15:33.812 [20] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(2) grock(ocr_crs) birth(2/2)
[ CSSD]2007-05-24 21:15:33.812 [20] >TRACE: clssgmCleanupGrocks: cleaning up grock _ORA_CRS_MEMBER_ahrac1 type 3
[ CSSD]2007-05-24 21:15:33.812 [20] >TRACE: clssgmCleanupGrocks: cleaning up grock _ORA_CRS_MEMBER_ahrac2 type 3
[ CSSD]2007-05-24 21:15:33.812 [20] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(_ORA_CRS_MEMBER_ahrac2) birth(2/2)
[ CSSD]2007-05-24 21:15:33.812 [20] >TRACE: clssgmCleanupGrocks: cleaning up grock IGTRADEDBALL type 2
[ CSSD]2007-05-24 21:15:33.812 [20] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(2) grock(IGTRADEDBALL) birth(2/2)
[ CSSD]2007-05-24 21:15:33.812 [20] >TRACE: clssgmEstablishConnections: 1 nodes in cluster incarn 3
[ CSSD]2007-05-24 21:15:33.812 [14] >TRACE: clssgmPeerDeactivate: node 2 (ahrac2), death 3, state 0x0 connstate 0xf
[ CSSD]2007-05-24 21:15:33.812 [14] >TRACE: clssgmPeerListener: connects done (1/1)
[ CSSD]2007-05-24 21:15:33.812 [20] >TRACE: clssgmEstablishMasterNode: MASTER for 3 is node(1) birth(1)
[ CSSD]2007-05-24 21:15:33.812 [20] >TRACE: clssgmMasterCMSync: Synchronizing group/lock status
[ CSSD]2007-05-24 21:15:33.813 [20] >TRACE: clssgmMasterSendDBDone: group/lock status synchronization complete
[ CSSD]CLSS-3000: reconfiguration successful, incarnation 3 with 1 nodes

[ CSSD]CLSS-3001: local node number 1, master node number 1

[ CSSD]2007-05-24 21:15:33.825 [20] >TRACE: clssgmReconfigThread: completed for reconfig(3), with status(1)
[ CSSD]2007-05-24 21:15:34.740 [17] >TRACE: clssnmWaitForAcks: done, msg type(15)
[ CSSD]2007-05-24 21:15:34.740 [17] >TRACE: clssnmDoSyncUpdate: Sync Complete!
[ CSSD]2007-05-24 21:15:34.991 [11] >TRACE: clssgmClientConnectMsg: Connect from con(100b31180) proc(100b5ac80) pid() proto(10:2:1:1)
[ CSSD]2007-05-24 21:15:35.296 [11] >TRACE: clssgmClientConnectMsg: Connect from con(100b5b3c0) proc(100b438f0) pid() proto(10:2:1:1)
.
.
.
[ CSSD]2007-05-24 21:19:41.850 [11] >TRACE: clssgmClientConnectMsg: Connect from con(100b286b0) proc(100b30850) pid() proto(10:
2:1:1)
[ CSSD]2007-05-24 21:19:54.935 [8] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(1) wrtcnt(3) LATS(2704482065) Disk lastSeqNo(3)
[ CSSD]2007-05-24 21:19:54.939 [7] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(1) wrtcnt(2) LATS(2704482069) Disk lastSeqNo(2)
[ CSSD]2007-05-24 21:19:54.978 [6] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(1) wrtcnt(1) LATS(2704482108) Disk lastSeqNo(1)
[ CSSD]2007-05-24 21:19:56.661 [10] >TRACE: clssnmConnComplete: connected to node 2 (con 100b286b0), state 1 birth 0, unique 1180012793/1180012793 prevConuni(0)
[ CSSD]2007-05-24 21:19:57.120 [17] >TRACE: clssnmDoSyncUpdate: Initiating sync 4
[ CSSD]2007-05-24 21:19:57.120 [17] >TRACE: clssnmSetupAckWait: Ack message type (11)
[ CSSD]2007-05-24 21:19:57.120 [17] >TRACE: clssnmSetupAckWait: node(1) is ALIVE
[ CSSD]2007-05-24 21:19:57.120 [17] >TRACE: clssnmSetupAckWait: node(2) is ALIVE
[ CSSD]2007-05-24 21:19:57.120 [17] >TRACE: clssnmSendSync: syncSeqNo(4)
[ CSSD]2007-05-24 21:19:57.120 [10] >TRACE: clssnmHandleSync: Acknowledging sync: src[1] srcName[ahrac1] seq[13] sync[4]
[ CSSD]2007-05-24 21:19:57.120 [17] >TRACE: clssnmWaitForAcks: Ack message type(11), ackCount(2)
[ CSSD]2007-05-24 21:19:57.210 [1] >USER: NMEVENT_SUSPEND [00][00][00][02]
[ CSSD]2007-05-24 21:19:58.130 [17] >TRACE: clssnmWaitForAcks: done, msg type(11)
.
.
.
[ CSSD]2007-05-24 21:19:59.140 [17] >TRACE: clssnmCheckDskInfo: Checking disk info...
[ CSSD]2007-05-24 21:20:00.150 [17] >TRACE: clssnmEvict: Start
[ CSSD]2007-05-24 21:20:00.150 [17] >TRACE: clssnmWaitOnEvictions: Start
[ CSSD]2007-05-24 21:20:00.150 [17] >TRACE: clssnmWaitOnEvictions: Node(0) down, LATS(0),timeout(-1590480016)
[ CSSD]2007-05-24 21:20:00.150 [17] >TRACE: clssnmSetupAckWait: Ack message type (15)
[ CSSD]2007-05-24 21:20:00.150 [17] >TRACE: clssnmSetupAckWait: node(1) is ACTIVE
[ CSSD]2007-05-24 21:20:00.150 [17] >TRACE: clssnmSetupAckWait: node(2) is ACTIVE
[ CSSD]2007-05-24 21:20:00.150 [17] >TRACE: clssnmSendUpdate: syncSeqNo(4)
[ CSSD]2007-05-24 21:20:00.150 [10] >TRACE: clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
[ CSSD]2007-05-24 21:20:00.150 [10] >TRACE: clssnmDeactivateNode: node 0 () left cluster

[ CSSD]2007-05-24 21:20:00.150 [10] >TRACE: clssnmUpdateNodeState: node 1, state (3/3) unique (1177308817/1177308817) prevConuni(0) birth (1/1) (old/new)
[ CSSD]2007-05-24 21:20:00.150 [10] >TRACE: clssnmUpdateNodeState: node 2, state (2/2) unique (1180012793/1180012793) prevConuni(0) birth (4/4) (old/new)
[ CSSD]2007-05-24 21:20:00.150 [10] >USER: clssnmHandleUpdate: SYNC(4) from node(1) completed
[ CSSD]2007-05-24 21:20:00.150 [10] >USER: clssnmHandleUpdate: NODE 1 (ahrac1) IS ACTIVE MEMBER OF CLUSTER
[ CSSD]2007-05-24 21:20:00.150 [10] >USER: clssnmHandleUpdate: NODE 2 (ahrac2) IS ACTIVE MEMBER OF CLUSTER
[ CSSD]2007-05-24 21:20:00.150 [17] >TRACE: clssnmWaitForAcks: Ack message type(15), ackCount(1)
[ CSSD]2007-05-24 21:20:00.190 [21] >TRACE: clssgmReconfigThread: started for reconfig (4)
[ CSSD]2007-05-24 21:20:00.190 [21] >USER: NMEVENT_RECONFIG [00][00][00][06]
[ CSSD]2007-05-24 21:20:00.190 [21] >TRACE: clssgmEstablishConnections: 2 nodes in cluster incarn 4
[ CSSD]2007-05-24 21:20:00.192 [14] >TRACE: clssgmInitialRecv: (100b3d4b0) accepted a new connection from node 2 born at 4 active (2, 2), vers (10,3,1,2)
[ CSSD]2007-05-24 21:20:00.192 [14] >TRACE: clssgmInitialRecv: conns done (2/2)
[ CSSD]2007-05-24 21:20:00.192 [21] >TRACE: clssgmEstablishMasterNode: MASTER for 4 is node(1) birth(1)
[ CSSD]2007-05-24 21:20:00.192 [21] >TRACE: clssgmMasterCMSync: Synchronizing group/lock status
[ CSSD]2007-05-24 21:20:00.196 [21] >TRACE: clssgmMasterSendDBDone: group/lock status synchronization complete
[ CSSD]CLSS-3000: reconfiguration successful, incarnation 4 with 2 nodes

[ CSSD]CLSS-3001: local node number 1, master node number 1

[ CSSD]2007-05-24 21:20:00.197 [21] >TRACE: clssgmReconfigThread: completed for reconfig(4), with status(1)
[ CSSD]2007-05-24 21:20:01.160 [17] >TRACE: clssnmWaitForAcks: done, msg type(15)
[ CSSD]2007-05-24 21:20:01.160 [17] >TRACE: clssnmDoSyncUpdate: Sync Complete!
[ CSSD]2007-05-24 21:20:07.797 [11] >TRACE: clssgmClientConnectMsg: Connect from con(100b28450) proc(100b47b10) pid() proto(10:2:1:1)

節點2上上$ORACLE_CRS/log/hostname/cssd目錄下ossd.log檔案

[ CSSD]2007-05-24 21:19:54.209 >USER: Oracle Database 10g CSS Release 10.2.0.1.0 Production Copyright 1996, 2004 Oracle. All rights reserved.
[ CSSD]2007-05-24 21:19:54.209 >USER: CSS daemon log for node ahrac2, number 2, in cluster crs
[ CSSD]2007-05-24 21:19:54.238 [1] >TRACE: clssscmain: local-only set to false
[ clsdmt]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=ahrac2DBG_CSSD))
[ CSSD]2007-05-24 21:19:54.302 [1] >TRACE: clssnmReadNodeInfo: added node 1 (ahrac1) to cluster
[ CSSD]2007-05-24 21:19:54.309 [1] >TRACE: clssnmReadNodeInfo: added node 2 (ahrac2) to cluster
[ CSSD]2007-05-24 21:19:54.330 [5] >TRACE: clssnm_skgxnmon: skgxn init failed, rc 1
[ CSSD]2007-05-24 21:19:54.331 [1] >TRACE: clssnm_skgxnonline: Using vacuous skgxn monitor
[ CSSD]2007-05-24 21:19:54.332 [1] >TRACE: clssnmInitNMInfo: misscount set to 30
[ CSSD]2007-05-24 21:19:54.335 [1] >TRACE: clssnmDiskStateChange: state from 1 to 2 disk (0//dev/vx/rdsk/datadg/vot1)
[ CSSD]2007-05-24 21:19:54.338 [1] >TRACE: clssnmDiskStateChange: state from 1 to 2 disk (1//dev/vx/rdsk/datadg/vot2)
[ CSSD]2007-05-24 21:19:54.340 [1] >TRACE: clssnmDiskStateChange: state from 1 to 2 disk (2//dev/vx/rdsk/datadg/vot3)
[ CSSD]2007-05-24 21:19:56.350 [7] >TRACE: clssnmDiskStateChange: state from 2 to 4 disk (1//dev/vx/rdsk/datadg/vot2)
[ CSSD]2007-05-24 21:19:56.382 [8] >TRACE: clssnmDiskStateChange: state from 2 to 4 disk (2//dev/vx/rdsk/datadg/vot3)
[ CSSD]2007-05-24 21:19:56.382 [6] >TRACE: clssnmDiskStateChange: state from 2 to 4 disk (0//dev/vx/rdsk/datadg/vot1)
[ CSSD]2007-05-24 21:19:56.400 [1] >TRACE: clssnmFatalInit: fatal mode enabled
[ CSSD]2007-05-24 21:19:56.408 [7] >TRACE: clssnmReadDskHeartbeat: node(1) is down. rcfg(4) wrtcnt(2614157) LATS(0) Disk lastSeqNo(2614157)
[ CSSD]2007-05-24 21:19:56.417 [8] >TRACE: clssnmReadDskHeartbeat: node(1) is down. rcfg(4) wrtcnt(2614157) LATS(0) Disk lastSeqNo(2614157)
[ CSSD]2007-05-24 21:19:56.425 [6] >TRACE: clssnmReadDskHeartbeat: node(1) is down. rcfg(4) wrtcnt(2614157) LATS(0) Disk lastSeqNo(2614157)
[ CSSD]2007-05-24 21:19:56.431 [10] >TRACE: clssnmconnect: connecting to node 2, flags 0x0001, connector 1
[ CSSD]2007-05-24 21:19:56.436 [10] >TRACE: clssnmconnect: connecting to node 0, flags 0x0000, connector 1
[ CSSD]2007-05-24 21:19:56.436 [10] >TRACE: clssnmconnect: connecting to node 1, flags 0x0001, connector 0
[ CSSD]2007-05-24 21:19:56.440 [10] >TRACE: clssnmConnComplete: connected to node 1 (con 100775550), state 3 birth 0, unique 1177308817/1177308817 prevConuni(0)
[ CSSD]2007-05-24 21:19:56.518 [11] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_2))
[ CSSD]2007-05-24 21:19:56.518 [11] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_ahrac2_crs))
[ CSSD]2007-05-24 21:19:56.523 [15] >TRACE: clssnmPollingThread: Connection complete
[ CSSD]2007-05-24 21:19:56.523 [16] >TRACE: clssnmSendingThread: Connection complete
[ CSSD]2007-05-24 21:19:56.523 [17] >TRACE: clssnmRcfgMgrThread: Connection complete
[ CSSD]2007-05-24 21:19:56.523 [17] >TRACE: clssnmRcfgMgrThread: Local Join
[ CSSD]2007-05-24 21:19:56.523 [17] >WARNING: clssnmLocalJoinEvent: takeover aborted due to connected but inactive nodes
[ CSSD]2007-05-24 21:19:56.899 [10] >TRACE: clssnmHandleSync: Acknowledging sync: src[1] srcName[ahrac1] seq[13] sync[4]
[ CSSD]2007-05-24 21:19:57.530 [17] >TRACE: clssnmRcfgMgrThread: lastleader(1) unique(1180012793)
[ CSSD]2007-05-24 21:19:57.909 [10] >TRACE: clssnmSendVoteInfo: node(1) syncSeqNo(4)
[ CSSD]2007-05-24 21:19:59.929 [10] >TRACE: clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
[ CSSD]2007-05-24 21:19:59.929 [10] >TRACE: clssnmDeactivateNode: node 0 () left cluster

[ CSSD]2007-05-24 21:19:59.929 [10] >TRACE: clssnmUpdateNodeState: node 1, state (4/3) unique (1177308817/1177308817) prevConuni(0) birth (0/1) (old/new)
[ CSSD]2007-05-24 21:19:59.929 [10] >TRACE: clssnmUpdateNodeState: node 2, state (1/2) unique (1180012793/1180012793) prevConuni(0) birth (0/4) (old/new)
[ CSSD]2007-05-24 21:19:59.929 [10] >USER: clssnmHandleUpdate: SYNC(4) from node(1) completed
[ CSSD]2007-05-24 21:19:59.929 [10] >USER: clssnmHandleUpdate: NODE 1 (ahrac1) IS ACTIVE MEMBER OF CLUSTER
[ CSSD]2007-05-24 21:19:59.929 [10] >USER: clssnmHandleUpdate: NODE 2 (ahrac2) IS ACTIVE MEMBER OF CLUSTER
[ CSSD]2007-05-24 21:19:59.970 [1] >USER: NMEVENT_SUSPEND [00][00][00][00]
[ CSSD]2007-05-24 21:19:59.970 [18] >TRACE: clssgmReconfigThread: started for reconfig (4)
[ CSSD]2007-05-24 21:19:59.970 [18] >USER: NMEVENT_RECONFIG [00][00][00][06]
[ CSSD]2007-05-24 21:19:59.970 [18] >TRACE: clssgmEstablishConnections: 2 nodes in cluster incarn 4
[ CSSD]2007-05-24 21:19:59.970 [14] >TRACE: clssgmInitialRecv: (100b1d0d0) accepted a new connection from node 1 born at 1 active (2, 2), vers (10,3,1,2)
[ CSSD]2007-05-24 21:19:59.970 [14] >TRACE: clssgmInitialRecv: conns done (2/2)
[ CSSD]2007-05-24 21:19:59.971 [18] >TRACE: clssgmEstablishMasterNode: MASTER for 4 is node(1) birth(1)
[ CSSD]2007-05-24 21:19:59.971 [18] >TRACE: clssgmChangeMasterNode: requeued 0 RPCs
[ CSSD]2007-05-24 21:19:59.975 [14] >TRACE: clssgmHandleDBDone(): src/dest (1/65535) size(72) incarn 4
[ CSSD]CLSS-3000: reconfiguration successful, incarnation 4 with 2 nodes

[ CSSD]CLSS-3001: local node number 2, master node number 1

[ CSSD]2007-05-24 21:19:59.975 [18] >TRACE: clssgmReconfigThread: completed for reconfig(4), with status(1)
[ CSSD]2007-05-24 21:20:00.426 [11] >TRACE: clssgmClientConnectMsg: Connect from con(100b23560) proc(100b28170) pid() proto(10: 2:1:1)

從上面的日誌可以看到,由於作業系統的日期修改導致node1和node2在30秒內無法建立通訊連線,從而使得Oracle的Rac將例項2重新啟動。

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/4227/viewspace-69287/,如需轉載,請註明出處,否則將追究法律責任。

相關文章