Oracle RAC(Cluster)的重構整理(3)
node2的alert.log
Sat Jul 09 16:41:28 CST 2011
Reconfiguration started (old inc 2, new inc 4)
List of nodes:
0 1
Global Resource Directory frozen
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Sat Jul 09 16:41:29 CST 2011
LMS 0: 0 GCS shadows cancelled, 0 closed
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Sat Jul 09 16:41:30 CST 2011
LMS 0: 5074 GCS shadows traversed, 2242 replayed
Sat Jul 09 16:41:30 CST 2011
Submitted all GCS remote-cache requests
Post SMON to start 1st pass IR
Fix write in gcs resources
Reconfiguration complete
node1的alert.log(node2 被shutdown abort):
Sat Jul 09 17:32:37 CST 2011
Reconfiguration started (old inc 4, new inc 6)
List of nodes:
0
Global Resource Directory frozen
* dead instance detected - domain 0 invalid = TRUE
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Sat Jul 09 17:32:38 CST 2011
LMS 0: 0 GCS shadows cancelled, 0 closed
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Post SMON to start 1st pass IR
Sat Jul 09 17:32:39 CST 2011
LMS 0: 5947 GCS shadows traversed, 0 replayed
Sat Jul 09 17:32:39 CST 2011
Submitted all GCS remote-cache requests
Fix write in gcs resources
Reconfiguration complete
Sat Jul 09 17:32:40 CST 2011
Instance recovery: looking for dead threads
Sat Jul 09 17:32:40 CST 2011
Beginning instance recovery of 1 threads
Sat Jul 09 17:32:42 CST 2011
Started redo scan
Sat Jul 09 17:32:46 CST 2011
Completed redo scan
3 redo blocks read, 5 data blocks need recovery
Sat Jul 09 17:32:46 CST 2011
Started redo application at
Thread 2: logseq 5, block 1884
Sat Jul 09 17:32:47 CST 2011
Recovery of Online Redo Log: Thread 2 Group 3 Seq 5 Reading mem 0
Mem# 0: +RAC_DISK/racdb/onlinelog/group_3.258.751759681
Sat Jul 09 17:32:47 CST 2011
Completed redo application
Sat Jul 09 17:32:47 CST 2011
Completed instance recovery at
Thread 2: logseq 5, block 1887, scn 532837
3 data blocks read, 5 data blocks written, 3 redo blocks read
Sat Jul 09 17:32:48 CST 2011
Thread 2 advanced to log sequence 6 (thread recovery)
這裡涉及到一個重要的服務Cluster Group Service(CGS):
LMON:各個例項的LMON程式會定期通訊,以檢查叢集中各節點的健康狀態,當某個節點出現故障時, 負責叢集 重構。它提供的服務叫Cluster Group Service(CGS),ORACLE
Clusterware使用Process Monitor Daemon解決腦裂的方法,如果某節點上的例項異常掛起,如果單從Network、OS、Clusterware幾個層面 看,可能檢測不到這種異常。因此資料
庫必須有自我監控的機制。LMON程式提供了節點監控(Node Montor)功能。這個功能是用 來記錄應用層各個節點的健康狀態,節點的健康狀態通過GRD中的一個點陣圖bitmap記錄,
每個節點一位,0代表關閉,1代表正常執行,各節點的LMON互相通訊,確認這個點陣圖的一致性。
LMON可以和下層的Clusterware合作也可以 單獨工作。當LMON檢測到例項級別的腦裂時,期待藉助於Clusterware解決腦裂,但RAC並不假設Clusterware 肯定能解決問題 ,因
此LMON不會無盡等待Clusterware層的處理結果,當等待超時LMON程式會自動觸發IMR(Instance Membership Recovery)IMR可以看做是ORACLE在資料庫層提供的腦裂、IO隔離機制
。
LMON主要藉助兩種心跳來完成健康監測:
1、節點間的心跳
2、控制檔案的磁碟心跳, 每個例項的CKPT程式 每3秒更新一次控制檔案的Checkpoint Progress Record資料塊,控制檔案是 共享的,因此例項可以互相檢測對方是否及時更新以判斷狀態。
LMON 相應的日誌:
*** 2011-07-09 16:41:25.412
kjxgmrcfg: Reconfiguration started, reason 1
kjxgmcs: Setting state to 2 0.
*** 2011-07-09 16:41:25.570
Name Service frozen
kjxgmcs: Setting state to 2 1.
kjxgrssvote: reconfig bitmap chksum 0xccd0ae50 cnt 2 master 0 ret 0
kjxggpoll: change poll time to 50 ms
*** 2011-07-09 16:41:25.665
Obtained RR update lock for sequence 3, RR seq 2
*** 2011-07-09 16:41:25.752
Voting results, upd 0, seq 4, bitmap: 0 1
CGS/IMR TIMEOUTS:
CSS recovery timeout = 71 sec
IMR Reconfig timeout = 300 sec
CGS rcfg timeout = 300 sec
kjxgmps: proposing substate 2
kjxgmcs: Setting state to 4 2.
kjfmuin: bitmap 0 1
kjfmmhi: received msg from 0 (inc 2)
kjfmmhi: received msg from 1 (inc 4)
Performed the unique instance identification check
kjxgmps: proposing substate 3
kjxgmcs: Setting state to 4 3.
Name Service recovery started
Deleted all dead-instance name entries
kjxgmps: proposing substate 4
kjxgmcs: Setting state to 4 4.
Multicasted all local name entries for publish
Replayed all pending requests
kjxgmps: proposing substate 5
kjxgmcs: Setting state to 4 5.
Name Service normal
Name Service recovery done
*** 2011-07-09 16:41:27.200
kjxgmps: proposing substate 6
kjxgmcs: Setting state to 4 6.
kjxggpoll: change poll time to 600 ms
*** 2011-07-09 16:41:28.279
kjfcrfg: DRM window size = 128->128 (min lognb = 10)
*** 2011-07-09 16:41:28.279
Reconfiguration started (old inc 2, new inc 4)
Synchronization timeout interval: 900 sec
List of nodes:
0 1
Undo tsn affinity 1
*** 2011-07-09 16:41:28.311
*** 2011-07-09 16:41:28.311
kjfcrfg: query of NESTED_RECONFIGURATION for node 1 failed with 7
Global Resource Directory frozen
node 0
node 1
release 10 2 0 5
asby init, 0/0/x2
asby returns, 0/0/x2/false
* Domain maps before reconfiguration:
* DOMAIN 0 (valid 1): 0
* End of domain mappings
* Domain maps after recomputation:
* DOMAIN 0 (valid 1): 0 1
* End of domain mappings
Dead inst
Join inst 1
Exist inst 0
Active Sendback Threshold = 50 %
Communication channels reestablished
sent syncr inc 4 lvl 1 to 0 (4,5/0/0)
sent synca inc 4 lvl 1 (4,5/0/0)
received all domreplay (4.6)
sent master 0 (4.6)
*** 2011-07-09 16:41:29.535
KJBDOMHVMAP: BEGINS
*** 2011-07-09 16:41:29.560
KJBDOMHVMAP: ENDS
sent dom info (4.6)
sent hv info (4.6)
sent syncr inc 4 lvl 2 to 0 (4,7/0/0)
sent synca inc 4 lvl 2 (4,7/0/0)
Master broadcasted resource hash value bitmaps
* kjfcrfg: domain 0 valid, valid_ver = 4
Non-local Process blocks cleaned out
Set master node info
sent syncr inc 4 lvl 3 to 0 (4,13/0/0)
sent synca inc 4 lvl 3 (4,13/0/0)
Submitted all remote-enqueue requests
kjfcrfg: Number of mesgs sent to node 1 = 774
sent syncr inc 4 lvl 4 to 0 (4,15/0/0)
sent synca inc 4 lvl 4 (4,15/0/0)
Dwn-cvts replayed, VALBLKs dubious
sent syncr inc 4 lvl 5 to 0 (4,18/0/0)
sent synca inc 4 lvl 5 (4,18/0/0)
All grantable enqueues granted
sent syncr inc 4 lvl 6 to 0 (4,20/0/0)
sent synca inc 4 lvl 6 (4,20/0/0)
Submitted all GCS cache requests
sent syncr inc 4 lvl 7 to 0 (4,22/0/0)
sent synca inc 4 lvl 7 (4,22/0/0)
Post SMON to start 1st pass IR
Fix write in gcs resources
sent syncr inc 4 lvl 8 to 0 (4,24/0/0)
sent synca inc 4 lvl 8 (4,24/0/0)
*** 2011-07-09 16:41:31.006
Reconfiguration complete
*** 2011-07-09 17:32:33.682
kjxgmpoll reconfig bitmap: 0
*** 2011-07-09 17:32:33.745
kjxgmrcfg: Reconfiguration started, reason 1
kjxgmcs: Setting state to 4 0.
*** 2011-07-09 17:32:34.157
Name Service frozen
kjxgmcs: Setting state to 4 1.
kjxgrssvote: reconfig bitmap chksum 0x6668604e cnt 1 master 0 ret 0
kjxggpoll: change poll time to 50 ms
*** 2011-07-09 17:32:34.464
Obtained RR update lock for sequence 5, RR seq 4
*** 2011-07-09 17:32:37.539
Voting results, upd 0, seq 6, bitmap: 0
CGS/IMR TIMEOUTS:
CSS recovery timeout = 71 sec
IMR Reconfig timeout = 300 sec
CGS rcfg timeout = 300 sec
kjxgmps: proposing substate 2
kjxgmcs: Setting state to 6 2.
kjfmSendAbortInstMsg: send an abort message to node 1
kjfmSendAbortInstMsg: unique id 0x0 reason 0x1
kjfmuin: bitmap 0
kjfmmhi: received msg from 0 (inc 2)
Performed the unique instance identification check
kjxgmps: proposing substate 3
kjxgmcs: Setting state to 6 3.
Name Service recovery started
Deleted all dead-instance name entries
kjxgmps: proposing substate 4
kjxgmcs: Setting state to 6 4.
Multicasted all local name entries for publish
Replayed all pending requests
kjxgmps: proposing substate 5
kjxgmcs: Setting state to 6 5.
Name Service normal
Name Service recovery done
*** 2011-07-09 17:32:37.598
kjxgmps: proposing substate 6
kjxgmcs: Setting state to 6 6.
kjxggpoll: change poll time to 600 ms
kjfmact: call ksimdic on instance (1)
*** 2011-07-09 17:32:37.843
kjfcrfg: DRM window size = 128->128 (min lognb = 10)
*** 2011-07-09 17:32:37.845
Reconfiguration started (old inc 4, new inc 6)
Synchronization timeout interval: 900 sec
List of nodes:
0
Undo tsn affinity 1
*** 2011-07-09 17:32:37.906
Global Resource Directory frozen
node 0
asby init, 0/0/x2
asby returns, 0/0/x2/false
* Domain maps before reconfiguration:
* DOMAIN 0 (valid 1): 0 1
* End of domain mappings
* kjbdomrcfg2: domain 0 invalid = TRUE
* Domain maps after recomputation:
* DOMAIN 0 (valid 0): 0
* End of domain mappings
Active Sendback Threshold = 50 %
Communication channels reestablished
sent syncr inc 6 lvl 1 to 0 (6,5/0/0)
sent syncr inc 6 lvl 2 to 0 (6,7/0/0)
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Set master node info
sent syncr inc 6 lvl 3 to 0 (6,13/0/0)
Submitted all remote-enqueue requests
sent syncr inc 6 lvl 4 to 0 (6,15/0/0)
Dwn-cvts replayed, VALBLKs dubious
sent syncr inc 6 lvl 5 to 0 (6,18/0/0)
All grantable enqueues granted
sent syncr inc 6 lvl 6 to 0 (6,20/0/0)
*** 2011-07-09 17:32:39.351
Post SMON to start 1st pass IR
Submitted all GCS cache requests
sent syncr inc 6 lvl 7 to 0 (6,22/0/0)
Fix write in gcs resources
sent syncr inc 6 lvl 8 to 0 (6,24/0/0)
*** 2011-07-09 17:32:39.673
Reconfiguration complete
* domain 0 valid?: 0
kjxgfipccb: msg 0x0xb7db2a6c, mbo 0x0xb7db2a68, type 19, ack 0, ref 0, stat 34來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/758322/viewspace-702235/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- Oracle RAC(Cluster)的重構(整理)(1)Oracle
- Oracle RAC(Cluster)的重構(整理)(2)Oracle
- 管理RAC中的OCR(Oracle Cluster Register)Oracle
- RAC重構型別型別
- CLUSTER內部結構的總結(3)
- zt_oracle rac private network cluster interconnectOracle
- oracle RAC dbca的時候報錯提示cluster nodes are not accessibleOracle
- oracle 10g rac 程式複習--整理自張曉明的《大話Oracle RAC》Oracle 10g
- Oracle RAC與AIX重啟的關係OracleAI
- 如何檢視Oracle RAC叢集的叢集名稱(CLUSTER NAME)Oracle
- 9207 Patchset does not contain OraCM (Oracle Cluster Manager) for RACAIACMOracle
- RAC加入新節點 - Add Node to Your Oracle RAC 10g Cluster on LinuxOracleLinux
- 14_深入解析Oracle table cluster結構Oracle
- redhat as4 安裝oracle 9204 rac啟動Oracle Cluster Manager報錯!RedhatOracle
- oracle10g rac常用命令整理Oracle
- Oracle RAC實現--Oracle Clusterware, RAC認證第三方認證軟體HACMP,SUN cluster,HPclusterOracleACM
- Oracle RAC實現--Oracle Clusterware, RAC認證第三方認證軟體HACMP,SUN cluster,HPclOracleACM
- 【OH】3 Managing Oracle Cluster Registry and Voting DisksOracle
- oracle中的cluster表Oracle
- ORACLE RAC GUARD架構——RAC GUARD概念和管理Oracle架構
- 【RAC】11gR2 新特性:Oracle Cluster Health Monitor(CHM)簡介Oracle
- oracle簇clusterOracle
- Oracle RAC解除安裝後的重灌重點環節Oracle
- Oracle11g RAC高可用性理論整理Oracle
- ORACLE RAC 日誌結構解析Oracle
- Oracle RAC 併發與架構Oracle架構
- oracle 12C RAC 12.1.0.2 叢集日誌(cluster log)目錄Oracle
- Remove a node from Oracle10g RAC cluster and add back for IBM AIXREMOracleIBMAI
- oracle rac_cssd程式故障重啟相關OracleCSS
- oracle 10g cluster rac vip始終在節點2的問題處理Oracle 10g
- Oracle系列:Oracle RAC叢集體系結構Oracle
- Oracle RAC 日常管理之CRS篇-3Oracle
- Oracle Cluster Time ManagementOracle
- Oracle RAC 體系結構--儲存Oracle
- Oracle RAC啟動歸檔時需要設定CLUSTER_DATABASE引數嗎?OracleDatabase
- Oracle RAC 11204 CHM(cluster healthy monitor) 檔案無限膨脹.Oracle
- 整理:RAC搭建過程
- oracle cluster(簇)的簡單使用Oracle