RAC一個節點記憶體故障當機,無法訪問
環境描述 :
應用使用sap ERP6.0
資料庫使用11g RAC + ASM
巡檢時SAP 裡面執行事務碼db02
檢視clusterware狀態,只剩下一個節點
bash-3.00$ ./crsctl status resource -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ACFS.dg
ONLINE ONLINE r3prddb01
ora.ARCH.dg
ONLINE ONLINE r3prddb01
ora.DATA.dg
ONLINE ONLINE r3prddb01
ora.LISTENER.lsnr
ONLINE ONLINE r3prddb01
ora.MLOG.dg
ONLINE ONLINE r3prddb01
ora.OLOG.dg
ONLINE ONLINE r3prddb01
ora.RECO.dg
ONLINE ONLINE r3prddb01
ora.VCR.dg
ONLINE ONLINE r3prddb01
ora.acfs.acfs.acfs
ONLINE ONLINE r3prddb01
ora.asm
ONLINE ONLINE r3prddb01
ora.gsd
OFFLINE OFFLINE r3prddb01
ora.net1.network
ONLINE ONLINE r3prddb01
ora.ons
ONLINE ONLINE r3prddb01
ora.registry.acfs
ONLINE ONLINE r3prddb01
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE r3prddb01
ora.cvu
1 ONLINE ONLINE r3prddb01
ora.oc4j
1 ONLINE ONLINE r3prddb01
ora.p01.db
1 ONLINE ONLINE r3prddb01 Open
2 ONLINE OFFLINE
ora.r3prddb01.vip
1 ONLINE ONLINE r3prddb01
ora.r3prddb02.vip
1 ONLINE INTERMEDIATE r3prddb01 FAILED OVER
ora.scan1.vip
1 ONLINE ONLINE r3prddb01
bash-3.00$
db01 資料庫例項alert日誌,
Sat Aug 23 11:17:20 2014
Reconfiguration started (old inc 4, new inc 6)
List of instances:
1 (myinst: 1)
Global Resource Directory frozen
* dead instance detected - domain 0 invalid = TRUE
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Sat Aug 23 11:17:20 2014
LMS 1: 6 GCS shadows cancelled, 0 closed, 0 Xw survived
Sat Aug 23 11:17:20 2014
LMS 0: 5 GCS shadows cancelled, 1 closed, 0 Xw survived
Sat Aug 23 11:17:20 2014
LMS 2: 5 GCS shadows cancelled, 2 closed, 0 Xw survived
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Post SMON to start 1st pass IR
Sat Aug 23 11:17:20 2014
Instance recovery: looking for dead threads
Beginning instance recovery of 1 threads
Submitted all GCS remote-cache requests
Post SMON to start 1st pass IR
Fix write in gcs resources
Reconfiguration complete
parallel recovery started with 31 processes
Started redo scan
Completed redo scan
read 58319 KB redo, 11434 data blocks need recovery
Started redo application at
Thread 2: logseq 49560, block 81359
Recovery of Online Redo Log: Thread 2 Group 44 Seq 49560 Reading mem 0
Mem# 0: +OLOG/p01/onlinelog/log_g44m1.dbf
Mem# 1: +MLOG/p01/onlinelog/log_g44m2.dbf
Sat Aug 23 11:17:25 2014
Setting Resource Manager plan SCHEDULER[0x447BF]:DEFAULT_MAINTENANCE_PLAN via scheduler window
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Sat Aug 23 11:17:25 2014
minact-scn: master found reconf/inst-rec before recscn scan old-inc#:6 new-inc#:6
Completed redo application of 48.31MB
Completed instance recovery at
Thread 2: logseq 49560, block 197998, scn 11518227963
10738 data blocks read, 11483 data blocks written, 58319 redo k-bytes read
Thread 2 advanced to log sequence 49561 (thread recovery)
Redo thread 2 internally disabled at seq 49561 (SMON)
Sat Aug 23 11:17:27 2014
Archived Log entry 91800 added for thread 2 sequence 49560 ID 0x592ddd4a dest 1:
Sat Aug 23 11:17:27 2014
ARC0: Archiving disabled thread 2 sequence 49561
Archived Log entry 91801 added for thread 2 sequence 49561 ID 0x592ddd4a dest 1:
minact-scn: master continuing after IR
minact-scn: Master considers inst:2 dead
Sat Aug 23 11:17:28 2014
Beginning log switch checkpoint up to RBA [0xa4e4.2.10], SCN: 11518240393
Thread 1 advanced to log sequence 42212 (LGWR switch)
Current log# 35 seq# 42212 mem# 0: +OLOG/p01/onlinelog/log_g35m1.dbf
Current log# 35 seq# 42212 mem# 1: +MLOG/p01/onlinelog/log_g35m2.dbf
Archived Log entry 91802 added for thread 1 sequence 42211 ID 0x592ddd4a dest 1:
DB01 ASM例項alert日誌 ASM01
Sat Aug 23 11:17:20 2014
Reconfiguration started (old inc 4, new inc 6)
List of instances:
1 (myinst: 1)
Global Resource Directory frozen
* dead instance detected - domain 1 invalid = TRUE
* dead instance detected - domain 2 invalid = TRUE
* dead instance detected - domain 3 invalid = TRUE
* dead instance detected - domain 4 invalid = TRUE
* dead instance detected - domain 5 invalid = TRUE
* dead instance detected - domain 6 invalid = TRUE
* dead instance detected - domain 7 invalid = TRUE
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Sat Aug 23 11:17:20 2014
LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Post SMON to start 1st pass IR
Sat Aug 23 11:17:20 2014
NOTE: SMON starting instance recovery for group ACFS domain 1 (mounted)
NOTE: F1X0 found on disk 1 au 49 fcn 0.12248
Submitted all GCS remote-cache requests
Post SMON to start 1st pass IR
Fix write in gcs resources
Reconfiguration complete
NOTE: starting recovery of thread=2 ckpt=19.43 group=1 (ACFS)
NOTE: SMON waiting for thread 2 recovery enqueue
NOTE: SMON about to begin recovery lock claims for diskgroup 1 (ACFS)
NOTE: SMON successfully validated lock domain 1
NOTE: advancing ckpt for thread=2 ckpt=19.43
NOTE: SMON did instance recovery for group ACFS domain 1
Sat Aug 23 11:17:20 2014
ASM Volume(VDBG) - Unable to send message 'disk status' to the volume driver.
NOTE: SMON starting instance recovery for group ARCH domain 2 (mounted)
NOTE: F1X0 found on disk 9 au 113 fcn 0.41343439
NOTE: starting recovery of thread=2 ckpt=77.3254 group=2 (ARCH)
NOTE: SMON waiting for thread 2 recovery enqueue
NOTE: SMON about to begin recovery lock claims for diskgroup 2 (ARCH)
NOTE: SMON successfully validated lock domain 2
NOTE: advancing ckpt for thread=2 ckpt=77.3254
NOTE: SMON did instance recovery for group ARCH domain 2
ASM Volume(VDBG) - Unable to send message 'disk status' to the volume driver.
NOTE: SMON starting instance recovery for group DATA domain 3 (mounted)
NOTE: F1X0 found on disk 15 au 60241 fcn 0.5143392
NOTE: starting recovery of thread=2 ckpt=22.3858 group=3 (DATA)
NOTE: SMON waiting for thread 2 recovery enqueue
NOTE: SMON about to begin recovery lock claims for diskgroup 3 (DATA)
NOTE: SMON successfully validated lock domain 3
NOTE: advancing ckpt for thread=2 ckpt=22.3858
NOTE: SMON did instance recovery for group DATA domain 3
ASM Volume(VDBG) - Unable to send message 'disk status' to the volume driver.
NOTE: SMON starting instance recovery for group MLOG domain 4 (mounted)
NOTE: F1X0 found on disk 3 au 639 fcn 0.120137
NOTE: starting recovery of thread=2 ckpt=23.2161 group=4 (MLOG)
NOTE: SMON waiting for thread 2 recovery enqueue
NOTE: SMON about to begin recovery lock claims for diskgroup 4 (MLOG)
NOTE: SMON successfully validated lock domain 4
NOTE: advancing ckpt for thread=2 ckpt=23.2161
NOTE: SMON did instance recovery for group MLOG domain 4
ASM Volume(VDBG) - Unable to send message 'disk status' to the volume driver.
NOTE: SMON starting instance recovery for group OLOG domain 5 (mounted)
NOTE: F1X0 found on disk 3 au 637 fcn 0.121291
NOTE: starting recovery of thread=2 ckpt=23.2261 group=5 (OLOG)
NOTE: SMON waiting for thread 2 recovery enqueue
NOTE: SMON about to begin recovery lock claims for diskgroup 5 (OLOG)
NOTE: SMON successfully validated lock domain 5
NOTE: advancing ckpt for thread=2 ckpt=23.2261
NOTE: SMON did instance recovery for group OLOG domain 5
ASM Volume(VDBG) - Unable to send message 'disk status' to the volume driver.
NOTE: SMON starting instance recovery for group RECO domain 6 (mounted)
NOTE: F1X0 found on disk 11 au 11 fcn 0.2264
NOTE: starting recovery of thread=2 ckpt=19.6 group=6 (RECO)
NOTE: SMON waiting for thread 2 recovery enqueue
NOTE: SMON about to begin recovery lock claims for diskgroup 6 (RECO)
NOTE: SMON successfully validated lock domain 6
NOTE: advancing ckpt for thread=2 ckpt=19.6
NOTE: SMON did instance recovery for group RECO domain 6
ASM Volume(VDBG) - Unable to send message 'disk status' to the volume driver.
NOTE: SMON starting instance recovery for group VCR domain 7 (mounted)
NOTE: F1X0 found on disk 2 au 177 fcn 0.1216
NOTE: starting recovery of thread=2 ckpt=16.13 group=7 (VCR)
NOTE: SMON waiting for thread 2 recovery enqueue
NOTE: SMON about to begin recovery lock claims for diskgroup 7 (VCR)
NOTE: SMON successfully validated lock domain 7
NOTE: advancing ckpt for thread=2 ckpt=16.13
NOTE: SMON did instance recovery for group VCR domain 7
透過以上日誌判斷,故障發生在 23日 11:17 ,聯絡機房同事,檢查發現M9000上該機器告警,記憶體故障,已報修。
機器修理中。。。。
第二天02機器記憶體故障已修好,啟動02作業系統。等待一會兒發現cluserware自動起來了,02資料庫例項也恢復
該機器設定的cluster隨著作業系統啟動自動啟動,預設安裝好設定也是這樣的。
如果設定不自動啟動執行以下命令crsctl disable crs,使用crsctl enable crs改回開啟自動啟動
設定後,需要手工啟動執行crsctl start crs命令
bash-3.00$ ./crsctl status resource -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ACFS.dg
ONLINE ONLINE r3prddb01
ONLINE ONLINE r3prddb02
ora.ARCH.dg
ONLINE ONLINE r3prddb01
ONLINE ONLINE r3prddb02
ora.DATA.dg
ONLINE ONLINE r3prddb01
ONLINE ONLINE r3prddb02
ora.LISTENER.lsnr
ONLINE ONLINE r3prddb01
ONLINE ONLINE r3prddb02
ora.MLOG.dg
ONLINE ONLINE r3prddb01
ONLINE ONLINE r3prddb02
ora.OLOG.dg
ONLINE ONLINE r3prddb01
ONLINE ONLINE r3prddb02
ora.RECO.dg
ONLINE ONLINE r3prddb01
ONLINE ONLINE r3prddb02
ora.VCR.dg
ONLINE ONLINE r3prddb01
ONLINE ONLINE r3prddb02
ora.acfs.acfs.acfs
ONLINE ONLINE r3prddb01
ONLINE ONLINE r3prddb02
ora.asm
ONLINE ONLINE r3prddb01
ONLINE ONLINE r3prddb02
ora.gsd
OFFLINE OFFLINE r3prddb01
OFFLINE OFFLINE r3prddb02
ora.net1.network
ONLINE ONLINE r3prddb01
ONLINE ONLINE r3prddb02
ora.ons
ONLINE ONLINE r3prddb01
ONLINE ONLINE r3prddb02
ora.registry.acfs
ONLINE ONLINE r3prddb01
ONLINE ONLINE r3prddb02
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE r3prddb01
ora.cvu
1 ONLINE ONLINE r3prddb01
ora.oc4j
1 ONLINE ONLINE r3prddb01
ora.p01.db
1 ONLINE ONLINE r3prddb01 Open
2 ONLINE ONLINE r3prddb02 Open
ora.r3prddb01.vip
1 ONLINE ONLINE r3prddb01
ora.r3prddb02.vip
1 ONLINE ONLINE r3prddb02
ora.scan1.vip
1 ONLINE ONLINE r3prddb01
DB02檢視已經兩個例項
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/27771627/viewspace-1257887/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- RAC第一個節點被剔除叢集故障分析
- asm例項自動dismount導致rac一個節點當機ASM
- 一個SMMU記憶體訪問異常的問題記憶體
- 關於RAC每個節點更改對應的記憶體引數記憶體
- DRM特性引起的RAC節點當機
- 記:僅配置單vip連線串,當vip對應的節點down機情況下程式無法連線上正常節點的故障
- RAC資料庫只能啟動一個節點的故障資料庫
- 【故障處理】RAC環境第二節點無法歸檔的詭異問題處理
- 當linux報 “-bash: fork: 無法分配記憶體”Linux記憶體
- 一次RAC節點當機的解決過程
- 記一次程式訪問無法訪問虛擬機器部署的服務虛擬機
- AIX RAC9I 節點當機測試AI
- 安裝RAC無法識別cluster節點資訊
- oracle兩節點RAC,由於gipc導致某節點crs無法啟動問題分析Oracle
- k8s中master無法訪問NodePort,普通節點可以K8SAST
- rac節點無法啟動ORA-29702的問題及分析
- 【kingsql分享】RAC節點故障修復一例SQL
- RAC系統當中,job在哪個節點執行?
- RAC節點啟動失敗--ASM無法連線ASM
- 記憶體訪問全過程記憶體
- 【故障公告】redis記憶體耗盡造成部落格後臺無法儲存Redis記憶體
- oracle 10g rac,刪除故障節點並新增新節點Oracle 10g
- 3節點RAC資料庫夯故障分析資料庫
- Oracle RAC 一個節點不能自動啟動 怪問題Oracle
- Oracle RAC命中ORA-7445只能開啟一個節點故障案例分析Oracle
- VB也能訪問記憶體 (轉)記憶體
- 故障分析 | MySQL 耗盡主機記憶體一例分析MySql記憶體
- oracle 11gR2 rac 兩節點有一個節點down掉問題處理Oracle
- 故障公告:IIS應用程式池停止工作造成部落格站點無法訪問
- github無法訪問Github
- RAC一個節點恢復另一個節點在帶庫上的備份
- 實體記憶體充足卻無法增加SGA記憶體
- Service使用referred和avileable固定會話到一個節點,當機後會切換到另一個節點會話
- 排查一個潛在的記憶體訪問問題 — 用 C 寫程式碼的日常記憶體
- oracle11GR2 RAC節點crash故障分析Oracle
- 如何訪問一個程式中的記憶體 (譯文二) (34千字)記憶體
- oracle 11.2.0.4 rac節點異常當機之ORA-07445Oracle
- RAC節點之間通訊問題 兩節點 11g RAC