【RAC】處理因ASM例項異常導致RAC第一節點例項異常終止故障

secooler發表於2011-05-08
  遭遇RAC第一節點例項由於ASM例項異常導致資料庫例項非正常停止,記錄在此。

1.故障現象
兩節點RAC第一節點例項停止,經檢查ASM例項亦異常終止。

2.故障分析
檢查資料庫例項及ASM例項的的alert尋找處理思路。
1)alert日誌內容
Sun May  8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_asmb_21478.trc:
ORA-15064: communication failure with ASM instance
ORA-03113: end-of-file on communication channel
Sun May  8 06:59:06 2011
ASMB: terminating instance due to error 15064
Sun May  8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_lms1_21275.trc:
ORA-15064: communication failure with ASM instance
Sun May  8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_lgwr_21283.trc:
ORA-15064: communication failure with ASM instance
Sun May  8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_lms0_21271.trc:
ORA-15064: communication failure with ASM instance
Sun May  8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_lmon_21267.trc:
ORA-15064: communication failure with ASM instance
Sun May  8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_lmd0_21269.trc:
ORA-15064: communication failure with ASM instance
Sun May  8 06:59:06 2011
System state dump is made for local instance
System State dumped to trace file /oracle/app/oracle/admin/racdb/bdump/racdb1_diag_21263.trc
Sun May  8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_mman_21279.trc:
ORA-15064: communication failure with ASM instance
Sun May  8 06:59:07 2011
Shutting down instance (abort)
License high water mark = 7
Sun May  8 06:59:07 2011
Trace dumping is performing id=[cdmp_20110508065906]
Sun May  8 06:59:11 2011
Instance terminated by ASMB, pid = 21478
Sun May  8 06:59:12 2011
Instance terminated by USER, pid = 4110
Mon May  9 13:44:05 2011

2)trace檔案中擷取到如下故障內容
kjctseventdump-end tail 14 heads 0 @ 0 14 @ -1115894656
 DEFER MSG QUEUE ON LMS1 IS EMPTY
 SEQUENCES:
  0:0.0  1:2933.0
error 15064 detected in background process
ORA-15064: communication failure with ASM instance

3)ASM日誌中記錄瞭如下內容
Thu Feb 10 19:17:58 2011
NOTE: cache recovered group 1 to fcn 0.20162635
Thu Feb 10 19:17:58 2011
NOTE: opening chunk 1 at fcn 0.20162635 ABA
NOTE: seq=79 blk=1597
Thu Feb 10 19:17:58 2011
NOTE: cache mounting group 1/0xBA97DAE1 (ORADATA) succeeded
SUCCESS: diskgroup ORADATA was mounted
Thu Feb 10 19:18:01 2011
NOTE: recovering COD for group 1/0xba97dae1 (ORADATA)
SUCCESS: completed COD recovery for group 1/0xba97dae1 (ORADATA)
Thu Feb 10 19:18:01 2011
Starting background process ASMB
ASMB started with pid=17, OS id=7767
Thu Feb 10 19:21:06 2011
NOTE: ASMB process exiting due to lack of ASM file activity
Sun May  8 06:48:33 2011
Shutting down instance (abort)
License high water mark = 6
Instance terminated by USER, pid = 20819

初步判斷是由於ASM出現異常導致的此次故障。但是和這裡的提示“NOTE: ASMB process exiting due to lack of ASM file activity”沒有關係。這個提示僅僅是一個提示而已,在ASM日誌中的其他地方也有多次出現。

3.嘗試故障處理
1)嘗試啟動ASM無果。

2)手工啟動ASM例項可以成功
racdb1@racdb1 /home/oracle$ export ORACLE_SID=+ASM1
+ASM1@racdb1 /home/oracle$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.3.0 - Production on Sun May 8 13:43:06 2011

Copyright (c) 1982, 2006, Oracle.  All Rights Reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters and Data Mining options

NotConnected@> shutdown immediate;
ASM diskgroups dismounted
ASM instance shutdown
NotConnected@> startup;
ASM instance started

Total System Global Area  130023424 bytes
Fixed Size                  2071000 bytes
Variable Size             102786600 bytes
ASM Cache                  25165824 bytes

3)但啟動資料庫例項時丟擲“ORA-01105”和“ORA-38767”錯誤。
racdb1@racdb1 /home/oracle$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.3.0 - Production on Sun May 8 13:43:53 2011

Copyright (c) 1982, 2006, Oracle.  All Rights Reserved.

Connected to an idle instance.

NotConnected@> startup;
ORACLE instance started.

Total System Global Area 8388608000 bytes
Fixed Size                  2086096 bytes
Variable Size            1644170032 bytes
Database Buffers         6727663616 bytes
Redo Buffers               14688256 bytes
ORA-01105: mount is incompatible with mounts by other instances
ORA-38767: flashback retention target parameter mismatch

4.再次嘗試故障處理
對除VIP之外的CRS資源進行重啟,此時仍然無法啟動ASM例項和資料庫例項。

5.最後的處理方法
最後嘗試重啟第一個節點的所有CRS資源,終於將RAC的第一個節點的所有資源啟動完畢。

6.小結
透過一系列的故障處理嘗試,最終恢復了RAC資料庫故障。

Good luck.

secooler
11.05.08

-- The End --

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/519536/viewspace-694867/,如需轉載,請註明出處,否則將追究法律責任。

相關文章