【RAC】處理因ASM例項異常導致RAC第一節點例項異常終止故障
遭遇RAC第一節點例項由於ASM例項異常導致資料庫例項非正常停止,記錄在此。
1.故障現象
兩節點RAC第一節點例項停止,經檢查ASM例項亦異常終止。
2.故障分析
檢查資料庫例項及ASM例項的的alert尋找處理思路。
1)alert日誌內容
Sun May 8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_asmb_21478.trc:
ORA-15064: communication failure with ASM instance
ORA-03113: end-of-file on communication channel
Sun May 8 06:59:06 2011
ASMB: terminating instance due to error 15064
Sun May 8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_lms1_21275.trc:
ORA-15064: communication failure with ASM instance
Sun May 8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_lgwr_21283.trc:
ORA-15064: communication failure with ASM instance
Sun May 8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_lms0_21271.trc:
ORA-15064: communication failure with ASM instance
Sun May 8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_lmon_21267.trc:
ORA-15064: communication failure with ASM instance
Sun May 8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_lmd0_21269.trc:
ORA-15064: communication failure with ASM instance
Sun May 8 06:59:06 2011
System state dump is made for local instance
System State dumped to trace file /oracle/app/oracle/admin/racdb/bdump/racdb1_diag_21263.trc
Sun May 8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_mman_21279.trc:
ORA-15064: communication failure with ASM instance
Sun May 8 06:59:07 2011
Shutting down instance (abort)
License high water mark = 7
Sun May 8 06:59:07 2011
Trace dumping is performing id=[cdmp_20110508065906]
Sun May 8 06:59:11 2011
Instance terminated by ASMB, pid = 21478
Sun May 8 06:59:12 2011
Instance terminated by USER, pid = 4110
Mon May 9 13:44:05 2011
2)trace檔案中擷取到如下故障內容
kjctseventdump-end tail 14 heads 0 @ 0 14 @ -1115894656
DEFER MSG QUEUE ON LMS1 IS EMPTY
SEQUENCES:
0:0.0 1:2933.0
error 15064 detected in background process
ORA-15064: communication failure with ASM instance
3)ASM日誌中記錄瞭如下內容
Thu Feb 10 19:17:58 2011
NOTE: cache recovered group 1 to fcn 0.20162635
Thu Feb 10 19:17:58 2011
NOTE: opening chunk 1 at fcn 0.20162635 ABA
NOTE: seq=79 blk=1597
Thu Feb 10 19:17:58 2011
NOTE: cache mounting group 1/0xBA97DAE1 (ORADATA) succeeded
SUCCESS: diskgroup ORADATA was mounted
Thu Feb 10 19:18:01 2011
NOTE: recovering COD for group 1/0xba97dae1 (ORADATA)
SUCCESS: completed COD recovery for group 1/0xba97dae1 (ORADATA)
Thu Feb 10 19:18:01 2011
Starting background process ASMB
ASMB started with pid=17, OS id=7767
Thu Feb 10 19:21:06 2011
NOTE: ASMB process exiting due to lack of ASM file activity
Sun May 8 06:48:33 2011
Shutting down instance (abort)
License high water mark = 6
Instance terminated by USER, pid = 20819
初步判斷是由於ASM出現異常導致的此次故障。但是和這裡的提示“NOTE: ASMB process exiting due to lack of ASM file activity”沒有關係。這個提示僅僅是一個提示而已,在ASM日誌中的其他地方也有多次出現。
3.嘗試故障處理
1)嘗試啟動ASM無果。
2)手工啟動ASM例項可以成功
racdb1@racdb1 /home/oracle$ export ORACLE_SID=+ASM1
+ASM1@racdb1 /home/oracle$ sqlplus / as sysdba
SQL*Plus: Release 10.2.0.3.0 - Production on Sun May 8 13:43:06 2011
Copyright (c) 1982, 2006, Oracle. All Rights Reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters and Data Mining options
NotConnected@> shutdown immediate;
ASM diskgroups dismounted
ASM instance shutdown
NotConnected@> startup;
ASM instance started
Total System Global Area 130023424 bytes
Fixed Size 2071000 bytes
Variable Size 102786600 bytes
ASM Cache 25165824 bytes
3)但啟動資料庫例項時丟擲“ORA-01105”和“ORA-38767”錯誤。
racdb1@racdb1 /home/oracle$ sqlplus / as sysdba
SQL*Plus: Release 10.2.0.3.0 - Production on Sun May 8 13:43:53 2011
Copyright (c) 1982, 2006, Oracle. All Rights Reserved.
Connected to an idle instance.
NotConnected@> startup;
ORACLE instance started.
Total System Global Area 8388608000 bytes
Fixed Size 2086096 bytes
Variable Size 1644170032 bytes
Database Buffers 6727663616 bytes
Redo Buffers 14688256 bytes
ORA-01105: mount is incompatible with mounts by other instances
ORA-38767: flashback retention target parameter mismatch
4.再次嘗試故障處理
對除VIP之外的CRS資源進行重啟,此時仍然無法啟動ASM例項和資料庫例項。
5.最後的處理方法
最後嘗試重啟第一個節點的所有CRS資源,終於將RAC的第一個節點的所有資源啟動完畢。
6.小結
透過一系列的故障處理嘗試,最終恢復了RAC資料庫故障。
Good luck.
secooler
11.05.08
-- The End --
1.故障現象
兩節點RAC第一節點例項停止,經檢查ASM例項亦異常終止。
2.故障分析
檢查資料庫例項及ASM例項的的alert尋找處理思路。
1)alert日誌內容
Sun May 8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_asmb_21478.trc:
ORA-15064: communication failure with ASM instance
ORA-03113: end-of-file on communication channel
Sun May 8 06:59:06 2011
ASMB: terminating instance due to error 15064
Sun May 8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_lms1_21275.trc:
ORA-15064: communication failure with ASM instance
Sun May 8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_lgwr_21283.trc:
ORA-15064: communication failure with ASM instance
Sun May 8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_lms0_21271.trc:
ORA-15064: communication failure with ASM instance
Sun May 8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_lmon_21267.trc:
ORA-15064: communication failure with ASM instance
Sun May 8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_lmd0_21269.trc:
ORA-15064: communication failure with ASM instance
Sun May 8 06:59:06 2011
System state dump is made for local instance
System State dumped to trace file /oracle/app/oracle/admin/racdb/bdump/racdb1_diag_21263.trc
Sun May 8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_mman_21279.trc:
ORA-15064: communication failure with ASM instance
Sun May 8 06:59:07 2011
Shutting down instance (abort)
License high water mark = 7
Sun May 8 06:59:07 2011
Trace dumping is performing id=[cdmp_20110508065906]
Sun May 8 06:59:11 2011
Instance terminated by ASMB, pid = 21478
Sun May 8 06:59:12 2011
Instance terminated by USER, pid = 4110
Mon May 9 13:44:05 2011
2)trace檔案中擷取到如下故障內容
kjctseventdump-end tail 14 heads 0 @ 0 14 @ -1115894656
DEFER MSG QUEUE ON LMS1 IS EMPTY
SEQUENCES:
0:0.0 1:2933.0
error 15064 detected in background process
ORA-15064: communication failure with ASM instance
3)ASM日誌中記錄瞭如下內容
Thu Feb 10 19:17:58 2011
NOTE: cache recovered group 1 to fcn 0.20162635
Thu Feb 10 19:17:58 2011
NOTE: opening chunk 1 at fcn 0.20162635 ABA
NOTE: seq=79 blk=1597
Thu Feb 10 19:17:58 2011
NOTE: cache mounting group 1/0xBA97DAE1 (ORADATA) succeeded
SUCCESS: diskgroup ORADATA was mounted
Thu Feb 10 19:18:01 2011
NOTE: recovering COD for group 1/0xba97dae1 (ORADATA)
SUCCESS: completed COD recovery for group 1/0xba97dae1 (ORADATA)
Thu Feb 10 19:18:01 2011
Starting background process ASMB
ASMB started with pid=17, OS id=7767
Thu Feb 10 19:21:06 2011
NOTE: ASMB process exiting due to lack of ASM file activity
Sun May 8 06:48:33 2011
Shutting down instance (abort)
License high water mark = 6
Instance terminated by USER, pid = 20819
初步判斷是由於ASM出現異常導致的此次故障。但是和這裡的提示“NOTE: ASMB process exiting due to lack of ASM file activity”沒有關係。這個提示僅僅是一個提示而已,在ASM日誌中的其他地方也有多次出現。
3.嘗試故障處理
1)嘗試啟動ASM無果。
2)手工啟動ASM例項可以成功
racdb1@racdb1 /home/oracle$ export ORACLE_SID=+ASM1
+ASM1@racdb1 /home/oracle$ sqlplus / as sysdba
SQL*Plus: Release 10.2.0.3.0 - Production on Sun May 8 13:43:06 2011
Copyright (c) 1982, 2006, Oracle. All Rights Reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters and Data Mining options
NotConnected@> shutdown immediate;
ASM diskgroups dismounted
ASM instance shutdown
NotConnected@> startup;
ASM instance started
Total System Global Area 130023424 bytes
Fixed Size 2071000 bytes
Variable Size 102786600 bytes
ASM Cache 25165824 bytes
3)但啟動資料庫例項時丟擲“ORA-01105”和“ORA-38767”錯誤。
racdb1@racdb1 /home/oracle$ sqlplus / as sysdba
SQL*Plus: Release 10.2.0.3.0 - Production on Sun May 8 13:43:53 2011
Copyright (c) 1982, 2006, Oracle. All Rights Reserved.
Connected to an idle instance.
NotConnected@> startup;
ORACLE instance started.
Total System Global Area 8388608000 bytes
Fixed Size 2086096 bytes
Variable Size 1644170032 bytes
Database Buffers 6727663616 bytes
Redo Buffers 14688256 bytes
ORA-01105: mount is incompatible with mounts by other instances
ORA-38767: flashback retention target parameter mismatch
4.再次嘗試故障處理
對除VIP之外的CRS資源進行重啟,此時仍然無法啟動ASM例項和資料庫例項。
5.最後的處理方法
最後嘗試重啟第一個節點的所有CRS資源,終於將RAC的第一個節點的所有資源啟動完畢。
6.小結
透過一系列的故障處理嘗試,最終恢復了RAC資料庫故障。
Good luck.
secooler
11.05.08
-- The End --
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/519536/viewspace-694867/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- ADG 例項異常終止故障分析報告
- 【RAC】因清理不完整導致RAC ASM例項建立失敗ASM
- asm例項自動dismount導致rac一個節點當機ASM
- Oracle RAC啟動因CTSS導致的異常Oracle
- 處理rac資料庫一個節點監聽異常資料庫
- WLS 10.3.0 更新發布應用異常終止處理一例
- 【RAC】儲存陣列電源故障導致RAC資料庫異常掛起陣列資料庫
- BW Conversion Routine 探究以及例項操作和異常處理
- ASM之建立ASM例項時的常見故障ASM
- ORACLE RAC spfile異常處理辦法Oracle
- RMAN異機恢復:RAC到單例項單例
- 常見路由器故障處理例項詳解路由器
- RAC+DG(asm單例項)ASM單例
- Redis CVE-2020-14147導致例項異常退出Redis
- oracle10.2.0.1 (rhel4)rac刪除asm例項不乾淨導致重建asm例項出錯OracleASM
- 序列異常導致災備端應用異常處理一則
- 【問題處理】因ASM磁碟組空間不足導致資料庫例項無法啟動的故障處理ASM資料庫
- APUE 7-3 終止處理程式例項
- ORACLE 11.2.0.4 rac for linux 鏈路宕導致的單節點異常當機OracleLinux
- oracle例項啟動異常慢案例一Oracle
- RAC恢復到單例項節點上單例
- Oracle 10g RAC增加節點例項Oracle 10g
- RAC asm恢復到單例項ASM單例
- 異常篇——異常處理
- rac 正常關閉例項service不會自動漂移,只有在例項異常abort才會發生自動failoverAI
- Tomcat常見異常及解決方案程式碼例項Tomcat
- rac asm 恢復到 單例項 1ASM單例
- rac asm 恢復到 單例項 2ASM單例
- 記一次故障排查(vnc日誌檔案過大導致crsd程式異常終止)VNC
- Oracle RAC某一節點異常,你該怎麼辦?Oracle
- 單節點執行ASM例項ASM
- 異常-throws的方式處理異常
- 異常處理與異常函式函式
- Linux,Network manager 導致節點異常重啟Linux
- 異常處理
- 【故障處理】RAC環境第二節點無法歸檔的詭異問題處理
- (轉)Oracle rac環境下清除asm例項OracleASM
- crontab導致CPU異常的問題分析及處理