【RAC】處理因ASM例項異常導致RAC第一節點例項異常終止故障
遭遇RAC第一節點例項由於ASM例項異常導致資料庫例項非正常停止,記錄在此。
1.故障現象
兩節點RAC第一節點例項停止,經檢查ASM例項亦異常終止。
2.故障分析
檢查資料庫例項及ASM例項的的alert尋找處理思路。
1)alert日誌內容
Sun May 8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_asmb_21478.trc:
ORA-15064: communication failure with ASM instance
ORA-03113: end-of-file on communication channel
Sun May 8 06:59:06 2011
ASMB: terminating instance due to error 15064
Sun May 8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_lms1_21275.trc:
ORA-15064: communication failure with ASM instance
Sun May 8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_lgwr_21283.trc:
ORA-15064: communication failure with ASM instance
Sun May 8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_lms0_21271.trc:
ORA-15064: communication failure with ASM instance
Sun May 8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_lmon_21267.trc:
ORA-15064: communication failure with ASM instance
Sun May 8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_lmd0_21269.trc:
ORA-15064: communication failure with ASM instance
Sun May 8 06:59:06 2011
System state dump is made for local instance
System State dumped to trace file /oracle/app/oracle/admin/racdb/bdump/racdb1_diag_21263.trc
Sun May 8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_mman_21279.trc:
ORA-15064: communication failure with ASM instance
Sun May 8 06:59:07 2011
Shutting down instance (abort)
License high water mark = 7
Sun May 8 06:59:07 2011
Trace dumping is performing id=[cdmp_20110508065906]
Sun May 8 06:59:11 2011
Instance terminated by ASMB, pid = 21478
Sun May 8 06:59:12 2011
Instance terminated by USER, pid = 4110
Mon May 9 13:44:05 2011
2)trace檔案中擷取到如下故障內容
kjctseventdump-end tail 14 heads 0 @ 0 14 @ -1115894656
DEFER MSG QUEUE ON LMS1 IS EMPTY
SEQUENCES:
0:0.0 1:2933.0
error 15064 detected in background process
ORA-15064: communication failure with ASM instance
3)ASM日誌中記錄瞭如下內容
Thu Feb 10 19:17:58 2011
NOTE: cache recovered group 1 to fcn 0.20162635
Thu Feb 10 19:17:58 2011
NOTE: opening chunk 1 at fcn 0.20162635 ABA
NOTE: seq=79 blk=1597
Thu Feb 10 19:17:58 2011
NOTE: cache mounting group 1/0xBA97DAE1 (ORADATA) succeeded
SUCCESS: diskgroup ORADATA was mounted
Thu Feb 10 19:18:01 2011
NOTE: recovering COD for group 1/0xba97dae1 (ORADATA)
SUCCESS: completed COD recovery for group 1/0xba97dae1 (ORADATA)
Thu Feb 10 19:18:01 2011
Starting background process ASMB
ASMB started with pid=17, OS id=7767
Thu Feb 10 19:21:06 2011
NOTE: ASMB process exiting due to lack of ASM file activity
Sun May 8 06:48:33 2011
Shutting down instance (abort)
License high water mark = 6
Instance terminated by USER, pid = 20819
初步判斷是由於ASM出現異常導致的此次故障。但是和這裡的提示“NOTE: ASMB process exiting due to lack of ASM file activity”沒有關係。這個提示僅僅是一個提示而已,在ASM日誌中的其他地方也有多次出現。
3.嘗試故障處理
1)嘗試啟動ASM無果。
2)手工啟動ASM例項可以成功
racdb1@racdb1 /home/oracle$ export ORACLE_SID=+ASM1
+ASM1@racdb1 /home/oracle$ sqlplus / as sysdba
SQL*Plus: Release 10.2.0.3.0 - Production on Sun May 8 13:43:06 2011
Copyright (c) 1982, 2006, Oracle. All Rights Reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters and Data Mining options
NotConnected@> shutdown immediate;
ASM diskgroups dismounted
ASM instance shutdown
NotConnected@> startup;
ASM instance started
Total System Global Area 130023424 bytes
Fixed Size 2071000 bytes
Variable Size 102786600 bytes
ASM Cache 25165824 bytes
3)但啟動資料庫例項時丟擲“ORA-01105”和“ORA-38767”錯誤。
racdb1@racdb1 /home/oracle$ sqlplus / as sysdba
SQL*Plus: Release 10.2.0.3.0 - Production on Sun May 8 13:43:53 2011
Copyright (c) 1982, 2006, Oracle. All Rights Reserved.
Connected to an idle instance.
NotConnected@> startup;
ORACLE instance started.
Total System Global Area 8388608000 bytes
Fixed Size 2086096 bytes
Variable Size 1644170032 bytes
Database Buffers 6727663616 bytes
Redo Buffers 14688256 bytes
ORA-01105: mount is incompatible with mounts by other instances
ORA-38767: flashback retention target parameter mismatch
4.再次嘗試故障處理
對除VIP之外的CRS資源進行重啟,此時仍然無法啟動ASM例項和資料庫例項。
5.最後的處理方法
最後嘗試重啟第一個節點的所有CRS資源,終於將RAC的第一個節點的所有資源啟動完畢。
6.小結
透過一系列的故障處理嘗試,最終恢復了RAC資料庫故障。
Good luck.
secooler
11.05.08
-- The End --
1.故障現象
兩節點RAC第一節點例項停止,經檢查ASM例項亦異常終止。
2.故障分析
檢查資料庫例項及ASM例項的的alert尋找處理思路。
1)alert日誌內容
Sun May 8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_asmb_21478.trc:
ORA-15064: communication failure with ASM instance
ORA-03113: end-of-file on communication channel
Sun May 8 06:59:06 2011
ASMB: terminating instance due to error 15064
Sun May 8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_lms1_21275.trc:
ORA-15064: communication failure with ASM instance
Sun May 8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_lgwr_21283.trc:
ORA-15064: communication failure with ASM instance
Sun May 8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_lms0_21271.trc:
ORA-15064: communication failure with ASM instance
Sun May 8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_lmon_21267.trc:
ORA-15064: communication failure with ASM instance
Sun May 8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_lmd0_21269.trc:
ORA-15064: communication failure with ASM instance
Sun May 8 06:59:06 2011
System state dump is made for local instance
System State dumped to trace file /oracle/app/oracle/admin/racdb/bdump/racdb1_diag_21263.trc
Sun May 8 06:59:06 2011
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_mman_21279.trc:
ORA-15064: communication failure with ASM instance
Sun May 8 06:59:07 2011
Shutting down instance (abort)
License high water mark = 7
Sun May 8 06:59:07 2011
Trace dumping is performing id=[cdmp_20110508065906]
Sun May 8 06:59:11 2011
Instance terminated by ASMB, pid = 21478
Sun May 8 06:59:12 2011
Instance terminated by USER, pid = 4110
Mon May 9 13:44:05 2011
2)trace檔案中擷取到如下故障內容
kjctseventdump-end tail 14 heads 0 @ 0 14 @ -1115894656
DEFER MSG QUEUE ON LMS1 IS EMPTY
SEQUENCES:
0:0.0 1:2933.0
error 15064 detected in background process
ORA-15064: communication failure with ASM instance
3)ASM日誌中記錄瞭如下內容
Thu Feb 10 19:17:58 2011
NOTE: cache recovered group 1 to fcn 0.20162635
Thu Feb 10 19:17:58 2011
NOTE: opening chunk 1 at fcn 0.20162635 ABA
NOTE: seq=79 blk=1597
Thu Feb 10 19:17:58 2011
NOTE: cache mounting group 1/0xBA97DAE1 (ORADATA) succeeded
SUCCESS: diskgroup ORADATA was mounted
Thu Feb 10 19:18:01 2011
NOTE: recovering COD for group 1/0xba97dae1 (ORADATA)
SUCCESS: completed COD recovery for group 1/0xba97dae1 (ORADATA)
Thu Feb 10 19:18:01 2011
Starting background process ASMB
ASMB started with pid=17, OS id=7767
Thu Feb 10 19:21:06 2011
NOTE: ASMB process exiting due to lack of ASM file activity
Sun May 8 06:48:33 2011
Shutting down instance (abort)
License high water mark = 6
Instance terminated by USER, pid = 20819
初步判斷是由於ASM出現異常導致的此次故障。但是和這裡的提示“NOTE: ASMB process exiting due to lack of ASM file activity”沒有關係。這個提示僅僅是一個提示而已,在ASM日誌中的其他地方也有多次出現。
3.嘗試故障處理
1)嘗試啟動ASM無果。
2)手工啟動ASM例項可以成功
racdb1@racdb1 /home/oracle$ export ORACLE_SID=+ASM1
+ASM1@racdb1 /home/oracle$ sqlplus / as sysdba
SQL*Plus: Release 10.2.0.3.0 - Production on Sun May 8 13:43:06 2011
Copyright (c) 1982, 2006, Oracle. All Rights Reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters and Data Mining options
NotConnected@> shutdown immediate;
ASM diskgroups dismounted
ASM instance shutdown
NotConnected@> startup;
ASM instance started
Total System Global Area 130023424 bytes
Fixed Size 2071000 bytes
Variable Size 102786600 bytes
ASM Cache 25165824 bytes
3)但啟動資料庫例項時丟擲“ORA-01105”和“ORA-38767”錯誤。
racdb1@racdb1 /home/oracle$ sqlplus / as sysdba
SQL*Plus: Release 10.2.0.3.0 - Production on Sun May 8 13:43:53 2011
Copyright (c) 1982, 2006, Oracle. All Rights Reserved.
Connected to an idle instance.
NotConnected@> startup;
ORACLE instance started.
Total System Global Area 8388608000 bytes
Fixed Size 2086096 bytes
Variable Size 1644170032 bytes
Database Buffers 6727663616 bytes
Redo Buffers 14688256 bytes
ORA-01105: mount is incompatible with mounts by other instances
ORA-38767: flashback retention target parameter mismatch
4.再次嘗試故障處理
對除VIP之外的CRS資源進行重啟,此時仍然無法啟動ASM例項和資料庫例項。
5.最後的處理方法
最後嘗試重啟第一個節點的所有CRS資源,終於將RAC的第一個節點的所有資源啟動完畢。
6.小結
透過一系列的故障處理嘗試,最終恢復了RAC資料庫故障。
Good luck.
secooler
11.05.08
-- The End --
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/519536/viewspace-694867/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- ADG 例項異常終止故障分析報告
- Oracle RAC啟動因CTSS導致的異常Oracle
- Redis CVE-2020-14147導致例項異常退出Redis
- RAC+DG(asm單例項)ASM單例
- RAC二節點啟動異常
- rac二節點例項redo故障無法啟動修復
- ORACLE 11.2.0.4 rac for linux 鏈路宕導致的單節點異常當機OracleLinux
- rac 正常關閉例項service不會自動漂移,只有在例項異常abort才會發生自動failoverAI
- oracle例項啟動異常慢案例一Oracle
- Tomcat常見異常及解決方案程式碼例項Tomcat
- oracle 11.2.0.4 rac節點異常當機之ORA-07445Oracle
- Oracle RAC某一節點異常,你該怎麼辦?Oracle
- 異常-異常的注意事項
- 異常篇——異常處理
- rac恢復到單例項單例
- Oracle Linux 6.7中 Oracle 11.2.0.4 RAC叢集CRS異常處理OracleLinux
- GaussDB(分散式)例項故障處理分散式
- 記錄一次Oracle 11.2.0.4 RAC異地恢復到單例項Oracle單例
- 3.6 延遲例項終止
- ORACLE11GR2 RAC解除安裝ASM例項步驟OracleASM
- 模擬oracle rac節點異常時如何保持ogg正常執行Oracle
- 一次詳細的RAC 節點例項驅逐分析文件
- Java 異常處理與正規表示式詳解,例項演練及最佳實踐Java
- 異常-throws的方式處理異常
- 異常處理
- 異常處理與推導式
- Java 異常表與異常處理原理Java
- restframework 異常處理及自定義異常RESTFramework
- RAC+單例項DG的切換單例
- oracle 12c RAC安裝,例項不能多節點同時啟動Oracle
- React 異常處理React
- JS異常處理JS
- oracle異常處理Oracle
- Python——異常處理Python
- Python異常處理Python
- ThinkPHP 異常處理PHP
- JavaScript 異常處理JavaScript
- JAVA 異常處理Java
- 異常的處理