asm例項自動dismount導致rac一個節點當機
asm日誌<br />
/u01/app/grid/diag/asm/+asm/+ASM1/trace<br />
<br />
<br />
Thu Jul 30 02:10:46 2015<br />
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.<br />
WARNING: Waited 15 secs for write IO to PST disk 1 in group 1.<br />
WARNING: Waited 15 secs for write IO to PST disk 2 in group 1.<br />
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.<br />
WARNING: Waited 15 secs for write IO to PST disk 1 in group 1.<br />
WARNING: Waited 15 secs for write IO to PST disk 2 in group 1.<br />
Thu Jul 30 02:10:47 2015<br />
NOTE: process _b000_+asm1 (38695) initiating offline of disk 0.3915941304 (DATA2_0000) with mask 0x7e in group 1<br />
NOTE: process _b000_+asm1 (38695) initiating offline of disk 1.3915941302 (DATA2_0001) with mask 0x7e in group 1<br />
NOTE: process _b000_+asm1 (38695) initiating offline of disk 2.3915941303 (DATA2_0002) with mask 0x7e in group 1<br />
NOTE: checking PST: grp = 1<br />
GMON checking disk modes for group 1 at 12 for pid 28, osid 38695<br />
ERROR: no read quorum in group: required 2, found 0 disks<br />
.............<br />
Dirty Detach Reconfiguration complete<br />
Thu Jul 30 02:10:47 2015<br />
WARNING: dirty detached from domain 1<br />
NOTE: cache dismounted group 1/0xB368755B (DATA2) <--自己dismounted了<br />
SQL> alter diskgroup DATA2 dismount force /* ASM SERVER:3009967451 */ <br />
.............<br />
Thu Jul 30 02:11:24 2015<br />
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1<br />
SUCCESS: diskgroup DATA2 was mounted <---自己又mounted了<br />
SUCCESS: ALTER DISKGROUP DATA2 MOUNT /* asm agent *//* {0:31:15779} */ <br />
<br />
<br />
<br />
參考文件<br />
ASM diskgroup dismount with "Waited 15 secs for write IO to PST" (文件 ID 1581684.1)<br />
<br />
alert可以看到ASM磁碟dismount,並且是錯誤“Waited 15 secs for write IO to PST”的問題,這是ASM特有的心跳超時檢測,<br />
ASM instance會定期檢查每個asm disk是不是能正常反饋<br />
<br />
<br />
Generally this kind messages comes in ASM alertlog file on below situations,<br />
Delayed ASM PST heart beats on ASM disks in normal or high redundancy diskgroup,<br />
thus the ASM instance dismount the diskgroup.By default, it is 15 seconds.<br />
By the way the heart beat delays are sort of ignored for external redundancy diskgroup.<br />
ASM instance stop issuing more PST heart beat until it succeeds PST revalidation,<br />
but the heart beat delays do not dismount external redundancy diskgroup directly.<br />
~~~~~~~~~~~~~~~~~~~~~~~~~~~~<br />
上面描述,可以理解為下面幾點:<br />
1. ASM例項會定期檢查每一個磁碟組的磁碟狀態,是否通訊正常;<br />
2. 這個檢查,只是針對normal和high冗餘模式,對於external冗餘,不會遇到這個錯誤;<br />
3. 預設情況是15s超時,也就是說15s磁碟組還是沒有對ASM例項響應的話,就會dismount磁碟組。<br />
<br />
<br />
在儲存網路出現問題的情況下,會引發這個錯誤的出現。也就是說,在ASM定期發出檢查資訊的時候,如果磁碟沒有在15s內反饋的話,就認為磁碟已經無法訪問。<br />
<br />
<br />
實際情況是上面的凌晨2:10時間點正好是做全庫備份時間,估計大量的寫入導致io響應慢<br />
<br />
在11.2.0.3.0之後才有這個引數出現,也就是說ASM例項對磁碟超時的檢測是在11.2.0.3之後才出現的<br />
<br />
<br />
set pages 9999;<br />
<br />
SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ<br />
FROM SYS.x$ksppi x, SYS.x$ksppcv y<br />
WHERE x.inst_id = USERENV ('Instance')<br />
AND y.inst_id = USERENV ('Instance')<br />
AND x.indx = y.indx<br />
AND upper(x.ksppinm) like '%ASM_H%';<br />
顯示如下:<br />
_asm_hbeatiowait<br />
15<br />
number of secs to wait for PST Async Hbeat IO return<br />
_asm_hbeatwaitquantum<br />
2<br />
quantum used to compute time-to-wait for a PST Hbeat check<br />
<br />
<br />
在儲存網路條件不是很好的情況下可以設定檢查時間長點,其實在12.1.0.2預設就是120秒了<br />
<br />
alter system set "_asm_hbeatiowait"=120 scope=spfile;<br />
<br />
重啟asm 繼續觀察<br />
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/25462274/viewspace-2149305/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- ORACLE 11.2.0.4 rac for linux 鏈路宕導致的單節點異常當機OracleLinux
- RAC+DG(asm單例項)ASM單例
- Oracle 19c rac安裝,只能啟動一個節點的ASMOracleASM
- XML節點自動生成簡單例項XML單例
- rac二節點例項redo故障無法啟動修復
- DRM特性引起的RAC節點當機
- RAC節點啟動失敗--ASM無法連線ASM
- MySQL Case-時間問題導致MySQL例項批次當機MySql
- 關於Oracle 11G RAC雙節點之間存在防火牆導致只能一個節點執行Oracle防火牆
- 一次詳細的RAC 節點例項驅逐分析文件
- oracle兩節點RAC,由於gipc導致某節點crs無法啟動問題分析Oracle
- 15、MySQL Case-時間問題導致MySQL例項批次當機MySql
- oracle 12c RAC安裝,例項不能多節點同時啟動Oracle
- oracle 11.2.0.4 rac節點異常當機之ORA-07445Oracle
- ORACLE11GR2 RAC解除安裝ASM例項步驟OracleASM
- ORACLE RAC 11.2.0.4 ASM加盤導致叢集重啟之ASM sga設定過小OracleASM
- 多路徑配置問題和ACFS啟用原因導致rac二節點不能正常啟動
- 【ASM】RAC19C因引數設定不當,asm無法啟動ASM
- asm磁碟組依賴導致資料庫自啟動報錯ASM資料庫
- RAC節點hang住, oracle bug導致了cpu過高,無法啟動叢集隔離Oracle
- rac 正常關閉例項service不會自動漂移,只有在例項異常abort才會發生自動failoverAI
- ORACLE RAC 兩節點db_32k_cache_size設定不當導致表truncate失敗之ORA-00379Oracle
- runc hang 導致 Kubernetes 節點 NotReady
- RAC二節點啟動異常
- 虛擬機器搭建rac ASM盤啟動失敗虛擬機ASM
- 記一次ORA-01102導致資料庫例項無法啟動案例資料庫
- IP packet reassembles failed導致例項被驅逐AI
- goldengate + asm + racGoASM
- Oracle RAC啟動因CTSS導致的異常Oracle
- 網路原因導致rac安裝過程中節點2跑root.sh失敗
- gpexpand擴充gp例項和節點
- 當心ORACLE 12.2 RAC新特性引入的BUG導致ORA-4031Oracle
- 【Oracle】ASM例項安裝入門OracleASM
- 記一次Oracle RAC for aix 儲存雙控鎖盤導致ASM控制檔案損壞恢復OracleAIASM
- 【RAC】作業系統重灌後RAC11g節點重置注意事項作業系統
- ASM單例項安裝後,需要手動設定ASM的引數檔案ASM單例
- 3.1.4 準備啟動一個例項
- Oracle RAC新增節點Oracle
- Redis CVE-2020-14147導致例項異常退出Redis