asm例項自動dismount導致rac一個節點當機
asm日誌<br />
/u01/app/grid/diag/asm/+asm/+ASM1/trace<br />
<br />
<br />
Thu Jul 30 02:10:46 2015<br />
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.<br />
WARNING: Waited 15 secs for write IO to PST disk 1 in group 1.<br />
WARNING: Waited 15 secs for write IO to PST disk 2 in group 1.<br />
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.<br />
WARNING: Waited 15 secs for write IO to PST disk 1 in group 1.<br />
WARNING: Waited 15 secs for write IO to PST disk 2 in group 1.<br />
Thu Jul 30 02:10:47 2015<br />
NOTE: process _b000_+asm1 (38695) initiating offline of disk 0.3915941304 (DATA2_0000) with mask 0x7e in group 1<br />
NOTE: process _b000_+asm1 (38695) initiating offline of disk 1.3915941302 (DATA2_0001) with mask 0x7e in group 1<br />
NOTE: process _b000_+asm1 (38695) initiating offline of disk 2.3915941303 (DATA2_0002) with mask 0x7e in group 1<br />
NOTE: checking PST: grp = 1<br />
GMON checking disk modes for group 1 at 12 for pid 28, osid 38695<br />
ERROR: no read quorum in group: required 2, found 0 disks<br />
.............<br />
Dirty Detach Reconfiguration complete<br />
Thu Jul 30 02:10:47 2015<br />
WARNING: dirty detached from domain 1<br />
NOTE: cache dismounted group 1/0xB368755B (DATA2) <--自己dismounted了<br />
SQL> alter diskgroup DATA2 dismount force /* ASM SERVER:3009967451 */ <br />
.............<br />
Thu Jul 30 02:11:24 2015<br />
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1<br />
SUCCESS: diskgroup DATA2 was mounted <---自己又mounted了<br />
SUCCESS: ALTER DISKGROUP DATA2 MOUNT /* asm agent *//* {0:31:15779} */ <br />
<br />
<br />
<br />
參考文件<br />
ASM diskgroup dismount with "Waited 15 secs for write IO to PST" (文件 ID 1581684.1)<br />
<br />
alert可以看到ASM磁碟dismount,並且是錯誤“Waited 15 secs for write IO to PST”的問題,這是ASM特有的心跳超時檢測,<br />
ASM instance會定期檢查每個asm disk是不是能正常反饋<br />
<br />
<br />
Generally this kind messages comes in ASM alertlog file on below situations,<br />
Delayed ASM PST heart beats on ASM disks in normal or high redundancy diskgroup,<br />
thus the ASM instance dismount the diskgroup.By default, it is 15 seconds.<br />
By the way the heart beat delays are sort of ignored for external redundancy diskgroup.<br />
ASM instance stop issuing more PST heart beat until it succeeds PST revalidation,<br />
but the heart beat delays do not dismount external redundancy diskgroup directly.<br />
~~~~~~~~~~~~~~~~~~~~~~~~~~~~<br />
上面描述,可以理解為下面幾點:<br />
1. ASM例項會定期檢查每一個磁碟組的磁碟狀態,是否通訊正常;<br />
2. 這個檢查,只是針對normal和high冗餘模式,對於external冗餘,不會遇到這個錯誤;<br />
3. 預設情況是15s超時,也就是說15s磁碟組還是沒有對ASM例項響應的話,就會dismount磁碟組。<br />
<br />
<br />
在儲存網路出現問題的情況下,會引發這個錯誤的出現。也就是說,在ASM定期發出檢查資訊的時候,如果磁碟沒有在15s內反饋的話,就認為磁碟已經無法訪問。<br />
<br />
<br />
實際情況是上面的凌晨2:10時間點正好是做全庫備份時間,估計大量的寫入導致io響應慢<br />
<br />
在11.2.0.3.0之後才有這個引數出現,也就是說ASM例項對磁碟超時的檢測是在11.2.0.3之後才出現的<br />
<br />
<br />
set pages 9999;<br />
<br />
SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ<br />
FROM SYS.x$ksppi x, SYS.x$ksppcv y<br />
WHERE x.inst_id = USERENV ('Instance')<br />
AND y.inst_id = USERENV ('Instance')<br />
AND x.indx = y.indx<br />
AND upper(x.ksppinm) like '%ASM_H%';<br />
顯示如下:<br />
_asm_hbeatiowait<br />
15<br />
number of secs to wait for PST Async Hbeat IO return<br />
_asm_hbeatwaitquantum<br />
2<br />
quantum used to compute time-to-wait for a PST Hbeat check<br />
<br />
<br />
在儲存網路條件不是很好的情況下可以設定檢查時間長點,其實在12.1.0.2預設就是120秒了<br />
<br />
alter system set "_asm_hbeatiowait"=120 scope=spfile;<br />
<br />
重啟asm 繼續觀察<br />
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/25462274/viewspace-2149305/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 【RAC】處理因ASM例項異常導致RAC第一節點例項異常終止故障ASM
- 【RAC】因清理不完整導致RAC ASM例項建立失敗ASM
- 單節點執行ASM例項ASM
- oracle10.2.0.1 (rhel4)rac刪除asm例項不乾淨導致重建asm例項出錯OracleASM
- ORACLE 11.2.0.4 rac for linux 鏈路宕導致的單節點異常當機OracleLinux
- 因為主機時間不同步導致的hbase zookeeper 節點當機奔潰 一例
- XML節點自動生成簡單例項XML單例
- Oracle RAC 一個節點不能自動啟動 怪問題Oracle
- RAC一個節點記憶體故障當機,無法訪問記憶體
- 網路中斷導致RAC環境所有節點監聽lsnr自動關閉
- 修改系統時間導致RAC環境的一個例項重啟
- ASMCMD +ASM 例項 Connected to an idle instance. 一個 / 導致的問題ASM
- Oracle 19c rac安裝,只能啟動一個節點的ASMOracleASM
- Oracle 11g RAC的ASM例項記憶體引數被修改導致無法啟動OracleASM記憶體
- DRM特性引起的RAC節點當機
- MySQL Case-時間問題導致MySQL例項批次當機MySql
- Flex ASM自動重定位ASM例項測試FlexASM
- Oracle9204 RAC 節點2當機後5小時重新啟動找不到節點1上例項Oracle
- ASM例項出現ORA-4031錯誤導致例項崩潰ASM
- RAC恢復到單例項節點上單例
- Oracle 10g RAC增加節點例項Oracle 10g
- 關於Oracle 11G RAC雙節點之間存在防火牆導致只能一個節點執行Oracle防火牆
- 10g RAC fail over測試,節點2例項不能隨Linux自動啟動AILinux
- 一次RAC節點當機的解決過程
- oracle兩節點RAC,由於gipc導致某節點crs無法啟動問題分析Oracle
- 15、MySQL Case-時間問題導致MySQL例項批次當機MySql
- RAC+DG(asm單例項)ASM單例
- 【RAC】刪除RAC資料庫節點(一)——刪除資料庫例項資料庫
- 11gR2 RAC手動新增節點資料庫例項資料庫
- AIX RAC9I 節點當機測試AI
- RAC節點啟動失敗--ASM無法連線ASM
- HP-UX+11.2.0.3RAC因裸裝置許可權不一致導致RAC一個節點報錯的解決UX
- 一次詳細的RAC 節點例項驅逐分析文件
- RAC asm恢復到單例項ASM單例
- 10g rac asm 恢復到 單例項(一)ASM單例
- oracle 12c RAC安裝,例項不能多節點同時啟動Oracle
- RAC系統當中,job在哪個節點執行?
- rac asm 恢復到 單例項 1ASM單例