asm日誌 /u01/app/grid/diag/asm/+asm/+ASM1/trace Thu Jul 30 02:10:46 2015 WARNING: Waited 15 secs for write IO to PST disk 0 in group 1. WARNING: Waited 15 secs for write IO to PST disk 1 in group 1. WARNING: Waited 15 secs for write IO to PST disk 2 in group 1. WARNING: Waited 15 secs for write IO to PST disk 0 in group 1. WARNING: Waited 15 secs for write IO to PST disk 1 in group 1. WARNING: Waited 15 secs for write IO to PST disk 2 in group 1. Thu Jul 30 02:10:47 2015 NOTE: process _b000_+asm1 (38695) initiating offline of disk 0.3915941304 (DATA2_0000) with mask 0x7e in group 1 NOTE: process _b000_+asm1 (38695) initiating offline of disk 1.3915941302 (DATA2_0001) with mask 0x7e in group 1 NOTE: process _b000_+asm1 (38695) initiating offline of disk 2.3915941303 (DATA2_0002) with mask 0x7e in group 1 NOTE: checking PST: grp = 1 GMON checking disk modes for group 1 at 12 for pid 28, osid 38695 ERROR: no read quorum in group: required 2, found 0 disks ............. Dirty Detach Reconfiguration complete Thu Jul 30 02:10:47 2015 WARNING: dirty detached from domain 1 NOTE: cache dismounted group 1/0xB368755B (DATA2)   <--自己dismounted了 SQL> alter diskgroup DATA2 dismount force /* ASM SERVER:3009967451 */  ............. Thu Jul 30 02:11:24 2015 NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1 SUCCESS: diskgroup DATA2 was mounted    <---自己又mounted了 SUCCESS: ALTER DISKGROUP DATA2 MOUNT  /* asm agent *//* {0:31:15779} */      參考文件 ASM diskgroup dismount with "Waited 15 secs for write IO to PST" (文件 ID 1581684.1) alert可以看到ASM磁碟dismount，並且是錯誤“Waited 15 secs for write IO to PST”的問題，這是ASM特有的心跳超時檢測， ASM instance會定期檢查每個asm disk是不是能正常反饋 Generally this kind messages comes in ASM alertlog file on below situations, Delayed ASM PST heart beats on ASM disks in normal or high redundancy diskgroup, thus the ASM instance dismount the diskgroup.By default, it is 15 seconds. By the way the heart beat delays are sort of ignored for external redundancy diskgroup. ASM instance stop issuing more PST heart beat until it succeeds PST revalidation, but the heart beat delays do not dismount external redundancy diskgroup directly. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 上面描述，可以理解為下面幾點： 1. ASM例項會定期檢查每一個磁碟組的磁碟狀態，是否通訊正常； 2. 這個檢查，只是針對normal和high冗餘模式，對於external冗餘，不會遇到這個錯誤； 3. 預設情況是15s超時，也就是說15s磁碟組還是沒有對ASM例項響應的話，就會dismount磁碟組。 在儲存網路出現問題的情況下，會引發這個錯誤的出現。也就是說，在ASM定期發出檢查資訊的時候，如果磁碟沒有在15s內反饋的話，就認為磁碟已經無法訪問。 實際情況是上面的凌晨2:10時間點正好是做全庫備份時間,估計大量的寫入導致io響應慢 在11.2.0.3.0之後才有這個引數出現，也就是說ASM例項對磁碟超時的檢測是在11.2.0.3之後才出現的 set pages 9999; SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ FROM SYS.x$ksppi x, SYS.x$ksppcv y WHERE x.inst_id = USERENV ('Instance') AND y.inst_id = USERENV ('Instance') AND x.indx = y.indx AND upper(x.ksppinm) like '%ASM_H%'; 顯示如下: _asm_hbeatiowait 15 number of secs to wait for PST Async Hbeat IO return _asm_hbeatwaitquantum 2 quantum used to compute time-to-wait for a PST Hbeat check 在儲存網路條件不是很好的情況下可以設定檢查時間長點,其實在12.1.0.2預設就是120秒了 alter system set "_asm_hbeatiowait"=120 scope=spfile; 重啟asm 繼續觀察

asm例項自動dismount導致rac一個節點當機

相關文章