ASM心跳超時檢測之--Delayed ASM PST heart beats

n-lauren發表於2015-05-14

近日,連續收到ASM磁碟dismount,並且是錯誤“Waited 15 secs for write IO to PST”的問題,這是ASM特有的心跳超時檢測,ASM instance會定期檢查每個asm disk是不是能正常反饋。所以決定針對這個問題,做個小總結。

在文件ASM diskgroup dismount with "Waited 15 secs for write IO to PST" (Doc ID 1581684.1) 中有下面一段描述:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Generally this kind messages comes in ASM alertlog file on below situations,

Delayed ASM PST heart beats on ASM disks in normal or high redundancy diskgroup,
thus the ASM instance dismount the diskgroup.By default, it is 15 seconds.

By the way the heart beat delays are sort of ignored for external redundancy diskgroup.
ASM instance stop issuing more PST heart beat until it succeeds PST revalidation,
but the heart beat delays do not dismount external redundancy diskgroup directly.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
上面描述,可以理解為下面幾點:
1. ASM例項會定期檢查每一個磁碟組的磁碟狀態,是否通訊正常;
2. 這個檢查,只是針對normal和high冗餘模式,對於external冗餘,不會遇到這個錯誤;
3. 預設情況是15s超時,也就是說15s磁碟組還是沒有對ASM例項響應的話,就會dismount磁碟組。

        而遇到這個問題的客戶,都是使用光纖網路儲存,在儲存網路出現問題的情況下,會引發這個錯誤的出現。也就是說,在ASM定期發出檢查資訊的時候,如果磁碟沒有在15s內反饋的話,我就認為磁碟已經無法訪問。
        針對這個錯誤,我嘗試在測試環境測試,由於測試環境是VMware的虛擬機器,在物理層面刪除磁碟,並不會引發這個問題。原因是在同一個主機上的磁碟被異常刪除後,ASM的讀取操作會立即返回系統層面的IO錯誤,而不需要去等待錯誤“Waited 15 secs for write IO to PST”的超時。

      所以,我總結這個錯誤,只會出現在共享的ASM磁碟,不在物理主機的本地,而是在儲存網路中,ASM發出去的檢測資訊,不能及時被反饋,才會出現這個錯誤。這時,可能是儲存主機,儲存網路,甚至儲存磁碟的問題,anyway,我ASM沒有收到我需要的確認資訊,我認為你有問題,如果有問題的磁碟數夠多,達到影響資料完整性了,那我ASM就要dismount這個磁碟組了。

        這裡對於“Waited 15 secs for write IO to PST”錯誤資訊,根據文件1581684.1介紹,是在11.2.0.3.0之後出現的。同時在文件中有描述,如何手動修改這個檢測超時的時間,可以通過引數_asm_hbeatiowait來控制:

alter system set "_asm_hbeatiowait"= scope=spfile sid='*';

為了確認,這個引數是在11.2.0.3之後出現的,我將全部資料庫版本都查詢一遍,具體可以參考下面資訊:
======================10.2===================== 
SQL> select * from v$version; 
BANNER 
---------------------------------------------------------------- 
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - Prod 
PL/SQL Release 10.2.0.5.0 - Production 
CORE 10.2.0.5.0 Production 
TNS for Linux: Version 10.2.0.5.0 - Production 
NLSRTL Version 10.2.0.5.0 - Production 
  
SQL> select ksppinm as "hidden parameter", ksppstvl as "value" from x$ksppi join x$ksppcv using (indx) where ksppinm like '\_%' escape '\' and ksppinm like '%undo%' order by ksppinm; 
hidden parameter value 
-------------------------------------------------------------------------------- ---------- 
_asm_acd_chunks 1 
_asm_allow_only_raw_disks TRUE 
_asm_allow_resilver_corruption FALSE 
_asm_ausize 1048576 
_asm_blksize 4096 
_asm_direct_con_expire_time 120 
_asm_disk_repair_time 14400 
_asm_droptimeout 60 
_asm_emulmax 10000 
_asm_emultimeout 0 
_asm_fob_tac_frequency 3 
hidden parameter value 
-------------------------------------------------------------------------------- ---------- 
_asm_instlock_quota 0 
_asm_kfdpevent 0 
_asm_libraries ufs 
_asm_maxio 1048576 
_asm_skip_resize_check FALSE 
_asm_stripesize 131072 
_asm_stripewidth 8 
_asm_wait_time 18 
_asmlib_test 0 
_asmsid asm 
21 rows selected. 
  
======================11.2.0.1===================== 
sqlplus / as sysdba 
Connected to: 
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production 
With the Partitioning, OLAP, Data Mining and Real Application Testing options 
SQL> select ksppinm as "hidden parameter", ksppstvl as "value" from x$ksppi join x$ksppcv using (indx) where ksppinm like '\_%' escape '\' and ksppinm like '%asm_hb%' order by ksppinm; 
hidden parameter value 
-------------------------------------------------------------------------------- 
_asm_hbeatwaitquantum 2 
  
======================11.2.0.2===================== 
 $ sqlplus / as sysdba 
Connected to: 
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production 
With the Partitioning, Oracle Label Security, OLAP, Data Mining 
and Real Application Testing options 
SQL> select ksppinm as "hidden parameter", ksppstvl as "value" from x$ksppi join x$ksppcv using (indx) where ksppinm like '\_%' escape '\' and ksppinm like '%asm_hb%' order by ksppinm; 
hidden parameter value 
-------------------------------------------------------------------------------- 
_asm_hbeatwaitquantum 2 
  
在11.2.0.3.0之後才有這個引數出現,也就是說ASM例項對磁碟超時的檢測是在11.2.0.3之後才出現的 
======================11.2.0.3===================== 
sys@R11203> select * from v$version; 
BANNER 
-------------------------------------------------------------------------------- 
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production 
SQL> select ksppinm as "hidden parameter", ksppstvl as "value" from x$ksppi join x$ksppcv using (indx) where ksppinm like '\_%' escape '\' and ksppinm like '%undo%' order by ksppinm; 
hidden parameter value 
hidden parameter value 
-------------------------------------------------- -------------------- 
_asm_hbeatiowait 15 
_asm_hbeatwaitquantum 2 
  
======================11.2.0.4===================== 
SQL> select * from v$version; 
BANNER 
-------------------------------------------------------------------------------- 
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - Production 
SQL> select ksppinm as "hidden parameter", ksppstvl as "value" from x$ksppi join x$ksppcv using (indx) where ksppinm like '\_%' escape '\' and ksppinm like '%undo%' order by ksppinm; 
hidden parameter value 
-------------------------------------------------------------------------------- --------- 
_asm_hbeatiowait 15 <<<<<<<<<<<<<<<<<<< _asm_hbeatwaitquantum 2 
  
 ======================12.1.0.1===================== 
 $ sqlplus / as sysdba 
Connected to: 
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production 
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options 
SQL> select ksppinm as "hidden parameter", ksppstvl as "value" from x$ksppi join x$ksppcv using (indx) where ksppinm like '\_%' escape '\' and ksppinm like '%asm_hb%' order by ksppinm; 
hidden parameter value 
-------------------------------------------------------------------------------- 
_asm_hbeatiowait 15 
_asm_hbeatwaitquantum 2 
  
在12.1.0.2之後,這個引數預設值被調整為120s 
 ======================12.1.0.2===================== 
 $ sqlplus / as sysdba 
  
Connected to: 
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production 
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options 
SQL> select ksppinm as "hidden parameter", ksppstvl as "value" from x$ksppi join x$ksppcv using (indx) where ksppinm like '\_%' escape '\' and ksppinm like '%asm_hb%' order by ksppinm; 
hidden parameter value 
-------------------------------------------------------------------------------- 
_asm_hbeatiowait 120 
_asm_hbeatwaitquantum 2

希望總結的這個知識點,對你有幫助。日常中,經常感嘆,這個問題很簡單,但是不sure,測試過後,記錄下來,以備查詢。

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/22990797/viewspace-1655015/,如需轉載,請註明出處,否則將追究法律責任。

相關文章