[重慶思莊每日技術分享]-資料庫alert報錯:ORA-00202、ORA-15081、ORA-27072

xianhua_33發表於2022-04-20

思路分析:

1、發現資料庫當機,檢查alert日誌發現如下出現控制檔案:I/O錯誤

Thu Apr 11 06:40:14 2019

WARNING: Read Failed. group:2 disk:1 AU:675 offset:16384 size:16384

WARNING: failed to read mirror side 1 of virtual extent 0 logical extent 0 of file 260 in group [2.3852408873] from disk DATA_0001 allocation unit 675 reason error; if possible, will try another mirror side

Errors in file /u01/app/oracle/diag/rdbms/jsswgsjk/jsswgsjk1/trace/jsswgsjk1_ckpt_93628.trc:

ORA-00202: control file: '+DATA/jsswgsjk/controlfile/current.260.998936297'

ORA-15081: failed to submit an I/O operation to a disk

ORA-27072: File I/O error

Linux-x86_64 Error: 5: Input/output error

Additional information: 4

Additional information: 1382432

Additional information: -1

Thu Apr 11 06:40:15 2019

WARNING: Read Failed. group:2 disk:1 AU:675 offset:65536 size:16384

2、檢查ASM日誌

-------發生磁碟超時,開始dimountOCR

Thu Apr 11 06:39:29 2019

NOTE: process _b000_+asm1 (31654636) initiating offline of disk 0.3671375779 (OCR_0000) with mask 0x7e in group 3

NOTE: process _b000_+asm1 (31654636) initiating offline of disk 1.3671375780 (OCR_0001) with mask 0x7e in group 3

NOTE: process _b000_+asm1 (31654636) initiating offline of disk 2.3671375781 (OCR_0002) with mask 0x7e in group 3

NOTE: checking PST: grp = 3

GMON checking disk modes for group 3 at 13 for pid 67, osid 31654636

ERROR: no read quorum in group: required 2, found 0 disks

NOTE: checking PST for grp 3 done.

NOTE: initiating PST update: grp = 3, dsk = 0/0xdad4bfa3, mask = 0x6a, op = clear

NOTE: initiating PST update: grp = 3, dsk = 1/0xdad4bfa4, mask = 0x6a, op = clear

NOTE: initiating PST update: grp = 3, dsk = 2/0xdad4bfa5, mask = 0x6a, op = clear

GMON updating disk modes for group 3 at 14 for pid 67, osid 31654636

ERROR: no read quorum in group: required 2, found 0 disks  <<<< 0個磁碟可訪問。

Thu Apr 11 06:39:29 2019

解決方案:

1、綜合以上資訊分析,故障分析總結如下:

Oracle RAC ASM管理磁碟組有一種特有的心跳磁碟監控’ASM PST heartbeat’,這個監控是在oracle 11.2.0.3之後出現,系統預設設至是15s,到12.1.0.2之後oracle把預設值改為了120s。

這個PST

heartbeat:往往發生在IO閃斷/繁忙/CPU繁忙時,PST檢測到同步延遲超過"_asm_hbeatiowait"值時,會通知ORACLE

ASM INSTANCE dismount disk group,造成ASM instance disk group

offline。一般Normal Redundancy或者High Redundancy策略下,超過半數的disk group

offline就會造成Rack腦裂。

我們任何的升級在鏈路切換中,PP一般會hold住 IO 15秒鐘左右再恢復,很大可能性會引起上述timeout問題,在升級之前強烈建議更改此引數值到120。

具體的檢查這個引數的辦法如下,修改為120s後,為確保設定生效,需要重啟CRS服務。

2、檢查引數 “_asm_hbeatiowait” 的值:(檢查為:15)

select ksppinm as "hidden parameter", ksppstvl as "value"

  from x$ksppi

  join x$ksppcv

using (indx)

where ksppinm like '\_%' escape '\'

   and ksppinm like '%asm_hb%'

order by ksppinm;

3、修改方案,在ASM例項下調整

alter system set "_asm_hbeatiowait"=120 scope=spfile;

注意重啟ASM或者CRS


來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/69950318/viewspace-2887947/,如需轉載,請註明出處,否則將追究法律責任。

相關文章