RAC共享磁碟物理路徑故障導致OCR、Votedisk所在ASM磁碟組不可訪問的案例分析
客戶的環境是兩臺IBM X3850,安裝Oracle Linux 6.x x86_64bit的作業系統部署的Oracle 11.2.0.4.0 RAC Database,共享儲存是EMC,使用了EMC vplex虛擬化軟體對儲存做了映象保護,作業系統安裝了EMC原生的多路徑軟體。故障的現象是當vplex內部發生切換時,RAC其中一個節點的OCR和Votedisk所在的磁碟組變得不可訪問,導致ora.crsd服務離線,Grid Infrastrasture叢集堆疊宕掉,但是該節點的資料庫例項不受影響,但不再接受外部新的連線,在這個過程中另外一個節點完全不受影響。下面是相關的日誌資訊:
1.作業系統日誌:
Mar 18 08:25:48 dzqddb01 kernel: Error:Mpx:Path Bus 3 Tgt 3 Lun 4 to CKM00142000957 is dead.
Mar 18 08:25:48 dzqddb01 kernel: Error:Mpx:Path Bus 3 Tgt 3 Lun 2 to CKM00142000957 is dead.
Mar 18 08:25:48 dzqddb01 kernel: Error:Mpx:Path Bus 3 Tgt 3 Lun 3 to CKM00142000957 is dead.
Mar 18 08:25:48 dzqddb01 kernel: Error:Mpx:Path Bus 3 Tgt 3 Lun 1 to CKM00142000957 is dead.
Mar 18 08:25:48 dzqddb01 kernel: Error:Mpx:Path Bus 3 Tgt 3 Lun 0 to CKM00142000957 is dead.
Mar 18 08:25:48 dzqddb01 kernel: Error:Mpx:Path Bus 3 Tgt 3 Lun 11 to CKM00142000957 is dead.
Mar 18 08:25:48 dzqddb01 kernel: Error:Mpx:Path Bus 3 Tgt 3 Lun 12 to CKM00142000957 is dead.
Mar 18 08:25:48 dzqddb01 kernel: Error:Mpx:Path Bus 3 Tgt 3 Lun 10 to CKM00142000957 is dead.
Mar 18 08:25:48 dzqddb01 kernel: Error:Mpx:Path Bus 3 Tgt 3 Lun 9 to CKM00142000957 is dead.
Mar 18 08:25:48 dzqddb01 kernel: Error:Mpx:Path Bus 3 Tgt 3 Lun 8 to CKM00142000957 is dead.
Mar 18 08:25:48 dzqddb01 kernel: Error:Mpx:Path Bus 3 Tgt 3 Lun 7 to CKM00142000957 is dead.
Mar 18 08:25:48 dzqddb01 kernel: Error:Mpx:Path Bus 3 Tgt 3 Lun 5 to CKM00142000957 is dead.
Mar 18 08:25:48 dzqddb01 kernel: Error:Mpx:Path Bus 3 Tgt 3 Lun 6 to CKM00142000957 is dead.
Mar 18 08:25:48 dzqddb01 kernel: Error:Mpx:Bus 3 to VPLEX CKM00142000957 port CL2-00 is dead.
Mar 18 08:25:48 dzqddb01 kernel: Error:Mpx:Path Bus 3 Tgt 2 Lun 1 to CKM00142000957 is dead.
Mar 18 08:25:48 dzqddb01 kernel: Error:Mpx:Path Bus 3 Tgt 2 Lun 12 to CKM00142000957 is dead.
Mar 18 08:25:48 dzqddb01 kernel: Error:Mpx:Path Bus 3 Tgt 2 Lun 11 to CKM00142000957 is dead.
Mar 18 08:25:48 dzqddb01 kernel: Error:Mpx:Path Bus 3 Tgt 2 Lun 10 to CKM00142000957 is dead.
Mar 18 08:25:48 dzqddb01 kernel: Error:Mpx:Path Bus 3 Tgt 2 Lun 7 to CKM00142000957 is dead.
Mar 18 08:25:48 dzqddb01 kernel: Error:Mpx:Path Bus 3 Tgt 2 Lun 4 to CKM00142000957 is dead.
Mar 18 08:25:48 dzqddb01 kernel: Error:Mpx:Path Bus 3 Tgt 2 Lun 8 to CKM00142000957 is dead.
Mar 18 08:25:48 dzqddb01 kernel: Error:Mpx:Path Bus 3 Tgt 2 Lun 9 to CKM00142000957 is dead.
Mar 18 08:25:48 dzqddb01 kernel: Error:Mpx:Path Bus 3 Tgt 2 Lun 5 to CKM00142000957 is dead.
Mar 18 08:25:48 dzqddb01 kernel: Error:Mpx:Path Bus 3 Tgt 2 Lun 3 to CKM00142000957 is dead.
Mar 18 08:25:48 dzqddb01 kernel: Error:Mpx:Path Bus 3 Tgt 2 Lun 6 to CKM00142000957 is dead.
Mar 18 08:25:48 dzqddb01 kernel: Error:Mpx:Path Bus 3 Tgt 2 Lun 2 to CKM00142000957 is dead.
Mar 18 08:25:48 dzqddb01 kernel: Error:Mpx:Path Bus 3 Tgt 2 Lun 0 to CKM00142000957 is dead.
Mar 18 08:25:48 dzqddb01 kernel: Error:Mpx:Bus 3 to VPLEX CKM00142000957 port CL2-04 is dead.
從作業系統日誌可以看出,Mar 18 08:25:48的時候port CL2-00和port CL2-04兩個鏈路dead了。
2.ASM日誌:
Fri Mar 18 08:25:59 2016
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 3. <<<< 幾乎在和作業系統報錯的相同時間,ASM開始檢查所有磁碟的PST(partnership state table),ASM的等待時間為15秒。
WARNING: Waited 15 secs for write IO to PST disk 1 in group 3.
WARNING: Waited 15 secs for write IO to PST disk 2 in group 3.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 3.
WARNING: Waited 15 secs for write IO to PST disk 1 in group 3.
WARNING: Waited 15 secs for write IO to PST disk 2 in group 3.
Fri Mar 18 08:25:59 2016
NOTE: process _b000_+asm1 (66994) initiating offline of disk 0.3190888900 (OCRVDISK_0000) with mask 0x7e in group 3 <<<< group 3是OCR和Votedisk所在的磁碟組。
NOTE: process _b000_+asm1 (66994) initiating offline of disk 1.3190888899 (OCRVDISK_0001) with mask 0x7e in group 3
NOTE: process _b000_+asm1 (66994) initiating offline of disk 2.3190888898 (OCRVDISK_0002) with mask 0x7e in group 3
NOTE: checking PST: grp = 3
GMON checking disk modes for group 3 at 10 for pid 48, osid 66994
ERROR: no read quorum in group: required 2, found 0 disks <<<< 由於OCR和Votedisk所在的磁碟組是Normal冗餘級別,3個ASM磁碟,要求2個可訪問,但是實際是0個可訪問。
NOTE: checking PST for grp 3 done.
NOTE: initiating PST update: grp = 3, dsk = 0/0xbe3119c4, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 3, dsk = 1/0xbe3119c3, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 3, dsk = 2/0xbe3119c2, mask = 0x6a, op = clear
GMON updating disk modes for group 3 at 11 for pid 48, osid 66994
ERROR: no read quorum in group: required 2, found 0 disks <<<< 0個磁碟可訪問。
Fri Mar 18 08:25:59 2016
NOTE: cache dismounting (not clean) group 3/0x3D81E95D (OCRVDISK)
WARNING: Offline for disk OCRVDISK_0000 in mode 0x7f failed. <<<< OCR和Votedisk所在的磁碟組對應的所有磁碟都離線。
WARNING: Offline for disk OCRVDISK_0001 in mode 0x7f failed.
WARNING: Offline for disk OCRVDISK_0002 in mode 0x7f failed.
NOTE: messaging CKPT to quiesce pins Unix process pid: 66996, image: oracle@dzqddb01 (B001)
Fri Mar 18 08:25:59 2016
NOTE: halting all I/Os to diskgroup 3 (OCRVDISK) <<<< OCRVDISK磁碟組下面的所有I/O都不可用。
Fri Mar 18 08:25:59 2016
NOTE: LGWR doing non-clean dismount of group 3 (OCRVDISK)
NOTE: LGWR sync ABA=11.69 last written ABA 11.69
Fri Mar 18 08:25:59 2016
kjbdomdet send to inst 2
detach from dom 3, sending detach message to inst 2
Fri Mar 18 08:25:59 2016
List of instances:
1 2
Dirty detach reconfiguration started (new ddet inc 1, cluster inc 96)
Global Resource Directory partially frozen for dirty detach
* dirty detach - domain 3 invalid = TRUE
Fri Mar 18 08:25:59 2016
NOTE: No asm libraries found in the system
2 GCS resources traversed, 0 cancelled
Dirty Detach Reconfiguration complete
Fri Mar 18 08:25:59 2016
WARNING: dirty detached from domain 3
NOTE: cache dismounted group 3/0x3D81E95D (OCRVDISK)
SQL> alter diskgroup OCRVDISK dismount force /* ASM SERVER:1031924061 */ <<<< dismount OCRVDISK磁碟組。
Fri Mar 18 08:25:59 2016
NOTE: cache deleting context for group OCRVDISK 3/0x3d81e95d
GMON dismounting group 3 at 12 for pid 51, osid 66996
NOTE: Disk OCRVDISK_0000 in mode 0x7f marked for de-assignment
NOTE: Disk OCRVDISK_0001 in mode 0x7f marked for de-assignment
NOTE: Disk OCRVDISK_0002 in mode 0x7f marked for de-assignment
NOTE:Waiting for all pending writes to complete before de-registering: grpnum 3
ASM Health Checker found 1 new failures
3.Clusterware告警日誌:
2016-03-18 11:53:19.394:
[crsd(47973)]CRS-1006:The OCR location +OCRVDISK is inaccessible. Details in /u01/app/11.2.0/grid/log/dzqddb01/crsd/crsd.log. <<<< 時間上要比OCRVDISK被dismount的時間晚很多。
2016-03-18 11:53:38.437:
[/u01/app/11.2.0/grid/bin/oraagent.bin(48283)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/oraagent_oracle' disconnected from server. Details at (:CRSAGF00117:) {0:7:121} in /u01/app/11.2.0/grid/log/dzqddb01/agent/crsd/oraagent_oracle/oraagent_oracle.log.
2016-03-18 11:53:38.437:
[/u01/app/11.2.0/grid/bin/scriptagent.bin(80385)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/scriptagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:9:7} in /u01/app/11.2.0/grid/log/dzqddb01/agent/crsd/scriptagent_grid/scriptagent_grid.log.
2016-03-18 11:53:38.437:
[/u01/app/11.2.0/grid/bin/orarootagent.bin(48177)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) {0:5:3303} in /u01/app/11.2.0/grid/log/dzqddb01/agent/crsd/orarootagent_root/orarootagent_root.log.
2016-03-18 11:53:38.437:
[/u01/app/11.2.0/grid/bin/oraagent.bin(48168)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/oraagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:1:7} in /u01/app/11.2.0/grid/log/dzqddb01/agent/crsd/oraagent_grid/oraagent_grid.log.
2016-03-18 11:53:38.442:
[ohasd(47343)]CRS-2765:Resource 'ora.crsd' has failed on server 'dzqddb01'. <<<< ora.crsd 已經OFFLINE。
2016-03-18 11:53:39.773:
[crsd(45323)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/dzqddb01/crsd/crsd.log.
2016-03-18 11:53:39.779:
[crsd(45323)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/dzqddb01/crsd/crsd.log. <<<< 物理裝置不可訪問。
2016-03-18 11:53:40.470:
[ohasd(47343)]CRS-2765:Resource 'ora.crsd' has failed on server 'dzqddb01'.
這裡我們會產生一個疑問,為什麼ora.crsd掛掉,但是ora.cssd沒有OFFLINE(透過crsctl stat res -t -init可以確認ora.cssd沒有掛掉,資料庫例項還正常執行,節點並沒有被踢出去),原因在於OCRVDISK對應的磁碟只是短暫的不可訪問,cssd程式是直接訪問OCRVDISK對應的3個ASM磁碟,並不依賴於OCRVDISK磁碟組是MOUNT狀態,並且Clusterware預設的磁碟心跳超時時間為200秒,所以cssd程式沒有出現問題。
由此我們又會有更多的疑問,為什麼RAC的另外一個節點沒有出現故障?為什麼只有OCRVDISK磁碟組dismount,其他的磁碟組都正常?
在出現問題後重啟has服務之後該節點即可恢復正常,加上其他磁碟組,其他節點並沒有出現故障,所以可以簡單的判斷共享儲存沒有太大的問題,只是鏈路斷掉之後有短時間的不可訪問,尋找問題的關鍵是ASM例項日誌中的這個資訊:WARNING: Waited 15 secs for write IO to PST disk,15秒的時間是否過短影響了OCRVDISK的離線,下面是MOS上的解釋:
Generally this kind messages comes in ASM alertlog file on below situations,
Delayed ASM PST heart beats on ASM disks in normal or high redundancy diskgroup, <<<< 在normal或high冗餘度的磁碟組上的ASM磁碟被執行延遲ASM PST心跳檢查。
thus the ASM instance dismount the diskgroup.By default, it is 15 seconds. <<<< 檢查失敗,ASM例項會dismount磁碟組,預設的超時時間為15秒。
By the way the heart beat delays are sort of ignored for external redundancy diskgroup. <<<< PST heartbeat檢查會忽略外部冗餘的磁碟組。
ASM instance stop issuing more PST heart beat until it succeeds PST revalidation,
but the heart beat delays do not dismount external redundancy diskgroup directly. <<<< PST heartbeat檢查即使超過了15秒也不會dismount外部冗餘的磁碟組。
The ASM disk could go into unresponsiveness, normally in the following scenarios: <<< ASM磁碟出現無反應的情況通常是由於以下幾個原因:
+ During path 'failover' in a multipath set up <<<< 2.具有裝置下的物理路徑發生failover。
+ Server load, or any sort of storage/multipath/OS maintenance <<<< 3.系統或裝置的維護操作。
透過上面的這段描述,能大概的解釋出現問題的原因,由於儲存鏈路斷掉了2條(可能發生failover),導致聚合後的共享儲存裝置短暫的不可訪問,OCRVDISK是Normal冗餘度的磁碟組,ASM會執行PST heartbeat檢查,由於超過15秒OCRVDISK對應的磁碟組不可訪問導致ASM將OCRVDISK直接dismount,進而導致OCR檔案不可訪問,導致crs服務OFFLINE,由於cssd的磁碟心跳超時時間為200秒,且是直接訪問ASM磁碟,不經過ASM磁碟組,所以css服務沒有受影響,hasd高可用堆疊依然正常工作,叢集節點未被踢出,資料庫例項正常工作。
Oracle給出了在資料庫層面解決這個問題的辦法:
If
you can not keep the disk unresponsiveness to below 15 seconds, then
the below parameter can be set in the ASM instance ( on all the Nodes of
RAC ):
_asm_hbeatiowait <<<< 該引數指定了PST heartbeat超時時間。
As per internal bug 17274537 , based on internal testing the value should be increased to 120 secs, which is fixed in 12.1.0.2 <<<< 從12.1.0.2開始,該引數預設值被增加到了120秒。
Run below in asm instance to set desired value for _asm_hbeatiowait
alter system set "_asm_hbeatiowait"= scope=spfile sid='*'; <<<< 執行這條命令修改ASM例項的該引數,之後重啟ASM例項,CRS。
And then restart asm instance / crs, to take new parameter value in effect.
為了避免類似的問題,可以將OCR映象到不同的ASM磁碟組,這樣將進一步的提高ora.crsd服務的可用性。
更詳細的內容請參考文章:《ASM diskgroup dismount with "Waited 15 secs for write IO to PST" (文件 ID 1581684.1)》
--end--
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/23135684/viewspace-2061397/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 遷移ocr/votedisk/asm spfile所在磁碟組ASM
- Oracle 11g RAC ASM VOTEDISK所在磁碟全部丟失重大故障恢復OracleASM
- ASM之OCR所在磁碟組損壞後的恢復ASM
- 存貯在asm中的ocr/votedisk/asm磁碟組損害的恢復ASM
- ASM磁碟組故障導致資料庫不能起來ASM資料庫
- 通過FTP訪問ASM磁碟組FTPASM
- ORACLE RAC重建ASM磁碟組OracleASM
- UDEV方式配置Oracle RAC ASM共享磁碟devOracleASM
- 針對11.2 RAC丟失OCR和Votedisk所在ASM Diskgroup的恢復手段ASM
- ftp到ASM磁碟組路徑錯誤的問題FTPASM
- RAC線上替換OCR、DATA、FRA等ASM磁碟ASM
- asm 磁碟組 增刪磁碟組ASM
- ASM重新命名包含OCR/vote file的磁碟組ASM
- 【RAC】在ESX 上安裝asm 共享磁碟ASM
- 11G RAC 為 ASM 磁碟組增加一個磁碟。(AIX)ASMAI
- RAC資料庫新增ASM磁碟組(1)資料庫ASM
- Oracle10g RAC ASM磁碟組[zt]OracleASM
- Oracle RAC日常運維-DATA磁碟組故障Oracle運維
- 11.2環境ASM例項spfile放在ASM磁碟組的訪問方式ASM
- ASM磁碟組限制ASM
- RAC環境ASM磁碟組間修改spfile的位置ASM
- Oracle RAC環境下ASM磁碟組擴容OracleASM
- 【問題處理】因ASM磁碟組空間不足導致資料庫例項無法啟動的故障處理ASM資料庫
- ASM磁碟組更換磁碟的操作方法ASM
- ASM磁碟故障診斷(二)ASM
- ASM磁碟故障診斷(一)ASM
- oracle 10g rac+asm 歸檔路徑磁碟組空間滿問題處理Oracle 10gASM
- ORACLE 11G RAC 更改OCR磁碟組extenrnal 為normalOracleORM
- 遷移ASM磁碟組ASM
- 遷移OCR和VotingDisk並刪除原ASM磁碟組ASM
- Oracle RAC ASM磁碟組擴容時遇到的VIP漂移OracleASM
- 在Oracle10g RAC下新增ASM磁碟組OracleASM
- 記一次儲存問題導致的rac故障案例
- ASM中磁碟組許可權問題ASM
- Oracle rac on vm--共享磁碟Oracle
- 新增磁碟多連路磁碟併為ASM磁碟組擴容ASM
- 修改ASM磁碟組的屬性ASM
- 在ASM磁碟組中刪除一個磁碟ASM