ORA-15042 故障終極恢復----惜分飛
接到一個朋友恢復請求,19個lun的asm 磁碟組,由於其中一個lun有問題,他們進行了增加一個新lun,刪除老lun的方法操作,但是操作一半hang住了(因為壞的lun是底層損壞,無法完成rebalance),然後儲存工程師繼續修復異常lun,非常幸運異常lun修復好了,但是高興過了頭,直接從儲存上刪除了新加入的lun(已經rebalance一部分資料進去了),這個時候asm dg徹底趴下了,不能mount成功,請求恢復支援。由於某種原因,無法從lun層面恢復,只能讓我們提供資料庫層面恢復
Mon Sep 21 19:52:35 2015 SQL> alter diskgroup dg_XFF add disk '/dev/rhdisk116' size 716800M drop disk dg_XFF_0012
NOTE: Assigning number (1,20) to disk ( /dev/rhdisk116 )
NOTE: requesting all-instance membership refresh for group=1
NOTE: initializing header on grp 1 disk DG_XFF_0020 NOTE: requesting all-instance disk validation for group=1
Mon Sep 21 19:52:44 2015 NOTE: skipping rediscovery for group 1 /0xb94738f1 (DG_XFF) on local instance.
NOTE: requesting all-instance disk validation for group=1
NOTE: skipping rediscovery for group 1 /0xb94738f1 (DG_XFF) on local instance.
NOTE: initiating PST update: grp = 1 Mon Sep 21 19:52:44 2015 GMON updating group 1 at 25 for pid 27, osid 12124486
NOTE: PST update grp = 1 completed successfully NOTE: membership refresh pending for group 1 /0xb94738f1 (DG_XFF)
GMON querying group 1 at 26 for pid 18, osid 10092734
NOTE: cache opening disk 20 of grp 1: DG_XFF_0020 path: /dev/rhdisk116
GMON querying group 1 at 27 for pid 18, osid 10092734
SUCCESS: refreshed membership for 1 /0xb94738f1 (DG_XFF)
Mon Sep 21 19:52:47 2015 SUCCESS: alter diskgroup dg_XFF add disk '/dev/rhdisk116' size 716800M drop disk dg_XFF_0012
NOTE: starting rebalance of group 1 /0xb94738f1 (DG_XFF) at power 1
Starting background process ARB0 Mon Sep 21 19:52:47 2015 ARB0 started with pid=28, OS id =10944804
NOTE: assigning ARB0 to group 1 /0xb94738f1 (DG_XFF) with 1 parallel I /O
NOTE: Attempting voting file refresh on diskgroup DG_XFF
Mon Sep 21 20:35:06 2015 |
SQL> ALTER DISKGROUP DG_XFF MOUNT /* asm agent * // * {1:51107:7083} */
NOTE: cache registered group DG_XFF number=1 incarn=0xdd6f975a NOTE: cache began mount (first) of group DG_XFF number=1 incarn=0xdd6f975a
NOTE: Assigning number (1,0) to disk ( /dev/rhdisk10 )
NOTE: Assigning number (1,1) to disk ( /dev/rhdisk11 )
NOTE: Assigning number (1,2) to disk ( /dev/rhdisk16 )
NOTE: Assigning number (1,3) to disk ( /dev/rhdisk17 )
NOTE: Assigning number (1,4) to disk ( /dev/rhdisk22 )
NOTE: Assigning number (1,5) to disk ( /dev/rhdisk23 )
NOTE: Assigning number (1,6) to disk ( /dev/rhdisk28 )
NOTE: Assigning number (1,7) to disk ( /dev/rhdisk29 )
NOTE: Assigning number (1,8) to disk ( /dev/rhdisk33 )
NOTE: Assigning number (1,9) to disk ( /dev/rhdisk34 )
NOTE: Assigning number (1,10) to disk ( /dev/rhdisk4 )
NOTE: Assigning number (1,11) to disk ( /dev/rhdisk40 )
NOTE: Assigning number (1,12) to disk ( /dev/rhdisk41 )
NOTE: Assigning number (1,13) to disk ( /dev/rhdisk45 )
NOTE: Assigning number (1,14) to disk ( /dev/rhdisk46 )
NOTE: Assigning number (1,15) to disk ( /dev/rhdisk5 )
NOTE: Assigning number (1,16) to disk ( /dev/rhdisk52 )
NOTE: Assigning number (1,17) to disk ( /dev/rhdisk53 )
NOTE: Assigning number (1,18) to disk ( /dev/rhdisk57 )
NOTE: Assigning number (1,19) to disk ( /dev/rhdisk58 )
Wed Sep 30 11:08:07 2015 NOTE: start heartbeating (grp 1) GMON querying group 1 at 33 for pid 35, osid 4194488
NOTE: Assigning number (1,20) to disk () GMON querying group 1 at 34 for pid 35, osid 4194488
NOTE: cache dismounting (clean) group 1 /0xDD6F975A (DG_XFF)
NOTE: dbwr not being msg'd to dismount NOTE: lgwr not being msg'd to dismount NOTE: cache dismounted group 1 /0xDD6F975A (DG_XFF)
NOTE: cache ending mount (fail) of group DG_XFF number=1 incarn=0xdd6f975a
NOTE: cache deleting context for group DG_XFF 1 /0xdd6f975a
GMON dismounting group 1 at 35 for pid 35, osid 4194488
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
ERROR: diskgroup DG_XFF was not mounted ORA-15032: not all alterations performed ORA-15040: diskgroup is incomplete ORA-15042: ASM disk "20" is missing from group number "1"
ERROR: ALTER DISKGROUP DG_XFF MOUNT /* asm agent * // * {1:51107:7083} */
|
這裡比較明顯,由於儲存工程師直接刪除了lun,這裡導致磁碟組DG_XFF丟失asm disk 20,使得磁碟組無法直接mount,由於該磁碟組已經進行了較長時間的rebalance,丟失的盤中已經有大量資料(包括後設資料),因此就算修改pst讓磁碟組mount起來(不一定成功),也會丟失大量資料,也不一定可以直接拿出來裡面的資料,如果只是加入盤,但是由於某種原因沒有做rebalance,那我們直接可以通過修改pst,使得磁碟組mount起來。因此對於這樣的情況,我們能夠做的,只能從底層掃描磁碟,生成資料檔案(因為有部分檔案的後設資料在丟失lun之上,如果直接使用現存後設資料資訊,直接拷貝,或者unload資料都會丟失大量資料),然後再進一步unload資料,完成恢復。需要恢復磁碟資訊
grp # dsk# bsize ausize disksize diskname groupname path
---- ---- ----- ------ -------- --------------- --------------- ------------- 1 0 4096 4096K 179200 DG_XFF_0000 DG_XFF /dev/rhdisk10
1 1 4096 4096K 179200 DG_XFF_0001 DG_XFF /dev/rhdisk11
1 2 4096 4096K 179200 DG_XFF_0002 DG_XFF /dev/rhdisk16
1 3 4096 4096K 179200 DG_XFF_0003 DG_XFF /dev/rhdisk17
1 4 4096 4096K 179200 DG_XFF_0004 DG_XFF /dev/rhdisk22
1 5 4096 4096K 179200 DG_XFF_0005 DG_XFF /dev/rhdisk23
1 6 4096 4096K 179200 DG_XFF_0006 DG_XFF /dev/rhdisk28
1 7 4096 4096K 179200 DG_XFF_0007 DG_XFF /dev/rhdisk29
1 8 4096 4096K 179200 DG_XFF_0008 DG_XFF /dev/rhdisk33
1 9 4096 4096K 179200 DG_XFF_0009 DG_XFF /dev/rhdisk34
1 10 4096 4096K 179200 DG_XFF_0010 DG_XFF /dev/rhdisk4
1 11 4096 4096K 179200 DG_XFF_0011 DG_XFF /dev/rhdisk40
1 12 4096 4096K 179200 DG_XFF_0012 DG_XFF /dev/rhdisk41
1 13 4096 4096K 179200 DG_XFF_0013 DG_XFF /dev/rhdisk45
1 14 4096 4096K 179200 DG_XFF_0014 DG_XFF /dev/rhdisk46
1 15 4096 4096K 179200 DG_XFF_0015 DG_XFF /dev/rhdisk5
1 16 4096 4096K 179200 DG_XFF_0016 DG_XFF /dev/rhdisk52
1 17 4096 4096K 179200 DG_XFF_0017 DG_XFF /dev/rhdisk53
1 18 4096 4096K 179200 DG_XFF_0018 DG_XFF /dev/rhdisk57
1 19 4096 4096K 179200 DG_XFF_0019 DG_XFF /dev/rhdisk58
|
這次運氣比較好,丟失的磁碟組只是一個業務磁碟組,而且裡面只有19個表空間,10個分割槽表,因此在資料字典完成的情況下,恢復10個分割槽表(一共6443個分割槽)的資料,整體恢復效果如下:
從整體資料量看恢復比例為:6003.26953/6027.26935*100%=99.6018127%,對於丟失了一個已經rebalance的大部分的lun,依舊能夠恢復如此的資料,整體看非常理想.
如果您遇到此類情況,無法解決請聯絡我們,提供專業ORACLE資料庫恢復技術支援
原文:ORA-15042: ASM disk “N” is missing from group number “M” 故障恢復
相關文章
- postgreSQL 恢復至故障點 精準恢復SQL
- 【故障公告】部落格站點再次出現故障,最終回退 .NET 5.0 恢復正常
- SQLServer異常故障恢復(二)SQLServer
- MySQL資料庫故障恢復MySql資料庫
- 【北亞資料恢復】硬碟壞道故障如何恢復資料?資料恢復硬碟
- 解析ESX SERVER故障資料恢復方法Server資料恢復
- 「分散式技術專題」故障恢復分散式
- redis cluster 叢集故障恢復操作思路Redis
- Oracle ASM故障資料恢復解決方案OracleASM資料恢復
- 恢復伺服器故障硬碟的資料伺服器硬碟
- MySQL 組複製故障恢復的有效策略MySql
- vsan儲存資料恢復過程—虛擬機器故障恢復過程資料恢復虛擬機
- 【伺服器資料恢復】xen server常見故障的資料恢復方案伺服器資料恢復Server
- 伺服器資料恢復-RAID5常見故障的資料恢復方案伺服器資料恢復AI
- 【儲存資料恢復】EqualLogic PS系列儲存磁碟故障的資料恢復案例資料恢復
- 終極找 bug 大法 - 二分
- 記一次Kafka叢集的故障恢復Kafka
- raid5硬碟故障資料恢復過程AI硬碟資料恢復
- vertica單節點故障恢復 Startup Failed, ASR RequiredAIUI
- 記一次自動恢復的支付故障
- 戴爾伺服器raid故障資料恢復伺服器AI資料恢復
- 在Linux中,如何進行系統故障恢復?Linux
- Oracle 目錄許可權丟失故障恢復Oracle
- 資料庫資料恢復-ORACLE資料庫的常見故障&各種故障下的資料恢復可能性資料庫資料恢復Oracle
- raid5常見故障資料恢復方法/伺服器資料恢復常用方法AI資料恢復伺服器
- Windows故障轉移群集(WSFC)的備份和恢復Windows
- 東芝硬碟的故障表現及資料恢復硬碟資料恢復
- 【伺服器資料恢復】HP EVA系列儲存常見故障有哪些?如何恢復資料?伺服器資料恢復
- 【伺服器資料恢復】xen server儲存庫(sr)常見故障的資料恢復方案伺服器資料恢復Server
- 伺服器資料恢復—不同型號伺服器RAID5故障的資料恢復案例伺服器資料恢復AI
- 伺服器資料恢復—伺服器發生故障後怎麼恢復伺服器資料?伺服器資料恢復
- 【伺服器資料恢復】磁碟物理故障導致RAID5崩潰的資料恢復案例伺服器資料恢復AI
- 【伺服器資料恢復】VMFS檔案系統RAID5硬碟故障的資料恢復案例伺服器資料恢復AI硬碟
- 【伺服器資料恢復】戴爾某型號伺服器raid故障的資料恢復案例伺服器資料恢復AI
- EMC CX4-480伺服器riad故障資料恢復伺服器資料恢復
- [20190531]ORA-600 kokasgi1故障模擬與恢復.txt
- 伺服器raid6硬碟故障離線資料恢復伺服器AI硬碟資料恢復
- mysql GTID主從複製故障後不停機恢復同步流程MySql