一. 準備知識

RAC ASM由於其高度的封裝性，使得我們很難知道窺探其內部的原理。ASM如果一旦出現問題，通常都很難處理。即便在有很完備的RMAN備份的情況下，恢復起來都可能需要很長的時間。

而ASM 中最為脆弱的又是ASM disk header。如果disk header邏輯損壞了，即corrupt了，整個disk group將不能夠mount，依賴於ASM例項的database也將不能夠startup。

在RAC中增刪節點後asm的disk header就很容易出現問題。如果是因為disk header的原因而需要重建整個diskgroup，進而用RMAN恢復就會浪費很多時間，所以最好的方法就是需要定期對ASM diskheader進行dd備份。

Oracle ASM 系列小結

http://blog.csdn.net/tianlesoftware/article/details/6364422

Oracle ASM 詳解

http://blog.csdn.net/tianlesoftware/article/details/5314541

官網的說明：

Introduction to Automatic StorageManagement (ASM)

1.1 什麼是ASM（Automatic StorageManagement ）

ASM是一個管理卷組或者檔案系統的軟體。它是透過ASM instance 來實現對磁碟的管理。這個和Oracle instance 很類似。 ASM instance 也有SGA和background processes組成. 但是ASM 相對的task 很少，所以它的SGA 相對較小。

1.1.1 ASM instance

ASM instance 維護以下ASMmetadata：

（1）The disks that belong to a disk group

（2）The amount of space that is available in a disk group

（3）The filenames of the files in a disk group

（4）The location of disk group datafile s

（5）A redo log that records information about atomically changing datablocks

ASM instace 透過維護asm metadata 來在file layout層面上對database instance提供支援。一個ASM instance 可以對應多個database instance。

準確的說ASM 的Metadata 可以分為3種：

（1）diskgroup metadata: files with NUMBER_KFFIL <256 ASM metadataandASMlog files. These files have high redundancy (3 copies) and block size=4KB.

1）ASM log filesare used for ASM instance and crash recovery when a crash happens with metadataoperations (seebelow COD and ACD)

2）at diskgroupcreation 6 files withmetadata are visible from x$kffil

（2）disk metadata: disk headers (typically the first 2 AU ofeach disk)are not listed in x$kffil (they appear as file number 0 in x$kfdat).Containdisk membership information. This part of the disk has to be 'zeroedout'before the disk can be added to ASM diskgroup as a new disk.

（3）file metadata: 3 mirrored extents with file metadata,visible fromx$kffxp and x$kfdat

1.1.2 ASM Disk groups

Diskgroups 由多個disks 組成，每個disk 就是我們的的一個分割槽。 ASM disk groups包含的metadata資訊就是ASM instance 管理的資訊。

大多數情況只需要建立很少的disk groups，一般是2個groups，很少有3個。

為了保護disks 上的資料，Oracle 對disk groups 有3種冗餘方式:

（1）external redundancy表示Oracle不幫你管理映象，功能由外部儲存系統實現，比如透過RAID技術。
（2）normalredundancy（預設方式）表示Oracle提供2路映象來保護資料。

（3）high redundancy表示Oracle提供3路映象來保護資料。

如果使用ASM 的冗餘，就是透過 ASMfailure group 來實現。ASM使用的映象演算法並不是映象整個disk，而是作extent級的映象。所以很明顯如果為各個failure group使用不同容量的disk是不明智的，因為這樣在Oracle分配下一個extent的時候可能就會出現問題。

在normal redundancy模式下，ASM環境中每分配一個extent都會有一個primary copy和一個second copy，ASM的演算法保證了second copy和primary copy一定是在不同的failure group中，這就是failure group的意義。透過這個演算法，ASM保證了即使一個failuregroup中的所有disk都損壞了，資料也是毫髮無傷的。

Oracle在分配extent的時候，所有failure group中的這個將擁有相同資料的extent稱為一個extent set，當Oracle將資料寫入檔案的時候，primary copy可能在任何一個failure group中，而second copy則在另外的failure group中，當Oracle讀取資料的時候，除非是primary copy不可用，否則將優先從primary copy中讀取資料，透過這種寫入無序，讀取有序的演算法，Oracle保證了資料讀取儘量分佈在多個disk中。

因為公用一個硬體模組的磁碟很可能會同時損壞或者失效，所以通常我們在設計failuregroup時，應該把一個大的盤陣中在一個tray中的磁碟放在一個failuregroup中，這樣我們就可以拿走一個tray，失效這個failure group，然後換上新的tray和磁碟，這跟RAID的思想是一樣的。

ASM的冗餘方式是在建立disk groups時指定的，一經設定就無法更改，如果我們想把normal redundancy改為high redundancy就只能是建立一個新的failure group，然後把舊failure group中的檔案透過RMAN或者DBMS_FILE_TRANSFER的方法移動到新failure group中去。

如果在建立disk groups時，如果沒有建立failure groups，即使沒有顯式指定，failure groups也是始終會建立的。在這種情況下，每個disk都屬於一個failure group，在建立磁碟組的時候，failure group也會預設建立，名稱就是disk的名字。

在我的blog：

Oracle ASM 相關的檢視(V$)和資料字典（X$）

http://blog.csdn.net/tianlesoftware/article/details/6733039

裡面提到了一些與ASM 相關的檢視，可以透過v$asm_diskgroup 來檢視groups 的資訊。

SYS@+ASM2(rac2)> desc v$asm_diskgroup

Name Null? Type

------------------------------------------------- ----------------------------

GROUP_NUMBER NUMBER

NAME VARCHAR2(30)

SECTOR_SIZE NUMBER

BLOCK_SIZE NUMBER

ALLOCATION_UNIT_SIZE NUMBER

STATE VARCHAR2(11)

TYPE VARCHAR2(6)

TOTAL_MB NUMBER

FREE_MB NUMBER

REQUIRED_MIRROR_FREE_MB NUMBER

USABLE_FILE_MB NUMBER

OFFLINE_DISKS NUMBER

UNBALANCED VARCHAR2(1)

COMPATIBILITY VARCHAR2(60)

DATABASE_COMPATIBILITY VARCHAR2(60)

SYS@+ASM2(rac2)> select group_number,name,allocation_unit_size,total_mb from v$asm_diskgroup;

GROUP_NUMBER NAME ALLOCATION_UNIT_SIZE TOTAL_MB

------------ -------------------------------------------------- ----------

1 DATA 1048576 11993

2 FRA 1048576 7993

這裡我們分配了2個disk groups，啟動AU 大小為1M。即預設值，關於AU 下節有說明。

1.1.3 ASM Disks

ASM disk 組成disk group，在OS 上的表現就是每個disk 對應一個分割槽。 ASM disks 由extent 組成，而每個extent 又由一個或者多個AU 組成。

Allocation Units

Every ASM disk is divided into allocation units (AU). An AU is the fundamental unitof allocation within a disk group. A file extent consists of one or more AU. AnASM file consists of one or more file extents.

When you create a disk group, you can setthe ASM AU size to be between 1 MB and 64 MB in powers of two, such as, 1, 2, 4,8, 16, 32, or 64. Larger AU sizes typically provide performance advantages fordata warehouse applications that use large sequential reads.

預設的AU 大小是1M。這個在上節透過v$asm_diskgroup 檢視可以檢視，而已可以檢視指定AU的引數：_asm_ausize

關於這個引數具體檢視方法，參考：

Oracle ASM 相關的檢視(V$)和資料字典（X$）

http://blog.csdn.net/tianlesoftware/article/details/6733039

中的第八小節。

我們也可以透過v$asm_disk檢視來檢視disk 的資訊：

SYS@+ASM2(rac2)> desc v$asm_disk

Name Null? Type

------------------------------------------------- ----------------------------

GROUP_NUMBER NUMBER

DISK_NUMBER NUMBER

COMPOUND_INDEX NUMBER

INCARNATION NUMBER

MOUNT_STATUS VARCHAR2(7)

HEADER_STATUS VARCHAR2(12)

MODE_STATUS VARCHAR2(7)

STATE VARCHAR2(8)

REDUNDANCY VARCHAR2(7)

LIBRARY VARCHAR2(64)

TOTAL_MB NUMBER

FREE_MB NUMBER

NAME VARCHAR2(30)

FAILGROUP VARCHAR2(30)

LABEL VARCHAR2(31)

PATH VARCHAR2(256)

UDID VARCHAR2(64)

PRODUCT VARCHAR2(32)

CREATE_DATE DATE

MOUNT_DATE DATE

REPAIR_TIMER NUMBER

READS NUMBER

WRITES NUMBER

READ_ERRS NUMBER

WRITE_ERRS NUMBER

READ_TIME NUMBER

WRITE_TIME NUMBER

BYTES_READ NUMBER

BYTES_WRITTEN NUMBER

SYS@+ASM2(rac2)> select group_number,disk_number,name,path from v$asm_disk;

GROUP_NUMBER DISK_NUMBER NAME PATH

------------ ----------------------------------------- ------------------------

1 0 DATA /dev/mapper/datap1

2 0 FRA_0000 /dev/mapper/frap1

1.1.4 ASM 檔案的命名規則說明

ASM檔名字的格式是固定的：+group/dbname/file type/tag.file.incarnation

在建立db時系統自動建立的幾個表空間(system,undotbs,sysaux,users)對應的都是真實的資料檔案，即ASM 檔案預設的命名格式。而且這個資訊都寫到了控制檔案裡。如果我們使用別名的話，會方便很多。對於這些建立資料庫時自動建立的表空間，我們要他們使用別名，除了手工建立對應別名外，還需要重建控制檔案，並且在重建時，datafile 裡寫別名的資訊。這樣資料庫也就使用別名了。

SYS@anqing2(rac2)> select file_id,file_name,AUTOEXTENSIBLE from dba_data_files order by 1;

FILE_ID FILE_NAME AUTOEXTENS

-------------------------------------------------- ----------

1 +DATA/anqing/datafile/system01.dbf YES

2 +DATA/anqing/datafile/undotbs01.dbf YES

3 +DATA/anqing/datafile/sysaux01.dbf YES

4 +DATA/anqing/datafile/users.273.75154823 YES

5 +DATA/anqing/datafile/undotbs02.dbf YES

6 +DATA/anqing/datafile/system02.dbf YES

7 +DATA/anqing/datafile/dave01.dbf YES

8 +DATA/anqing/datafile/test01.dbf YES

這裡我使用了別名，所以只有user 表空間是預設的ASM名稱，我們用ASMCMD 命令來驗證一下這個：

[oracle@rac2 ~]$ export ORACLE_SID=+ASM2

[oracle@rac2 ~]$ asmcmd

ASMCMD> ls

DATA/

FRA/

ASMCMD> cd DATA

ASMCMD> ls

ANQING/

DAVE/

DB_UNKNOWN/

RAC/

ASMCMD> cd ANQING

ASMCMD> ls

CONTROLFILE/

DATAFILE/

ONLINELOG/

PARAMETERFILE/

TEMPFILE/

ASMCMD> cd DATAFILE

ASMCMD> pwd

+DATA/ANQING/DATAFILE

ASMCMD> ls

DAVE.285.755349075

SYSAUX.275.751548237

SYSTEM.276.751548261

SYSTEM.280.755038499

TEST.286.755567335

UNDOTBS1.274.751548233

UNDOTBS2.281.751559213

USERS.273.751548233

dave01.dbf

sysaux01.dbf

system01.dbf

system02.dbf

test01.dbf

undotbs01.dbf

undotbs02.dbf

ASMCMD>

從上面的結果我們可以看到別名和原始名稱的對應關係。

連上ASM 例項，用sqlplus的查詢進一步確認一下：

[oracle@rac2 ~]$ exportORACLE_SID=+ASM2

[oracle@rac2 ~]$ sqlplus / as sysdba;

SQL*Plus: Release 10.2.0.4.0 - Productionon Tue Aug 30 15:31:03 2011

Connected to:

Oracle Database 10g Enterprise EditionRelease 10.2.0.4.0 - Production

With the Partitioning, Real ApplicationClusters, OLAP, Data Mining

and Real Application Testing options

SYS@+ASM2(rac2)> select name,file_numberfrom v$asm_alias order by 2;

NAME FILE_NUMBER

-----------------------------------------------------------

SYSTEM.256.746634087 256

Current.256.746634203 256

SYSAUX.257.746634087 257

group_1.257.746989321 257

UNDOTBS1.258.746634089 258

group_2.258.746989329 258

group_3.259.746989339 259

USERS.259.746634089 259

Current.260.746634201 260

group_4.260.746989347 260

group_1.261.746989315 261

以上內容簡單的說明了一下什麼是ASM，以及ASM 檔案的分配。即：

ASMinstance 管理ASM Diskgroups，disk groups 由disk 組成，每個disk 由多個extent 組成，每個extent又由多個AU組成。

這個就是ASM 層面上的一個組成。那麼對於DB instance，它只能看到disk groups 這個層面。當我們建立datafile 的時候，就是放到對應的asm disk groups裡。至於disk group 內部的balance 和 stripe 就交給ASM instance 來處理。

1.2 ASM disk header 具體內容

在1.1 節裡，我們知道了asm disk 與asm 的關係。 ASM 中最脆弱的就是ASM disk header。如果disk header邏輯損壞了，即corrupt了，整個disk group將不能夠mount，依賴於ASM例項的database也將不能夠startup。

可以使用KFED 命令或者BBED 命令來檢視asm disk header 裡的具體內容：

Oracle KFED 和 KFOD 工具說明

http://blog.csdn.net/tianlesoftware/article/details/6729950

Oracle BBED 工具說明

http://blog.csdn.net/tianlesoftware/article/details/5006580

Oracle bbed 五個實用示例

http://blog.csdn.net/tianlesoftware/article/details/6684505

使用如下語句檢視：

SYS@+ASM2(rac2)> select group_kfdatgroup#,FNUM_KFDAT file#, sum(1) AU_used from x$kfdat where v_kfdat='V' group bygroup_kfdat,FNUM_KFDAT,v_kfdat;

GROUP# FILE# AU_USED

---------- ---------- ----------

1 0 2 --這裡儲存的就是我們的disk header

1 1 2

1 2 1

1 3 85

1 4 2

1 5 1

1 6 1

1 256 522

1 257 602

1 258 337

1 259 8

......

1 286 51

1 1048575 103

2 0 2 --這裡儲存的就是我們的disk header

2 1 2

2 2 1

2 3 85

2 4 2

GROUP# FILE# AU_USED

---------- ---------- ----------

2 5 1

2 6 1

2 400 15

2 256 16

2 257 56

......

2 429 8

2 1048575 71

74 rows selected.

SYS@+ASM2(rac2)>

以上SQL 顯示，在每個disk groups 上，都有file# 從0-6 的資訊，並且顯示了該File 佔用的AU 大小。這裡的資訊就是我們需要關注的資訊。關於這7個File#的說明如下：

（1）. File#0, AU=0: disk header (disk name, etc), Allocation Table (AT)andFree Space Table (FST)

（2）. File#0, AU=1: PartnerStatus Table (PST)

（3）. File#1: File Directory(files and their extent pointers)

（4）. File#2: Disk Directory

（5）. File#3: Active ChangeDirectory (ACD) The ACD is analogous to a redolog, where changes to themetadata are logged. Size=42MB * number of instances

（6）. File#4: Continuing OperationDirectory (COD). The COD is analogousto an undo tablespace. It maintains thestate of active ASM operations such asdisk or datafile drop/add. The COD logrecord is either committed or rolledback based on the success of the operation.

（7）. File#5: Template directory

（8）. File#6: Alias directory

（9）. 11g, File#9: AttributeDirectory

（10）. 11g, File#12:Stalenessregistry, created when needed to track offline disks

這裡的相關術語解釋如下：

（1）. PST - Partner StatusTable. Maintains info on disk-to-diskgroupmembership.

（2）. COD - ContinuingOperation Directory. The COD structuremaintains the state of active ASMoperations or changes, such as disk ordatafile drop/add. The COD log record iseither committed or rolled back basedon the success of the operation. (source Oraclewhitepaper)

（3）. ACD - Active ChangeDirectory. The ACD is analogous to a redolog, where changes to the metadata arelogged. The ACD log record is used todetermine point of recovery in the case ofASM operation failures or instancefailures. (source Oracle whitepaper)

（4）. OSM Oracle StorageManager, legacy name, synonymous of ASM

（5）. CSS ClusterSynchronization Services. Part of Oracleclusterware, mandatory with ASM even insingle instance. CSS is used toheartbeat the health of the ASM instances.

（6）. RBAL - Oraclebackgroud process. In an ASM instance coordinatedrebalancing operations. In aDB instance, opens and mount diskgroups from thelocal ASM instance.

（7）. ARBx - Oraclebackgroud processes. In an ASM instance, a slavefor rebalancing operations

（8）. PSPx - Oraclebackgroud processes. In an ASM instance, ProcessSpawners

（9）. GMON - Oraclebackgroud processes. In an ASM instance,diskgroup monitor.

（10）. ASMB - Oraclebackgroudprocess. In an DB instance, keeps a (bequeath) persistent DB connectionto thelocal ASM instance. Provides hearthbeat and ASM statistics. During adiskgrouprebalancing operation ASM communicates to the DB AU changes via thisconnection.

（11）. O00x - Oraclebackgroudprocesses. Slaves used to connected from the DB to the ASM instancefor 'shortoperations'.

可以使用KFED 命令來檢視disk header的具體內容。這個在我之前有關KFED的blog裡有示例：

Oracle KFED 和 KFOD 工具說明

http://blog.csdn.net/tianlesoftware/article/details/6729950

這裡擷取部分內容：

[oracle@rac2 ~]$ kfedread /dev/mapper/datap1

kfbh.endian: 1 ; 0x000: 0x01

kfbh.hard: 130 ; 0x001: 0x82

kfbh.type: 1 ; 0x002:KFBTYP_DISKHEAD

kfbh.datfmt: 1 ; 0x003: 0x01

kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0

kfbh.block.obj: 2147483648 ; 0x008: TYPE=0x8NUMB=0x0

kfbh.check: 1508168608 ; 0x00c:0x59e4d3a0

kfbh.fcn.base: 0 ; 0x010: 0x00000000

kfbh.fcn.wrap: 0 ; 0x014: 0x00000000

kfbh.spare1: 0 ; 0x018: 0x00000000

kfbh.spare2: 0; 0x01c: 0x00000000

kfdhdb.driver.provstr: ORCLDISKDATA ; 0x000: length=12

--&gt磁碟卷名

kfdhdb.driver.reserved[0]: 1096040772 ; 0x008: 0x41544144

kfdhdb.driver.reserved[1]: 0 ; 0x00c: 0x00000000

kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000

kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000

kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000

kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000

kfdhdb.compat: 168820736 ; 0x020: 0x0a100000

kfdhdb.dsknum: 0 ; 0x024: 0x0000

kfdhdb.grptyp: 1 ; 0x026:KFDGTP_EXTERNAL

--&gt ThisindicatesRedundancy for Group.Check TYPE in query output.

kfdhdb.hdrsts: 3 ; 0x027:KFDHDR_MEMBER

--&gt This indicatesDiskHeader status. Here it indicates it is member of Group.

kfdhdb.dskname: DATA ; 0x028: length=4

--&gt This indicatesDisk Name

kfdhdb.grpname: DATA ; 0x048: length=4

--&gt This indicates theGroupName for the disk.

kfdhdb.fgname: DATA ; 0x068: length=4

--&gt This indicatestheFailure Group Name.

使用BBED 命令檢視disk header 參考：

Oracle 使用BBED 檢視 ASMDisk Header 內容

http://blog.csdn.net/tianlesoftware/article/details/6739369

二. 使用DD 命令進行asmdisk 備份與恢復

2.1 DD 備份需要多少個bytes？

我們透過KFED命令可以檢視到最後一個bytes的資訊：

kfdhdb.acdb.ub2spare: 0 ; 0x1de: 0x0000

這裡的0x1de 轉成十進位制是478.即這個disk header 佔用了478個bytes。但是我們用dd 備份disk header需要備份4096個bytes，即4k。為什麼是4k？

這是受隱含引數控制的：_asm_blksize=4096。即一個block的大小。

可以透過all_parameters 檢視檢視該隱含引數的值：

Oracle all_parameters 檢視

http://blog.csdn.net/tianlesoftware/article/details/6641281

SYS@anqing2(rac2)> select name,valuefrom all_parameters where name='_asm_blksize';

NAME VALUE

------------------------------------------------

_asm_blksize 4096

在disk header 中某些狀態位和效驗位是會發生變化，但是基本資訊是固定的。使用這些固定資訊就可以進行恢復。

還有一個重要的一點：dd 備份最好停機做，kfed 可以線上做。

2.2 如何清理ASMDisk

有時候一個ASM Disk由於故障，導致我們刪也刪不掉，加也加不進去，通常現象是磁碟的headerstatus狀態不正確但是diskheader中仍然保留了部分磁碟組的資訊。此時我們就需要clear這個磁碟的diskheader，然後再將它重新加入磁碟組中。

清理操作的命令如下：

ddif= of= bs=4096 count=1
dd if=/dev/zeroof= bs=4096 count=1

強調一點：慎用該命令。

2.3 開始DD 備份

SYS@anqing2(rac2)> select name,path fromv$asm_disk;

NAME PATH

-------------------------------------------------------------------------------

DATA /dev/mapper/datap1

FRA_0000 /dev/mapper/frap1

[oracle@rac2 ~]$ dd if=/dev/mapper/datap1 of=/u01/datap1header bs=4096 count=1

1+0 records in

1+0 records out

4096 bytes (4.1 kB) copied, 0.000122762seconds, 33.4 MB/s

[oracle@rac2 ~]$ dd if=/dev/mapper/frap1 of=/u01/fraheader bs=4096 count=1;

1+0 records in

1+0 records out

4096 bytes (4.1 kB) copied, 0.000325073seconds, 12.6 MB/s

2.4 停掉ASM 例項

SYS@anqing2(rac2)> select name,state,type from v$asm_diskgroup;

NAME STATE TYPE

--------------- ----------- ------

DATA CONNECTED EXTERN

FRA CONNECTED EXTERN

[oracle@rac2 u01]$ sh crs_stat.sh

Name Target State Host

------------------------------ ------------------- -------

ora.anqing.anqing1.inst ONLINE ONLINE rac1

ora.anqing.anqing2.inst ONLINE ONLINE rac2

ora.anqing.db ONLINE ONLINE rac1

ora.rac1.ASM1.asm ONLINE ONLINE rac1

ora.rac1.LISTENER_RAC1.lsnr ONLINE ONLINE rac1

ora.rac1.gsd ONLINE ONLINE rac1

ora.rac1.ons ONLINE ONLINE rac1

ora.rac1.vip ONLINE ONLINE rac1

ora.rac2.ASM2.asm ONLINE ONLINE rac2

ora.rac2.LISTENER_RAC2.lsnr ONLINE ONLINE rac2

ora.rac2.gsd ONLINE ONLINE rac2

ora.rac2.ons ONLINE ONLINE rac2

ora.rac2.vip ONLINE ONLINE rac2

[oracle@rac2 u01]$ srvctl stop database -danqing

[oracle@rac2 u01]$ sh crs_stat.sh

Name Target State Host

------------------------------ ------------------- -------

ora.anqing.anqing1.inst OFFLINE OFFLINE

ora.anqing.anqing2.inst OFFLINE OFFLINE

ora.anqing.db OFFLINE OFFLINE

ora.rac1.ASM1.asm ONLINE ONLINE rac1

ora.rac1.LISTENER_RAC1.lsnr ONLINE ONLINE rac1

ora.rac1.gsd ONLINE ONLINE rac1

ora.rac1.ons ONLINE ONLINE rac1

ora.rac1.vip ONLINE ONLINE rac1

ora.rac2.ASM2.asm ONLINE ONLINE rac2

ora.rac2.LISTENER_RAC2.lsnr ONLINE ONLINE rac2

ora.rac2.gsd ONLINE ONLINE rac2

ora.rac2.ons ONLINE ONLINE rac2

ora.rac2.vip ONLINE ONLINE rac2

[oracle@rac2 u01]$ srvctl stop asm -n rac1

[oracle@rac2 u01]$ srvctl stop asm -n rac2

[oracle@rac2 u01]$ sh crs_stat.sh

Name Target State Host

------------------------------ ------------------- -------

ora.anqing.anqing1.inst OFFLINE OFFLINE

ora.anqing.anqing2.inst OFFLINE OFFLINE

ora.anqing.db OFFLINE OFFLINE

ora.rac1.ASM1.asm OFFLINE OFFLINE

ora.rac1.LISTENER_RAC1.lsnr ONLINE ONLINE rac1

ora.rac1.gsd ONLINE ONLINE rac1

ora.rac1.ons ONLINE ONLINE rac1

ora.rac1.vip ONLINE ONLINE rac1

ora.rac2.ASM2.asm OFFLINE OFFLINE

ora.rac2.LISTENER_RAC2.lsnr ONLINE ONLINE rac2

ora.rac2.gsd ONLINE ONLINE rac2

ora.rac2.ons ONLINE ONLINE rac2

ora.rac2.vip ONLINE ONLINE rac2

[oracle@rac2 u01]$

2.5 模擬diskheader 故障

使用2.2中的方法。

[oracle@rac2 u01]$ dd if=/dev/zero of=/dev/mapper/datap1 bs=4096 count=1

1+0 records in

1+0 records out

4096 bytes (4.1 kB) copied, 0.00558218seconds, 734 kB/s

2.6 用KFED 檢視此時的diskheader

[oracle@rac2 u01]$ kfed read /dev/mapper/datap1

kfbh.endian: 0 ; 0x000: 0x00

kfbh.hard: 0 ; 0x001: 0x00

kfbh.type: 0 ; 0x002:KFBTYP_INVALID

kfbh.datfmt: 0 ; 0x003: 0x00

kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0

kfbh.block.obj: 0 ; 0x008: TYPE=0x0NUMB=0x0

kfbh.check: 0 ; 0x00c:0x00000000

kfbh.fcn.base: 0 ; 0x010: 0x00000000

kfbh.fcn.wrap: 0 ; 0x014: 0x00000000

kfbh.spare1: 0 ; 0x018: 0x00000000

kfbh.spare2: 0 ; 0x01c: 0x00000000

2.7 啟動ASM 例項

[oracle@rac2 u01]$ export ORACLE_SID=+ASM2

[oracle@rac2 u01]$ sqlplus / as sysdba;

SQL*Plus: Release 10.2.0.4.0 - Productionon Thu Sep 1 16:35:06 2011

Connected to an idle instance.

SQL> startup

ASM instance started

Total System Global Area 92274688 bytes

Fixed Size 1265960 bytes

Variable Size 65842904 bytes

ASM Cache 25165824 bytes

ORA-15032: not all alterations performed

ORA-15063: ASM discoveredan insufficient number of disks for diskgroup "DATA"

這裡提示DATAdiskgroup 不能mout，ASM 例項不能啟動

2.8 用之前的備份恢復

[oracle@rac2 u01]$ dd if=/u01/datap1header of=/dev/mapper/datap1 bs=4096 count=1

1+0 records in

1+0 records out

4096 bytes (4.1 kB) copied, 0.00666105seconds, 615 kB/s

2.9 用KFED 驗證diskheader

[oracle@rac2 u01]$ kfed read /dev/mapper/datap1

kfbh.endian: 1 ; 0x000: 0x01

kfbh.hard: 130 ; 0x001: 0x82

kfbh.type: 1 ; 0x002:KFBTYP_DISKHEAD

kfbh.datfmt: 1 ; 0x003: 0x01

kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0

kfbh.block.obj: 2147483648 ; 0x008: TYPE=0x8NUMB=0x0

kfbh.check: 1508168608 ; 0x00c:0x59e4d3a0

kfbh.fcn.base: 0 ; 0x010: 0x00000000

kfbh.fcn.wrap: 0 ; 0x014: 0x00000000

kfbh.spare1: 0 ; 0x018: 0x00000000

kfbh.spare2: 0 ; 0x01c: 0x00000000

....

現在恢復正常

2.10 將Data diskgroup mount 上去

SYS@+ASM2(rac2)> select name,state,typefrom v$asm_diskgroup;

NAME STATE TYPE

------------------------------ -----------------

DATA DISMOUNTED

FRA MOUNTED EXTERN

SYS@+ASM2(rac2)> alter diskgroup DATAmount;

Diskgroup altered.

SYS@+ASM2(rac2)> select name,state,typefrom v$asm_diskgroup;

NAME STATE TYPE

------------------------------ -----------------

DATA MOUNTED EXTERN

FRA MOUNTED EXTERN

mout 成功，現在RAC 可以正常啟動了。

[oracle@rac2 u01]$ sh crs_stat.sh

Name Target State Host

------------------------------ ------------------- -------

ora.anqing.anqing1.inst ONLINE ONLINE rac1

ora.anqing.anqing2.inst ONLINE ONLINE rac2

ora.anqing.db ONLINE ONLINE rac2

ora.rac1.ASM1.asm ONLINE ONLINE rac1

ora.rac1.LISTENER_RAC1.lsnr ONLINE ONLINE rac1

ora.rac1.gsd ONLINE ONLINE rac1

ora.rac1.ons ONLINE ONLINE rac1

ora.rac1.vip ONLINE ONLINE rac1

ora.rac2.ASM2.asm ONLINE ONLINE rac2

ora.rac2.LISTENER_RAC2.lsnr ONLINE ONLINE rac2

ora.rac2.gsd ONLINE ONLINE rac2

ora.rac2.ons ONLINE ONLINE rac2

ora.rac2.vip ONLINE ONLINE rac2

三. 使用KFED 進行備份恢復

這種方式和dd 一樣，先把asm disk header 匯出，然後匯入就可以了。不過這裡要注意的幾點，就是當我們匯出以後，在匯入。在這段時間內disk header的資訊可能會發生變化。所以在匯入之前需要關注一下這些資訊。

如：

kfdhdb.dsknum: 0 ; 0x024: 0x0000
kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname: DATA_0000 ; 0x028: length=9
kfdhdb.grpname: DATA ; 0x048: length=4
kfdhdb.fgname: DATA_0000 ; 0x068: length=9
kfdhdb.crestmp.hi: 32937833 ; 0x0a8: HOUR=0x9 DAYS=0x1b MNTH=0x5 YEAR=0x7da
kfdhdb.mntstmp.hi: 32937834 ; 0x0b0: HOUR=0xa DAYS=0x1b MNTH=0x5 YEAR=0x7da
kfdhdb.secsize: 512 ; 0x0b8: 0x0200
kfdhdb.blksize: 4096 ; 0x0ba: 0x1000
kfdhdb.ausize: 1048576 ; 0x0bc: 0x00100000
kfdhdb.dsksize: 51200 ; 0x0c4: 0x0000c800
kfdhdb.f1b1locn: 2 ; 0x0d4: 0x00000002
kfdhdb.dbcompat: 168820736 ; 0x0e0: 0x0a100000
kfdhdb.grpstmp.hi: 32937833 ; 0x0e4: HOUR=0x9 DAYS=0x1b MNTH=0x5 YEAR=0x7da
kfdhdb.grpstmp.lo: 1704339456 ; 0x0e8: USEC=0x0 MSEC=0x18a SECS=0x19 MINS=0x19

以上資訊的解釋說明：

dsknum：磁碟號
grptyp：磁碟所屬型別EXTERNALREDUNDANCY
磁碟所屬型別主要有：
       NORMAL REDUNDANCY - Two-waymirroring, requiring two failure groups.
       HIGH REDUNDANCY - Three-waymirroring, requiring three failure groups.
       EXTERNAL REDUNDANCY - No mirroringfor disks that are already protected using hardware mirroring or RAID.
ddrsts：磁碟頭狀態
dskname：在asm中磁碟名
grpname：磁碟組名
fgname：failure groupname
crestmp.hi：asm磁碟組建立時間
mntstmp.hi：asm磁碟組mount時間
blksize：磁碟頭塊大小 4096
ausize：條帶化大小預設1M
dsksize：磁碟大小
f1b1locn：FileDirectory blk 1 AU num

這裡需要強調一點，如果一個disk group裡有多個disk 的時候，並且他們都是同時新增到disk group裡的，那麼這種情況下，他們的disk header 是差不多的。所以在同一個disk group裡，當某個disk header 出現corrupt的時候，只需要將改組的其他disk header 匯出，然後匯入corrupt的就ok了。

3.1 KFED 備份asmdisk header

SYS@anqing2(rac2)> select path fromv$asm_disk;

PATH

--------------------------------------------------------------------------------

/dev/mapper/datap1

/dev/mapper/frap1

[oracle@rac2 u01]$ kfed read/dev/mapper/datap1 text=/u01/datap1disker

[oracle@rac2 u01]$ ll datap1disker

-rw-r--r-- 1 oracle oinstall 6607 Sep 2 10:48 datap1disker

[oracle@rac2 u01]$ cat datap1disker

kfbh.endian: 1 ; 0x000: 0x01

kfbh.hard: 130 ; 0x001: 0x82

kfbh.type: 1 ; 0x002:KFBTYP_DISKHEAD

kfbh.datfmt: 1 ; 0x003: 0x01

kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0

kfbh.block.obj: 2147483648 ; 0x008: TYPE=0x8NUMB=0x0

kfbh.check: 868534624 ; 0x00c:0x33c4c960

kfbh.fcn.base: 0 ; 0x010: 0x00000000

kfbh.fcn.wrap: 0 ; 0x014: 0x00000000

kfbh.spare1: 0 ; 0x018: 0x00000000

kfbh.spare2: 0 ; 0x01c: 0x00000000

kfdhdb.driver.provstr: ORCLDISKDATA ; 0x000: length=12

kfdhdb.driver.reserved[0]: 1096040772 ; 0x008: 0x41544144

kfdhdb.driver.reserved[1]: 0 ; 0x00c: 0x00000000

kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000

kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000

kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000

kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000

kfdhdb.compat: 168820736 ; 0x020: 0x0a100000

kfdhdb.dsknum: 0 ; 0x024: 0x0000

kfdhdb.grptyp: 1 ; 0x026:KFDGTP_EXTERNAL

kfdhdb.hdrsts: 3 ; 0x027:KFDHDR_MEMBER

kfdhdb.dskname: DATA ; 0x028: length=4

kfdhdb.grpname: DATA ; 0x048: length=4

kfdhdb.fgname: DATA ; 0x068: length=4

kfdhdb.capname: ; 0x088: length=0

kfdhdb.crestmp.hi: 32952076 ; 0x0a8: HOUR=0xcDAYS=0x18 MNTH=0x3 YEAR=0x7db

kfdhdb.crestmp.lo: 3374491648 ; 0x0ac: USEC=0x0MSEC=0xaa SECS=0x12 MINS=0x32

kfdhdb.mntstmp.hi: 32957488 ; 0x0b0: HOUR=0x10DAYS=0x1 MNTH=0x9 YEAR=0x7db

kfdhdb.mntstmp.lo: 2804987904 ; 0x0b4: USEC=0x0MSEC=0x2e SECS=0x33 MINS=0x29

kfdhdb.secsize: 512 ; 0x0b8: 0x0200

kfdhdb.blksize: 4096 ; 0x0ba: 0x1000

kfdhdb.ausize: 1048576 ; 0x0bc: 0x00100000

kfdhdb.mfact: 113792 ; 0x0c0: 0x0001bc80

kfdhdb.dsksize: 11993 ; 0x0c4: 0x00002ed9

kfdhdb.pmcnt: 2 ; 0x0c8: 0x00000002

kfdhdb.fstlocn: 1 ; 0x0cc: 0x00000001

kfdhdb.altlocn: 2 ; 0x0d0: 0x00000002

kfdhdb.f1b1locn: 2 ; 0x0d4: 0x00000002

kfdhdb.redomirrors[0]: 0 ; 0x0d8: 0x0000

kfdhdb.redomirrors[1]: 0 ; 0x0da: 0x0000

kfdhdb.redomirrors[2]: 0 ; 0x0dc: 0x0000

kfdhdb.redomirrors[3]: 0 ; 0x0de: 0x0000

kfdhdb.dbcompat: 168820736 ; 0x0e0: 0x0a100000

kfdhdb.grpstmp.hi: 32952076 ; 0x0e4: HOUR=0xcDAYS=0x18 MNTH=0x3 YEAR=0x7db

kfdhdb.grpstmp.lo: 3374396416 ; 0x0e8: USEC=0x0MSEC=0x4d SECS=0x12 MINS=0x32

.....

3.2 清空asmdisk header

要清空頭4k disk header的原因，是由於一些垃圾位資訊的存在，導致check校驗值計算有誤，清空完頭後再merge的話，校驗計算就正確了。如果不清空，那麼前4k不僅僅只包含merge的header資訊，還有其他被corrupt的資訊，所以用merge進去會導致校驗值錯誤，就算修改check的16進位制程式碼，還是不能載入diskgroup，v$asm_disk顯示header_status為provision（錯誤的check值會顯示imcompatible），需要清空前4k再merge這樣check才會正確。

[oracle@rac2 u01]$ dd if=/dev/zero of=/dev/mapper/datap1 bs=4096 count=1

1+0 records in

1+0 records out

4096 bytes (4.1 kB) copied, 0.000133662seconds, 30.6 MB/s

注意：

我的這步dd 操作實在DB open 狀態下進行的，我們看一下此時的狀態。

SYS@anqing2(rac2)> select name,state,offline_disks from v$asm_diskgroup;

NAME STATE OFFLINE_DISKS

------------------------------ ----------- -------------

DATA CONNECTED 0

FRA CONNECTED 0

SYS@anqing2(rac2)> select mount_status,header_status,state,path from v$asm_disk;

MOUNT_S HEADER_STATU STATE PATH

------- ------------ ----------------------------------------------------------

OPENED UNKNOWN NORMAL /dev/mapper/datap1

OPENED UNKNOWN NORMAL /dev/mapper/frap1

進行一下事務操作：

SYS@anqing2(rac2)> create table d1 asselect * from all_objects;

Table created.

SYS@anqing2(rac2)> select count(*) fromd1;

COUNT(*)

----------

49868

事務操作也正常。

現在我們重啟一下ASM例項。

SYS@+ASM2(rac2)> shutdown immediate

ASM diskgroups dismounted

ASM instance shutdown

SYS@+ASM2(rac2)> startup

ASM instance started

Total System Global Area 92274688 bytes

Fixed Size 1265960 bytes

Variable Size 65842904 bytes

ASM Cache 25165824 bytes

ORA-15032: not all alterations performed

ORA-15063: ASM discovered an insufficientnumber of disks for diskgroup "DATA"

重啟之後，之前的dd 破壞就有影響了。

3.3 使用KFEDMerge 恢復

在前面講過，使用Merge 恢復，要檢查下之前匯出來的內容。因為可能有變跟。

我這裡直接使用KFEDmerge 回去。在mout Data disk group.

[oracle@rac2 u01]$ kfed merge /dev/mapper/datap1 text=/u01/datap1disker

SYS@+ASM2(rac2)> select name,state fromv$asm_diskgroup;

NAME STATE

------------------------------ -----------

DATA DISMOUNTED

FRA MOUNTED

SYS@+ASM2(rac2)> alter diskgroup DATAmount;

Diskgroup altered.

SYS@+ASM2(rac2)> select name,state fromv$asm_diskgroup;

NAME STATE

------------------------------ -----------

DATA MOUNTED

FRA MOUNTED

成功Mount。

四. 重建ASM Disk Header

Oracle 官方文件：

Creatinga New ASM Disk Header After Existing One Is Corrupted

http://blog.csdn.net/tianlesoftware/article/details/6740716

Oracle的asm這塊很脆弱，如果我們沒有對disk header進行，或者使用kfed merge 也失敗，那麼就只有最好一招：重建disk header。這裡要注意，不是所有情況下都可以重建成功的。如果重建失敗，那麼就只有最後一個解決方法，重建diskgroup，然後透過備份進行全庫恢復。

在Oracle 11g裡引入了AMDU工具，不過該工具在10g裡也可以使用。具體參考MOS 文件：[ID 553639.1].

       AMDU isa tool introduced in 11g where it is posible to extract all the availablemetadata from one or more ASM disks, generate formatted block printouts fromthe dump output, extract one or more files from a diskgroup (mounted/unmounted)and write them to the OS file system.
       This tool is very important whendealing with internal errors related to the ASM metadata.
       Although this tool was releasedwith 11g, it can be used with ASM 10g.

而且在11gR2裡，asmcmd 的md_backup 和 md_restore命令也可以進行備份。關於這個命令的使用，參考eygle 的blog：

我們透過x$kfdat 字典檢視時，會顯示每個file# 對應的AU 數。如下：

SYS@+ASM2(rac2)> select group_kfdatgroup#,FNUM_KFDAT file#, sum(1) AU_used from x$kfdat where v_kfdat='V' group bygroup_kfdat,FNUM_KFDAT,v_kfdat;

GROUP# FILE# AU_USED

---------- ---------- ----------

1 0 2

1 1 2

1 2 1

1 3 85

1 4 2

1 5 1

1 6 1

其中我們在disk header 重建時需要關注的的幾個資訊：file direcroy 和 disk directory。

（1）. File#0, AU=0: disk header(disk name, etc), Allocation Table (AT)and Free Space Table (FST)

（2）. File#0, AU=1: PartnerStatus Table (PST)

（3）. File#1: File Directory(files and their extent pointers)

（4）. File#2: Disk Directory

注意幾點：

1. KFED 工具版本要10.2.0.2 以上的的。否則會有bug：.

2. 重建disk header思路如下：

1).找到filedirectory ，再根據filedirectory 找到 diskdirectory；
2). 根據disk directory找出磁碟資訊，手工編輯磁碟標頭檔案，最後用kfed merge到對應磁碟中，生成disk header。

3).file directory一般在磁碟組某個磁碟au=2的位置上，如果對磁碟組做過刪除盤和增加盤的操作，file directory不一定在au=2的位置上,需要手工去查詢。

4.1 官網的示例

Forthis test we have 3 ASM disks in an external redundancy diskgroup. For the

test we will wipe out the header for ASMdisk 3 (data03):

/ocfs02/asm/data01

/ocfs02/asm/data02

/ocfs02/asm/data03

測試環境的diskgroup裡有3個disk, 實驗破壞data 03的diak header。

1. Make sure all ASMinstances are shut down.

--關閉所有ASM 例項

2. Make a back up of thefirst 4k of the bad disk with dd:

ddif= of= bs=4096 count=1

備份損壞的disk header

3. Check existing disksand see which one has “file 1 block 1″:

To find the disk with f1b1 run:

kfedread | grep f1b1

搜尋含有file 1 block 1的欄位。

Example:

$ kfed read /ocfs02/asm/data01 | grep f1b1

kfdhdb.f1b1locn: 2 ; 0x0d4: 0x00000002

$ kfed read /ocfs02/asm/data02 | grep f1b1

kfdhdb.f1b1locn: 0 ; 0x0d4: 0x00000000

Sincedata01, has a non-zero value, data01 is the disk with “file 1 block 1″.

--注意這裡的值，如果非0，就是代表搜尋到了file 1 block 1.

Confirmthis by checking the following to see if you see “KFBTYP_LISTHEAD” in the 2ndallocation unit:

可以可以透過第二個AU 單元來驗證。

kfed read aunum=2 |grep kfbh.type

Also specify the ausize with AUSZ=# ifusing a non default allocation unit size.

如果使用非預設AUsize 的話，也可以指定ausize。

Example:

$ kfed read /ocfs02/asm/data01 aunum=2 |grep kfbh.type

kfbh.type: 5 ; 0x002: KFBTYP_LISTHEAD

Ifthe lost disk is the “file 1 block 1″ disk then scan every AU of the bad disk till you find a headerwhich claims to be FILE_DIRECTORY (KFBTYP_FILEDIR).

如果透過grep沒有找到f1b1,就需要查詢所有的AU.直到找到file directory。

Onceyou find that you can set f1b1locn to that AU number and continue… If the file directory cannotbe found anywhere then we have no choice but to re-create the diskgroup andrestore from a backup.

如果找到了f1b1locn,就將其設定為正確的AU Number，如果說沒有找到File directory。那麼就只有重建diskgroup，然後透過備份進行restore了。

4. Make a copy of a gooddisk header with kfed that IS NOT the disk that contains f1b1 and is in theSAME diskgroup as the bad disk.

copy 一個disk header。這個disk header是非f1b1的。在上面的測試，f1b1在data01上。

In our example this is data02:

kfedread > fix.txt

Example:

$ kfed read /ocfs02/asm/data02 > fix.txt

5. Edit the fix.txt and change thefollowing fields to the proper values (use the ASM alert log for reference):

       kfdhdb.dsknum
       kfdhdb.dskname
       kfdhdb.fgname

修改相關的引數值

Example:

Check the alert log for proper names:

NOTE: cache opening disk 0 of grp 1:DATA_0000 path:/ocfs02/asm/data01

NOTE: cache opening disk 1 of grp 1:DATA_0001 path:/ocfs02/asm/data02

NOTE: cache opening disk 2 of grp 1:DATA_0002 path:/ocfs02/asm/data03

Old values from fix.txt:

kfdhdb.dsknum:1 ; 0x024: 0x0001

kfdhdb.grptyp:1 ; 0x026: KFDGTP_EXTERNAL

kfdhdb.hdrsts:3 ; 0x027: KFDHDR_MEMBER

kfdhdb.dskname:DATA_0001 ; 0x028: length=9

kfdhdb.grpname:DATA ; 0x048: length=4

kfdhdb.fgname:DATA_0001 ; 0x068: length=9

New values from fix.txt:

kfdhdb.dsknum:2 ; 0x024: 0x0002

kfdhdb.grptyp:1 ; 0x026: KFDGTP_EXTERNAL

kfdhdb.hdrsts:3 ; 0x027: KFDHDR_MEMBER

kfdhdb.dskname:DATA_0002 ; 0x028: length=9

kfdhdb.grpname:DATA ; 0x048: length=4

kfdhdb.fgname:DATA_0002 ; 0x068: length=9

6. Find the diskdirectory by dumping aunum=2 and blknum=2 for the disk with f1b1:

根據file directory查詢disk directory，命令如下：

kfed read aunum=2 blknum=2 | more

Example:

$ kfed read /ocfs02/asm/data01 aunum=2blknum=2 | more

kfffde[0].xptr.au: 2 ; 0x4a0: 0x00000002

kfffde[0].xptr.disk: 2 ; 0x4a4: 0x0002

kfffde[0].xptr.flags: 0 ; 0x4a6: L=0 E=0D=0 S=0

kfffde[0].xptr.chk: 42 ; 0x4a7: 0x2a

kfffde[1].xptr.au: 4294967295; 0x4a8:0xffffffff

kfffde[1].xptr.disk: 65535 ; 0x4ac: 0xffff

kfffde[1].xptr.flags: 0 ; 0x4ae: L=0 E=0D=0 S=0

kfffde[1].xptr.chk: 42 ; 0x4af: 0x2a

Afterthe initial file directory header, you will see the extent map. If thediskgroup is external redundancy then each entry refers to an extent of thefile. For normal redundancy, every pair is a extent set, similarly for highredundancy [012] form the extent set. Here we see thedisk directory is at au = 2 in disk number = 2.

In this example, it turned out to bein that location on the second AU, but it is not guaranteed that it will alwaysbe there.

7. Once the diskdirectory location is found, find the info for your disk number.

一旦確定了disk directory 的位置，就可以檢視disk number 的資訊。命令如下：

kfedread aunum=2 blknum=0 | more

Example:

kfed read /ocfs02/asm/data02 aunum=2blknum=0 | more

kfbh.type: 6 ; 0x002: KFBTYP_DISKDIR

...

kfddde[0].entry.incarn: 1 ;0x024: A=1 NUMM=0x0

--為1 才是allocatedentries，為0表示該entry 已經被deleted。

...

kfddde[2].dsknum: 2 ; 0x3b4: 0x0002

kfddde[2].state: 2 ; 0x3b6: KFDSTA_NORMAL

kfddde[2].ub1spare:0 ; 0x3b7: 0x00

kfddde[2].dskname: DATA_0002 ; 0x3b8:length=9

kfddde[2].fgname: DATA_0002 ; 0x3d8:length=9

kfddde[2].crestmp.hi: 32885842; 0x3f8: HOUR=0x12 DAYS=0x2 MNTH=0x3 YEAR=0x7d7

kfddde[2].crestmp.lo:3860343808 ; 0x3fc: USEC=0x0 MSEC=0x20b SECS=0x21 MINS=0x39

kfddde[2].failstmp.hi: 0 ; 0x400: HOUR=0x0DAYS=0x0 MNTH=0x0 YEAR=0x0

kfddde[2].failstmp.lo: 0 ; 0x404: USEC=0x0MSEC=0x0 SECS=0x0 MINS=0x0

Various kfddde refer to the disk directory entries.Only entries with entry.incarn numbers shouldA=1 are allocated entries. You might find entries with dskname populated, butif A=0 then it means that entry was deleted.

8. Now go back to fix.txt and adjust thecrestmp.hi and crestmp.lo to match what the disk directory shows. Ifit is already the same then leave it.

根據diskdirectory裡的值修改crestmp.hi 和 crestmp.lo 引數

Example:

Before:

kfdhdb.crestmp.hi: 32879468 ; 0x0a8:HOUR=0xc DAYS=0x1b MNTH=0xc YEAR=0x7d6

kfdhdb.crestmp.lo:

296378368 ; 0x0ac: USEC=0x0 MSEC=0x298SECS=0x1a MINS=0x4

kfdhdb.mntstmp.hi: 32879468 ; 0x0b0:HOUR=0xc DAYS=0x1b MNTH=0xc YEAR=0x7d6

kfdhdb.mntstmp.lo: 309633024 ; 0x0b4:USEC=0x0 MSEC=0x128 SECS=0x27 MINS=0x4

After:

kfdhdb.crestmp.hi:32885842 ; 0x0a8: HOUR=0x12 DAYS=0x2 MNTH=0x3 YEAR=0x7d7

kfdhdb.crestmp.lo:3860343808 ; 0x0ac: USEC=0x0 MSEC=0x20b SECS=0x21 MINS=0x39

kfdhdb.mntstmp.hi: 32885842 ; 0x0b0:HOUR=0x12 DAYS=0x2 MNTH=0x3 YEAR=0x7d7

kfdhdb.mntstmp.lo: 3870944256 ; 0x0b4:USEC=0x0 MSEC=0x27b SECS=0x2b MINS=0x39

9. Do a kfed merge to put the new headerinto the disk using fix.txt:

用kfed 命令將我們修改的新的disk header merge 到損壞的disk header上。

命令如下：

kfed merge text=fix.txt

Example:

kfed merge /ocfs02/asm/data03 text=fix.txt

Ifyou are using ASMLIB, at this point you will need to run the following to fixthe ASMLIB portion of the header:

如果使用ASMLIB，還需要修復對應的header，命令如下：

       /etc/init.d/oracleasmforce-renamedisk /dev/sdbg1
       /etc/init.d/oracleasm scandisks
       /etc/init.d/oracleasm listdisks

10. Startup nomount the ASM instance:

SQL> startup nomount;

啟動ASM 例項

11. Check v$asm_disk.header_status toverify that the disk header is in a “MEMBER” state.

檢查asmdisk header 的狀態。

Example:

SQL> select path, header_status fromv$asm_disk where path like '%data03%';

PATH

--------------------------------------------------------------------------------

HEADER_STATU

------------

/ocfs02/asm/data03

MEMBER

12. Mount the diskgroup.

mount diskgroup，命令如下：

alterdiskgroup mount;

Ifthe diskgroup fails to mount at this point, you may want to either considerre-creating the diskgroup and restoring or engaging BDE to assist.

Youmay also want to try clearing the first 4k of the disk with dd then do a kfedmerge again in case there are any extra characters causing problems (MAKE SURE YOU HAVE A BACKUP OF THE FIRST 4K FIRST):

如果mount 失敗，可以先考慮清空頭4k的內容，然後在merge，如果還失敗，就只能重建diskgroup，然後restore DB了。

Example:

dd if= of= bs=4096 count=1
dd if=/dev/zero of= bs=4096 count=1

4.2 說明

我的測試環境的diskgroup 都只有一個disk，所以不能進行測試。只能透過備份進行恢復，而無法進行重建。

如果進行重建，那麼分別從filedirectory 中獲取如下引數：

kfdhdb.dsknum: 0 ; 0x024: 0x0000

kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL

kfdhdb.hdrsts: 3 ; 0x027:KFDHDR_MEMBER

kfdhdb.dskname: DATA ; 0x028: length=4

kfdhdb.grpname: DATA ; 0x048: length=4

kfdhdb.fgname: DATA ; 0x068: length=4

從diskdirectory 中獲取如下引數：

kfdhdb.crestmp.hi:32885842 ; 0x0a8: HOUR=0x12 DAYS=0x2 MNTH=0x3 YEAR=0x7d7

kfdhdb.crestmp.lo:3860343808 ; 0x0ac: USEC=0x0 MSEC=0x20b SECS=0x21 MINS=0x39

重新生成disk header 之後進行kfed merge恢復。具體的操作步驟參考官網示例的步驟。總之備份終於一切。

轉載於--&gt>http://blog.csdn.net/tianlesoftware/article/details/6743677

Oracle RAC ASM disk header 備份恢復與重建示例說明

一. 準備知識

1.1 什麼是ASM（Automatic StorageManagement ）

1.1.1 ASM instance

1.1.2 ASM Disk groups

1.1.4 ASM 檔案的命名規則說明

1.2 ASM disk header 具體內容

二. 使用DD 命令進行asmdisk 備份與恢復

2.1 DD 備份需要多少個bytes？

2.2 如何清理ASMDisk

2.3 開始DD 備份

2.4 停掉ASM 例項

2.5 模擬diskheader 故障

2.6 用KFED 檢視此時的diskheader

2.7 啟動ASM 例項

2.8 用之前的備份恢復

2.9 用KFED 驗證diskheader

2.10 將Data diskgroup mount 上去

三. 使用KFED 進行備份恢復

3.1 KFED 備份asmdisk header

3.2 清空asmdisk header

3.3 使用KFEDMerge 恢復

四. 重建ASM Disk Header

4.1 官網的示例

4.2 說明

相關文章

Oracle RAC ASM disk header 備份 恢復 與 重建 示例說明

一. 準備知識

1.1 什麼是ASM（Automatic StorageManagement ）

1.1.1 ASM instance

1.1.2 ASM Disk groups

1.1.4 ASM 檔案的命名規則說明

1.2 ASM disk header 具體內容

二. 使用DD 命令進行asmdisk 備份與恢復

2.1 DD 備份需要多少個bytes？

2.2 如何清理ASMDisk

2.3 開始DD 備份

2.4 停掉ASM 例項

2.5 模擬diskheader 故障

2.6 用KFED 檢視此時的diskheader

2.7 啟動ASM 例項

2.8 用之前的備份恢復

2.9 用KFED 驗證diskheader

2.10 將Data diskgroup mount 上去

三. 使用KFED 進行備份恢復

3.1 KFED 備份asmdisk header

3.2 清空asmdisk header

3.3 使用KFEDMerge 恢復

四. 重建ASM Disk Header

4.1 官網的示例

4.2 說明

相關文章

Oracle RAC ASM disk header 備份恢復與重建示例說明