一次壞塊的處理過程 [轉]

zhouxianwang發表於2013-08-10

最近在一個資料庫遇到了壞塊,以下是處理過程。

一、壞塊的發現及處理


首先是在做RMAN備份的時候,看到alert檔案有壞塊報錯:
.......
Corrupt block relative dba: 0x00925026 (file 2, block 1200166)
Fractured block found during backing up datafile
Data in bad block:
type: 6 format: 2 rdba: 0x00925026
last change scn: 0x0578.15b7ebf7 seq: 0x1 flg: 0x06
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0x33da0602
check value in block header: 0xf8d0
computed block checksum: 0xd82e
Reread of blocknum=1200166, file=+DATA/dwrac/datafile/dwdata_1m01.dbf. found same corrupt data
Reread of blocknum=1200166, file=+DATA/dwrac/datafile/dwdata_1m01.dbf. found same corrupt data
Reread of blocknum=1200166, file=+DATA/dwrac/datafile/dwdata_1m01.dbf. found same corrupt data
Reread of blocknum=1200166, file=+DATA/dwrac/datafile/dwdata_1m01.dbf. found same corrupt data
Reread of blocknum=1200166, file=+DATA/dwrac/datafile/dwdata_1m01.dbf. found same corrupt data
......

用dbv檢查也證實了這一點。

[oracle@dwdb02 admin]$ dbv file="+DATA/dwrac/datafile/dwdata_1m01.dbf" start=502554 end=1217134 userid=admin/sdoadmin123 blocksize=16384

DBVERIFY: Release 10.2.0.5.0 - Production on Thu Feb 17 14:50:13 2011

Copyright (c) 1982, 2007, Oracle. All rights reserved.

DBVERIFY - Verification starting : FILE = +DATA/dwrac/datafile/dwdata_1m01.dbf
Page 502554 is marked corrupt
Corrupt block relative dba: 0x0087ab1a (file 2, block 502554)
Bad header found during dbv: 
Data in bad block:
type: 11 format: 2 rdba: 0x00800001
last change scn: 0x0000.00000000 seq: 0x1 flg: 0x04
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0x00000b01
check value in block header: 0x85d8
computed block checksum: 0x0
......

DBVERIFY - Verification complete

Total Pages Examined : 714581
Total Pages Processed (Data) : 567369
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 143342
Total Pages Failing (Index): 0
Total Pages Processed (Other): 3866
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 0
Total Pages Marked Corrupt : 4
Total Pages Influx : 0
Highest block SCN : 0 (0.0)

10g以後rman備份發現的壞塊資訊可以在v$database_block_corruption查到。

SQL> select * from v$database_block_corruption;

FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTION_TYPE
---------- ---------- ---------- ------------------ ---------------
2 1217134 1 0 FRACTURED
508 2386314 1 0 FRACTURED
517 1115970 1 0 FRACTURED
2 1200166 1 0 FRACTURED
51 401698 1 0 FRACTURED
61 306642 1 0 FRACTURED
2 502554 1 0 FRACTURED
2 1213658 1 0 FRACTURED
34 701050 1 0 FRACTURED
60 347166 1 0 FRACTURED

FRACTURED意味著資料塊是物理損壞,遇到這種情況,如果有備份的話最好是從備份恢復,甚至可以用rman做塊級恢復,如
blockrecover datafile 2 block 1217134;
或者:
blockrecover corruption list;

不幸的是,這個資料庫是一個新庫,沒有正式上線,也還沒有上備份。沒辦法,只能看看損壞的物件是什麼了。

--由於庫太大,直接查dba_extents太慢,因此備份dba_extents到表dbextnet再查詢。

SQL> select * from dbextent where file_id=2 and 1213658 between block_id and block_id+blocks-1;

OWNER SEGMENT_NAME PARTITION_NAME SEGMENT_TYPE TABLESPACE_NAME EXTENT_ID FILE_ID BLOCK_ID BYTES BLOCKS RELATIVE_FNO
------------------------------ -------------------------------------------------------------------------------- ------------------------------ ------------------ ------------------------------ ---------- ---------- ---------- ---------- ---------- ------------
CREATER_USER WIDGET_NEWUSE_IX2 INDEX DWDATA_1M 995 2 1213637 1048576 64 2

SQL> select * from dbextent where file_id=2 and 502554 between block_id and block_id+blocks-1;

OWNER SEGMENT_NAME PARTITION_NAME SEGMENT_TYPE TABLESPACE_NAME EXTENT_ID FILE_ID BLOCK_ID BYTES BLOCKS RELATIVE_FNO
------------------------------ -------------------------------------------------------------------------------- ------------------------------ ------------------ ------------------------------ ---------- ---------- ---------- ---------- ---------- ------------
CREATER_USER WIDGET_NEWUSE_IX2 INDEX DWDATA_1M 1787 2 502533 1048576 64 2


萬幸的是,這裡損壞的只是索引,把索引rebuild就可以。如果損壞的是表,則可能需要用到其他手段,如dbms_repare包來處理了。

SQL> alter index creater_user.WIDGET_NEWUSE_IX2 rebuild online parallel 16 tablespace dwdata_10m_1 nologging;

Index altered

......

注意:這裡要用rebuild online,否則rebuild可能會讀取原索引作為源來重建新索引,這樣的話新建的索引也是損壞的。

全部處理完畢後,確認是否還有物件損壞。

SQL> select * from dba_extents E,v$database_block_corruption c where e.file_id=c.file# and c.block# between e.block_id and e.block_id+e.blocks-1;

OWNER SEGMENT_NAME PARTITION_NAME SEGMENT_TYPE TABLESPACE_NAME EXTENT_ID FILE_ID BLOCK_ID BYTES BLOCKS RELATIVE_FNO FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTION_TYPE
------------------------------ -------------------------------------------------------------------------------- ------------------------------ ------------------ ------------------------------ ---------- ---------- ---------- ---------- ---------- ------------ ---------- ---------- ---------- ------------------ ---------------

以上查詢返回結果為空,表示沒有資料庫物件損壞,第一階段處理完畢。

二、後續處理

上面的步驟解決了壞塊對資料庫的潛在影響,但是此時v$database_block_corruption的資訊是還沒有清除的,需要清除,需要再執行RMAN,如果沒有壞塊已經修復,才會從v$database_block_corruption刪除對應記錄。
但是在大部分情況下,即使刪除了原來的物件,資料塊的狀態讓人是標識損壞的,但是不影響一般的資料庫操作,可以忽略。如果你看不順眼或者擔心它影響RMAN的備份,可以用以下方法格式化這個資料塊,只有被正常格式化後,資料塊狀態才會恢復正常。

1. 首先建立一個新表

create table s (n number,c varchar2(4000)) nologging tablespace dwdata_1m;

注意表空間要指定在壞塊所在的表空間上(v$database_block_corruption可查詢到)

2. 確保目標資料塊在free list上,並查詢其大小
SQL> Select BYTES from dba_free_space where file_id=2 and 502554 between block_id and block_id + blocks -1;

BYTES
----------
1048576


3. 手工擴充套件

BEGIN
FOR I IN 1 .. 100000000
LOOP
EXECUTE IMMEDIATE 'alter table s allocate extent(size 1048576 datafile ''+DATA/dwrac/datafile/dwdata_1m01.dbf'')';
END LOOP;
END;

在這一步執行的同時,不斷地查新dba_free_space,直至確認該資料塊已經不在dba_free_space中了,之後就可以把上一步操作的迴圈中斷。

SQL> Select * from dba_free_space where file_id= 2 and 502554 between block_id and block_id + blocks -1;

TABLESPACE_NAME FILE_ID BLOCK_ID BYTES BLOCKS RELATIVE_FNO
-------------------------------- ---------- ---------- ---------- ---------- ------------

4. 下一步就是要寫裡寫入資料了

BEGIN
FOR i IN 1..1000000000 LOOP
INSERT /*+ APPEND */ INTO s select i, lpad('REFORMAT',3092, 'R') from dual connect by rownum<=10000;
commit ;
END LOOP;
END;
/

注意:必須要寫入資料,資料塊才能被重新格式化,才能修復塊的狀態。

5. 在寫入資料的同時,不斷用dbv檢測壞塊情況

--指定start和end引數,提高效率
[oracle@dwdb02 admin]$ dbv file="+DATA/dwrac/datafile/dwdata_1m01.dbf" start=502554 end=502554 userid=admin/sdoadmin123 blocksize=16384

DBVERIFY: Release 10.2.0.5.0 - Production on Thu Feb 17 15:56:53 2011

Copyright (c) 1982, 2007, Oracle. All rights reserved.

DBVERIFY - Verification starting : FILE = +DATA/dwrac/datafile/dwdata_1m01.dbf


DBVERIFY - Verification complete

Total Pages Examined : 714581
Total Pages Processed (Data) : 610675
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 100202
Total Pages Failing (Index): 0
Total Pages Processed (Other): 3704
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 0
Total Pages Marked Corrupt : 0
Total Pages Influx : 0
Highest block SCN : 0 (0.0)

如果Total Pages Marked Corrupt=0,則表示資料塊已經修復了。此時就可以取消上一步的資料insert操作。

6. 用rman重新validate一次

RMAN> backup check logical validate datafile 2;

Starting backup at 17-FEB-11
using channel ORA_DISK_1
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
input datafile fno=00508 name=+DATA/dwrac/datafile/dwdata_1m01.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:20:45
Finished backup at 17-FEB-11

正常的話,這一步完了以後,v$database_block_corruption裡對應的壞塊記錄就應該被刪除了。

SQL> select * from v$database_block_corruption t where file#=2 and block#=502554;

FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTION_TYPE
---------- ---------- ---------- ------------------ ---------------


重複上面的步驟直至v$database_block_corruption不包括任何資料。

以上只是壞塊的處理過程,但是壞塊產生的原因仍然不明朗,需進一步挖掘,但是從這些例子也可以看出備份的重要性!

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/27036311/viewspace-768174/,如需轉載,請註明出處,否則將追究法律責任。

相關文章