Oracle資料庫壞塊典型案例分析

龍山游龍發表於2022-11-07

一、 資料庫壞塊背景說明

某次,使用者反饋資料庫後臺日誌出現資料庫壞塊,部分表查詢時報錯。在業務恢復之後,透過梳理處理記錄,做一次總結分析,以下將對該案例的診斷過程進行說明。

二、問題詳細診斷過程

2 .1 資料庫 alter 日誌

Thu Jun 10 08:36:33 2021

OS Pid: 12648460 executed alter system set events '10231 trace name context forever,level 10'

Thu Jun 10 08:37:38 2021

ALTER SYSTEM: Flushing buffer cache

Thu Jun 10 08:37:55 2021

Hex dump of (file 62, block 5521) in trace file /oracle/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_12648460.trc

Corrupt block relative dba: 0x0f801591  (file 62, block 5521)

Bad header found during buffer read

Data in bad block:

 type: 6 format: 2 rdba: 0x2acf8b99

 last change scn: 0x0006.edc07b41 seq: 0x1 flg: 0x06

 spare1: 0x0 spare2: 0x0 spare3: 0x0

 consistency value in tail: 0x7b410601

 check value in block header: 0xdb02

 computed block checksum: 0x0

Reading datafile '+DATA/orcl/datafile/tbs_sys.ora' for corruption at rdba: 0x0f801591 (file 62, block 5521)

Read datafile mirror 'DATA_0000' (file 62, block 5521) found same corrupt data (no logical check)

WARNING: requested mirror side 2 of virtual extent 43 logical extent 1 offset 139264 is not allocated; I/O request failed

Errors in file /oracle/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_12648460.trc  (incident=912662):

ORA-01578: ORACLE data block corrupted (file # 62, block # 5521)

ORA-01110: data file 62: '+DATA/orcl/datafile/tbs_sys.ora'

Incident details in: /oracle/oracle/diag/rdbms/orcl/orcl1/incident/incdir_912662/orcl1_ora_12648460_i912662.trc

Thu Jun 10 08:37:55 2021

Corrupt Block Found

         TSN = 62, TSNAME = TBS_SYS

         RFN = 62, BLK = 5521, RDBA = 260052369

         OBJN = 82103, OBJD = 82103, OBJECT = XXXX100D, SUBOBJECT =

         SEGMENT OWNER = XXXXXX_ST, SEGMENT TYPE = Table Segment

Thu Jun 10 08:37:57 2021

Dumping diagnostic data in directory=[cdmp_20210610083757], requested by (instance=1, osid=12648460), summary=[incident=912662].

Thu Jun 10 08:37:58 2021

Sweep [inc][912662]: completed

Sweep [inc2][912774]: completed

Sweep [inc2][912662]: completed

Thu Jun 10 08:38:14 2021

Dumping diagnostic data in directory=[cdmp_20210610083814], requested by (instance=2, osid=9175400), summary=[incident=353303].

Thu Jun 10 08:38:29 2021

OS Pid: 12648460 executed alter system set events '10231 trace name context off'

 

從資料庫後臺日誌可以看出,資料檔案 '+DATA/orcl/datafile/tbs_sys.ora' 發生損壞,出現了部分損壞的 block 。損壞的塊屬於 XXXXXX_ST 使用者下的 XXXX100 表。

2 .2 作業系統錯誤日誌

 

檢查 A IX 主機的作業系統日誌,發現存在多塊磁碟存在 DISK OPERATION ERROR 錯誤。

 

2 .3 R MN 校驗資料庫壞塊

RMAN> run {

 allocate channel d1 type disk;

 allocate channel d2 type disk;

 allocate channel d3 type disk;

 allocate channel d4 type disk;

 backup validate check logical database;

 release channel d1;

 release channel d2;

 release channel d3;

 release channel d4;

 }

 

 

 

當使用 R MAN 校驗到 6 2 號資料檔案 t bs _ sys.ora 時報錯,自動退出。

S QL> select   *   from   g v$database_block_corrutption

 

no rows selected

查詢資料庫壞塊檢視,沒有返回資料庫中壞塊的資訊。

2 .4 使用 exp 嘗試匯出表資料

 

使用 exp 匯出 XXXX100 表,在查詢到 6 2 號檔案的 5 248 號塊時報錯,匯出失敗。

2 .5 RMAN 備份修復壞塊

RMAN> blockrecover datafile 6 2   block 5248 ;  -- 單塊修復

RMAN> blockrecover corruption list;  -- 對檢視中所有壞塊修復

 

sys@ORCL> select count(*) from XXXXXX_ST.XXXX100;

select count(*) from XXXXXX_ST.XXXX100

                                *

ERROR at line 1:

ORA-01115: IO error reading block from file  (block # )

ORA-01115: IO error reading block from file 62 (block # 5248)

ORA-15081: failed to submit an I/O operation to a disk

ORA-15081: failed to submit an I/O operation to a disk

再次查詢 XXXX100 表在訪問 5 248 號塊時,報 I O error

 

2 .6 設定 1 0231 事件跳過壞塊

SQL>alter system  SET EVENTS '10231 trace name context forever,level 10';

設定 1 0231 事件後,查詢 XXXX100 表,仍然報資料塊損壞,無法查詢。

2 .7 使用 dbms 包標記壞塊

sys@ORCL> begin

dbms_repair.skip_corrupt_blocks(

schema_name=>'XXXXXX_ST',

object_name=>'XXXX100',

object_type=>dbms_repair.table_object,

flags=>dbms_repair.skip_flag);

end;

/  2    3    4    5    6    7    8

 

PL/SQL procedure successfully completed.

 

sys@ORCL> select count(*) from XXXXXX_ST.XXXX100;

select count(*) from XXXXXX_ST.XXXX100

                                *

ERROR at line 1:

ORA-01115: IO error reading block from file  (block # )

ORA-01115: IO error reading block from file 62 (block # 5248)

ORA-15081: failed to submit an I/O operation to a disk

ORA-15081: failed to submit an I/O operation to a disk

 

sys@ORCL> exec dbms_repair.skip_corrupt_blocks(schema_name => 'XXXXXX_ST',object_name => 'XXXX100',flags => 1);

 

PL/SQL procedure successfully completed.

 

sys@ORCL>

sys@ORCL> select count(*) from XXXXXX_ST.XXXX100;

select count(*) from XXXXXX_ST.XXXX100

                                *

ERROR at line 1:

ORA-01115: IO error reading block from file  (block # )

ORA-01115: IO error reading block from file 62 (block # 5248)

ORA-15081: failed to submit an I/O operation to a disk

ORA-15081: failed to submit an I/O operation to a disk

從結果上來看,使用 dbms 包標記的方法也無法正常跳過壞塊,查詢出來資料。

2 .8 基於 rowid 抽取資料

2 .8.1 檢視錶屬性

 

2 .8.2 基於塊 rowid 生成插入語句

-- 基於 rowid 抽取壞塊資料

select 'insert into test select * from XXXXXX_ST.XXXX100 t where rowid between ' || '''' ||

       dbms_rowid.rowid_create(1,

                               o.data_object_id,

                               e.RELATIVE_FNO,

                               e.BLOCK_ID,

                               0) || '''' || ' and ' || '''' ||

       DBMS_ROWID.ROWID_CREATE(1,

                               o.data_object_id,

                               e.RELATIVE_FNO,

                               e.BLOCK_ID + e.BLOCKS - 1,

                               10000) || '''' || ';' || CHR(10) ||

       'commit;'

  from dba_extents e, dba_objects o

 where e.segment_name = 'XXXX100'

   and e.owner = 'XXXXXX_ST'

   AND o.object_name = 'XXXX100'

   AND o.owner = 'XXXXXX_ST';

 

2 .8.3 建立空表

SQL> create table test as select * from XXXXXX_ST.XXXX100 where 1=2;

2 .8.4 插入資料

 

從圖中可以看出就只有第一個 insert 語句因為資料塊損壞無法正常執行,其他都執行成功。

2 .8.5 查詢表資料

 

 

可以看出新建的表,能夠正常訪問和查詢。

三、解決辦法和建議

3.1  解決辦法

從問題詳細診斷過程的基於 rowid 抽取資料的方法可以看出,此方法能夠跳過壞塊,讀取出表中可用的資料。因此,該問題的處理思路大概如下:

1 )將原表 rename 掉,例如: XXXX100 rename XXXX100 _BAK

2 )複製 XXXX100_ BAK 的表結構建立一個名為 XXXX100 的空表。

3 )使用 rowid 的抽取方式,把沒有損壞的塊的資料抽取到 XXXX100 表中。

4 )檢視 XXXX100 _BAK 的表屬性,重新建立 XXXX100 的索引及約束等。

3.2  分析建議

1 )因使用 R MAN 校驗失敗,無法從資料庫檢視中直接查詢有多少表受到了影響,可以使用 select 批次生成查詢指令碼或使用 exp 匯出該使用者的方式,當訪問或匯出問題表時,會出現報錯,透過此方法來確定有多少表受到了影響,進而統一進行修復。

2 )對重要的資料庫做異地容災,可以避免如機房斷電、硬體故障等原因造成的業務中斷或資料丟失。

 


來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/30310891/viewspace-2922214/,如需轉載,請註明出處,否則將追究法律責任。

相關文章