Oracle資料壞塊簡介及其恢復(dbv、BMR)
Oracle資料塊恢復(dbv、BMR)
物理壞塊和邏輯壞塊
Oracle資料檔案的壞塊可以分為物理壞塊和邏輯壞塊。物理壞塊指的是塊格式本身已經損壞,塊內的資料沒有任何意義。而邏輯壞塊,指的是塊內的資料在邏輯上存在問題,比如說索引塊的索引值沒有按從小到大排列導致的邏輯壞塊。物理壞塊一般是由於記憶體問題、OS問題、I/O子系統問題或硬體引起的,邏輯壞塊一般是有Oracle bug等原因引起的。
各種各樣的塊損壞通常是透過Oracle的ORA-1578錯誤報告出來的,詳細的損壞描述會在告警日誌中列印出來。
l 物理塊損壞
塊的物理損壞有很多種情況,例如塊頭(Cache Header)被一個不正確的值替換、塊被破壞或變得不完整、塊的頭和尾不匹配、塊的校驗和(CHECKSUM)不正確、塊錯位、塊被歸零。
n 破裂塊
一個破裂塊的意思即塊是不完整的,塊頭的資訊不能匹配塊尾的資訊。在告警日誌中可能出現如下的日誌資訊:
Corrupt block relative dba:0x0380e573(file 14,block 58739)
Fractured block found during buffer read
……
n 不正確的校驗和
塊的校驗和也是資料塊的一致性檢查的依據。塊的一致性檢查由DB_BLOCK_CHECKSUM和DB_BLOCK_CHECKING兩個初始化引數控制。DB_BLOCK_CHECKSUM是一種物理檢查,DB_CHECK_CHECKING是一種邏輯檢查。
引數1 DB_CHECK_CHECKSUM
DB_BLOCK_CHECKSUM只有在寫入(DBWn常規寫或使用者程式直接路徑寫入)時,根據一個CHECKSUM演算法計算資料塊的校驗和,然後寫入資料塊的一個特定位置(CACHE HEADER,具體是以0位元組算起,塊的第16~17位元組),在讀取塊時再進行檢驗。主要是防止I/O硬體和I/O子系統的錯誤。
CHECKSUM的演算法只是根據塊的位元組值計算一個校驗和,演算法比較簡單。該引數預設在所有表空間上都起作用。
DB_BLOCK_CHECKSUM引數屬性
屬性 |
描述 |
語法 |
DB_BLOCK_CHECKSUM={OFF|FALSE|TYPICAL|TURE|FULL} |
預設值 |
TYPICAL |
修改範圍 |
ALTER SESSION,ALTER SYSTEM |
只有當引數值是TYPICAL或者FULL,並且塊的最後一次寫是儲存了一個校驗和時,讀取這個塊,校驗和才會被驗證。在FULL模式,Oracle用update/delete語句改變資料之前會驗證校驗和,改變被應用之後還會重新計算校驗和。
從Oracle Database 11g開始,大多數日誌校驗和都是透過前臺程式產生的,同時LGWR執行其餘的工作,這是為了更好地發揮CPU和快取的效率。當這個引數設定為FULL,寫日誌塊到磁碟之前,LGWR驗證透過前臺程式生成的每個日誌塊的校驗和。在Oracle Database 11g之前的版本中,LGWR獨自執行日誌塊校驗和。資料檔案塊的校驗和是由DBWR程式負責計算和管理的。
這個引數設定為OFF時,DBWn只為SYSTEM表空間計算校驗和,不為使用者表空間計算校驗和。另外,此時資料庫也不會執行日誌的校驗工作。
校驗和可以使Oracle資料庫察覺到磁碟、儲存系統或者I/O系統引起的損壞。如果設定為FULL,DB_BLOCK_CHECKSUM也會捕捉在記憶體中的損壞,並停止它們對磁碟的操作。設定這個引數為TYPICAL值只會引起系統額外的1%~2%的負載,設定為FULL會引起4%~5%的負載。Oracle推薦設定DB_BLOCK_CHECKSUM為TYPICAL。為了保持向後相容性,TRUE和FALSE值被保留,TRUE等同於TYPICAL,FALSE等同於OFF。
如果DB_BLOCK_CHECKSUM不等於FALSE值,每次讀取塊,Oracle計算校驗和,都與儲存在塊頭中的校驗和進行對比。如下例子:
Corrupt block relative dba: 0x0380a58f (file 14,block 42383)
Bad check value found during buffer read
……
引數2 DB_BLOCK_CHECKING
DB_BLOCK_CHECKING引數主要是用於資料塊的邏輯一致檢查,但只是在塊內,不包括塊間的邏輯檢查。主要用於防止在記憶體中損壞或資料損壞。
無論該引數如何設定,對SYSTEM表空間來說,邏輯一致檢查始終處於“開啟”狀態,在其他表空間該引數預設是關閉的。
DB_BLOCK_CHECKING引數的屬性
引數值 |
含義 |
OFF或者FALSE |
對於使用者表空間沒有任何邏輯一致性檢查工作 |
LOW |
塊的內容在記憶體中改變之後,執行基本的塊頭檢查,如UPDATE語句、INSERT語句、磁碟讀或者在RAC中內部例項之間的塊傳遞之後發生檢查工作 |
MEDIUM |
除了索引以外的所有物件執行LOW檢查和完全語義檢查,由於索引能在遭遇損壞的情況下的重建,所以可以不考慮對它檢查 |
FULL或者TRUE |
所有物件執行MEDIUM檢查和完全語義檢查 |
Oracle透過遍歷在塊中的資料來檢查一個塊,確保它在邏輯上手尾一致。根據系統負載和引數值,塊檢查通常一起1%~10%的負載。開啟塊檢查,大量的UPDATE或者INSERT將造成更大負載,對於一個繁忙的系統,特別有大量插入或者更新操作的系統來說,效能影響是比較明顯的。如果效能負載可以被接受,應該考慮設定DB_BLOCK_CHECKING為FULL。為了保持向後的相容性,TURE和FALSE引數值同樣可以使用,FALSE等同於OFF,TRUE等同於FULL。
如果啟用DB_BLOCK_CHECKING引數,在磁碟的塊發生邏輯損壞,下一次塊更新將作為軟損壞標記這個塊,之後讀取這個塊產生ORA-1578的錯誤。
n 塊錯位
當Oracle察覺讀取塊的內容屬於不同塊但是校驗和又是正確的時,會產生錯誤。
l 邏輯塊損壞
若塊包含一個正確的校驗和,塊頭以下的結構是損壞的(塊內容損壞),這可能引起不同的ORA-600錯誤。邏輯塊損壞的詳細損壞描述通常不會列印到告警日誌。DBV將報告塊具體的邏輯錯誤。
3. 壞塊的檢測工具
以下為損壞塊的檢測工具和使用方法:
l DBVERIRY壞塊驗證工具
DBVERIRY不能驗證聯機Redo日誌、歸檔Redo日誌、控制檔案和RMAN備份集,只能用於資料檔案的塊驗證。
n DBV驗證傳統資料檔案
下面是使用DBV工具驗證資料檔案塊的例子:
$ dbv file=/testdb/test01.dbf blocksize=8192
注意:DBV工具除了用於檢測資料檔案是否有壞塊外,也用於獲得壞塊的詳細資訊。
n DBV驗證裸裝置資料檔案
DBV要求file後面跟的必須是一個包含副檔名的檔案,所以如果資料庫使用裸裝置作為儲存方式,就必須使用ln命令連線裸裝置一個帶副檔名的檔案,然後使用DBV工具透過對連結檔案的驗證實現對裸裝置資料檔案的驗證。
n DBV驗證ASM儲存的資料檔案
如果是驗證儲存在ASM中的資料檔案則需要指定使用者名稱和密碼,如果不指定使用者名稱和密碼,將收到DBV-00008:USERID must bu specified for OSM files的報錯。下面是使用DBV工具驗證儲存在ASM中的資料檔案的塊的例子:
$ dbv file=+DATAFILE/testdb/datafile/test.234.648839 userid=sys/oracle
l ANALYZE命令
Analyze命令的主要目的是透過分析資料庫物件,為最佳化器收集資料庫物件的統計量資訊,以便最佳化器生成準確的執行計劃。同時,它也能檢查某個表或索引是否存在損壞的情況。Analyze執行壞塊檢查,但是不會標記壞塊為corrupt,檢測結果儲存在USER_DUMP_DEST目錄下的使用者trace檔案中。Analyze語法:
Analyze table/index / validate structure ;
Analyze命令會驗證每個資料塊、每條記錄和索引的完整性。CASCADE關鍵字表示驗證表及其相關的所有索引。與DBVERIFY不同的是,analyze只驗證高水位線以下的資料塊,analyze不會對未使用的空間進行驗證。
SQL> analyze table fengpin.test validate structure;
l RMAN工具
RMAN是一塊備份工具,就像一個過濾器,RMAN需要透過快取過濾每一個塊,其中一個特點就是檢查塊是否被損壞。如果備份的資料庫中包含有壞塊,將會收到錯誤
l EXP工具
對於包含壞塊的表執行匯出操作,會收到相關的錯誤資訊。對於這種情況,在非歸檔模式無法透過塊恢復修復塊的情況下,有如下兩種處理方法:
方法1:啟用10231事件
透過設定10231診斷事件可以在匯出的時候讓Oracle忽略表損壞的塊,10231是Oracle的內部診斷事件,設定在全表掃描時跳過壞塊的資料塊,只匯出包含正確塊的資料,之後把表刪除,再把匯出的表資料匯入新表,從而修復該表。
1) 啟用10231診斷事件
SQL> alter system set events=’10231 trace name context forever,level 10’;
2) 禁用10231診斷事件:
SQL> alter system set events=’10231 trace name context forever,level 0’;
方法2:使用DBMS_REPAIR包標記損壞的塊。
可以使用DBMS_REPAIR包標記損壞的資料庫物件,這樣在對損壞的物件執行全表掃描的時候會跳過損壞的塊。語法如下:
SQL> exec dbms_repair.skip_corrupt_blocks(‘’,’tablename’);
EXP壞塊檢查有一定的侷限性,不會發現如下型別的壞塊:
ü HWM(高水位線)以上的壞塊
ü 索引中存在的壞塊
ü 資料字典中存在的壞塊
l Expdp工具
使用expdp工具不會給出壞塊的提示,只會將物件正確的資料匯出。
4. 塊的損壞與恢復
塊已經不是Oracle的格式,或者其內部是不一致的,那麼這個塊就被認為已損壞。塊介質恢復是當資料檔案是聯機時,還原和恢復資料塊的技術。如果只有一些塊被破壞,那麼塊介質恢復是較好的恢復選擇。
BBED(Block Brower and EDitor)是Oracle的一款內部工具,可以用來直接檢視和修改Oracle資料檔案塊的內容。BBED是一個針對Oracle的二進位制編譯工具。該工具不受Oracle支援,預設是不生成可執行檔案的,在使用錢需要重新編譯。
1) 編譯BBED
直接在Oracle 11gR2 的環境中編譯BBED,將收到以下錯誤資訊:
$ cd $ORACLE_HOME/rdbms/lib
$ make –f ins_rdbms.mk $ORACLE_HOME/rdbms/lib/bbed
……
gcc: /u01/app/oracle/11.2.0/db_1/rdbms/lib/ssbbded.o: No such file or directory
gcc: /u01/app/oracle/11.2.0/db_1/rdbms/lib/sbbdpt.o: No such file or directory
Oracle 11gR2 環境中編譯BBED可執行檔案所需要的ssbbded.o和sbbdpt.o物件檔案被移除,不過可以從Oracle 10g環境中將這兩個檔案複製到Oracle 11g環境中進行編譯。
除了將上面的ssbbded.o和sbbdpt.o檔案複製到Oracle 11g環境外,BBED還需要用到$ORACLE_HOME/rdbms/mesg目錄下的bbedus.msg和bbedus.msb兩個資訊檔案,這幾個檔案都需要從Oracle 10g中複製到Oracle 11g中對應的目錄中。下面是將以上4個檔案從Oracle 10g中複製到Oracle 11g對應目錄之後的編譯過程:
$ cd $ORACLE_HOME/rdbms/lib
$ make –f ins_rdbms.mk $ORACLE_HOME/rdbms/lib/bbed
$ file bbed
$ size bbed
$ ldd bbed
$ cp bbed $ORACLE_HOME/bin/
$ cd /
$ which bbed
/u01/app/oracle/product/11.2.0/db_1/bin/bbed
編譯成功後登入BBED,登入時需要密碼(預設密碼是:blockedit)
$ bbed
2) BBED模擬表資料塊的損壞
a. 建立測試表
SQL> create table test.testbbed as select * from dba_tables;
b. 建立BBED引數檔案
由於BBED無法對ASM進行操作,所以這裡將表建立到ACFS檔案系統的儲存裝置上。這裡建立兩個BBED引數檔案,filelist.txt儲存要操作的資料檔案的ID和路徑,bbed.par儲存資料檔案的塊大小、filelist.txt的位置和操作模式:
$ more filelist.txt
6 /testbbed/tbtbs01.dbf
$ more bbed.par
blocksize=8192
listfile=filelist.txt
mode=edit
filelist.txt的內容可透過select file_id,file_name from dba_data_filesSQL查詢得到。
c. BBED基本操作
ü 使用指定的引數檔案登入BBED:
$ bbed parfile=bbed.par
ü 顯示BBED配置檔案中指定的資料檔案資訊:
BBED> info
ü 設定要操作的資料檔案:
BBED> set file 6
ü 顯示要操作的資料檔案的詳細資訊:
BBED> show
d. 模擬壞塊
修改檔案號為6的第136號塊:
BBED> modify 1000 file 6 block 136
如果修改錯誤,可以執行revert命令回滾。
e. 驗證壞塊
在BBED執行以下命令驗證資料塊,發現block 136已經損壞
BBED> verify
f. 使用DBV工具驗證
使用DBV工具驗證發現file 6 block 136已經損壞
$ dbv file=/testbbed/tbtbs01.dbf blocksize=8192
g. 執行塊讀取操作
執行一個test.testbbed的全表掃描,收到ORA-01578錯誤
SQL> alter system flush buffer_cache;
SQL> select /*+FULL(T)*/ COUNT(1) FROM TEST.TESTBBED T;
3) RMAN的塊恢復
塊介質恢復用來恢復一個單獨的塊或者資料檔案中資料塊的集合,如果是小資料量的資料丟失或損壞,而不是整個資料檔案,這種型別的恢復是很有用的。通常,塊損壞會在跟蹤檔案中報告錯誤資訊。
塊級別的資料丟失通常是由以下兩個原因造成的:
n I/O錯誤引起的映象資料丟失。
n 記憶體損壞,重新整理到磁碟。
a. 使用RMAN BLOCKRECOVER命令的注意事項
n 目標資料庫必須在MOUNT或者OPEN狀態,如果執行某個資料檔案的塊介質恢復,那麼該資料檔案不能是離線狀態。
n 塊介質恢復不支援基於時間點的塊恢復。
n 只能在損壞的塊上執行塊介質恢復。
n 塊被標記為介質損壞之後是不能訪問的,直達恢復完成。
n 當使用備份的控制檔案載入資料庫時,不能執行塊的介質恢復。
n 必須有一個包含損壞塊檔案的全備份,塊介質恢復不能使用增量備份。
n 如果RMAN訪問塊介質恢復需要特定歸檔Redo日誌檔案失敗,那麼將執行還原FAILOVER,嘗試使用RMAN資料庫中列出的適合這個操作的所有其它備份,如果沒有合適的備份存在執行才會失敗。
n 資料檔案頭不能被恢復
n 不能在非歸檔模式下執行塊介質恢復。
b. RMAN BLOCKRECOVER命令的使用方式
RMAN BLOCKRECOVER命令有以下三種使用方式:
方式1 使用BLOCKRECOVER CORRUPTION LIST命令恢復在V$DATABASE_BLOCK_CORRUPTION檢視中報告的所有塊:
RMAN> blockrecover corruption list;
方式2 使用BLOCKRECOVER 命令的時候指定檔案號和塊號:
RMAN> blockrecover datafile block ;
方式3 執行blockrecover命令的時候指定表空間和資料塊地址(DBA):
RMAN> blockrecover tablespace DBA ;
c. RMAN BLOCKRECOVER使用的例子
例子1 恢復3個資料檔案的損壞塊:
RMAN> BLOCKRECOVER DATAFILE 2 BLOCK 12,13 DATAFILE 3 BLOCK 5,98,99 DATAFILE 4 BLOCK 19;
例子2:從資料檔案複製中還原、恢復一系列塊:
RMAN> RUN
{
BLOCKRECOVER DATAFILE 3 BLOCK 2,3,4,5 TABLESPACE sales DBA 4194405,4194409,4194412 from DATAFILECOPY;
}
例子3:從指定的tag備份總還原、恢復塊
RMAN> BLOCKRECOVER TABLESPACE SYSTEM DBA 4194404,4194405 FROM TAG “weekly_backup”;
例子4:從用於恢復資料到兩天以前的備份中還原、恢復SYSTEM表空間中的兩個塊:
RMAN> BLOCKRECOVER TALBESPACE SYSTEM DBA 4194404,4194405 RESOTRE UNTILL TIME ‘sysdate-2’;
例子5:執行備份驗證資料庫,修復在V$DATABASE_BLOCK_CORRUPTION中記錄的所有損壞塊:
RMAN> BACKUP VALIDATE DATABASE;
RMAN> BLOCKRECOVER CORRUPTION LIST;
4) 確定損壞塊對應的物件
要確定一個損壞的物件需要知道AFN(Absolute File Numbe,絕對檔案號)和BL(Block Number,塊號)。AFN和RFN(Relative File Number,相對檔案號)通常是相同的,但是也可能不同(特別是如果資料庫從Oracle7遷移或者如果使用的是可傳輸、可插拔的表空間),獲得正確的AFN和RFN就顯得非常重要,如果指定了錯誤的AFN將導致找不到物件或錯誤識別物件。
a. 確定AFN和BL
方法1:從ORA-1578得到AFN
ORA-1578之後產生的ORA-1110錯誤提供AFN號碼。
方法2:從DBVERIFY輸出獲得AFN。
透過使用DBV工具會報告損壞的塊,DBV工具透過提供與相關的RDBA、RFN和BL資訊。
方法3:從RMAN獲得AFN
RMAN在V$DATABASE_BLOCK_CORRUPTION檢視報告損壞的塊。該檢視的欄位FILE#表示AFN,欄位BLOCK#表示BL。
b. 定位損壞的物件
一旦AFN被識別,執行以下SQL語句定位損壞的物件:
SQL> select *
From DBA_EXTENTS
Where file_id=&AFN
And &BL BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS-1;
基於RMAN實現壞塊介質恢復(blockrecover)
對於物理損壞的資料塊,我們可以透過RMAN塊介質恢復(BLOCK MEDIA RECOVERY)功能來完成受損塊的恢復,而不需要恢復整個資料庫或所有檔案來修復這些少量受損的資料塊。恢復整個資料庫或資料檔案那不是大炮用來打蚊子,有點不值得!但前提條件是你得有一個可用的RMAN備份存在,因此,無論何時備份就是一切。本文演示了產生壞塊即使用RMAN實現壞塊恢復的全過程。
1、建立演示環境
- SQL> select * from v$version where rownum<2;
- BANNER
- --------------------------------------------------------------------------------
- Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
- --建立用於演示的data file
- SQL> create tablespace tbs_tmp datafile '/u02/database/usbo/oradata/tbs_tmp.dbf' size 10m autoextend on;
- SQL> conn scott/tiger;
- --基於新的資料檔案建立物件tb_tmp
- SQL> create table tb_tmp tablespace tbs_tmp as select * from dba_objects;
- SQL> col file_name format a60
- SQL> select file_id,file_name from dba_data_files where tablespace_name='TBS_TMP';
- FILE_ID FILE_NAME
- ---------- ------------------------------------------------------------
- 6 /u02/database/usbo/oradata/tbs_tmp.dbf
- --表物件tb_tmp上的資訊,包含對應的檔案資訊,頭部塊,總塊數
- SQL> select segment_name , header_file , header_block,blocks
- 2 from dba_segments
- 3 where segment_name = 'TB_TMP' and owner='SCOTT';
- SEGMENT_NAME HEADER_FILE HEADER_BLOCK BLOCKS
- ------------------------------ ----------- ------------ ----------
- TB_TMP 6 130 1152
- --首先使用rman備份對應的資料檔案
- $ $ORACLE_HOME/bin/rman target /
- RMAN> backup datafile 6 tag=health;
- Starting backup at 2013/08/28 17:03:15
- allocated channel: ORA_DISK_1
- channel ORA_DISK_1: SID=24 device type=DISK
- channel ORA_DISK_1: starting full datafile backup set
- channel ORA_DISK_1: specifying datafile(s) in backup set
- input datafile file number=00006 name=/u02/database/usbo/oradata/tbs_tmp.dbf
- channel ORA_DISK_1: starting piece 1 at 2013/08/28 17:03:16
- channel ORA_DISK_1: finished piece 1 at 2013/08/28 17:03:17
- piece handle=/u02/database/usbo/fr_area/USBO/backupset/2013_08_28/o1_mf_nnndf_HEALTH_91vh6ntb_.bkp tag=HEALTH comment=NONE
- channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
- Finished backup at 2013/08/28 17:03:17
- RMAN> exit
2、單塊資料塊損壞的恢復處理
- --下面使用了linux自帶的dd命令來損壞單塊資料塊
- [oracle@linux1 ~]$ dd of=/u02/database/usbo/oradata/tbs_tmp.dbf bs=8192 conv=notrunc seek=130 <<eof </eof <>
- > Corrupted block!
- > EOF
- 0+1 records in
- 0+1 records out
- 17 bytes (17 B) copied, 0.000184519 seconds, 92.1 kB/s
- --清空buffer cache
- SQL> alter system flush buffer_cache;
- --查詢表對相 tb_tmp,收到ORA-01578
- SQL> select count(*) from tb_tmp;
- select count(*) from tb_tmp
- *
- ERROR at line 1:
- ORA-01578: ORACLE data block corrupted (file # 6, block # 130)
- ORA-01110: data file 6: '/u02/database/usbo/oradata/tbs_tmp.dbf'
- --查詢檢視v$database_block_corruption,提示有壞塊,注意該檢視可能不會返回任何資料,如無返回,先執行backup validate
- SQL> select * from v$database_block_corruption;
- FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO
- ---------- ---------- ---------- ------------------ ---------
- 6 129 1 0 CORRUPT
- --也可以使用dbv工具來校驗壞塊,參考: http://blog.csdn.net/robinson_0612/article/details/6530890
- --下面使用blockrecover來恢復壞塊
- RMAN> blockrecover datafile 6 block 130;
- Starting recover at 2013/08/28 17:22:25
- using target database control file instead of recovery catalog
- allocated channel: ORA_DISK_1
- channel ORA_DISK_1: SID=24 device type=DISK
- channel ORA_DISK_1: restoring block(s)
- channel ORA_DISK_1: specifying block(s) to restore from backup set
- restoring blocks of datafile 00006
- channel ORA_DISK_1: reading from backup piece /u02/database/usbo/fr_area/USBO/backupset/2013_08_28/o1_mf_nnndf_HEALTH_91vh6ntb_.bkp
- channel ORA_DISK_1: piece handle=/u02/database/usbo/fr_area/USBO/backupset/2013_08_28/o1_mf_nnndf_HEALTH_91vh6ntb_.bkp tag=HEALTH
- channel ORA_DISK_1: restored block(s) from backup piece 1
- channel ORA_DISK_1: block restore complete, elapsed time: 00:00:01
- starting media recovery
- media recovery complete, elapsed time: 00:00:03
- Finished recover at 2013/08/28 17:22:31
- --再次查詢表tb_emp正常
- SQL> select count(*) from tb_tmp;
- COUNT(*)
- ----------
- 72449
3、多塊資料塊損壞的恢復處理
- --下面使用linux dd命令對不連續塊損壞
- [oracle@linux1 ~]$ dd of=/u02/database/usbo/oradata/tbs_tmp.dbf bs=8192 conv=notrunc seek=133 <<eof </eof <>
- > New corrupted block!
- > EOF
- 0+1 records in
- 0+1 records out
- 21 bytes (21 B) copied, 0.000182835 seconds, 115 kB/s
- [oracle@linux1 ~]$ dd of=/u02/database/usbo/oradata/tbs_tmp.dbf bs=8192 conv=notrunc seek=143 <<eof </eof <>
- > New corrupted block!
- > EOF
- 0+1 records in
- 0+1 records out
- 21 bytes (21 B) copied, 0.000115527 seconds, 182 kB/s
- [oracle@linux1 ~]$ dd of=/u02/database/usbo/oradata/tbs_tmp.dbf bs=8192 conv=notrunc seek=153 <<eof </eof <>
- > New corrupted block!
- > EOF
- 0+1 records in
- 0+1 records out
- 21 bytes (21 B) copied, 0.000335781 seconds, 62.5 kB/s
- SQL> alter system flush buffer_cache;
- --下面提示塊133被損壞,注意我們損壞了多塊資料塊,但查詢時,從塊號最小的開始提示,如133被修復後還有壞塊則繼續提示133之後的壞塊
- SQL> select count(*) from scott.tb_tmp;
- select count(*) from scott.tb_tmp
- *
- ERROR at line 1:
- ORA-01578: ORACLE data block corrupted (file # 6, block # 133)
- ORA-01110: data file 6: '/u02/database/usbo/oradata/tbs_tmp.dbf'
- --查詢檢視v$database_block_corruption無任何記錄
- SQL> select * from v$database_block_corruption;
- no rows selected
- --下面使用backup validate來校驗資料檔案
- RMAN> backup validate datafile 6;
- Starting backup at 2013/08/29 09:42:04
- using target database control file instead of recovery catalog
- allocated channel: ORA_DISK_1
- channel ORA_DISK_1: SID=22 device type=DISK
- channel ORA_DISK_1: starting full datafile backup set
- channel ORA_DISK_1: specifying datafile(s) in backup set
- input datafile file number=00006 name=/u02/database/usbo/oradata/tbs_tmp.dbf
- channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
- List of Datafiles
- =================
- File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
- ---- ------ -------------- ------------ --------------- ----------
- 6 FAILED 0 223 1408 838489 --欄位Status為FAILED
- File Name: /u02/database/usbo/oradata/tbs_tmp.dbf
- Block Type Blocks Failing Blocks Processed
- ---------- -------------- ----------------
- Data 0 1029
- Index 0 0
- Other 3 156 --有3個Blocks Failing
- validate found one or more corrupt blocks
- See trace file /u02/database/usbo/diag/rdbms/usbo/usbo/trace/usbo_ora_27874.trc for details
- Finished backup at 2013/08/29 09:42:06
- --再次查詢v$database_block_corruption,表明有3個損壞的塊
- SQL> select * from v$database_block_corruption;
- FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO
- ---------- ---------- ---------- ------------------ ---------
- 6 153 1 0 CORRUPT
- 6 143 1 0 CORRUPT
- 6 133 1 0 CORRUPT
- --下面直接使用blockrecover corruption list來恢復,如下所有剛剛被校驗的壞塊都會被恢復
- RMAN> blockrecover corruption list;
- Starting recover at 2013/08/29 10:05:24
- using channel ORA_DISK_1
- channel ORA_DISK_1: restoring block(s)
- channel ORA_DISK_1: specifying block(s) to restore from backup set
- restoring blocks of datafile 00006
- channel ORA_DISK_1: reading from backup piece /u02/database/usbo/fr_area/USBO/backupset/2013_08_28/o1_mf_nnndf_HEALTH_91vh6ntb_.bkp
- channel ORA_DISK_1: piece handle=/u02/database/usbo/fr_area/USBO/backupset/2013_08_28/o1_mf_nnndf_HEALTH_91vh6ntb_.bkp tag=HEALTH
- channel ORA_DISK_1: restored block(s) from backup piece 1
- channel ORA_DISK_1: block restore complete, elapsed time: 00:00:01
- starting media recovery
- media recovery complete, elapsed time: 00:00:03
- Finished recover at 2013/08/29 10:05:28
- --校驗結果
- SQL> select count(*) from scott.tb_tmp;
- COUNT(*)
- ----------
- 72449
4、壞塊的物件定位與影響
- --下面我們查詢塊號為163上的物件
- SQL> select dbms_rowid.rowid_object(rowid) object_id,dbms_rowid.rowid_relative_fno(rowid) file_id,
- 2 dbms_rowid.rowid_block_number(rowid) block_id,owner,object_name,object_id
- 3 from scott.tb_tmp where dbms_rowid.rowid_block_number(rowid)=163 and rownum<=2;
- OBJECT_ID FILE_ID BLOCK_ID OWNER OBJECT_NAME OBJECT_ID
- ---------- ---------- ---------- ------------ ------------------------------ ----------
- 74555 6 163 SYS GV_$QUEUEING_MTH 2439
- 74555 6 163 PUBLIC GV$QUEUEING_MTH 2440
- --使用上面的方法,我們損塊塊163,173,此處不再列出
- a、對於壞塊物件無法進行聚合彙總等操作
- SQL> select count(*) from scott.tb_tmp;
- select count(*) from scott.tb_tmp
- *
- ERROR at line 1:
- ORA-01578: ORACLE data block corrupted (file # 6, block # 163)
- ORA-01110: data file 6: '/u02/database/usbo/oradata/tbs_tmp.dbf'
- b、對於壞塊上的記錄無法被查詢
- --我們使用基於之前查詢到的OBJECT_ID來查詢
- SQL> select owner,object_name,object_id from scott.tb_tmp where object_id in(2439,2440);
- select owner,object_name,object_id from scott.tb_tmp where object_id in(2439,2440)
- *
- ERROR at line 1:
- ORA-01578: ORACLE data block corrupted (file # 6, block # 163)
- ORA-01110: data file 6: '/u02/database/usbo/oradata/tbs_tmp.dbf'
- --如下面的查詢,位於損壞塊上的資料無法被查詢到,但對於未損壞的依舊可以查詢。下面的查詢時塊161上的物件
- SQL> select owner,object_name,object_id from scott.tb_tmp
- 2 where dbms_rowid.rowid_block_number(rowid)=161 and rownum<3;
- OWNER OBJECT_NAME OBJECT_ID
- ------------------------------ ------------------------------ ----------
- PUBLIC GV$RECOVERY_LOG 2285
- SYS GV_$ARCHIVE_GAP 2286
- --Author : Robinson Cheng
- --Blog : http://blog.csdn.net/robinson_0612
- c、定位受損塊所對應的物件
- SQL> run get_obj_name_from_corrupt_block
- 1 SELECT tablespace_name,
- 2 segment_type,
- 3 owner,
- 4 segment_name,
- 5 partition_name
- 6 FROM dba_extents
- 7* WHERE file_id = &file_id AND &block_id BETWEEN block_id AND block_id + blocks - 1
- Enter value for file_id: 6
- Enter value for block_id: 133
- old 7: WHERE file_id = &file_id AND &block_id BETWEEN block_id AND block_id + blocks - 1
- new 7: WHERE file_id = 6 AND 133 BETWEEN block_id AND block_id + blocks - 1
- TABLESPACE_NAME SEGMENT_TYPE OWNER SEGMENT_NAME PARTITION_NAME
- ------------------------------ ------------------ -------------- ----------------- -----------------
- TBS_TMP TABLE SCOTT TB_TMP
- d、對於損壞的資料檔案,預設情況下,不能對其進行備份,如下
- RMAN> backup datafile 6 tag='corruption';
- Starting backup at 2013/08/29 10:37:32
- using channel ORA_DISK_1
- channel ORA_DISK_1: starting full datafile backup set
- channel ORA_DISK_1: specifying datafile(s) in backup set
- input datafile file number=00006 name=/u02/database/usbo/oradata/tbs_tmp.dbf
- channel ORA_DISK_1: starting piece 1 at 2013/08/29 10:37:32
- RMAN-00571: ===========================================================
- RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
- RMAN-00571: ===========================================================
- RMAN-03009: failure of backup command on ORA_DISK_1 channel at 08/29/2013 10:37:33
- ORA-19566: exceeded limit of 0 corrupt blocks for file /u02/database/usbo/oradata/tbs_tmp.dbf
- --需要設定允許損壞塊的數量之後才能進行備份
- RMAN> run{
- 2> set maxcorrupt for datafile 6 to 2;
- 3> backup datafile 6 tag='corruption';
- 4> }
- executing command: SET MAX CORRUPT
- Starting backup at 2013/08/29 10:41:24
- using channel ORA_DISK_1
- channel ORA_DISK_1: starting full datafile backup set
- channel ORA_DISK_1: specifying datafile(s) in backup set
- input datafile file number=00006 name=/u02/database/usbo/oradata/tbs_tmp.dbf
- channel ORA_DISK_1: starting piece 1 at 2013/08/29 10:41:25
- channel ORA_DISK_1: finished piece 1 at 2013/08/29 10:41:26
- piece handle=/u02/database/usbo/fr_area/USBO/backupset/2013_08_29/o1_mf_nnndf_CORRUPTION_91xf6o18_.bkp tag=CORRUPTION comment=NONE
- channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
- Finished backup at 2013/08/29 10:41:26
- --檢視備份資訊如下,應在修復壞塊後重新備份以避免由於保留策略導致先前可用的備份被aged out
- RMAN> list backup summary;
- List of Backups
- ===============
- Key TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag
- ------- -- -- - ----------- ------------------- ------- ------- ---------- ---
- 1 B F A DISK 2013/08/28 17:03:17 1 1 NO HEALTH
- 3 B F A DISK 2013/08/29 10:41:25 1 1 NO CORRUPTION
5、後記
a、對於受損的資料塊,僅僅壞塊上的資料無法被查詢或讀取,其餘正常塊的資料依舊可以使用。
b、對於受損的表物件進行聚合等相關運算時收到錯誤提示,因為壞塊上的資料無法被統計。如果你聚合的是索引列,索引未損壞的情形則可正常返回。
c、可以基於RMAN可用的備份檔案實現塊介質恢復,其資料檔案無需offline,開銷最小,影響最小。
d、對於多個資料塊的損壞,先執行backup validate校驗資料庫或相應的資料檔案以便標記受損的壞塊後,填充v$database_block_corruption以及後續恢復。
e、對於使用backup validate 校驗後的情形,壞塊恢復時可以直接使用blockrecover corruption list一次性恢復所有的壞塊。
f、預設情況下,存在壞塊的資料檔案無法成功備份,也會導致自動備份指令碼失敗。
DBVERIFY 工具的使用
--**********************
-- DBVERIFY 工具的使用
--**********************
Oracle 數據庫運行過程中由於硬體故障或操作系統故障導致導致Oracle無法以Oracle格式來識別或所包含的內容即為出現數據塊損壞
故障,這個壞塊可以分為介質損壞以及邏輯損壞。下面給出了塊的檢查,以及使用DBVERIFY 工具實施塊檢查。
一、塊檢查
1.何時檢查塊
當一個數據塊被讀或寫的時候,將對塊的進行一致性檢查,檢查的內容包括
塊的版本
比較塊在cache與block buffer中的數據塊地址
根據要求進行校驗(checksum)
2.損壞的數據塊的錯誤提示
可以從告警日誌檔案中找到該錯誤提示,以及在會話中發現損壞的數據塊時也會給出類似的提示
ORA-01578: ORACLE data block corrupted (file # 6, block # 11)
ORA-01110: data file 6: '/u01/app/oracle/oradata/orcl/tbs01.dbf'
3.與塊損壞的相關特性(幾種檢查工具)
------------------------------------------------------------------------------------------------
特性 壞塊偵測類型 能否修復損壞塊
------------------------------------------------------------------------------------------------
DBVERIFY 物理 否
ANALYZE 邏輯 否
DB_BLOCK_CHECKING 邏輯 否
DB_BLOCK_CHECKSUM 物理 否
exp 物理 否
FlashBack 邏輯 是
DBMS_REPAIR 邏輯 是
Block media recovery 未知 是
二、DBVERIFY工具介紹
特性
是一個運行於操作系統提示符下的外部程式,用於驗證數據檔案,檢查塊的一致性錯誤
僅僅針對數據檔案,能夠校驗open階段的數據檔案以及shutdown狀態下的數據檔案
可以驗證復制的數據檔案,也可以驗證備份的鏡像副本
不支援聯機日誌檔案,控制檔案,歸檔日誌,RMAN備份集驗證
被驗證的檔案可以位於檔案系統,ASM磁盤或原始裝置
在Unix系統中位於:$ORACLE_HOME/bin/dbv
在Windows系統中位於:%ORACLE_HOME%/bin/dbv.exe
對於DBVERIFY工具,高版本可以自動識別低版本數據庫,比如11g的dbv訪問9i的數據庫,但是低版本的dbv訪問高版本會報錯
三、DBVERIFY工具用法
1.獲取dbv的幫助資訊,直接在提示符下輸入dbv即可 或者輸入dbv help=y
[oracle@oradb orcl]$ dbv
DBVERIFY: Release 10.2.0.4.0 - Production on Tue Oct 26 18:21:09 2010
Copyright (c) 1982, 2007, Oracle. All rights reserved.
Keyword Description (Default)
----------------------------------------------------
FILE File to Verify (NONE)
START Start Block (First Block of File)
END End Block (Last Block of File)
BLOCKSIZE Logical Block Size (8192)--指定數據檔案的尺寸,預設值為8192,對於非8192塊將收到DBV-00103錯誤
LOGFILE Output Log (NONE) --用於顯示驗證進度
FEEDBACK Display Progress (0)
PARFILE Parameter File (NONE) --可以指定引數檔案
USERID Username/Password (NONE) --校驗段、ASM檔案需要使用
SEGMENT_ID Segment ID (tsn.relfile.block) (NONE) --校驗段,需要表空間ID,數據檔案ID,段的頭部ID
HIGH_SCN Highest Block SCN To Verify (NONE)
(scn_wrap.scn_base OR scn)
2.校驗online,offline數據檔案,使用下面的方法
dbv file=
[oracle@oradb orcl]$ dbv file=$ORACLE_BASE/oradata/orcl/tbs01.dbf
DBVERIFY: Release 10.2.0.4.0 - Production on Tue Oct 26 18:29:39 2010
Copyright (c) 1982, 2007, Oracle. All rights reserved.
DBVERIFY - Verification starting : FILE = /u01/app/oracle/oradata/orcl/tbs01.dbf
DBVERIFY - Verification complete
Total Pages Examined : 128 --校驗的總頁面數,一個頁面即是一個數據塊
Total Pages Processed (Data) : 96 --已處理的數據頁面數
Total Pages Failing (Data) : 0 --已處理數據頁面的失敗數
Total Pages Processed (Index): 1 --已處理的索引頁面數
Total Pages Failing (Index): 0 --已處理索引頁面失敗數
Total Pages Processed (Other): 31 --已處理的其它頁面數
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 0
Total Pages Marked Corrupt : 0
Total Pages Influx : 0
Highest block SCN : 1152518 (0.1152518)
注意:如果Total Pages Influx的值大於零,且未存在壞塊的情況下,是由於針對open狀態的檔案運行dbv
程式遇到了一個當前正在被DBWn進程寫入的數據塊
[oracle@oradb orcl]$ dbv file=$ORACLE_BASE/oradata/orcl/tbs01.dbf feedback=1000
上面這句在執行時每驗證1000個塊將顯示一個"."號
--下面的校驗發現了I/O錯誤
[oracle@oradb orcl]$ dbv file=/u01/app/oracle/oradata/orcl/tbs01.dbf
DBVERIFY: Release 10.2.0.4.0 - Production on Tue Oct 26 18:26:21 2010
Copyright (c) 1982, 2007, Oracle. All rights reserved.
DBV-00102: File I/O error on FILE (/u01/app/oracle/oradata/orcl/tbs01.dbf)
during end read operation (-1)
3.驗證指定段
該方法需要獲得段所在表空間的ID,段所在數據檔案的ID,段的頭部ID
如下面的查詢表空間的ID為7,檔案ID為6,段的頭部ID為35
sys@ORCL> select tablespace_id,tablespace_name,header_file,header_block
2 from sys_dba_segs
3 where segment_name='TB3';
TABLESPACE_ID TABLESPACE_NAME HEADER_FILE HEADER_BLOCK
------------- --------------- ----------- ------------
7 TBS1 6 35
注意:sys用戶的段可以查詢sys_user_segs,而普通用戶的段資訊,需要查詢sys_dba_segs
[oracle@oradb orcl]$ dbv userid=scott/tiger segment_id=7.6.35
DBVERIFY: Release 10.2.0.4.0 - Production on Tue Oct 26 18:50:01 2010
Copyright (c) 1982, 2007, Oracle. All rights reserved.
DBVERIFY - Verification starting : SEGMENT_ID = 7.6.35
DBVERIFY - Verification complete
Total Pages Examined : 8
Total Pages Processed (Data) : 5
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing (Index): 0
Total Pages Processed (Other): 2
Total Pages Processed (Seg) : 1
Total Pages Failing (Seg) : 0
Total Pages Empty : 0
Total Pages Marked Corrupt : 0
Total Pages Influx : 0
Highest block SCN : 1152518 (0.1152518)
4.驗證復制的數據檔案或驗證備份的鏡像副本
RMAN> backup as copy datafile 6 --使用RMAN備份鏡像副本
2> format='/u01/app/oracle/bk/rmbk/cp_dfile6'
3> tag='Copy_datafile6';
[oracle@oradb orcl]$ dbv file=/u01/app/oracle/bk/rmbk/cp_dfile6
DBVERIFY: Release 10.2.0.4.0 - Production on Tue Oct 26 18:59:17 2010
Copyright (c) 1982, 2007, Oracle. All rights reserved.
DBVERIFY - Verification starting : FILE = /u01/app/oracle/bk/rmbk/cp_dfile6
DBVERIFY - Verification complete
Total Pages Examined : 128
Total Pages Processed (Data) : 96
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 1
Total Pages Failing (Index): 0
Total Pages Processed (Other): 31
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 0
Total Pages Marked Corrupt : 0
Total Pages Influx : 0
Highest block SCN : 1152518 (0.1152518)
RMAN命令中的BACKUP VALIDATE DATABASE命令通常用於檢查全庫,該命令不產生任何備份集,可以通過
Validate命令來檢查是否能備份,如數據檔案是否存在,是否存在壞塊不能被備份,查詢檢視
v$database_block_corruption,此檢視將檢查過程中存在的壞塊
如使用下面的查詢
RMAN> backup validate database;
RMAN> backup validate database archivelog all;
sys@ORCL> select * from v$database_block_corruption;
no rows selected
檢視v$database_block_corruption將列出損壞的壞塊所在的檔案位置,損壞塊的起始位置,損壞快的大
小以及損壞類型如果上述檢視中發現了壞塊,則可以通過SQL查詢獲得壞塊所影響的範圍,以及確定壞塊
所影響的是索引段還是UNDO段
select owner,segment_name,segment_type from dba_extents where file_id= and
between block_id and block_id+blocks-1;
(和分別是ORA-01578報出的壞塊出現的檔案號和塊號)
下面使用rman 來修復受損的數據塊
RMAN> run{
2> allocate channel ch1 device type disk;
3> blockrecover datafile 6 block 37;
4> release channel ch1;}
released channel: ORA_DISK_1
allocated channel: ch1
channel ch1: sid=139 devtype=DISK
Starting blockrecover at 26-OCT-10
starting media recovery
media recovery complete, elapsed time: 00:00:00
Finished blockrecover at 26-OCT-10
released channel: ch1
flasher@ORCL> alter table tb3 add constraint empno_tb3_pk
2 primary key(empno);
四、其它
1.事實上使用dbv工具可以對控制檔案進行驗證(數據庫處於OPEN狀態),參見下面的例子
[oracle10g:oracle:orcl]$ dbv file=control01.ctl
DBVERIFY: Release 10.2.0.4.0 - Production on Sun Jun 5 14:56:31 2011
Copyright (c) 1982, 2007, Oracle. All rights reserved.
DBV-00103: Specified BLOCKSIZE (8192) differs from actual (16384) --根據提示知道blocksize的值為16384
[oracle@oradb orcl]$ dbv file=control01.ctl blocksize=16384
DBVERIFY: Release 10.2.0.4.0 - Production on Sun Jun 5 14:53:01 2011
Copyright (c) 1982, 2007, Oracle. All rights reserved.
DBVERIFY - Verification starting : FILE = control01.ctl
DBVERIFY - Verification complete
Total Pages Examined : 404
Total Pages Processed (Data) : 0
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing (Index): 0
Total Pages Processed (Other): 40
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 364
Total Pages Marked Corrupt : 0
Total Pages Influx : 0
Highest block SCN : 775 (65535.775)
2.對聯機日誌檔案的驗證(數據庫處於OPEN狀態),不支援
[oracle10g:oracle:orcl]$ dbv file=redo01.log
DBVERIFY: Release 10.2.0.4.0 - Production on Sun Jun 5 15:01:09 2011
Copyright (c) 1982, 2007, Oracle. All rights reserved.
DBV-00103: Specified BLOCKSIZE (8192) differs from actual (512)
[oracle10g:oracle:orcl]$ dbv file=redo01.log blocksize=512 --可以執行,但出現下面很多的提示(介質錯誤)
Page 88441 is influx - most likely media corrupt
Corrupt block relative dba: 0x00015979 (file 0, block 88441)
Fractured block found during dbv:
Data in bad block:
type: 1 format: 2 rdba: 0x00015979
last change scn: 0x8010.00000010 seq: 0x51 flg: 0x55
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0xc2037800
check value in block header: 0x108
computed block checksum: 0x0
Physical and Logical Block Corruptions. All you wanted to know about it. (文件 ID 840978.1)
In this Document
Purpose |
Scope |
Details |
Physical Block Corruptions |
Fractured Block |
Bad Checksum |
Block Misplaced |
Logical Block Corruptions |
References |
APPLIES TO:
Oracle Database - Enterprise Edition - Version 9.2.0.1 to 11.2.0.3 [Release 9.2 to 11.2]Oracle Database - Enterprise Edition - Version 11.2.0.4 to 11.2.0.4 [Release 11.2]
Oracle Database - Enterprise Edition - Version 12.1.0.2 to 12.1.0.2 [Release 12.1]
Information in this document applies to any platform.
***Checked for relevance on 21-May-2012***
PURPOSE
Oracle classifies the Data File Block corruptions as Physical and Logical. This is also referred as intra block corruptions. This document is intended to provide detailed information and errors example about it.
SCOPE
This document is intended for Database Administrators.
DETAILS
Physical Block Corruptions
This kind of block corruptions are normally reported by Oracle with error ORA-1578 and the detailed corruption description is printed in the alert log.
Corruption Examples are:
- Bad header - the beginning of the block (cache header) is corrupt with invalid values
- The block is Fractured/Incomplete - header and footer of the block do not match
- The block checksum is invalid
- The block is misplaced
- Zeroed out blocks Note 1545366.1
Detailed Corruption Description:
Fractured Block
A Fractured block means that the block is incomplete. Information from the block header does not match the block tail.
Fractured block found during buffer read
Data in bad block -
type: 6 format: 2 rdba: 0x0380e573
last change scn: 0x0288.8e5a2f78 seq: 0x1 flg: 0x04
consistency value in tail: 0x00780601
check value in block header: 0x8739, computed block checksum: 0x2f00
spare1: 0x0, spare2: 0x0, spare3: 0x0
***
Reread of rdba: 0x0380e573 (file 14, block 58739) found same corrupted data
Bad Checksum
Block Checksums are used to identify if the block was changed by something external to Oracle and after the block was last written by Oracle.
Checksum is calculated by DBWR or direct loader before writing the block to disk and stored in the block header. Every time that the block is read and if db_block_checksum is different than false, Oracle calculates a checksum and compares it to the one stored in the block header. Reference Note 30706.1
Example of a corrupt block due to invalid checksum:
Bad check value found during buffer read
Data in bad block -
type: 6 format: 2 rdba: 0x0380a58f
last change scn: 0x0288.7784c5ee seq: 0x1 flg: 0x06
consistency value in tail: 0xc5ee0601
check value in block header: 0x68a7, computed block checksum: 0x2f00
spare1: 0x0, spare2: 0x0, spare3: 0x0
***
Reread of rdba: 0x0380a58f (file 14, block 42383) found same corrupted data
A value different than zero (0x0) in "computed block checksum" means that the checksum differs and the result of this comparison is printed.
Block Misplaced
This is when Oracle detected that the content of the block being read belongs to a different block and the checksum is valid:
Bad header found during buffer read
Data in bad block -
type: 6 format: 2 rdba: 0x0d805b08 ----> Block is different than expected 0x0d805a89
last change scn: 0x0692.86dc08e3 seq: 0x1 flg: 0x04
consistency value in tail: 0x08e30601
check value in block header: 0x2a6e, computed block checksum: 0x0
spare1: 0x0, spare2: 0x0, spare3: 0x0
***
Logical Block Corruptions
This is when block contains a valid checksum and the structure below the beginning of the block is corrupt (Block content is corrupt). It may cause different ORA-600 errors.
The detailed corruption description for Logical Corruptions are not normally printed in the alert.log. DBVerify will report what is logically corrupted in the block.
Corruption Examples are:
- row locked by non-existent transaction - ORA-600 [4512], etc
- the amount of space used is not equal to block size
- avsp bad
- etc.
When db_block_checking is enabled, it may produce the internal errors ORA-600 [kddummy_blkchk] or ORA-600 [kdBlkCheckError].
If db_block_checking is enabled and the block is already logically corrupt on disk, the next block update will mark the block as Soft Corrupt and future reads of this block will produce the error ORA-1578. In that case DBVerify reports this corruption with error "DBV-200: Block, dba <rdba>, already marked corrupted". Reference Note 1496934.1
REFERENCES
NOTE:1088018.1 - Master Note for Handling Oracle Database Corruption IssuesNOTE:1578.1 - OERR: ORA-1578 "ORACLE data block corrupted (file # %s, block # %s)" Master Note
NOTE:28814.1 - Handling Oracle Block Corruptions
NOTE:794505.1 - ORA-1578 / ORA-26040 Corrupt blocks by NOLOGGING - Error explanation and solution
NOTE:819533.1 - How to identify the corrupt Object reported by ORA-1578 / RMAN / DBVERIFY
NOTE:836658.1 - Identify the Corruption Extension for Block Corruption, Table/Index Inconsistency, Data Dictionary and Lost Writes
NOTE:1545366.1 - Physical Corrupted Blocks consisting of all Zeroes indicate a problem with OS, HW or Storage
NOTE:1496934.1 - Soft Corrupt Blocks - Definition and information
Physical Corrupted Blocks consisting of all Zeroes indicate a problem with OS, HW or Storage (文件 ID 1545366.1)
In this Document
Purpose |
Details |
Oracle by design does not write blocks of all zeroes. |
Storage vendors have implemented checks based on the Oracle design of not writing zeros. |
Conclusion |
References |
APPLIES TO:
Oracle Database - Enterprise Edition - Version 8.0.6.0 to 12.2.0.1 [Release 8.0.6 to 12.2]Information in this document applies to any platform.
** Reviewed for Relevancy Oct 2014 **
PURPOSE
The purpose of this article is to describe a certain type of physical block corruption consisting of an Oracle block containing all zeros.
DBVerify reports this block corruption with message:
RMAN reports it as:
Completely zero block found during validation
V$DATABASE_BLOCK_CORRRUPTION.DESCRIPTION has:
ALL ZERO
SQL statements reading the corrupt block produce error ORA-1578:
and alert log is updated with:
Corrupt block relative dba: <rdba> (file <file#>, block <block#>)
Completely zero block found during buffer read
Reading datafile 'datafile' for corruption at rdba: <rdba> (file <file#>, block <block#>)
Reread (file <file#>, block <block#>) found same corrupt data (no logical check)
DETAILS
Oracle detects, prevents and attempts to repair different types of block corruptions.
A corrupt block is a block that has been changed so that it differs from what Oracle Database expects to find. This note covers three data block corruption types:
- In a , which is also called a media corruption, the database does not recognize the block at all: the is invalid, the block contains all zeros, the header and footer of the block do not match or one of the key data block data structure is incorrect such as the data block address (DBA).
- In a , the contents of the block are physically sound and pass the physical block checks; however the block can be logically inconsistent. Examples of logical corruption include corruption of a row piece or index entry.
- Block corruptions caused by stray writes, lost writes or misdirected writes can also cause havoc to your database availability. The data block may be physically or logically correct but in this case the block’s content is older or stale or in the wrong location.
Block corruptions can also be divided into interblock corruption and intrablock corruption:
- In intrablock corruption, the corruption occurs in the block itself and can be either a physical or a logical corruption.
- In an interblock corruption, the corruption occurs between blocks and can only be a logical corruption.
Refer to "Best Practices for Corruption Detection, Prevention, and Automatic Repair - in a Data Guard Configuration" Note 1302539.1 and "Physical and Logical Block Corruptions. All you wanted to know about it"Note 840978.1 for more information.
This note focuses on physical block corruptions specifically a corrupted block containing all zeros.
Oracle by design does not write blocks of all zeroes.
This is done to easily identify problems in the underlying operating system, hardware or storage. This is enforced by Oracle checks that are enabled by default before writing any data or redo block. No special parameters are needed to enable these checks nor can anyone disable them. Every Oracle process that writes changes to the disk abides by these rules. Oracle has seen bugs in hardware, operating system, firmware and storage that result in zero out blocks. Oracle deliberately avoids writing complete zero blocks so it easily detects when external forces caused these data block corruptions. Every Oracle data block, controlfile, redo and tempblocks are stamped with minimally set of meta data. For data blocks, each block is formatted to include the block offset address (rdba), block format, a flag, tail and checksum within the datafile. When a new datafile is created, Oracle stamps every block with these fields. Redo logs are similarly pre-formatted or initialized before being used in the database.
Ensuring Oracle never writes zeroed out blocks is also checked at the lowest levels of the database, right before Oracle issues the write() call to the Operating System. If the Oracle RDBMS detects an all-zero block at the point of the write is issued to the operating system, an error ORA-600 is produced or the block is reported as corrupt in the alert log; the block then is not written to disk.
Storage vendors have implemented checks based on the Oracle design of not writing zeros.
This Oracle property (not writing zero blocks) is now checked by some storage vendors to avoid some of these corruptions caused by operating system or storage.
EMC Double Checksum checks for a non-zero DBA field in Oracle blocks. An all-zero block by definition fails checks. Hitachi Database Validator performs the same check. Oracle Exadata performs these, and additional, more comprehensive (HARD) checks.
If Oracle wrote all-zero blocks, it would never have allowed these checks to be implemented into third-party hardware.
Conclusion
If the database is corrupt with zero-out blocks, then there is definitely some issue with the OS or storage subsystem below Oracle. You may notify Oracle Support but work with your OS or storage vendor to identify the issue.
To repair the corruption use RMAN block media recovery as described in Note 144911.1 or if a backup is not available reference section "Database in NOARCHIVELOG mode or there is not a valid backup" in Note 1578.1
REFERENCES
NOTE:144911.1 - RMAN : Block-Level Media Recovery - Concept & ExampleNOTE:1302539.1 - Best Practices for Corruption Detection, Prevention, and Automatic Repair - in a Data Guard Configuration
NOTE:840978.1 - Physical and Logical Block Corruptions. All you wanted to know about it.
NOTE:1578.1 - OERR: ORA-1578 "ORACLE data block corrupted (file # %s, block # %s)" Master Note
Soft Corrupt Blocks - Definition and information (文件 ID 1496934.1)
In this Document
Purpose |
Scope |
Details |
What is a Soft Corrupt Block? |
When a block is marked as Soft Corrupt? |
Why is a block marked as Soft Corrupt? |
What is the Oracle behavior for a Soft Corrupt Block? |
References |
APPLIES TO:
Oracle Database - Enterprise Edition - Version 9.2.0.1 and laterInformation in this document applies to any platform.
PURPOSE
Oracle may mark a block as Soft Corrupt. This document explains the definition of Soft Corrupt Block and the situations and behaviour when a block is marked as Soft Corrupt.
DETAILS
What is a Soft Corrupt Block?
A Software Corrupt Block is:
1) Former LOGICAL corrupt block marked as formally corrupt or
2) A block that was attempted to be recovered in the buffer cache and could not be recovered. The next message is indication of marking a block as soft corrupt in memory because the automatic block recover could not recover the block:
A block that is PHYSICAL corrupt is also called Media Corrupt Block. A Media Corrupt block is not a Soft Corrupt block.
When a block is marked as Soft Corrupt?
- When db_block_checking is enabled, Oracle soft corrupts an already corrupt block when modifying it (the block before image is already corrupt meaning that the block may be already corrupt on DISK). This only applies for LOGICAL corruptions as with PHYSICAL corruptions the block is detected as corrupt during read and does not arrive to the checking code where a modification takes place.
- When automatic block recover could not recover the block after a process failed during block modification. "Block recovery logically corrupted file .. block .." message appears in the alert.log
- dbms_repair.FIX_CORRUPT_BLOCKS can also be used to mark a LOGICAL corrupt block as Soft Corrupt.
- Media recovery can also soft corrupt a block if the block is already PHYSICAL corrupt
Subsequent block reads then produce ORA-1578 instead of internal errors.
Why is a block marked as Soft Corrupt?
- A block is marked as soft corrupt to prevent other internal errors and to prevent further corruptions in the database. When a block is logically corrupt it may cause ORA-600 / ORA-7445 errors and may produce further corruptions for other blocks in the Oracle buffer cache memory when the corrupt block is modified. Marking the block as soft corrupt prevents DML SQL statements to modify the block. When a block is marked as soft corrupt, SQL statements fail with error ORA-1578 when the block is read.
- A block is also marked as Soft Corrupt so it can be skipped when using event 10231 or when using procedure dbms_repair.SKIP_CORRUPT_BLOCKS.
- When using TDE (Transparent Data Encryption - Tablespace encryption), a block can be marked as soft corrupt if an incorrect or invalid wallet is used. Note that for this case the block is only soft corrupt in Memory as long as the invalid wallet is used (dbverify does not produce an error as the block is not soft corrupt on disk).
What is the Oracle behavior for a Soft Corrupt Block?
- When a block is marked as soft corrupt, SQL statements fail with error ORA-1578 when reading the block unless dbms_repair.SKIP_CORRUPT_BLOCKS or event 10231/10233 are used.
- RMAN backups do not fail with Soft Corrupt blocks. RMAN ignores 'soft corrupt' blocks during backup without setting the MAXCORRUPT clause. Soft Corrupt blocks do not count in the MAXCORRUPT clause.
- Media Recovery (rollforward) ignores Soft Corrupt blocks. In general media RECOVERY ignores Physical corrupt blocks and Soft Corrupt corrupt blocks. When the block is logically corrupt Media RECOVERY may fail with unexpected errors. When the block is PHYSICAL corrupt, media recovery marks the block as soft corrupt and recovery continues with no errors (Corrupt block is skipped).
- RMAN validate does not report a soft corrupt block in the trace file but in v$database_block_corruption. Example:
FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
1 59744 1 0 CORRUPT
- DBVerify reports the soft corrupt block with DBV-200 error:
REFERENCES
NOTE:840978.1 - Physical and Logical Block Corruptions. All you wanted to know about it.How to identify the corrupt Object reported by ORA-1578 / RMAN / DBVERIFY (文件 ID 819533.1)
In this Document
Goal |
Solution |
Identify the Absolute File Number (AFN) and Block Number (BL) |
Identify the Corrupt Object |
Fixing Corruptions |
References |
APPLIES TO:
Oracle Database - Enterprise Edition - Version 8.1.5.0 to 12.1.0.2 [Release 8.1.5 to 12.1]Information in this document applies to any platform.
***Checked for relevance on 21-Oct-2013***
***Checked for relevance on 16-JUN-2017***
GOAL
The purpose of this note is to provide the instructions to identify the corrupt Object reported by the error ORA-1578 or tools like RMAN / DBVERIFY
SOLUTION
Identify the Absolute File Number (AFN) and Block Number (BL)
The absolute file number (AFN) and relative file numbers (RFN) are often the same but can differ, especially if the database has been migrated from Oracle7 or if Transportable/Plugged Tablespaces are used or if Multitenant in 12c is configured or for Big File Tablespace (BFT) which always has a RFN of 1024.
The RFN is relative to the tablespace, it is unique within a tablespace and non-unique in the database, whereas the AFN is unique in the database.
It is important to get the correct numbers for the AFN and RFN or you may end up salvaging the wrong object.
Getting the AFN from ORA-1578
The AFN is provided by the error ORA-1110 produced right after the ORA-1578. In the next example the AFN is 5 and BL is 34.
select * from scott.dept_view
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 11, block # 34)
ORA-01110: data file 5: '/home/oracle/oradata/users.dbf'
Getting the AFN from DBVERIFY output
A corrupt block might be reported by dbverify in different ways. DBVERIFY normally provides the RDBA associated to the affected block. Then the RFN is used to get the AFN in the query from dba_data_files below. Here are some examples:
RFN=11 BL=34:
Corrupt block relative dba: 0x02c00022 (file 11, block 34)
Bad check value found during dbv:
Data in bad block:
type: 6 format: 2 rdba: 0x02c00022
last change scn: 0x0771.4eebe71c seq: 0x2 flg: 0x04
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0xe71c0602
check value in block header: 0xd3ce
computed block checksum: 0x2
Dbverify always reports the relative data block address (rdba/dba) in its output. In the above case the Relative dba is the hexadecimal value 0x02c00022 taken from Message "Corrupt block relative dba: 0x02c00022 (file 11, block 34)". The rdba/dba provides the RFN. The RFN is then 11. Use the query from dba_data_files below to get the AFN.
Another example from dbverify is:
RFN=11 BL=35:
Dbv output:
To get the RFN and Block# use the next query:
dbms_utility.data_block_address_block(&&rdba) BL
from dual;
Example:
2 dbms_utility.data_block_address_block(&&rdba) BL
3 from dual;
Enter value for rdba: 46137379
RFN BL
---------- ----------
11 35
Get the AFN from the RFN using dba_data_files:
from dba_data_files
where relative_fno=&RFN;
Example:
2 from dba_data_files
3 where relative_fno=&RFN;
Enter value for rfn: 11
AFN RELATIVE_FNO TABLESPACE_NAME
---------- ------------ ------------------------------
5 11 USERS
The AFN is 5
Getting the AFN from RMAN
RMAN report corruptions in the v$database_block_corruption view.
The column FILE# in that view is the AFN. Column BLOCK# is BL.
Identify the Corrupt Object
Once that the AFN is identified, run the next query to identify the corrupt Object:
from dba_extents
where file_id = &AFN
and &BL between block_id AND block_id + blocks - 1;
Example:
2 from dba_extents
3 where file_id = &AFN
4 and &BL between block_id AND block_id + blocks - 1;
Enter value for afn: 5
Enter value for bl: 34
OWNER SEGMENT_NAME PARTITION_NAME SEGMENT_TYPE TABLESPACE_NAME EXTENT_ID FILE_ID BLOCK_ID BYTES BLOCKS RELATIVE_FNO
----- ------------ -------------- ------------ --------------- --------- ------- -------- ---------- ------ ------------
SCOTT DEPT TABLE USERS 0 5 33 65536 8 11
If the above query does not return rows, it can also be that the corrupted block is a segment header in a Locally Managed Tablespace (LMT). When the corrupted block is a segment header block in a LMT, the above query produces a corruption message in the alert.log but the query does not not fail. In that case run this query:
from dba_segments
where header_file = &AFN
and header_block = &BL;
If the block belongs to a free EXTENT (not associated to an object) or if the block is in a TEMPFILE the above queries will return no data. For TEMPFILES the "Segment Type" will be "TEMPORARY".
If the block belongs to a free extent it should appear in DBA_FREE_SPACE:
from dba_free_space
where file_id = &AFN
and &BL between block_id AND block_id + blocks - 1;
Note that in Oracle 10g and above when an ORA-1578 is produced, the alert log is also updated with the information of the corrupt object. Example:
TSN = 5, TSNAME = USERS
RFN = 11, BLK = 34, RDBA = 46137378
OBJN = 46107, OBJD = 36440, OBJECT = DEPT, SUBOBJECT =
SEGMENT OWNER = SCOTT, SEGMENT TYPE = Table Segment
Fixing Corruptions
Once that the Corrupt Object has been identified solve the corruption by following the next articles:
If getting corruption error ORA-1578 follow the instructions in Doc ID 1578.1 to fix the corruption.
To fix other corruptions follow Doc ID 28814.1
REFERENCES
NOTE:472231.1 - How to identify all the Corrupted Objects in the Database with RMANNOTE:28814.1 - Handling Oracle Block Corruptions
NOTE:836658.1 - Identify the Corruption Extension for Block Corruption, Table/Index Inconsistency, Data Dictionary and Lost Writes
NOTE:1578.1 - OERR: ORA-1578 "ORACLE data block corrupted (file # %s, block # %s)" Master Note
About Me
.............................................................................................................................................
● 本文作者:小麥苗,部分內容整理自網路,若有侵權請聯絡小麥苗刪除
● 本文在itpub(http://blog.itpub.net/26736162/abstract/1/)、部落格園(http://www.cnblogs.com/lhrbest)和個人微信公眾號(xiaomaimiaolhr)上有同步更新
● 本文itpub地址:http://blog.itpub.net/26736162/abstract/1/
● 本文部落格園地址:http://www.cnblogs.com/lhrbest
● 本文pdf版、個人簡介及小麥苗雲盤地址:http://blog.itpub.net/26736162/viewspace-1624453/
● 資料庫筆試面試題庫及解答:http://blog.itpub.net/26736162/viewspace-2134706/
● DBA寶典今日頭條號地址:
.............................................................................................................................................
● QQ群號:230161599(滿)、618766405
● 微信群:可加我微信,我拉大家進群,非誠勿擾
● 聯絡我請加QQ好友(646634621),註明新增緣由
● 於 2018-04-01 06:00 ~ 2018-04-31 24:00 在魔都完成
● 最新修改時間:2018-04-01 06:00 ~ 2018-04-31 24:00
● 文章內容來源於小麥苗的學習筆記,部分整理自網路,若有侵權或不當之處還請諒解
● 版權所有,歡迎分享本文,轉載請保留出處
.............................................................................................................................................
● 小麥苗的微店:
● 小麥苗出版的資料庫類叢書:http://blog.itpub.net/26736162/viewspace-2142121/
● 小麥苗OCP、OCM、高可用網路班:http://blog.itpub.net/26736162/viewspace-2148098/
.............................................................................................................................................
使用微信客戶端掃描下面的二維碼來關注小麥苗的微信公眾號(xiaomaimiaolhr)及QQ群(DBA寶典),學習最實用的資料庫技術。
小麥苗的微信公眾號 小麥苗的DBA寶典QQ群2 《DBA筆試面試寶典》讀者群 小麥苗的微店
.............................................................................................................................................
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/26736162/viewspace-2139709/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 【dbv】使用dbv工具檢驗資料檔案是否有壞塊
- 轉載:Oracle資料塊損壞恢復總結Oracle
- 怎樣檢查資料庫壞塊(DBV)資料庫
- Oracle資料庫壞塊修復Oracle資料庫
- 【資料庫資料恢復】Oracle資料庫檔案出現壞塊報錯的資料恢復案例資料庫資料恢復Oracle
- ORACLE中修復資料塊損壞Oracle
- 恢復資料,資料塊恢復
- oracle實驗記錄 (恢復,備份-含壞塊資料檔案)Oracle
- Oracle塊損壞恢復(有rman備份)Oracle
- 資料檔案壞塊的製造和恢復
- 使用dbv和RMAN檢查資料檔案中的壞塊
- oracle asm 資料塊重構恢復OracleASM
- 基於RMAN實現壞塊介質恢復(blockrecover)BloC
- Oracle如何進行塊介質的恢復?(有邏輯壞塊是如何處理)Oracle
- Oracle資料庫壞塊(corruption)-物理壞塊Oracle資料庫
- Oracle資料庫UNDO損壞後的恢復Oracle資料庫
- Oracle中匯出修復資料塊損壞Oracle
- Oracle中模擬修復資料塊損壞Oracle
- PostgreSQL 恢復大法 - 恢復部分資料庫、跳過壞塊、修復無法啟動的資料庫SQL資料庫
- 恢復SDS中壞掉的一塊硬碟的資料(轉)硬碟
- 【北亞資料恢復】硬碟壞道故障如何恢復資料?資料恢復硬碟
- rman 恢復資料塊
- rman恢復資料塊
- 修復損壞的資料塊
- 資料底層損壞的恢復方法—拼碎片恢復資料
- SQLite資料庫損壞及其修復探究SQLite資料庫
- 使用blockrecover 對有壞塊的資料檔案進行恢復BloC
- 【伺服器資料恢復】IBM儲存伺服器硬碟壞道離線、oracle資料庫損壞的資料恢復伺服器資料恢復IBM硬碟Oracle資料庫
- Oracle DBV 工具 介紹Oracle
- Oracle資料檔案損壞恢復例項二則Oracle
- 資料庫檔案壞塊損壞導致開啟時報錯的恢復方法資料庫
- [ORACLE] 系統故障資料庫恢復--資料檔案無損壞Oracle資料庫
- oracle備份和恢復策略簡介Oracle
- 資料塊恢復例項
- 資料庫資料恢復—NTFS分割槽損壞如何恢復SqlServer資料庫資料資料庫資料恢復SQLServer
- 【資料庫資料恢復】MongoDB資料庫檔案損壞的資料恢復案例資料庫資料恢復MongoDB
- 修復資料庫壞塊之五資料庫
- 修復資料庫壞塊之四資料庫