模擬data block lost write的簡單方法

oliseh發表於2015-08-17


data block lost write是指資料庫下發一個磁碟寫操作後收到I/O系統關於寫入已經完成的通知,但實際卻由於磁碟或者磁碟控制器異常等原因資料塊最終沒有記入到磁碟,導致對於資料塊的修改丟失的情況,data block lost write雖不會影響資料庫的高可用性,但若不及時發現會引起資料庫的邏輯錯誤進一步的擴散,如果test1表裡有一行記錄

ID
--
1

test2表裡有一行記錄
ID  salary
--  ------
2   5000

session 1裡執行第一條update語句後發生了寫丟失
update test1 set id=2 where id=1;
commit;
<發生了data block lost write>

由於修改的結果沒有永久儲存到磁碟,磁碟上test1表裡的記錄還是
ID
--
1

session 1緊接著又執行了第二條update,這次update的是test2表,關聯了test1表
update test2 set salary=salary+1000 where id in (select id from test1)

由於lost write的發生,最終update了0行記錄

session 1裡的最後一條記錄還是update,這次update的是test3表,關聯了test2表

update test3 set job='manager' where id in (select id from test2 where salary > 5000);

最終還是update了0行記錄

可以看出data block lost write會引起一連串的邏輯錯誤,其後果是非常嚴重的

oracle提供了db_lost_write_protect引數用來檢測是否發生了lost write,一般我們在Dataguard環境裡會設定主庫以及備庫的db_lost_write_protect=typical,這樣主庫若發生data block lost write備庫就能檢測出來。當然如果沒有DG環境使用下面的方法也能夠模擬出data block lost write,我們利用RMAN 備份和恢復的過程來模擬data block lost write


###設定db_lost_write_protect=typical
alter system set db_lost_write_protect=typical scope=both;

SYS@tstdb1-SQL> show parameter db_lost_write_protect

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_lost_write_protect                string      NONE

###建立測試用表空間和表
drop tablespace ts0815_1 including contents and datafiles;

create tablespace ts0815_1 datafile '/oradata06/testaaaaa/ts0815_1.dbf' size 128m;

create table t0815_1 (c1 varchar2(2),c2 varchar2(2)) tablespace ts0815_1 ;

insert into t0815_1 values('a','aa');
commit;

select * from t0815_1;
C1 C2
-- --
a  aa

alter system flush buffer_cache;
alter system flush buffer_cache;

select dbms_rowid.rowid_relative_fno(rowid) rfno,dbms_rowid.rowid_block_number(rowid) blkno from t0815_1;
      RFNO      BLKNO
---------- ----------
         8        131

###儲存block 8/131的快照
dd if=/oradata06/testaaaaa/ts0815_1.dbf of=/oradata06/testaaaaa/blk131.dmp bs=8192 count=1 skip=131
1+0 records in.
1+0 records out.

ls -l /oradata06/testaaaaa/blk131.dmp
-rw-r--r--    1 oracle   oinstall       8192 Aug 15 13:41 /oradata06/testaaaaa/blk131.dmp


###修改記錄

update t0815_1 set c1='b',c2='bb';
commit;

alter system flush buffer_cache;
alter system flush buffer_cache;

select * from t0815_1;
C1 C2
-- --
b  bb

rman target /


###備份datafile

RMAN> backup datafile 8;

Starting backup at 20150815 13:42:56
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=663 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00008 name=/oradata06/testaaaaa/ts0815_1.dbf
channel ORA_DISK_1: starting piece 1 at 20150815 13:42:57
channel ORA_DISK_1: finished piece 1 at 20150815 13:42:58
piece handle=/oradata06/fra/TSTDB1/backupset/2015_08_15/o1_mf_nnndf_TAG20150815T134257_1lU8o7H7G_.bkp tag=TAG20150815T134257 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 20150815 13:42:58

alter system switch logfile;

alter system switch logfile;

alter system flush buffer_cache;


###將old block image覆蓋現有的block,模擬data block lost write

dd if=/oradata06/testaaaaa/blk131.dmp of=/oradata06/testaaaaa/ts0815_1.dbf conv=notrunc bs=8192 count=1 seek=131
1+0 records in.
1+0 records out

###變回了修改前的記錄
alter system flush buffer_cache;


select * from t0815_1;

C1 C2
-- --
a  aa

alter system switch logfile;

###redo size for lost write detection、redo entries for lost write detection的統計資訊有增長,因為read block from disk的操作會寫redo
SYS@tstdb1-SQL> select * from v$mystat m,v$statname s where s.statistic#=m.statistic# and s.name like '%lost write%';

       SID STATISTIC#      VALUE STATISTIC# NAME                                                    CLASS    STAT_ID
---------- ---------- ---------- ---------- -------------------------------------------------- ---------- ----------
       598         87          0         87 recovery blocks read for lost write detection               8 2677324291
       598         88          0         88 recovery blocks skipped lost write checks                   8  680578166
       598        179          8        179 redo entries for lost write detection                       2 2194106845
       598        180       1648        180 redo size for lost write detection                          2 3835726666


###資料檔案丟失      

rm /oradata06/testaaaaa/ts0815_1.dbf

startup force
ORACLE instance started.

Total System Global Area 6413680640 bytes
Fixed Size                  2233480 bytes
Variable Size            3623881592 bytes
Database Buffers         2768240640 bytes
Redo Buffers               19324928 bytes
Database mounted.
ORA-01157: cannot identify/lock data file 8 - see DBWR trace file
ORA-01110: data file 8: '/oradata06/testaaaaa/ts0815_1.dbf'

###recover datafile時出現了ORA-00752
rman target /          

RMAN> restore datafile 8;

Starting restore at 20150815 13:48:46
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=463 device type=DISK

channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00008 to /oradata06/testaaaaa/ts0815_1.dbf
channel ORA_DISK_1: reading from backup piece /oradata06/fra/TSTDB1/backupset/2015_08_15/o1_mf_nnndf_TAG20150815T134257_1lU8o7H7G_.bkp
channel ORA_DISK_1: piece handle=/oradata06/fra/TSTDB1/backupset/2015_08_15/o1_mf_nnndf_TAG20150815T134257_1lU8o7H7G_.bkp tag=TAG20150815T134257
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
Finished restore at 20150815 13:48:49

RMAN> recover datafile 8;

Starting recover at 20150815 13:49:00
using channel ORA_DISK_1

starting media recovery

archived log for thread 1 with sequence 201 is already on disk as file /oradata06/arch/1_201_884446916.dbf
archived log for thread 1 with sequence 202 is already on disk as file /oradata06/arch/1_202_884446916.dbf
archived log for thread 1 with sequence 203 is already on disk as file /oradata06/arch/1_203_884446916.dbf
archived log for thread 1 with sequence 204 is already on disk as file /oradata06/arch/1_204_884446916.dbf
archived log file name=/oradata06/arch/1_201_884446916.dbf thread=1 sequence=201
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 08/15/2015 13:49:02
ORA-00283: recovery session canceled due to errors
RMAN-11003: failure during parse/execution of SQL statement: alter database recover logfile '/oradata06/arch/1_201_884446916.dbf'
ORA-00283: recovery session canceled due to errors
ORA-00752: recovery detected a lost write of a data block
ORA-10567: Redo is inconsistent with data block (file# 8, block# 131, file offset is 1073152 bytes)
ORA-10564: tablespace TS0815_1
ORA-01110: data file 8: '/oradata06/testaaaaa/ts0815_1.dbf'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 7856057

附:recover過程中的alert.log日誌
Full restore complete of datafile 8 /oradata06/testaaaaa/ts0815_1.dbf.  Elapsed time: 0:00:00
  checkpoint is 12793185443139
  last deallocation scn is 12723364365344
Sat Aug 15 13:49:00 2015
alter database recover datafile list clear
Completed: alter database recover datafile list clear
alter database recover if needed
 datafile 8
Media Recovery Start
Serial Media Recovery started
ORA-279 signalled during: alter database recover if needed
 datafile 8
...
alter database recover logfile '/oradata06/arch/1_201_884446916.dbf'
Media Recovery Log /oradata06/arch/1_201_884446916.dbf
Recovery of Online Redo Log: Thread 1 Group 1 Seq 202 Reading mem 0
  Mem# 0: /oradata06/testaaaaa/redo01a.log
  Mem# 1: /oradata06/testaaaaa/redo01b.log
Recovery of Online Redo Log: Thread 1 Group 2 Seq 203 Reading mem 0
  Mem# 0: /oradata06/testaaaaa/redo02a.log
  Mem# 1: /oradata06/testaaaaa/redo02b.log
Recovery of Online Redo Log: Thread 1 Group 4 Seq 204 Reading mem 0
  Mem# 0: /oradata06/testaaaaa/redo04a.log
  Mem# 1: /oradata06/testaaaaa/redo04b.log
REDO APPLICATION DURING RECOVERY HAS DETECTED THAT THE DATABASE
LOST A DISK WRITE OF BLOCK 131, FILE 8                               <---定位到了發生lost write的資料塊
DURING NORMAL DATABASE OPERATION.
NO REDO AT OR AFTER SCN 12793185443253 CAN BE USED FOR RECOVERY.
Recovery of Online Redo Log: Thread 1 Group 4 Seq 204 Reading mem 0
  Mem# 0: /oradata06/testaaaaa/redo04a.log
  Mem# 1: /oradata06/testaaaaa/redo04b.log
Recovery of Online Redo Log: Thread 1 Group 2 Seq 203 Reading mem 0
  Mem# 0: /oradata06/testaaaaa/redo02a.log
  Mem# 1: /oradata06/testaaaaa/redo02b.log
Media Recovery failed with error 752
ORA-283 signalled during: alter database recover logfile '/oradata06/arch/1_201_884446916.dbf'...

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/53956/viewspace-1773778/,如需轉載,請註明出處,否則將追究法律責任。

相關文章