1128PAGETABLE SEGMENT HEADER損壞恢復

lfree發表於2016-11-28

[20161128]PAGETABLE SEGMENT HEADER損壞恢復2.txt

--昨天在做段頭損壞測試時,發現一個奇特現象,我破壞了段頭,但是我select依舊能正常訪問,檢查發現我在資料快取相應的塊是OK的,
--雖然我破壞了資料檔案中相應表的段頭,但是資料快取的資訊是好的,所以能正常訪問,而我執行alter system checkpoint並沒有將
--快取資訊寫盤(也許前面已經發出過checkpoint命令),這樣才出現問題。

--所以這樣提出一種恢復思路,如果我這個時候,我能備份段頭塊的資訊(記憶體的資訊)到某個檔案,一樣可以很好的恢復段頭,這些步驟
--的關鍵是如何備份記憶體中的段頭資訊,透過測試來說明問題。

--另外我寫一篇[20161128]關於Little Enddian.txt,裡面提到記憶體儲存的資訊正好4個自己倒一下。

1.環境:
SCOTT@book> @  &r/ver1
PORT_STRING                    VERSION        BANNER
------------------------------ -------------- --------------------------------------------------------------------------------
x86_64/Linux 2.4.xx            11.2.0.4.0     Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

CREATE TABLESPACE SUGAR DATAFILE
  '/mnt/ramdisk/book/sugar01.dbf' SIZE 40M AUTOEXTEND ON NEXT 16M MAXSIZE UNLIMITED
LOGGING
ONLINE
EXTENT MANAGEMENT LOCAL AUTOALLOCATE
BLOCKSIZE 8K
SEGMENT SPACE MANAGEMENT AUTO
FLASHBACK ON;

create table t4 tablespace sugar   as select rownum id ,lpad('A',32,'A') name from dual connect by level<=1e5;
alter system checkpoint ;

SCOTT@book> select rowid,t4.id from t4 where id=1;
ROWID                      ID
------------------ ----------
AAAVwoAAGAAAACDAAA          1
--//依舊正常訪問。

SCOTT@book> @ &r/rowid AAAVwoAAGAAAACDAAA
    OBJECT       FILE      BLOCK        ROW ROWID_DBA            DBA                  TEXT
---------- ---------- ---------- ---------- -------------------- -------------------- ----------------------------------------
     89128          6        131          0  0x1800083           6,131                alter system dump datafile 6 block 131 ;

SYS@book> @ &r/bh 6 130
HLADDR              DBARFIL     DBABLK      CLASS CLASS_TYPE         STATE             TCH CR_SCN_BAS CR_SCN_WRP CR_UBA_FIL CR_UBA_BLK CR_UBA_SEQ BA               OBJECT_NAME
---------------- ---------- ---------- ---------- ------------------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------------- -----------
0000000084A184E8          6        130          4 segment header     xcur                2          0          0          0          0          0 0000000073190000 T4

--我沒有建立索引,我執行select rowid,t4.id from t4 where id=1;,全表掃描,這樣一定要訪問段頭。這樣段頭儲存在資料快取中。

2.開始破壞段頭。
--首先做一個備份:
RMAN> backup as copy datafile 6 format '/u01/backup/sugar.dbf_20161128';
Starting backup at 2016-11-28 10:32:37
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=46 device type=DISK
channel ORA_DISK_1: starting datafile copy
input datafile file number=00006 name=/mnt/ramdisk/book/sugar01.dbf
output file name=/u01/backup/sugar.dbf_20161128 tag=TAG20161128T103238 RECID=5 STAMP=929097158
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:01
Finished backup at 2016-11-28 10:32:39

$ dd if=/dev/zero of=/mnt/ramdisk/book/sugar01.dbf bs=8192 seek=130 count=1 conv=notrunc
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 4.3488e-05 seconds, 188 MB/s
--//執行時一定小心,方法,引數,使用conv=notrunc(我經常忘記加這個引數),不然會截斷。

SCOTT@book> select count(*) from t4 ;
  COUNT(*)
----------
    100000

--可以發現這個時候快取的塊是ok的,所以能夠訪問,而資料塊的檔案中的資訊是壞的,而這個時候如何知道呢(先放一邊,也許一些磁
--盤故障在寫盤時會報錯)

$ dbv file=/mnt/ramdisk/book/sugar01.dbf
DBVERIFY: Release 11.2.0.4.0 - Production on Mon Nov 28 10:34:07 2016
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
DBVERIFY - Verification starting : FILE = /mnt/ramdisk/book/sugar01.dbf
Page 130 is marked corrupt
Corrupt block relative dba: 0x01800082 (file 6, block 130)
Completely zero block found during dbv:

3.如何取出這個塊的在快取的資訊呢?
--//參考 http://blog.itpub.net/267265/viewspace-1659981/
alter session set events 'immediate trace name set_tsn_p1 level m';
ALTER SESSION SET EVENTS 'immediate trace name buffer level rdba';
--//注 m要ts#+1.

SCOTT@book> select ts# from sys.ts$ where name='SUGAR';
       TS#
----------
         7
SCOTT@book> @ &r/convrdba 6 130
RDBA16               RDBA
-------------- ----------
       1800082   25165954

alter session set events 'immediate trace name set_tsn_p1 level 8';
ALTER SESSION SET EVENTS 'immediate trace name buffer level 0x1800082';

--轉儲的內容。
Hex dump of block: st=0, typ_found=1
Dump of memory from 0x0000000073190000 to 0x0000000073192000
073190000 0000A223 01800082 006E2A4D 04010000  [#.......M*n.....]
073190010 0000F64C 00000000 00000000 00000000  [L...............]
073190020 00000000 00000014 00000280 00000A9C  [................]
073190030 00000013 00000064 00000080 018002E4  [....d...........]
073190040 00000000 00000013 00000000 00000264  [............d...]
073190050 00000000 00000000 00000000 00000013  [................]
073190060 00000064 00000080 018002E4 00000000  [d...............]
073190070 00000013 00000000 00000264 01800281  [........d.......]
073190080 01800281 00000000 00000000 00000000  [................]
073190090 00000000 00000000 00000000 00000000  [................]
        Repeat 3 times
0731900D0 00000001 00002000 00000000 00001434  [..... ......4...]
0731900E0 00000000 01800081 00000001 01800281  [................]
0731900F0 01800081 00000000 00000000 00000000  [................]
073190100 00000000 00000000 00000014 00000000  [................]
073190110 00015C28 10000000 01800080 00000008  [(\..............]
073190120 01800088 00000008 01800090 00000008  [................]
073190130 01800098 00000008 018000A0 00000008  [................]
073190140 018000A8 00000008 018000B0 00000008  [................]
073190150 018000B8 00000008 018000C0 00000008  [................]
073190160 018000C8 00000008 018000D0 00000008  [................]
073190170 018000D8 00000008 018000E0 00000008  [................]
073190180 018000E8 00000008 018000F0 00000008  [................]
073190190 018000F8 00000008 01800100 00000080  [................]
0731901A0 01800180 00000080 01800200 00000080  [................]
0731901B0 01800280 00000080 00000000 00000000  [................]
0731901C0 00000000 00000000 00000000 00000000  [................]
        Repeat 142 times
073190AB0 01800080 01800083 01800080 01800088  [................]
073190AC0 01800090 01800091 01800090 01800098  [................]
073190AD0 018000A0 018000A1 018000A0 018000A8  [................]
073190AE0 018000B0 018000B1 018000B0 018000B8  [................]
073190AF0 018000C0 018000C1 018000C0 018000C8  [................]
073190B00 018000D0 018000D1 018000D0 018000D8  [................]
073190B10 018000E0 018000E1 018000E0 018000E8  [................]
073190B20 018000F0 018000F1 018000F0 018000F8  [................]
073190B30 01800100 01800102 01800180 01800182  [................]
073190B40 01800200 01800202 01800280 01800282  [................]
073190B50 00000000 00000000 00000000 00000000  [................]
        Repeat 142 times
073191440 00000000 00000000 01800081 00000000  [................]
073191450 00000000 00000000 00000000 00000000  [................]
        Repeat 185 times
073191FF0 00000000 00000000 00000000 2A4D2301  [.............#M*]

--還有什麼好方法。也許程式設計直接轉儲這個區域的資訊,不過目前超過我的能力,先放棄。

--如果使用如下方式:
SCOTT@book> alter system dump datafile 6 block 130;
System altered.

Block dump from disk:
buffer tsn: 7 rdba: 0x00000000 (0/0)
scn: 0x0000.00000000 seq: 0x00 flg: 0x00 tail: 0x00000000
frmt: 0x00 chkval: 0x0000 type: 0x00=unknown
Hex dump of corrupt header 4 = CORRUPT
Dump of memory from 0x00007FD3316A8200 to 0x00007FD3316A8214
7FD3316A8200 00000000 00000000 00000000 00000000  [................]
7FD3316A8210 00000000                             [....]
Hex dump of block: st=4, typ_found=0
Dump of memory from 0x00007FD3316A8200 to 0x00007FD3316AA200
7FD3316A8200 00000000 00000000 00000000 00000000  [................]
  Repeat 511 times
End dump data blocks tsn: 7 file#: 6 minblk 130 maxblk 130

--全是0。

4.轉儲為hex模式看相關資料

$ oerr ora 10289
10289, 00000, "Do block dumps to trace file in hex rather than fromatted"
// *Cause:
// *Action: If set, don't do formatted block dumps.  This is a work-around
//          for block dump routines that accvio when given badly corrupted
//          blocks. It may also be useful for cases where the type is wrong.

--看看這種模式。

alter session set events '10289 trace name context forever, level 12';
alter session set events 'immediate trace name set_tsn_p1 level 8';
ALTER SESSION SET EVENTS 'immediate trace name buffer level 0x1800082';

--//放棄一樣的,僅僅缺少後面轉儲的相關詳細說明。

5.繼續測試。
SCOTT@book> alter system checkpoint ;
System altered.

SCOTT@book> alter tablespace sugar offline ;
Tablespace altered.

SCOTT@book> alter tablespace sugar online ;
Tablespace altered.

SCOTT@book> select count(*) from t4 ;
select count(*) from t4
                     *
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 6, block # 130)
ORA-01110: data file 6: '/mnt/ramdisk/book/sugar01.dbf'

--//在建立表後發過一次alter system checkpoint,這樣第2次不會寫盤,這樣online後,讀取的都是0,再訪問時報錯。

SCOTT@book> alter tablespace sugar offline ;
Tablespace altered.

6.現在探究如下將前面dump記憶體的資訊轉化成資料塊的資訊。

--自己程式設計應該可以,不過有點超出我的能力,測試別的方法。
--//首先取出轉儲資訊。
$ cat  130a.bin
073190000 0000A223 01800082 006E2A4D 04010000  [#.......M*n.....]
073190010 0000F64C 00000000 00000000 00000000  [L...............]
073190020 00000000 00000014 00000280 00000A9C  [................]
073190030 00000013 00000064 00000080 018002E4  [....d...........]
073190040 00000000 00000013 00000000 00000264  [............d...]
073190050 00000000 00000000 00000000 00000013  [................]
073190060 00000064 00000080 018002E4 00000000  [d...............]
073190070 00000013 00000000 00000264 01800281  [........d.......]
073190080 01800281 00000000 00000000 00000000  [................]
073190090 00000000 00000000 00000000 00000000  [................]
        Repeat 3 times

--前面073190000 以及  [#.......M*n.....]刪除。

$ cut -c11-45 130a.bin > 130b.bin

--編輯130b.bin,Repeat 3 times這行刪除,實際上會vi很簡單,先在前面一行按Y,然後按3P複製一樣的3行。後面的操作類似(比如
--Repeat 142 times),不再講解。格式如下:
$ head 130b.bin
0000A223 01800082 006E2A4D 04010000
0000F64C 00000000 00000000 00000000
00000000 00000014 00000280 00000A9C
00000013 00000064 00000080 018002E4
00000000 00000013 00000000 00000264
00000000 00000000 00000000 00000013
00000064 00000080 018002E4 00000000
00000013 00000000 00000264 01800281
01800281 00000000 00000000 00000000
00000000 00000000 00000000 00000000

$ cut -c1-4,5-8,10-13,14-17,19-22,23-26,28-31,32-35 --output-delimiter=' ' 130b.bin >| 130c.bin
$ head -5 130c.bin
0000 A223 0180 0082 006E 2A4D 0401 0000
0000 F64C 0000 0000 0000 0000 0000 0000
0000 0000 0000 0014 0000 0280 0000 0A9C
0000 0013 0000 0064 0000 0080 0180 02E4
0000 0000 0000 0013 0000 0000 0000 0264

--//製作xxd 的開頭。
$ dd if=/dev/zero  bs=8192 count=1 | xxd -c 16 | cut -f1 -d" " > 130d.bin
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 3.8499e-05 seconds, 213 MB/s

$ head 130d.bin
0000000:
0000010:
0000020:
0000030:
...

--//拼接在一起。
$ paste 130d.bin 130c.bin -d" " | xxd -r >| 130e.raw
$ ls -l 130e.raw
-rw-r--r-- 1 oracle oinstall 8192 2016-11-28 10:53:00 130e.raw

--大小正好8k,但是這樣是不能使用的,因為位元組順序是反了。

$ od -t x4 -v 130e.raw | cut -c9- | paste 130d.bin - -d" " | xxd -r > 130f.raw

$ xxd -c 16 130f.raw | head -5
0000000: 23a2 0000 8200 8001 4d2a 6e00 0000 0104  #.......M*n.....
0000010: 4cf6 0000 0000 0000 0000 0000 0000 0000  L?.............
0000020: 0000 0000 1400 0000 8002 0000 9c0a 0000  ................
0000030: 1300 0000 6400 0000 8000 0000 e402 8001  ....d.......?..
0000040: 0000 0000 1300 0000 0000 0000 6402 0000  ............d...

--OK,現在已經轉化完成。對比備份的md5結果看看.
$ md5sum 130f.raw
50e7809b1585323d175b3fac3b064568  130f.raw

$ dd if=/u01/backup/sugar.dbf_20161128 bs=8192 count=1 skip=130 2>/dev/null |   md5sum
50e7809b1585323d175b3fac3b064568  -

--兩者一致。

7.現在將轉儲的資訊塊寫入檔案:

$ dd if=130f.raw of=/mnt/ramdisk/book/sugar01.dbf seek=130 bs=8192 count=1 conv=notrunc
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 4.3381e-05 seconds, 189 MB/s
--//使用dd一定小心,方向,輸入,輸出,引數conv=notrunc ,seek 對應輸出的,skip對應輸入的情況。

SCOTT@book> alter tablespace sugar online ;
Tablespace altered.

SCOTT@book> select count(*) from t4 ;
  COUNT(*)
----------
    100000

--OK,現在可以訪問了。

總結:
1.轉化太複雜,也許shell程式設計沒學好。
2.僅僅是驗證我的理解是否正確,沒有什麼意思^_^。
3.也許還有更好的shell程式設計方式。我測試過 cut -c7-8,5-6,3-4,1-2 --output-delimiter=' ' 130b.bin 並不能達到我的目的。
4.不知道linux是否存在什麼命令將2進位制轉化成原來的格式。我僅僅知道xxd。那位知道,望告知..............
5.另外我也嘗試使用oradebug poke 的方式寫入原來記憶體區域,發現不可取,太危險!! 放棄這種方式。

--補充更好的轉換指令碼:

$ xxd -r -p  130b.bin | od -t x4 -v | cut -c9- | xxd -r -p  | md5sum
50e7809b1585323d175b3fac3b064568  -

$ xxd -r -p  130c.bin | od -t x4 -v | cut -c9- | xxd -r -p | md5sum
50e7809b1585323d175b3fac3b064568  -

$ md5sum 130f.raw
50e7809b1585323d175b3fac3b064568  130f.raw

--最簡潔的寫法:
$ xxd -r -p  130b.bin | od -t x4 -A x | xxd -r | md5sum
50e7809b1585323d175b3fac3b064568  -

--關於引數具體看文件,簡要說明:
xxd -r -p :
xxd -r never generates parse errors. Garbage is silently skipped.

When editing hexdumps, please note that xxd -r skips everything on the input line after reading enough columns of
hexadecimal data (see option -c). This also means, that changes to the printable ascii  (or  ebcdic)  columns  are
always ignored. Reverting a plain (or postscript) style hexdump with xxd -r -p does not depend on the correct number of
columns. Here anything that looks like a pair of hex-digits is interpreted.

od -t x4 -A x:
-t x4 就是相當於4位元組反轉
-A x  就是使用16進位制,而不是預設8進位制。

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/267265/viewspace-2129195/,如需轉載,請註明出處,否則將追究法律責任。

相關文章