oracle資料檔案內部BLOCK結構詳解

likaiabc發表於2008-07-06
轉載別人的文章,作者不詳[@more@]
datafile block block size :8192


Offset 0 1 2 3 4 5 6 7 8 9 a b c d e f

00014000 06 A2 00 00 0A 00 40 01 0E 89 43 00 00 00 05 02
type frmt spare1/2_kcbh rdba scn seq flg
1 : 20 bytes
type: 0x06=trans data defined in kcb.h
frmt: 8i~9i 都是0x02 10.1.0 2k: 0x62 4k:0x82 8k:0xa2 16k:0xc2 (logfile 0x22 512 bytes)
spare1/2_kcbh: ub1 spare1_kcbh this field is no longer used (old inc#, now always 0)
ub1 spare2_kcbh this field is no longer used (old ts#, now always 0)
rdba: 0x0140000a 轉換成2進位制後它的前10 bit 表示file id 後22 bit 表示的block id
可以看出一個tablespace 可以有1023 個datafile ,每個datafile可以有4M 的block
10G 出現的 big datafile 這裡表示的就是block id了 沒有file id
9.2.0試驗過一個tablespace可以有1023個datafile 一個object可以存放在1023個datafile中
scn: scn: 0x0000.0043890e
seq: A sequence number incremented for each change to a block at the same SCN
A new SCN is allocated if the sequence number wraps.
同一個SCN影響這個block中的行數大於 254 行就會為這個事務分配一個新的SCN
如下面的操作就可能引起同一個SCN但影響的同一個block 中的行超過254行
"delete from table_name"
影響的行數(最大254) 是用從 0x01 到 0xfe 表示的
當這個byte 的資料為 0xff 的時候標誌這個 block 壞調了---&gt ora-01578
Sequence number:
SEQ -> 0 /* non-logged changes - do not advance seq# */
SEQ -> (UB1MAXVAL-1)/* maximum possible sequence number */
SEQ -> (UB1MAXVAL) /* seq# to indicate a block is corrupt,equal to FF. soft corrupt*/
0xff : When present it indicates that the block has been marked as corrupt by Oracle. either by the db_block_checking functionality or the equivalent events (10210 for data blocks, 10211 for index blocks, and 10212 for cluster blocks) when making a database change, or by the DBMS_REPAIR.FIX_CORRUPT_BLOCKS procedure, or by PMON after an unsuccessful online block recovery attempt while recovering a failed process, or by RMAN during a BACKUP, COPY or VALIDATE command with the CHECK LOGICAL option. Logical corruptions are normally due to either recovery through a NOLOGGING operation, or an Oracle software bug.
flg: as defined in kcbh.h
#define KCBHFNEW 0x01 /* new block - zeroed data area */
#define KCBHFDLC 0x02 /* Delayed Logging Change advance SCN/seq */
#define KCBHFCKV 0x04 /* ChecK Value saved-block xor's to zero */
#define KCBHFTMP 0x08 /* Temporary block */
這是一個可以組合的值 也就是說有為 6 的時候是 2,4 兩種情況的組合
Block structure as defined in kcbh.h:
struct kcbh
{ub1 type_kcbh; /* Block type* /
ub1 frmt_kcbh; /* #define KCBH_FRMT8 2 */
ub1 spare1_kcbh;
ub1 spare2_kcbh;
krdba rdba_kcbh; /* relative DBA /
ub4 bas_kcbh; /* base of SCN */
ub2 wrp_kcbh; /* wrap of SCN */
ub1 seq_kcbh; /* sequence # of changes at same scn */
ub1 flg_kcbh;
ub2 chkval_kcbh;
};
00014010 00 00 00 00 01 00 17 00 54 D2 00 00 0A 89 43 00
chkval spare3_kcbh typ ? seg/obj csc
spare3_kcbh : ub2 spare3_kcbh

2 : 24 bytes (總計44bytes)
typ : 1 - DATA 2 index
改成3了在10.1.0 上引起了ora-600[2032]然後ORA-27101: shared memory realm does not exist
oracle進行查詢的時候是根據 obj$表中的情況來判斷物件的型別的,不是根據這個typ
也就是說如果有一個表但改變表中block的這個標誌位,一樣可以查詢出資料來,
但dump block 時會出錯,ORA-00600: 內部錯誤程式碼,自變數: [4555], [0], [], [], [], [], [], []
錯誤中的 [0] 就是typ對應的資料
在10G中改變它後update這個block的資料commit可以但rollback的報錯
? 見過有其他值 但用編輯器改這個值 在 dump 檔案中顯示不出來變化
seg/obj: 0xd254
csc : 0x00.43890a The SCN at which the last full cleanout was performed on the block
00014020 00 00 E8 1F 02 00 03 00 00 00 00 00 04 00 0C 00
csc ? itc ? flg fsl fnx xid
3 : 24 bytes * itl (2個itl總計92bytes)
? 見過有其他值 但用編輯器改這個值 在 dump 檔案中顯示不出來變化
itc ITL 條目的個數 max 255超過會報ORA-02207
ORA-00060 ORA-00054 可能是沒空間分配itl條目了或它的爭用引起的
在8i中 INITRANS default為1 , 9.2.0中 INITRANS default為2
flg indicates that the block is on a freelist. Otherwise the flag is -
9i 的ASSM 的情況下這個值為 E
ixora 上說他佔用 2 bytes 但我下面的試驗和他的結果有一定的出入
我觀察到的情況是 : Object id on Block? Y flg: O ver: 0x01
上面的3項是用同一個 byte 來表示的

flg: O ver: 0x01 Object id on Block? Y
從我的觀察中 dump 出來的檔案中 flg ver Object id on Block
他們共同佔用的這個一個位元組 他的規律可以從下面的情況看出
2進位制資料 flg ver Object id on Block?
0x00 - 0x00 N
0x01 0 0x00 N
0x02 - 0x01 Y
0x03 0 0x01 Y
0x04 - 0x02 Y
0x05 0 0x02 Y
0x06 - 0x03 Y
0x07 0 0x03 Y
0x08 - 0x04 N
0x09 0 0x04 N
0x0a - 0x05 Y
0x0b 0 0x05 Y
0x0c - 0x06 Y
0x0d 0 0x06 Y
0x0e - 0x07 Y
0x0f 0 0x07 Y
0x10 ... 類似上面的迴圈了 這種情況在9i上已經改變因為ASSM的出現
fsl : Index to the first slot on the ITL freelist. ITL TX freelist slot
fnx : 自由列表中下一塊的地址 Null if this block is not on a freelist 有資料例如: fnx: 0x1000029
00014030 50 18 00 00 96 14 80 00 B9 07 01 00 01 20 00 00
xid uba Lck Flag Scn/Fsc
xid : Transaction ID (UndoSeg.Slot.Wrap)
值可以用select XIDUSN, XIDSLOT,XIDSQN from v$transaction;查到
This is comprised of the rollback segment number (2 bytes), the slot number
in the transaction table of that rollback segment (2 bytes), and the number
of times use of that transaction table has wrapped (4 bytes).
uba : Undo address (UndoDBA.SeqNo.RecordNo)
The location of the undo for the most recent change to this block by this transaction. This is comprised of the DBA of the rollback segment block (4 bytes), the sequence number (2 bytes), and the record number for the change in that undo block (1 byte), plus 1 unused byte.
Lck Flag: Lck 鎖定的row數 這裡還用到了下一個 byte 的資料
2 對應的二進位制表示為 0010 正好和dump檔案中的 --U- 吻合
flag 1 nibble
C = Committed; U = Commit Upper Bound; T = Active at CSC; B = Rollback of this UBA gives before image of the ITL.
---- = transaction is active, or committed pending cleanout
C--- = transaction has been committed and locks cleaned out
-B-- = this undo record contains the undo for this ITL entry
--U- = transaction committed (maybe long ago); SCN is an upper bound
---T = transaction was still active at block cleanout SCN
Lck 3 nibbles
The number of row-level locks held in the block by this transaction.
Scn/Fsc : If the transaction has been cleaned out, this is the commit SCN or an upper bound thereof. Otherwise the leading two bytes contain the free space credit for the transaction - that is, the number of bytes freed in the block by the transaction
Scn = SCN of commited TX; Fsc = Free space credit (bytes)
00014040 0E 89 43 00 00 00 00 00 00 00 00 00 00 00 00 00
Scn/Fsc 第2條itl 這裡沒使用
00014050 00 00 00 00 00 00 00 00 00 00 00 00 00 01 01 00
第2條itl 這裡沒使用 flag ntab nrow
4 : 14 bytes 從這個flag位置開始是data區 也是下面的行的offset的起始地址
flag : N=pctfree hit(clusters), F=don't put on free list
K=flushable cluster keys. 當然還有別的標記: A ...
ntab : 這block中有幾個table的資料 cluster這個就可能大於1
nrow : block 有多少行資料
00014060 FF FF 14 00 9B 1F 83 1F 83 1F 00 00 01 00 9B 1F
frre fsbo fseo avsp tosp offs nrow row offs
frre : First free row index entry. -1=you have to add one.
fsbo : Free Space Begin offset 出去row dict 後面的可以放資料的空間的起始位置
也可以看成是從這個區域的開始"flag"到最後一個 "row offs"佔用的空間
fseo : Free Space End offset ( 9.2.0 )參與db_block_checking的計算剩餘空間
select 的時候oracle不是簡單的根據offset定位row.這個值也是參與了定位row的
avsp : Available space in the block (pctfree and pctused) ORA-01578
tosp : Total available space when all TXs commit ( 9.2.0 )參與db_block_checking
offs : 偏移量 用 cluster 的時候可以看出值
nrow : 這個table有多少行資料
row offs : 這行資料相對的起始位置 after delete & commit is 0xffff
00015FF0 00 00 00 00 00 00 00 2C 01 01 01 61 05 06 0E 89
fb lb cc length data block tail
5 : 使用者資料
6 : 4 bytes block tail
fb : K = Cluster Key (Flags may change meaning if this is set to show HASH cluster)
C = Cluster table member
H = Head piece of row
D = Deleted row
F = First data piece
L = Last data piece
P = First column continues from previous piece
N = Last column continues in next piece
lb : 和上面的 ITL 的lck相對應 表示這行是否被 lock 了
cc : 有幾列資料 這裡只能表示255列 超過了就會有連結行
length : 這列的資料的長度是多少
0xfa ( 250 bytes ) 其實0xfb,0xfc,0xfd 也同樣是250bytes
0xfe fb 00 ( 0xfb 00 表示的251 bytes 0xfe表示row的長度超過了250 bytes)
0xff 表示number 的 null 這也是oracle中null的表現形式排序的時候null最大了
欄位的資料超過250位元組是就用3bytes來表示欄位的長度,因為如果是long型別它的欄位再長
它在這個block中的資料的長度不會超過64K 所以最長用3bytes來表示行的長度已經夠了.再長就連結行了
data : 'a'
block tail : 改這 block 最後的4 bytes 資料中的任意肯定ora-1578
第 1 byte : 對應開始的 seq
第 2 byte : 對應開始的 type
第3,4byte : 對應開始的scn的末2為 control file 這裡是control seq
10.1.0~lgone@ONE.LG.OK> create table a(v varchar2(4000)) TABLESPACE t;

Table created.

10.1.0~lgone@ONE.LG.OK> insert into a values('a');

1 row created.


Start dump data blocks tsn: 17 file#: 5 minblk 10 maxblk 10
buffer tsn: 17 rdba: 0x0140000a (5/10)   
   //// buffer tsn:        
         資料檔案對應的 tablespace 的 number   這只是dump檔案中記錄的資料而已
         block 中是沒有記錄 tablespace 的 number 的  

scn: 0x0000.0043890e seq: 0x05 flg: 0x02 tail: 0x890e0605
frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
Block header dump:  0x0140000a
 Object id on Block? Y
 seg/obj: 0xd254  csc: 0x00.43890a  itc: 2  flg: O  typ: 1 - DATA
     fsl: 0  fnx: 0x0 ver: 0x01
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0004.00c.00001850  0x00801496.07b9.01  --U-    1  fsc 0x0000.0043890e
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
 
data_block_dump,data header at 0x87e125c
   ////  data_block_dump,data header at 0x87e125c
         其實這個block不是直接從 data buffer 中 dump 出來的這個表示真正dump時 block 的資料區的起始位置
         也就是下面這部分開始的位置
         
===============        ////  tsiz:    hsiz:   pbl:   bdba: 在資料檔案都是沒有儲存的 
tsiz: 0x1fa0           //// Total data area size  
                     8k的block: 8192-20(block head)-24(Transaction Header)-24*2(一個事務條)-4(block tail)=8096(0x1fa0)
hsiz: 0x14             //// Data header size  資料塊頭20個位元組+資料塊尾4個位元組=24位元組(0x14)
pbl: 0x087e125c        //// Pointer to buffer holding the block
bdba: 0x0140000a
     76543210
   
flag=--------
ntab=1
nrow=1
frre=-1
fsbo=0x14
fseo=0x1f9b
avsp=0x1f83
tosp=0x1f83
0xe:pti[0]  nrow=1  offs=0
0x12:pri[0] offs=0x1f9b
block_row_dump:
tab 0, row 0, @0x1f9b
tl: 5 fb: --H-FL-- lb: 0x1  cc: 1
col  0: [ 1]  61
end_of_block_dump
End dump data blocks tsn: 17 file#: 5 minblk 10 maxblk 10

block 壞掉了還可以報:
    ORA-600 (4519) Cache layer block type is incorrect
    ORA-600 (4393) Check for Type for Segment header with free list
    ORA-600 (4136) Check Rollback segment block
    ORA-600 (4154) Check Rollback segment block 

    Ora-600[kcbzpb_1],[d],[kind],[chk] gets signaled when the block got corrupted in memory.
    The only way it should be bad is if a stray store into memory destroyed the header or tail.
    d = blocknumber, kind= kind of corruption detected,chk = checksum flag

    ora-600[3398] and ora-600[3339]  
    ora-600[3398] is not in oracle 8.
    ora-600[3398] means it failed a verification check before writing back to disk,  so it must 
        be an in-memory corruption.
    ora-600[3339] comes with ora-1578 and means either disk corruption or in memory corruption after read.
    ora-600 [3339] has been removed from 7.2+
    From 7.2+  ora-600 [3398] has become ora-600 [3374] with some checks added.

2進位制儲存格式
               ALTER SESSION SET EVENTS '10289 trace name context forever, level 1';
               ALTER SESSION SET EVENTS '10289 trace name context off';

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/8193225/viewspace-1006846/,如需轉載,請註明出處,否則將追究法律責任。

相關文章