Data block Structure and Row Piece Structure

目標：搞清Data Block 與 Row Piece的物理結構，理解Oracle 如何儲存行資料

參考文件

P118<>

P260<>

1 Data Block Format

2 Data Block Free Space Management

3 釋放的空間可以被重用的條件

4 Oracle coalesces the free space of a data block only when(Coalesces空間的條件)

5 Row Chaining and Migrating

6 PCTFREE, PCTUSED, and Row Chaining

7 How table data is stored

8 Row Piece Format

9 Rowids of Row Pieces

10 Column Order

11 測試

11.1 正常情況下，只佔了一個row piece

11.2 當columns 個數大於等於256時，出現的分成兩個row pieces的情形.

11.3 測試column 小於 256時單行資料跨塊的情形

12 Notes

1 Data Block Format

The Oracle data block format is similar regardless of whether the data block contains table, index, or clustered data.

Block Header: The header contains general block information. Such as the block address and the type of segment (for example, data or index).

Table Directory: contains information about the table who have rows in this block.

Row Directory: contains information about the actual rows in the block (including address of each row piece in the data area). Once the space was allocated in the row directory, this space is not reclaim(回收)even the row is deleted. Oracle reuses(重用) the space only when new rows are inserted in the block.

Overhead: The data block header, table directory and row directory are referred to collectively as overhead. Some block overhead is fixed size; the total block overhead size is variable. On average, the fixed and variable portions of data block overhead total 84 to 107 bytes.

Row data: This portion of the data block contains table or index data. Rows can span blocks

Free space: Free space is allocated for insertion of new rows and for updates to rows that require additional space. Whether issued insertions actually occur in a given data block is a function of current free space in that data block and the value of the space management parameter PCTFREE.

In data blocks allocated for the data segment of a table or cluster, or for the index segment of an index, free space can also hold transaction entries. A transaction entry is required in a block for each INSERT, UPDATE, DELETE, and SELECT...FOR UPDATE statement accessing one or more rows in the block. The space required for transaction entries is operating system dependent; however, transaction entries in most operating systems require approximately 23 bytes. (Transaction entries 的固定開銷大約23 bytes in Free Space portion)

2 Data Block Free Space Management

Free space can be managed automatically or manually

Auto: The in-segment free/used space is tracked using bitmaps

Manual: The in-segment free/used space is tracked using free lists.

Two types statement can increase the free space of one or more data blocks: DELETE statements, and UPDATE statements that update existing values to smaller values. (兩種釋放datablock空間的語句:Delete and update to smaller value. 但釋放的空間並不能被立即被重用)

3 釋放的空間可以被重用的條件

1) If the INSERT statement is in the same transaction and subsequent to the statement that frees space, then the INSERT statement can use the space made available. (在同個事務，且insert緊跟delete操作)

2) If the INSERT statement is in a separate transaction from the statement that frees space (perhaps being run by another user), then the INSERT statement can use the space made available only after the other transaction commits and only if the space is needed. (不在同個事務，只有在釋放空間的事務提交後，且釋放的空間正是申請空間的事務所需的大小)

4 Oracle coalesces the free space of a data block only when(Coalesces空間的條件)

(1) An INSERT or UPDATE statement attempts to use a block that contains enough free space to contain a new row piece.

(2) The free space is fragmented so the row piece cannot be inserted in a contiguous section of the block.

Oracle does this compression only in such situations, because otherwise the performance of a database system decreases due to the continuous compression of the free space in data blocks.

5 Row Chaining and Migrating

In two circumstances, the data for a row in a table may be too large to fit into a single data block. In the first case, the row is too large to fit into one data block then it is first inserted. In this case, Oracle stores the data for the row in a chain of data blocks (one or more) reserved for that segment. Row chaining most often occurs with large rows, such as rows that contain a column of datatype LONG or LONG RAW. Row chaining in these cases is unavoidable.

However, in the second case, a row that originally fit into one data block is updated so that the overall row length increases, and the block’s free space is already completely filled. In this case, Oracle migrates the data for the entire row to a new data block, assuming the entire row can fit in a new block. Oracle preserves the original row piece of a migrated row to point to the new block containing the migrated row. The rowid of a migrated row does not change. When a row is chained or migrated, I/O performance associated with this row decreases because Oracle must scan more than one data block to retrieve the information for the row.

6 PCTFREE, PCTUSED, and Row Chaining

For manually managed tablespaces, two space management parameters, PCTFREE and PCTUSED, enable you to control the use of free space for inserts and updates to the rows in all the data blocks of a particular segment. Specify these parameters when you create or alter a table or cluster (which has its own data segment). You can also specify the storage parameter

7 How table data is stored

When you create a table, Oracle automatically allocates a data segment in a tablespace to hold the table’s future data. You can control the allocation and use of space for a table’s data segment in the following two ways:

Setting the storage parameters for the data segment.設定segment 引數

Setting the storage parameters for the data block. 設定block引數

（具體的引數細節要參考相應的文件?未整理）

Oracle stores each row of a database table containing data for less than 256 columns as one or more row pieces. If an entire row can be inserted into a single data block, then Oracle stores the row as one row piece. However, if all of a row’s data cannot be inserted into a single data block or an update to an existing row causes the row to outgrow its data block, Oracle stores the row using multiple row pieces. A data block usually contains only one row piece for each row. When Oracle must store a row in more than one row piece, it is chained across multiple blocks.

When a table has more than 255 columns, rows that have data after the 255 column are likely to be chained within the same block. This is called intra-block chaining. A chained row’s pieces are chained together using the rowids of the pieces. (如果一個表小於256列，Oracle將表中的單行資料儲存在單個資料塊中作為一個Row Piece，如果由於初始太大或者由於更新導致太大，導致不能在單個資料中存放時，Oracle會將多餘的資料存放在另一個Block中，這樣就生多個Row Pieces，從而形成Chain,但仍有一個Rowid. A data block usually contains only one row piece for each table row data. Chain spans multiple blocks. 如果一個表大於256列，則第255列之後的資料可能被chained within the same block.這就是 intra-block chaining，而不會span multiple blocks。）

8 Row Piece Format

The row header precedes the data and contains information about:

1) Row pieces

2) Chaining(for chained row pieces only)

3) Columns in the row piece

4) Cluster keys(for clustered data only)

A row fully contained in one block has at least 3 bytes of row header. After the row header information, each row contains column length and data. The column length requires 1 byte for columns that store 250 bytes or less, or 3 bytes for columns that store more than 250 bytes, and precedes the column data. Space required for column data depends on the datatype. If the datatype of a column is variable length, then the space required to hold a value can grow and shrink with updates to the data.

To conserve space, a null in a column only stores the column length (zero). Oracle does not store data for the null column. Also, for trailing null columns, Oracle does not even store the column length. (Each row also uses 2 bytes in the data block header’s row directory,見第一個圖)

Clustered rows contain the same information as nonclustered rows. In addition, they contain information that references the cluster key to which they belong. (Cluster rows包含cluster key資訊)

9 Rowids of Row Pieces

The rowed identifies each piece by its location or address. After they are assigned, a given row piece retains its rowed until the corresponding row is deleted or exported and imported using the Export and Import Utilities. Because rowids are constant for the lifetime of a row piece, it is useful to refer rowid in SQL statements such as SELECT, UPDATE, and DELETE.(rowid 是一個常量)

10 Column Order

The column order is the same as they were listed in the CREATE TABLE statement, but this is not guaranteed. Long type column is always stored in the last. To conserve space, a null in a column only stores the column length (zero). Oracle does not store data for the null column. Also, for trailing null columns, Oracle does not even store the column length. So try to place columns that frequently contain nulls last. (Column Order 順序沒保證)

11 測試

11.1 正常情況下，只佔了一個row piece

SQL> desc t2;

Name Null? Type

----------------------------------------- -------- ----------------------------

ID NUMBER

NAME VARCHAR2(20)

select dbms_rowid.rowid_relative_fno(rowid),dbms_rowid.rowid_block_number(rowid) from t2;

alter system dump datafile 4 block 14362;

Start dump data blocks tsn: 3 file#: 4 minblk 14362 maxblk 14362

buffer tsn: 3 rdba: 0x0100381a (4/14362)

scn: 0x0000.00070ba3 seq: 0x01 flg: 0x06 tail: 0x0ba30601

frmt: 0x02 chkval: 0xef35 type: 0x06=trans data

Block header dump: 0x0100381a

Object id on Block? Y

seg/obj: 0x1796 csc: 0x00.70a55 itc: 2 flg: O typ: 1 - DATA

fsl: 0 fnx: 0x0 ver: 0x01

Itl Xid Uba Flag Lck Scn/Fsc

0x01 0x0005.00b.00000029 0x0080004c.0227.2c C--- 0 scn 0x0000.00070a2f

0x02 0x0006.01b.0000002f 0x0080005f.0017.24 --U- 1 fsc 0x0000.00070ba3

data_block_dump,data header at 0x3fb985c

===============

tsiz: 0x1fa0

hsiz: 0x14

pbl: 0x03fb985c

bdba: 0x0100381a

76543210

flag=--------

ntab=1

nrow=1

frre=-1

fsbo=0x14

fseo=0x1f89

avsp=0x1f80

tosp=0x1f80

0xe:pti[0] nrow=1 offs=0

0x12:pri[0] offs=0x1f89

block_row_dump:

tab 0, row 0, @0x1f89

tl: 12 fb: --H-FL-- lb: 0x2 cc: 2

col 0: [ 2] c1 02

col 1: [ 5] 6d 6f 75 73 65

end_of_block_dump

End dump data blocks tsn: 3 file#: 4 minblk 14362 maxblk 14362

11.2 當columns 個數大於等於256時，出現的分成兩個row pieces的情形.即intra-block chaining

Create table t_column(a number);

declare

v_string varchar(100);

begin

for i in 2 .. 260 loop

v_string := 'alter table t_column add col_' || i || ' number';

execute immediate v_string;

end loop;

end ;

/

insert into t_column values(

1 ,

2 ,

3 ,

4 ,

5 ,

6 ,

7 ,

8 ,

9 ,

10 ,

...省略

250,

251,

252,

253,

254,

255,

256,

257,

258,

259,

260

);

1 row created.

SQL> commit;

Commit complete.

SQL> select SEGMENT_NAME, BLOCK_ID start_block_id, BLOCK_ID+BLOCKS-1 end_block_id from dba_extents where segment_name='T_COLUMN';

SEGMENT_NAME START_BLOCK_ID END_BLOCK_ID

-------------------------------------------------------------------------------- -------------- ------------

T_COLUMN 14369 14376

SQL> select SEGMENT_NAME, BLOCK_ID start_block_id, BLOCK_ID+BLOCKS-1 end_block_id

2 from dba_extents

3 where BLOCK_ID between 14369 and 14376;

SEGMENT_NAME START_BLOCK_ID END_BLOCK_ID

-------------------------------------------------------------------------------- -------------- ------------

T_COLUMN 14369 14376

可知在 block id in (14369-14376)上只有一個 Segment

SQL> select dbms_rowid.rowid_relative_fno(rowid),dbms_rowid.rowid_block_number(rowid) from t_column;

DBMS_ROWID.ROWID_RELATIVE_FNO( DBMS_ROWID.ROWID_BLOCK_NUMBER(

------------------------------ ------------------------------

4 14370

Dump block

SQL> alter system dump datafile 4 block 14370;

System altered.

檢視DUMP檔案

*** 2009-12-10 11:07:46.097

*** SESSION ID:(11.7) 2009-12-10 11:07:46.096

Start dump data blocks tsn: 3 file#: 4 minblk 14370 maxblk 14370

buffer tsn: 3 rdba: 0x01003822 (4/14370)

scn: 0x0000.0007e745 seq: 0x01 flg: 0x02 tail: 0xe7450601

frmt: 0x02 chkval: 0x0000 type: 0x06=trans data

Block header dump: 0x01003822

Object id on Block? Y

seg/obj: 0x17a2 csc: 0x00.7e743 itc: 3 flg: O typ: 1 - DATA

fsl: 0 fnx: 0x0 ver: 0x01

Itl Xid Uba Flag Lck Scn/Fsc

0x01 0x0007.00c.00000039 0x00800169.001e.2b --U- 2 fsc 0x0000.0007e745

0x02 0x0000.000.00000000 0x00000000.0000.00 ---- 0 fsc 0x0000.00000000

0x03 0x0000.000.00000000 0x00000000.0000.00 C--- 0 scn 0x0000.00000000

data_block_dump,data header at 0x3fb9874

===============

tsiz: 0x1f88

hsiz: 0x16

pbl: 0x03fb9874

bdba: 0x01003822

76543210

flag=--------

ntab=1

nrow=2

frre=-1

fsbo=0x16

fseo=0x1bd1

avsp=0x1bbb

tosp=0x1bbb

0xe:pti[0] nrow=2 offs=0

0x12:pri[0] offs=0x1be9

0x14:pri[1] offs=0x1bd1

block_row_dump:

tab 0, row 0, @0x1be9

tl: 927 fb: -----L-- lb: 0x1 cc: 255

col 0: [ 2] c1 07

col 1: [ 2] c1 08

col 2: [ 2] c1 09

col 3: [ 2] c1 0a

col 4: [ 2] c1 0b

col 5: [ 2] c1 0c

col 6: [ 2] c1 0d

col 7: [ 2] c1 0e

col 8: [ 2] c1 0f

col 9: [ 2] c1 10

col 10: [ 2] c1 11

.....省略

col 250: [ 3] c2 03 39

col 251: [ 3] c2 03 3a

col 252: [ 3] c2 03 3b

col 253: [ 3] c2 03 3c

col 254: [ 3] c2 03 3d

tab 0, row 1, @0x1bd1

tl: 24 fb: --H-F--- lb: 0x1 cc: 5

nrid: 0x01003822.0

col 0: [ 2] c1 02

col 1: [ 2] c1 03

col 2: [ 2] c1 04

col 3: [ 2] c1 05

col 4: [ 2] c1 06

end_of_block_dump

End dump data blocks tsn: 3 file#: 4 minblk 14370 maxblk 14370

從上面的DUMP檔案可以看出，一個row分成了兩個row piece

tab 0, row 1, @0x1bc9

tl: 24 fb: --H-F--- lb: 0x1 cc: 5

nrid: 0x04000196.0

看上面的fb，標示為H和F，H代表Head piece of row ，F代表First data piece

nrid: 0x04000196.0 下一個row piece的地址

tab 0, row 0, @0x1be1

tl: 927 fb: -----L-- lb: 0x1 cc: 255

看上面的fb，標示為L，L代表Last data piece

11.3 測試column Number小於 256列時單行資料跨塊的情形

如何測試？參考<中行遷移和行連結的清除及檢測_20091210.doc>>

如何檢視行資料有沒有跨塊，在dump檔案中查詢nrid標記

idle> create table t2(col1 char(2000),col2 char(2000),col3 char(2000),col4 char(2000),col5 char(2000));

Table created.

idle> insert into t2 values('1','2','3','4','5');

1 row created.

idle> commit;

Commit complete.

idle> select dbms_rowid.rowid_relative_fno(rowid),dbms_rowid.rowid_block_number(rowid) from t2;

DBMS_ROWID.ROWID_RELATIVE_FNO(ROWID) DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID)

------------------------------------ ------------------------------------

6 15

idle> alter system dump datafile 6 block 15;

System altered.

檢視dump 檔案 nrid: 0x0180000e.0 表明有行遷移

*** 2004-07-21 00:40:22.970

Start dump data blocks tsn: 6 file#: 6 minblk 15 maxblk 15

buffer tsn: 6 rdba: 0x0180000f (6/15)

scn: 0x0000.000f6ad7 seq: 0x01 flg: 0x02 tail: 0x6ad70601

frmt: 0x02 chkval: 0x0000 type: 0x06=trans data

Block header dump: 0x0180000f

Object id on Block? Y

seg/obj: 0x7184 csc: 0x00.f6ad6 itc: 2 flg: E typ: 1 - DATA

brn: 0 bdba: 0x1800009 ver: 0x01

inc: 0 exflg: 0

Itl Xid Uba Flag Lck Scn/Fsc

0x01 0x0008.018.0000009e 0x008000bf.0033.09 --U- 1 fsc 0x0000.000f6ad7

0x02 0x0000.000.00000000 0x00000000.0000.00 ---- 0 fsc 0x0000.00000000

data_block_dump,data header at 0xad7d464

===============

tsiz: 0x1f98

hsiz: 0x14

pbl: 0x0ad7d464

bdba: 0x0180000f

76543210

flag=--------

ntab=1

nrow=1

frre=-1

fsbo=0x14

fseo=0x179f

avsp=0x178b

tosp=0x178b

0xe:pti[0] nrow=1 offs=0

0x12:pri[0] offs=0x179f

block_row_dump:

tab 0, row 0, @0x179f

tl: 2041 fb: --H-F--N lb: 0x1 cc: 2

nrid: 0x0180000e.0

col 0: [2000]

31 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20

20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20

……… 省略…

20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20

col 1: [28]

32 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20

20 20 20

end_of_block_dump

End dump data blocks tsn: 6 file#: 6 minblk 15 maxblk 15

12 Notes

1) Table can be define up to 1000 columns the same as views

2）假定一個8K的data block，有多少個row piece呢，一個row piece多大，開銷多大？row piece並沒有固定的大小，只是指儲存的一行資料。參考Data block的結構，有哪些檢視可用來查詢，什麼時候才跨塊，是不能在single datablock 存放下時，做測試

3）Default Values for Columns： If a default values is not explicitly defined for a column, then the default for the column is implicitly set to NULL.

4) 一個block中可以存放多個table的資料 ?

不能吧

Block=>extent=>Segment=table

5) 關於Block的詳細解析以後再研究

參考<物理結構詳解_20091210.doc>>

Data block Structure and Row Piece Structure_20091209

相關文章