[分享] 複合索引branch block上儲存幾個列的資訊 zt

asword發表於2009-09-21
[@more@]
這是很早之前研究的問題,已經忘記了是什麼案例導致去研究這個問題。

不過這個問題本省就很有意思,當在兩個或者多個列上建立索引時,究竟branch block中會紀錄幾個列的資訊?

首先我們知道branch block中儲存結構大致是這樣的 pointer - value - pointer - value - pointer.其中pointer指向leaf block或者下一層branch block.

那麼這個value的值就是為了區分左右指標所指向的兩個leaf block中(或者下一層branch block中)的值,這樣從root -> branch -> leaf的時候才能準確定位到leaf block.
在提供答案前,首先給出幾種不同的論點

1.不管建立在多少列上,branch block中只儲存leading column資訊

2. 如果建立在n個列上,branch block中儲存n-1列的資訊

3. branch block上儲存所有列的資訊.

這三種觀點都有問題,首先觀點一

如果這個索引的leading column的可選擇性很差,例如status或者flag這樣的列,distinct value很少

這時候如果索引建立在 (status, id)上,對於

select * from table where status=0 and id=12345這樣的查詢,range scan並不能準確定位到 leaf block

對於觀點二,在n=2的時候顯然也可以用上面的例子來推翻

對於觀點三,是可以正確定位到leaf block,但是如果對於像

index(id,name) 其中name 假定name欄位型別為varchar(1000),這樣的索引效率也不好

事實上因為id列可選擇性很好,一般不需要再儲存name值到branch block中就可以區分左右的block

在branch block中儲存的name列的值浪費了branch block的空間,

而我們知道單個branch block中能夠儲存值的多少直接影響了樹的高度,而樹的高度顯然直接影響了index的效率

透過上面的分析,以上的三種觀點都是錯誤的,那麼oracle究竟如何來儲存列資訊才能既做到準確定位leaf block又不儲存多餘的值呢?

這裡首先給出試驗的結果,我沒有找到官方文件或者其他的文章來解釋這個結論,不過透過試驗來看結論是很清楚的,而且也符合上面提到的兩種考慮因素。

其實剛剛我們被限制住的原因是一直認為branch block中儲存的列的個數是個常數,但事實上儲存幾個列是變化的。

這個值只需要區分其左右指標所指向的兩個block的值就可以了,

更簡單的說,當oracle從上到下走到這個branch block的時候,

透過這個值可以準確的知道向左走還是向右走,

舉個例子,我們上面一開始提到的索引例子,

第一個是 index(status, id) , 如果左邊block的最大值為(0,33),右邊block的最小值為(0,35),

因為此時單純從第一列不能區分兩個block,這時就需要儲存兩列的資訊

如果左邊block最大值為(0,100),右邊block最小值為(1,1)

因為此時從第一列就可以區分出兩個block,所以只需要儲存第一列的資訊

這樣對於

select * from table where status=0 and id=12345

查詢,透過branch block中的值既可以知道“向左走,向右走”,準確定位leaf block

又可以避免不必要的浪費branch block空間,避免樹高度不必要的增高

下面是試驗的過程,直接copy以前自己發的郵件,懶得翻譯了[分享] 複合索引branch block上儲存幾個列的資訊 zt790)this.width=790;" src="http://www.itpub.net/images/smilies/14.gif" />

We discussed the internal storage rule for composite index last time, but we didn’t get conclusion yet.

So I did a test for this.

SQL> create table test(n1 int,n2 int ,n3 int,v1 varchar2(2000),v2 varchar2(2000),v3 varchar2(2000));

Table created.

SQL> begin
2 for i in 1..1000 loop
3 insert into test values(0,i,i,rpad(’a',2000),rpad(’a',2000),rpad(’a',2000));
4 end loop;
5 end;
6 /

PL/SQL procedure successfully completed.

insert 1000 rows, the format is like this
0,1,1,xxxxxx,xxxxxxxxxx
0,2,2,xxxxxx,xxxxxxxxxx
0,3,3, xxxxxx,xxxxxxxxxx
……..
0,1000,1000, xxxxxx,xxxxxxxxxx

SQL> commit;

Commit complete.

create 7 indexes on this table, the naming rule is index_

_N (number) V (varchar)

SQL> create index index_2_nv on test(n1,v1);

Index created.

SQL> create index index_2_vn on test(v1,n1);

Index created.

SQL> create index index_3_nnv on test(n1,n2,v1);

Index created.

SQL> create index index_3_nvn on test(n1,v1,n2);

Index created.

SQL> create index index_4_nnnv on test(n1,n2,n3,v1);

Index created.

SQL> create index index_4_nnvn on test(n1,n2,v1,n3);

Index created.

SQL> create index index_5_nnnvv on test (n1,n2,n3,v1,v2);

Index created.

SQL> analyze table test compute statistics;

Table analyzed.

SQL> select index_name,PREFIX_LENGTH,BLEVEL,LEAF_BLOCKS from user_indexes
2 where lower(index_name) in (’index_2_nv’,'index_2_vn’,'index_3_nnv’,'index_3_nvn’,
3 ‘index_4_nnnv’,'index_4_nnvn’,'index_5_nnnvv’);

INDEX_NAME PREFIX_LENGTH BLEVEL LEAF_BLOCKS
—————————— ————- ———- ———–
INDEX_2_NV 5 334
INDEX_2_VN 5 334
INDEX_3_NNV 1 334
INDEX_3_NVN 5 334
INDEX_4_NNNV 1 334
INDEX_4_NNVN 1 334
INDEX_5_NNNVV 2 1000

7 rows selected.

You see, three indexes (INDEX_2_NV, INDEX_2_VN, INDEX_3_NVN) have level5. This means these indexes’ branch block contains varchar2000.

Then I dump these indexes’ root block to confirm this.

SQL> select segment_name,file_id,block_id from dba_extents where
2 lower(segment_name) in (’index_2_nv’,'index_2_vn’,'index_3_nnv’,'index_3_nvn’,
‘index_4_nnnv’,'index_4_nnvn’,'index_5_nnnvv’) and extent_id=0 order by segment_name 3 ;

SEGMENT_NAME FILE_ID BLOCK_ID
—————————— ———- ———-
INDEX_2_NV 9 16082
INDEX_2_VN 9 16097
INDEX_3_NNV 9 13587
INDEX_3_NVN 1 22053
INDEX_4_NNNV 1 32702
INDEX_4_NNVN 1 62230
INDEX_5_NNNVV 1 61697

7 rows selected.
SQL> @temp.sql
SQL> alter system dump datafile 9 block 16083;
System altered.

SQL> alter system dump datafile 9 block 16098;
System altered.

SQL> alter system dump datafile 9 block 13588;
System altered.

SQL> alter system dump datafile 1 block 22054;
System altered.

SQL> alter system dump datafile 1 block 32703;
System altered.

SQL> alter system dump datafile 1 block 62231;
System altered.

SQL> alter system dump datafile 1 block 61698;
System altered.

.[分享] 複合索引branch block上儲存幾個列的資訊 zt790)this.width=790;" src="" />

It’s weird. The index INDEX_2_NV and INDEX_2_VN only have 2 columns.But in the root block dump file, we can find col0,col1 and col2(maybesomething related to rowid, I guess).

I am not sure about what the third col is, but it’s clear that the number of columns in branch block is related about the data.

0,1,1,xxxxxx,xxxxxxxxxx
0,2,2,xxxxxx,xxxxxxxxxx
0,3,3, xxxxxx,xxxxxxxxxx
……..
0,1000,1000, xxxxxx,xxxxxxxxxx

The index INDEX_2_NV, INDEX_2_VN can’t distinguish the leaf blocksusing the prefix two columns, so it added the third column (unknown,maybe something related to rowid, I guess).

The indexes INDEX_3_NNV, INDEX_4_NNNV, INDEX_4_NNVN and INDEX_5_NNNVVcan distinguish the leaf blocks using the prefix two columns, so thebranch blocks only contain 2 columns.

The index INDEX_3_NVN can should use three columns to distinguish theleft/right leaf blocks, so it contains 3 columns (because the prefixtwo columns’ values are the same).

You can check the dump file xfan_ora_25936.trc for details.

So I guess that the number of columns stored in branch block is related to linked leaf blocks’ data.
The columns which stored in branch blocks need to distinguish the left/right linked leaf blocks.
If the rule for composite index is like I guessed, then the columnsstored in branch blocks maybe different in one index branch block!!

So I did another test to prove this:

SQL> create table test2(n1 int,n2 int,n3 int,v1 varchar2(2000),v2 varchar2(2000));

Table created.

SQL> create index test_idx on test2(n1,n2,n3,v1,v2);

Index created.

SQL> begin
2 for i in 1..1000 loop
insert into test2 values(0,0,i,rpad(’a',2000),rpad(’a',2000));
end loop;
end;
/ 3 4 5 6

PL/SQL procedure successfully completed.

Insert first 1000 rows like this:

0, 0, 1, xxxxx,xxxxxx
0, 0, 2, xxxxx,xxxxxx
0, 0, 3, xxxxx,xxxxxx
……
0, 0, 1000, xxxxx,xxxxxx

SQL> SQL>
SQL> begin
2 for i in 1001..2000 loop
insert into test2 values(0,i,i,rpad(’a',2000),rpad(’a',2000));
end loop;
end;
/ 3 4 5 6

Then insert other 1000 rows like this:
0,1,1,xxxx,xxxx
0,2,2,xxxx,xxxx
0,3,3,xxxx,xxxx
…….
0,1000,1000,xxxx,xxxx

Then I did treedump and branch block dump.

Here is a snippet of the branch block dump:

row#454[2145] dba: 4218395=0×405e1b
col 0; len 1; (1): 80
col 1; len 1; (1): 80
col 2; len 3; (3): c2 0a 64
col 3; TERM
row#455[2133] dba: 4218396=0×405e1c
col 0; len 1; (1): 80
col 1; len 1; (1): 80
col 2; len 2; (2): c2 0b
col 3; TERM
row#456[2124] dba: 4218397=0×405e1d
col 0; len 1; (1): 80
col 1; len 1; (1): c2
col 2; TERM
row#457[2113] dba: 4218398=0×405e1e
col 0; len 1; (1): 80
col 1; len 3; (3): c2 0b 03
col 2; TERM
row#458[2102] dba: 4218399=0×405e1f
col 0; len 1; (1): 80
col 1; len 3; (3): c2 0b 04
col 2; TERM

You see, the numbers of columns changed in one branch block!!

The attachment index.trc is the dump file

So the conclusion is that the number of columns stored in branch block is related to linked leaf blocks’ data.

The branch block will store as less columns as possible to distinguish the left/right leaf blocks it linked

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/82387/viewspace-1027205/,如需轉載,請註明出處,否則將追究法律責任。

相關文章