[分享] 複合索引branch block上儲存幾個列的資訊 zt
[@more@]
這是很早之前研究的問題,已經忘記了是什麼案例導致去研究這個問題。
不過這個問題本省就很有意思,當在兩個或者多個列上建立索引時,究竟branch block中會紀錄幾個列的資訊?
首先我們知道branch block中儲存結構大致是這樣的 pointer - value - pointer - value - pointer.其中pointer指向leaf block或者下一層branch block.
那麼這個value的值就是為了區分左右指標所指向的兩個leaf block中(或者下一層branch block中)的值,這樣從root -> branch -> leaf的時候才能準確定位到leaf block.
在提供答案前,首先給出幾種不同的論點
1.不管建立在多少列上,branch block中只儲存leading column資訊
2. 如果建立在n個列上,branch block中儲存n-1列的資訊
3. branch block上儲存所有列的資訊.
這三種觀點都有問題,首先觀點一
如果這個索引的leading column的可選擇性很差,例如status或者flag這樣的列,distinct value很少
這時候如果索引建立在 (status, id)上,對於
select * from table where status=0 and id=12345這樣的查詢,range scan並不能準確定位到 leaf block
對於觀點二,在n=2的時候顯然也可以用上面的例子來推翻
對於觀點三,是可以正確定位到leaf block,但是如果對於像
index(id,name) 其中name 假定name欄位型別為varchar(1000),這樣的索引效率也不好
事實上因為id列可選擇性很好,一般不需要再儲存name值到branch block中就可以區分左右的block
在branch block中儲存的name列的值浪費了branch block的空間,
而我們知道單個branch block中能夠儲存值的多少直接影響了樹的高度,而樹的高度顯然直接影響了index的效率
透過上面的分析,以上的三種觀點都是錯誤的,那麼oracle究竟如何來儲存列資訊才能既做到準確定位leaf block又不儲存多餘的值呢?
這裡首先給出試驗的結果,我沒有找到官方文件或者其他的文章來解釋這個結論,不過透過試驗來看結論是很清楚的,而且也符合上面提到的兩種考慮因素。
其實剛剛我們被限制住的原因是一直認為branch block中儲存的列的個數是個常數,但事實上儲存幾個列是變化的。
這個值只需要區分其左右指標所指向的兩個block的值就可以了,
更簡單的說,當oracle從上到下走到這個branch block的時候,
透過這個值可以準確的知道向左走還是向右走,
舉個例子,我們上面一開始提到的索引例子,
第一個是 index(status, id) , 如果左邊block的最大值為(0,33),右邊block的最小值為(0,35),
因為此時單純從第一列不能區分兩個block,這時就需要儲存兩列的資訊
如果左邊block最大值為(0,100),右邊block最小值為(1,1)
因為此時從第一列就可以區分出兩個block,所以只需要儲存第一列的資訊
這樣對於
select * from table where status=0 and id=12345
查詢,透過branch block中的值既可以知道“向左走,向右走”,準確定位leaf block
又可以避免不必要的浪費branch block空間,避免樹高度不必要的增高
下面是試驗的過程,直接copy以前自己發的郵件,懶得翻譯了790)this.width=790;" src="http://www.itpub.net/images/smilies/14.gif" />
We discussed the internal storage rule for composite index last time, but we didn’t get conclusion yet.
So I did a test for this.
SQL> create table test(n1 int,n2 int ,n3 int,v1 varchar2(2000),v2 varchar2(2000),v3 varchar2(2000));
Table created.
SQL> begin
2 for i in 1..1000 loop
3 insert into test values(0,i,i,rpad(’a',2000),rpad(’a',2000),rpad(’a',2000));
4 end loop;
5 end;
6 /
PL/SQL procedure successfully completed.
insert 1000 rows, the format is like this
0,1,1,xxxxxx,xxxxxxxxxx
0,2,2,xxxxxx,xxxxxxxxxx
0,3,3, xxxxxx,xxxxxxxxxx
……..
0,1000,1000, xxxxxx,xxxxxxxxxx
SQL> commit;
Commit complete.
create 7 indexes on this table, the naming rule is index_
_N (number) V (varchar)
SQL> create index index_2_nv on test(n1,v1);
Index created.
SQL> create index index_2_vn on test(v1,n1);
Index created.
SQL> create index index_3_nnv on test(n1,n2,v1);
Index created.
SQL> create index index_3_nvn on test(n1,v1,n2);
Index created.
SQL> create index index_4_nnnv on test(n1,n2,n3,v1);
Index created.
SQL> create index index_4_nnvn on test(n1,n2,v1,n3);
Index created.
SQL> create index index_5_nnnvv on test (n1,n2,n3,v1,v2);
Index created.
SQL> analyze table test compute statistics;
Table analyzed.
SQL> select index_name,PREFIX_LENGTH,BLEVEL,LEAF_BLOCKS from user_indexes
2 where lower(index_name) in (’index_2_nv’,'index_2_vn’,'index_3_nnv’,'index_3_nvn’,
3 ‘index_4_nnnv’,'index_4_nnvn’,'index_5_nnnvv’);
INDEX_NAME PREFIX_LENGTH BLEVEL LEAF_BLOCKS
—————————— ————- ———- ———–
INDEX_2_NV 5 334
INDEX_2_VN 5 334
INDEX_3_NNV 1 334
INDEX_3_NVN 5 334
INDEX_4_NNNV 1 334
INDEX_4_NNVN 1 334
INDEX_5_NNNVV 2 1000
7 rows selected.
You see, three indexes (INDEX_2_NV, INDEX_2_VN, INDEX_3_NVN) have level5. This means these indexes’ branch block contains varchar2000.
Then I dump these indexes’ root block to confirm this.
SQL> select segment_name,file_id,block_id from dba_extents where
2 lower(segment_name) in (’index_2_nv’,'index_2_vn’,'index_3_nnv’,'index_3_nvn’,
‘index_4_nnnv’,'index_4_nnvn’,'index_5_nnnvv’) and extent_id=0 order by segment_name 3 ;
SEGMENT_NAME FILE_ID BLOCK_ID
—————————— ———- ———-
INDEX_2_NV 9 16082
INDEX_2_VN 9 16097
INDEX_3_NNV 9 13587
INDEX_3_NVN 1 22053
INDEX_4_NNNV 1 32702
INDEX_4_NNVN 1 62230
INDEX_5_NNNVV 1 61697
7 rows selected.
SQL> @temp.sql
SQL> alter system dump datafile 9 block 16083;
System altered.
SQL> alter system dump datafile 9 block 16098;
System altered.
SQL> alter system dump datafile 9 block 13588;
System altered.
SQL> alter system dump datafile 1 block 22054;
System altered.
SQL> alter system dump datafile 1 block 32703;
System altered.
SQL> alter system dump datafile 1 block 62231;
System altered.
SQL> alter system dump datafile 1 block 61698;
System altered.
.790)this.width=790;" src="" />
It’s weird. The index INDEX_2_NV and INDEX_2_VN only have 2 columns.But in the root block dump file, we can find col0,col1 and col2(maybesomething related to rowid, I guess).
I am not sure about what the third col is, but it’s clear that the number of columns in branch block is related about the data.
0,1,1,xxxxxx,xxxxxxxxxx
0,2,2,xxxxxx,xxxxxxxxxx
0,3,3, xxxxxx,xxxxxxxxxx
……..
0,1000,1000, xxxxxx,xxxxxxxxxx
The index INDEX_2_NV, INDEX_2_VN can’t distinguish the leaf blocksusing the prefix two columns, so it added the third column (unknown,maybe something related to rowid, I guess).
The indexes INDEX_3_NNV, INDEX_4_NNNV, INDEX_4_NNVN and INDEX_5_NNNVVcan distinguish the leaf blocks using the prefix two columns, so thebranch blocks only contain 2 columns.
The index INDEX_3_NVN can should use three columns to distinguish theleft/right leaf blocks, so it contains 3 columns (because the prefixtwo columns’ values are the same).
You can check the dump file xfan_ora_25936.trc for details.
So I guess that the number of columns stored in branch block is related to linked leaf blocks’ data.
The columns which stored in branch blocks need to distinguish the left/right linked leaf blocks.
If the rule for composite index is like I guessed, then the columnsstored in branch blocks maybe different in one index branch block!!
So I did another test to prove this:
SQL> create table test2(n1 int,n2 int,n3 int,v1 varchar2(2000),v2 varchar2(2000));
Table created.
SQL> create index test_idx on test2(n1,n2,n3,v1,v2);
Index created.
SQL> begin
2 for i in 1..1000 loop
insert into test2 values(0,0,i,rpad(’a',2000),rpad(’a',2000));
end loop;
end;
/ 3 4 5 6
PL/SQL procedure successfully completed.
Insert first 1000 rows like this:
0, 0, 1, xxxxx,xxxxxx
0, 0, 2, xxxxx,xxxxxx
0, 0, 3, xxxxx,xxxxxx
……
0, 0, 1000, xxxxx,xxxxxx
SQL> SQL>
SQL> begin
2 for i in 1001..2000 loop
insert into test2 values(0,i,i,rpad(’a',2000),rpad(’a',2000));
end loop;
end;
/ 3 4 5 6
Then insert other 1000 rows like this:
0,1,1,xxxx,xxxx
0,2,2,xxxx,xxxx
0,3,3,xxxx,xxxx
…….
0,1000,1000,xxxx,xxxx
Then I did treedump and branch block dump.
Here is a snippet of the branch block dump:
row#454[2145] dba: 4218395=0×405e1b
col 0; len 1; (1): 80
col 1; len 1; (1): 80
col 2; len 3; (3): c2 0a 64
col 3; TERM
row#455[2133] dba: 4218396=0×405e1c
col 0; len 1; (1): 80
col 1; len 1; (1): 80
col 2; len 2; (2): c2 0b
col 3; TERM
row#456[2124] dba: 4218397=0×405e1d
col 0; len 1; (1): 80
col 1; len 1; (1): c2
col 2; TERM
row#457[2113] dba: 4218398=0×405e1e
col 0; len 1; (1): 80
col 1; len 3; (3): c2 0b 03
col 2; TERM
row#458[2102] dba: 4218399=0×405e1f
col 0; len 1; (1): 80
col 1; len 3; (3): c2 0b 04
col 2; TERM
You see, the numbers of columns changed in one branch block!!
The attachment index.trc is the dump file
So the conclusion is that the number of columns stored in branch block is related to linked leaf blocks’ data.
The branch block will store as less columns as possible to distinguish the left/right leaf blocks it linked
不過這個問題本省就很有意思,當在兩個或者多個列上建立索引時,究竟branch block中會紀錄幾個列的資訊?
首先我們知道branch block中儲存結構大致是這樣的 pointer - value - pointer - value - pointer.其中pointer指向leaf block或者下一層branch block.
那麼這個value的值就是為了區分左右指標所指向的兩個leaf block中(或者下一層branch block中)的值,這樣從root -> branch -> leaf的時候才能準確定位到leaf block.
在提供答案前,首先給出幾種不同的論點
1.不管建立在多少列上,branch block中只儲存leading column資訊
2. 如果建立在n個列上,branch block中儲存n-1列的資訊
3. branch block上儲存所有列的資訊.
這三種觀點都有問題,首先觀點一
如果這個索引的leading column的可選擇性很差,例如status或者flag這樣的列,distinct value很少
這時候如果索引建立在 (status, id)上,對於
select * from table where status=0 and id=12345這樣的查詢,range scan並不能準確定位到 leaf block
對於觀點二,在n=2的時候顯然也可以用上面的例子來推翻
對於觀點三,是可以正確定位到leaf block,但是如果對於像
index(id,name) 其中name 假定name欄位型別為varchar(1000),這樣的索引效率也不好
事實上因為id列可選擇性很好,一般不需要再儲存name值到branch block中就可以區分左右的block
在branch block中儲存的name列的值浪費了branch block的空間,
而我們知道單個branch block中能夠儲存值的多少直接影響了樹的高度,而樹的高度顯然直接影響了index的效率
透過上面的分析,以上的三種觀點都是錯誤的,那麼oracle究竟如何來儲存列資訊才能既做到準確定位leaf block又不儲存多餘的值呢?
這裡首先給出試驗的結果,我沒有找到官方文件或者其他的文章來解釋這個結論,不過透過試驗來看結論是很清楚的,而且也符合上面提到的兩種考慮因素。
其實剛剛我們被限制住的原因是一直認為branch block中儲存的列的個數是個常數,但事實上儲存幾個列是變化的。
這個值只需要區分其左右指標所指向的兩個block的值就可以了,
更簡單的說,當oracle從上到下走到這個branch block的時候,
透過這個值可以準確的知道向左走還是向右走,
舉個例子,我們上面一開始提到的索引例子,
第一個是 index(status, id) , 如果左邊block的最大值為(0,33),右邊block的最小值為(0,35),
因為此時單純從第一列不能區分兩個block,這時就需要儲存兩列的資訊
如果左邊block最大值為(0,100),右邊block最小值為(1,1)
因為此時從第一列就可以區分出兩個block,所以只需要儲存第一列的資訊
這樣對於
select * from table where status=0 and id=12345
查詢,透過branch block中的值既可以知道“向左走,向右走”,準確定位leaf block
又可以避免不必要的浪費branch block空間,避免樹高度不必要的增高
下面是試驗的過程,直接copy以前自己發的郵件,懶得翻譯了790)this.width=790;" src="http://www.itpub.net/images/smilies/14.gif" />
We discussed the internal storage rule for composite index last time, but we didn’t get conclusion yet.
So I did a test for this.
SQL> create table test(n1 int,n2 int ,n3 int,v1 varchar2(2000),v2 varchar2(2000),v3 varchar2(2000));
Table created.
SQL> begin
2 for i in 1..1000 loop
3 insert into test values(0,i,i,rpad(’a',2000),rpad(’a',2000),rpad(’a',2000));
4 end loop;
5 end;
6 /
PL/SQL procedure successfully completed.
insert 1000 rows, the format is like this
0,1,1,xxxxxx,xxxxxxxxxx
0,2,2,xxxxxx,xxxxxxxxxx
0,3,3, xxxxxx,xxxxxxxxxx
……..
0,1000,1000, xxxxxx,xxxxxxxxxx
SQL> commit;
Commit complete.
create 7 indexes on this table, the naming rule is index_
_N (number) V (varchar)
SQL> create index index_2_nv on test(n1,v1);
Index created.
SQL> create index index_2_vn on test(v1,n1);
Index created.
SQL> create index index_3_nnv on test(n1,n2,v1);
Index created.
SQL> create index index_3_nvn on test(n1,v1,n2);
Index created.
SQL> create index index_4_nnnv on test(n1,n2,n3,v1);
Index created.
SQL> create index index_4_nnvn on test(n1,n2,v1,n3);
Index created.
SQL> create index index_5_nnnvv on test (n1,n2,n3,v1,v2);
Index created.
SQL> analyze table test compute statistics;
Table analyzed.
SQL> select index_name,PREFIX_LENGTH,BLEVEL,LEAF_BLOCKS from user_indexes
2 where lower(index_name) in (’index_2_nv’,'index_2_vn’,'index_3_nnv’,'index_3_nvn’,
3 ‘index_4_nnnv’,'index_4_nnvn’,'index_5_nnnvv’);
INDEX_NAME PREFIX_LENGTH BLEVEL LEAF_BLOCKS
—————————— ————- ———- ———–
INDEX_2_NV 5 334
INDEX_2_VN 5 334
INDEX_3_NNV 1 334
INDEX_3_NVN 5 334
INDEX_4_NNNV 1 334
INDEX_4_NNVN 1 334
INDEX_5_NNNVV 2 1000
7 rows selected.
You see, three indexes (INDEX_2_NV, INDEX_2_VN, INDEX_3_NVN) have level5. This means these indexes’ branch block contains varchar2000.
Then I dump these indexes’ root block to confirm this.
SQL> select segment_name,file_id,block_id from dba_extents where
2 lower(segment_name) in (’index_2_nv’,'index_2_vn’,'index_3_nnv’,'index_3_nvn’,
‘index_4_nnnv’,'index_4_nnvn’,'index_5_nnnvv’) and extent_id=0 order by segment_name 3 ;
SEGMENT_NAME FILE_ID BLOCK_ID
—————————— ———- ———-
INDEX_2_NV 9 16082
INDEX_2_VN 9 16097
INDEX_3_NNV 9 13587
INDEX_3_NVN 1 22053
INDEX_4_NNNV 1 32702
INDEX_4_NNVN 1 62230
INDEX_5_NNNVV 1 61697
7 rows selected.
SQL> @temp.sql
SQL> alter system dump datafile 9 block 16083;
System altered.
SQL> alter system dump datafile 9 block 16098;
System altered.
SQL> alter system dump datafile 9 block 13588;
System altered.
SQL> alter system dump datafile 1 block 22054;
System altered.
SQL> alter system dump datafile 1 block 32703;
System altered.
SQL> alter system dump datafile 1 block 62231;
System altered.
SQL> alter system dump datafile 1 block 61698;
System altered.
.790)this.width=790;" src="" />
It’s weird. The index INDEX_2_NV and INDEX_2_VN only have 2 columns.But in the root block dump file, we can find col0,col1 and col2(maybesomething related to rowid, I guess).
I am not sure about what the third col is, but it’s clear that the number of columns in branch block is related about the data.
0,1,1,xxxxxx,xxxxxxxxxx
0,2,2,xxxxxx,xxxxxxxxxx
0,3,3, xxxxxx,xxxxxxxxxx
……..
0,1000,1000, xxxxxx,xxxxxxxxxx
The index INDEX_2_NV, INDEX_2_VN can’t distinguish the leaf blocksusing the prefix two columns, so it added the third column (unknown,maybe something related to rowid, I guess).
The indexes INDEX_3_NNV, INDEX_4_NNNV, INDEX_4_NNVN and INDEX_5_NNNVVcan distinguish the leaf blocks using the prefix two columns, so thebranch blocks only contain 2 columns.
The index INDEX_3_NVN can should use three columns to distinguish theleft/right leaf blocks, so it contains 3 columns (because the prefixtwo columns’ values are the same).
You can check the dump file xfan_ora_25936.trc for details.
So I guess that the number of columns stored in branch block is related to linked leaf blocks’ data.
The columns which stored in branch blocks need to distinguish the left/right linked leaf blocks.
If the rule for composite index is like I guessed, then the columnsstored in branch blocks maybe different in one index branch block!!
So I did another test to prove this:
SQL> create table test2(n1 int,n2 int,n3 int,v1 varchar2(2000),v2 varchar2(2000));
Table created.
SQL> create index test_idx on test2(n1,n2,n3,v1,v2);
Index created.
SQL> begin
2 for i in 1..1000 loop
insert into test2 values(0,0,i,rpad(’a',2000),rpad(’a',2000));
end loop;
end;
/ 3 4 5 6
PL/SQL procedure successfully completed.
Insert first 1000 rows like this:
0, 0, 1, xxxxx,xxxxxx
0, 0, 2, xxxxx,xxxxxx
0, 0, 3, xxxxx,xxxxxx
……
0, 0, 1000, xxxxx,xxxxxx
SQL> SQL>
SQL> begin
2 for i in 1001..2000 loop
insert into test2 values(0,i,i,rpad(’a',2000),rpad(’a',2000));
end loop;
end;
/ 3 4 5 6
Then insert other 1000 rows like this:
0,1,1,xxxx,xxxx
0,2,2,xxxx,xxxx
0,3,3,xxxx,xxxx
…….
0,1000,1000,xxxx,xxxx
Then I did treedump and branch block dump.
Here is a snippet of the branch block dump:
row#454[2145] dba: 4218395=0×405e1b
col 0; len 1; (1): 80
col 1; len 1; (1): 80
col 2; len 3; (3): c2 0a 64
col 3; TERM
row#455[2133] dba: 4218396=0×405e1c
col 0; len 1; (1): 80
col 1; len 1; (1): 80
col 2; len 2; (2): c2 0b
col 3; TERM
row#456[2124] dba: 4218397=0×405e1d
col 0; len 1; (1): 80
col 1; len 1; (1): c2
col 2; TERM
row#457[2113] dba: 4218398=0×405e1e
col 0; len 1; (1): 80
col 1; len 3; (3): c2 0b 03
col 2; TERM
row#458[2102] dba: 4218399=0×405e1f
col 0; len 1; (1): 80
col 1; len 3; (3): c2 0b 04
col 2; TERM
You see, the numbers of columns changed in one branch block!!
The attachment index.trc is the dump file
So the conclusion is that the number of columns stored in branch block is related to linked leaf blocks’ data.
The branch block will store as less columns as possible to distinguish the left/right leaf blocks it linked
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/82387/viewspace-1027205/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- innodb索引的統計資訊不存在物理儲存上索引
- SQLServeronLinux列儲存索引SQLServerLinux索引
- 關於聯合索引,範圍查詢,時間列索引的幾個問題索引
- SQL Server 深入解析索引儲存(上)SQLServer索引
- 一個複合索引的優化案例索引優化
- index branch and leaf block structure(轉)IndexBloCStruct
- 什麼是行儲存和列儲存?正排索引和倒排索引?MySQL既不是倒排索引,也索引MySql
- SQLT 最佳化SQL 用複合索引代替單列索引的案例SQL索引
- 又一個複合索引的SQL調優索引SQL
- 索引特性之儲存列值及ROWID索引
- block底層儲存方式BloC
- 陣列去除重複的幾個方法陣列
- MySQL複合索引MySql索引
- 兩列複合索引中(交換索引中列的順序),選索引的原則是?因為索引名字ascii碼小?索引ASCII
- 在單個站點中恢復 exchange server上的資訊儲存Server
- 函式索引的儲存函式索引
- 分享幾種清理“其他儲存”並在Mac上獲得更多可用空間的方法Mac
- [zt] Oracle不使用索引的幾種情況Oracle索引
- SQLPLUS命令列儲存輸出資訊SQL命令列
- MySQL複合索引探究MySql索引
- MySQL建立複合索引MySql索引
- 關於ORACLE組合索引內部儲存淺談Oracle索引
- 複合索引中前導列對sql查詢的影響索引SQL
- 分享幾個有意思的陣列方法陣列
- SQL Server 列儲存索引 第二篇:設計SQLServer索引
- SQL Server 列儲存索引 第三篇:維護SQLServer索引
- SQL Server 列儲存索引 第一篇:概述SQLServer索引
- SQLServer2012列儲存索引技術SQLServer索引
- 索引儲存小記索引
- 幾個非常經典的對“資料倉儲”的解釋(ZT)
- Block型別及儲存區域BloC型別
- Oracle LOB儲存知識(zt)Oracle
- MongoDB複合索引詳解MongoDB索引
- SUM優化(複合索引)優化索引
- DataFrame刪除複合索引索引
- 資料儲存--面向列的儲存設計
- oracle block 格式 (zt)OracleBloC
- 聯合索引和多個單列索引使用中的索引命中情況及索引建立原則索引