【筆記】Oracle B-tree、點陣圖、全文索引三大索引效能比較及優缺點彙總
引言:大家都知道“效率”是資料庫中非常重要的一個指標,如何提高效率大家可能都會想起索引,但索引又這麼多種,什麼場合應該使用什麼索引呢?哪種索引可以提高我們的效率,哪種索引可以讓我們的效率大大降低(有時還不如全表掃描效能好)下面要講的“索引”如何成為我們的利器而不是災難!多說一點,由於不同索引的儲存結構不同,所以應用在不同組織結構的資料上,本篇文章重點就是:理解不同的技術都適合在什麼地方應用!
B-Tree索引
場合:非常適合資料重複度低的欄位 例如 身份證號碼 手機號碼 QQ號等欄位,常用於主鍵 唯一約束,一般在線上交易的專案中用到的多些。
場合:非常適合資料重複度低的欄位 例如 身份證號碼 手機號碼 QQ號等欄位,常用於主鍵 唯一約束,一般在線上交易的專案中用到的多些。
原理:一個鍵值對應一行(rowid) 格式: 【索引頭|鍵值|rowid】
優點:當沒有索引的時候,oracle只能全表掃描where qq=40354446 這個條件那麼這樣是灰常灰常耗時的,當資料量很大的時候簡直會讓人崩潰,那麼有個B-tree索引我們就像翻書目錄一樣,直接定位rowid立刻就找到了我們想要的資料,實質減少了I/O操作就提高速度,它有一個顯著特點查詢效能與表中資料量無關,例如 查2萬行的資料用了3 consistent get,當查詢1200萬行的資料時才用了4 consistent gets。
當我們的欄位中使用了主鍵or唯一約束時,不用想直接可以用B-tree索引
缺點:不適合鍵值重複率較高的欄位上使用,例如 第一章 1-500page 第二章 501-1000page
實驗:
alter system flush shared_pool; 清空共享池
alter system flush buffer_cache; 清空資料庫緩衝區,都是為了實驗需要
建立leo_t1 leo_t2 表
leo_t1 表的object_id列的資料是沒有重複值的,我們抽取了10行資料就可以看出來了。
> create table leo_t1 as select object_id,object_name from dba_objects;
> select count(*) from leo_t1;
優點:當沒有索引的時候,oracle只能全表掃描where qq=40354446 這個條件那麼這樣是灰常灰常耗時的,當資料量很大的時候簡直會讓人崩潰,那麼有個B-tree索引我們就像翻書目錄一樣,直接定位rowid立刻就找到了我們想要的資料,實質減少了I/O操作就提高速度,它有一個顯著特點查詢效能與表中資料量無關,例如 查2萬行的資料用了3 consistent get,當查詢1200萬行的資料時才用了4 consistent gets。
當我們的欄位中使用了主鍵or唯一約束時,不用想直接可以用B-tree索引
缺點:不適合鍵值重複率較高的欄位上使用,例如 第一章 1-500page 第二章 501-1000page
實驗:
alter system flush shared_pool; 清空共享池
alter system flush buffer_cache; 清空資料庫緩衝區,都是為了實驗需要
建立leo_t1 leo_t2 表
leo_t1 表的object_id列的資料是沒有重複值的,我們抽取了10行資料就可以看出來了。
> create table leo_t1 as select object_id,object_name from dba_objects;
> select count(*) from leo_t1;
COUNT(*)
----------
9872
> select * from leo_t1 where rownum <= 10;
----------
9872
> select * from leo_t1 where rownum <= 10;
OBJECT_ID OBJECT_NAME
---------- -----------
20 ICOL$
44 I_USER1
28 CON$
15 UNDO$
29 C_COBJ#
3 I_OBJ#
25 PROXY_ROLE_DATA$
39 I_IND1
51 I_CDEF2
26 I_PROXY_ROLE_DATA$_1
---------- -----------
20 ICOL$
44 I_USER1
28 CON$
15 UNDO$
29 C_COBJ#
3 I_OBJ#
25 PROXY_ROLE_DATA$
39 I_IND1
51 I_CDEF2
26 I_PROXY_ROLE_DATA$_1
leo_t2 表的object_id列我們是做了取餘操作,值就只有0,1兩種,因此重複率較高,如此設定為了說明重複率對B樹索引的影響
> create table leo_t2 as select mod(object_id,2) object_ID ,object_name from dba_objects;
> select count(*) from leo_t2;
> create table leo_t2 as select mod(object_id,2) object_ID ,object_name from dba_objects;
> select count(*) from leo_t2;
COUNT(*)
----------
9873
> select * from leo_t2 where rownum <= 10;
----------
9873
> select * from leo_t2 where rownum <= 10;
OBJECT_ID OBJECT_NAME
---------- -----------
0 ICOL$
0 I_USER1
0 CON$
1 UNDO$
1 C_COBJ#
1 I_OBJ#
1 PROXY_ROLE_DATA$
1 I_IND1
1 I_CDEF2
0 I_PROXY_ROLE_DATA$_1
> create index leo_t1_index on leo_t1(object_id); 建立B-tree索引,說明 預設建立的都是B-tree索引
Index created.
> create index leo_t2_index on leo_t2(object_ID); 建立B-tree索引
Index created.
讓我們看一下leo_t1與leo_t2的重複情況
> select count(distinct(object_id)) from leo_t1; 讓我們看一下leo_t1與leo_t2的重複情況,leo_t1沒有重複值,leo_t2有很多
COUNT(DISTINCT(OBJECT_ID))
--------------------------
9872
> select count(distinct(object_ID)) from leo_t2;
COUNT(DISTINCT(OBJECT_ID))
--------------------------
2
收集2個表統計資訊
> execute dbms_stats.gather_table_stats(ownname=>'LS',tabname=>'LEO_T1',method_opt=>'for all indexed columns size 2',cascade=>TRUE);
> execute dbms_stats.gather_table_stats(ownname=>'LS',tabname=>'LEO_T2',method_opt=>'for all indexed columns size 2',cascade=>TRUE);
引數詳解:
method_opt=>'for all indexed columns size 2' size_clause=integer 整型 ,範圍 1~254 ,使用柱狀圖[ histogram analyze ]分析列資料的分佈情況
cascade=>TRUE 收集表的統計資訊的同時收集B-tree索引的統計資訊
顯示執行計劃和統計資訊+設定autotrace簡介
序號 命令 解釋
1 SET AUTOTRACE OFF 此為預設值,即關閉Autotrace
2 SET AUTOTRACE ON EXPLAIN 只顯示執行計劃
3 SET AUTOTRACE ON STATISTICS 只顯示執行的統計資訊
4 SET AUTOTRACE ON 包含2,3兩項內容
5 SET AUTOTRACE TRACEONLY 與ON相似,但不顯示語句的執行結果
---------- -----------
0 ICOL$
0 I_USER1
0 CON$
1 UNDO$
1 C_COBJ#
1 I_OBJ#
1 PROXY_ROLE_DATA$
1 I_IND1
1 I_CDEF2
0 I_PROXY_ROLE_DATA$_1
> create index leo_t1_index on leo_t1(object_id); 建立B-tree索引,說明 預設建立的都是B-tree索引
Index created.
> create index leo_t2_index on leo_t2(object_ID); 建立B-tree索引
Index created.
讓我們看一下leo_t1與leo_t2的重複情況
> select count(distinct(object_id)) from leo_t1; 讓我們看一下leo_t1與leo_t2的重複情況,leo_t1沒有重複值,leo_t2有很多
COUNT(DISTINCT(OBJECT_ID))
--------------------------
9872
> select count(distinct(object_ID)) from leo_t2;
COUNT(DISTINCT(OBJECT_ID))
--------------------------
2
收集2個表統計資訊
> execute dbms_stats.gather_table_stats(ownname=>'LS',tabname=>'LEO_T1',method_opt=>'for all indexed columns size 2',cascade=>TRUE);
> execute dbms_stats.gather_table_stats(ownname=>'LS',tabname=>'LEO_T2',method_opt=>'for all indexed columns size 2',cascade=>TRUE);
引數詳解:
method_opt=>'for all indexed columns size 2' size_clause=integer 整型 ,範圍 1~254 ,使用柱狀圖[ histogram analyze ]分析列資料的分佈情況
cascade=>TRUE 收集表的統計資訊的同時收集B-tree索引的統計資訊
顯示執行計劃和統計資訊+設定autotrace簡介
序號 命令 解釋
1 SET AUTOTRACE OFF 此為預設值,即關閉Autotrace
2 SET AUTOTRACE ON EXPLAIN 只顯示執行計劃
3 SET AUTOTRACE ON STATISTICS 只顯示執行的統計資訊
4 SET AUTOTRACE ON 包含2,3兩項內容
5 SET AUTOTRACE TRACEONLY 與ON相似,但不顯示語句的執行結果
結果鍵值少的情況
set autotrace trace exp stat; (SET AUTOTRACE OFF 關閉執行計劃和統計資訊)
> select * from leo_t1 where object_id=1;
no rows selected
Execution Plan 執行計劃
----------------------------------------------------------
Plan hash value: 3712193284
set autotrace trace exp stat; (SET AUTOTRACE OFF 關閉執行計劃和統計資訊)
> select * from leo_t1 where object_id=1;
no rows selected
Execution Plan 執行計劃
----------------------------------------------------------
Plan hash value: 3712193284
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 21 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| LEO_T1 | 1 | 21 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN索引掃描 | LEO_T1_INDEX | 1 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 21 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| LEO_T1 | 1 | 21 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN索引掃描 | LEO_T1_INDEX | 1 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_ID"=1)
---------------------------------------------------
2 - access("OBJECT_ID"=1)
Statistics 統計資訊
----------------------------------------------------------
0 recursive calls
0 db block gets
2 consistent gets 我們知道leo_t1表的object_id沒有重複值,因此使用B-tree索引掃描只有2次一致性讀
0 physical reads
0 redo size
339 bytes sent via SQL*Net to client
370 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
0 rows processed
結果鍵值多的情況
> select * from leo_t2 where object_ID=1; (select /*+full(leo_t2) */ * from leo_t2 where object_ID=1;hint方式強制全表掃描)
4943 rows selected.
Execution Plan 執行計劃
----------------------------------------------------------
Plan hash value: 3657048469
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4943 | 98860 | 12 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| LEO_T2 | 4943 | 98860 | 12 (0)| 00:00:01 | sql結果是4943row,那麼全表掃描也是4943row
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("OBJECT_ID"=1)
----------------------------------------------------------
0 recursive calls
0 db block gets
2 consistent gets 我們知道leo_t1表的object_id沒有重複值,因此使用B-tree索引掃描只有2次一致性讀
0 physical reads
0 redo size
339 bytes sent via SQL*Net to client
370 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
0 rows processed
結果鍵值多的情況
> select * from leo_t2 where object_ID=1; (select /*+full(leo_t2) */ * from leo_t2 where object_ID=1;hint方式強制全表掃描)
4943 rows selected.
Execution Plan 執行計劃
----------------------------------------------------------
Plan hash value: 3657048469
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4943 | 98860 | 12 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| LEO_T2 | 4943 | 98860 | 12 (0)| 00:00:01 | sql結果是4943row,那麼全表掃描也是4943row
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("OBJECT_ID"=1)
Statistics 統計資訊
----------------------------------------------------------
1 recursive calls
0 db block gets
366 consistent gets 導致有366次一致性讀
0 physical reads
0 redo size
154465 bytes sent via SQL*Net to client
4000 bytes received via SQL*Net from client
331 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
4943 rows processed
大家肯定會疑惑,為什麼要用全表掃描而不用B-tree索引呢,這是因為oracle基於成本最佳化器CBO認為使用全表掃描要比使用B-tree索引效能更好更快,由於我們結果重複率很高,導致有366次一致性讀,從cup使用率12%上看也說明了B-tree索引不適合鍵值重複率較高的列
我們在看一下強制使用B-tree索引時,效率是不是沒有全表掃描高呢?
> select /*+index(leo_t2 leo_t2_index) */ * from leo_t2 where object_ID=1; hint方式強制索引掃描
4943 rows selected.
Execution Plan 執行計劃
----------------------------------------------------------
Plan hash value: 321706586
----------------------------------------------------------
1 recursive calls
0 db block gets
366 consistent gets 導致有366次一致性讀
0 physical reads
0 redo size
154465 bytes sent via SQL*Net to client
4000 bytes received via SQL*Net from client
331 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
4943 rows processed
大家肯定會疑惑,為什麼要用全表掃描而不用B-tree索引呢,這是因為oracle基於成本最佳化器CBO認為使用全表掃描要比使用B-tree索引效能更好更快,由於我們結果重複率很高,導致有366次一致性讀,從cup使用率12%上看也說明了B-tree索引不適合鍵值重複率較高的列
我們在看一下強制使用B-tree索引時,效率是不是沒有全表掃描高呢?
> select /*+index(leo_t2 leo_t2_index) */ * from leo_t2 where object_ID=1; hint方式強制索引掃描
4943 rows selected.
Execution Plan 執行計劃
----------------------------------------------------------
Plan hash value: 321706586
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4943 | 98860 | 46 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| LEO_T2 | 4943 | 98860 | 46 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | LEO_T2_INDEX | 4943 | | 10 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4943 | 98860 | 46 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| LEO_T2 | 4943 | 98860 | 46 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | LEO_T2_INDEX | 4943 | | 10 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_ID"=1)
---------------------------------------------------
2 - access("OBJECT_ID"=1)
Statistics 統計資訊
----------------------------------------------------------
1 recursive calls
0 db block gets
704 consistent gets 使用B-tree索引704次一致性讀 > 全表掃描366次一致性讀,而且cpu使用率也非常高,顯然效果沒有全表掃描高
0 physical reads
0 redo size
171858 bytes sent via SQL*Net to client
4000 bytes received via SQL*Net from client
331 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
4943 rows processed
小結:從以上的測試我們可以瞭解到,B-tree索引在什麼情況下使用跟鍵值重複率高低有很大關係的,之間沒有一個明確的分水嶺,只能多測試分析執行計劃後來決定。
----------------------------------------------------------
1 recursive calls
0 db block gets
704 consistent gets 使用B-tree索引704次一致性讀 > 全表掃描366次一致性讀,而且cpu使用率也非常高,顯然效果沒有全表掃描高
0 physical reads
0 redo size
171858 bytes sent via SQL*Net to client
4000 bytes received via SQL*Net from client
331 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
4943 rows processed
小結:從以上的測試我們可以瞭解到,B-tree索引在什麼情況下使用跟鍵值重複率高低有很大關係的,之間沒有一個明確的分水嶺,只能多測試分析執行計劃後來決定。
點陣圖索引 Bitmap index
場合:列的基數很少,可列舉,重複值很多,資料不會被經常更新
原理:一個鍵值對應很多行(rowid), 格式:鍵值 start_rowid end_rowid 點陣圖
優點:OLAP 例如報表類資料庫 重複率高的資料 特定型別的查詢例如count、or、and等邏輯操作因為只需要進行位運算即可得到我們需要的結果
缺點:不適合重複率低的欄位,還有經常DML操作(insert,update,delete),因為點陣圖索引的鎖代價極高,修改一個點陣圖索引段影響整個點陣圖段,例如修改
場合:列的基數很少,可列舉,重複值很多,資料不會被經常更新
原理:一個鍵值對應很多行(rowid), 格式:鍵值 start_rowid end_rowid 點陣圖
優點:OLAP 例如報表類資料庫 重複率高的資料 特定型別的查詢例如count、or、and等邏輯操作因為只需要進行位運算即可得到我們需要的結果
缺點:不適合重複率低的欄位,還有經常DML操作(insert,update,delete),因為點陣圖索引的鎖代價極高,修改一個點陣圖索引段影響整個點陣圖段,例如修改
一個鍵值,會影響同鍵值的多行,所以對於OLTP 系統點陣圖索引基本上是不適用的
實驗:點陣圖索引和B-tree索引的效能比較
set pagesize 100; 設定頁大小
利用dba_objects資料字典建立一個15萬行的表
> create table leo_bm_t1 as select * from dba_objects;
Table created.
> insert into leo_bm_t1 select * from leo_bm_t1; 翻倍插入
9876 rows created.
> /
19752 rows created.
> /
39504 rows created.
> /
79008 rows created.
> /
158016 rows created.
因object_type欄位重複值較高,顧在此欄位上建立bitmap索引
> create bitmap index leo_bm_t1_index on leo_bm_t1(object_type);
Index created.
建立一個和leo_bm_t1表結構一模一樣的表leo_bm_t2,並在object_type列上建立一個B-tree索引(15萬行記錄)
> create table leo_bm_t2 as select * from leo_bm_t1;
Table created.
> create index leo_bm_t2_bt_index on leo_bm_t2(object_type);
Index created.
對比點陣圖索引和B-tree索引所佔空間大小,很明顯點陣圖要遠遠小於B-tree索引所佔用的空間,節約空間特性也是我們選擇點陣圖的理由之一
> select segment_name,bytes from user_segments where segment_type='INDEX';
SEGMENT_NAME BYTES
--------------------------------------------------------------------------------- ----------
LEO_BM_T1_INDEX 327680(327K)
LEO_BM_T2_BT_INDEX 7340032(7M)
顯示執行計劃和統計資訊
set autotrace trace exp stat;
在建立有點陣圖索引的表上做count操作對比執行計劃
> select count(*) from leo_bm_t1 where object_type='TABLE';
Execution Plan 執行計劃
----------------------------------------------------------
Plan hash value: 3251686305
實驗:點陣圖索引和B-tree索引的效能比較
set pagesize 100; 設定頁大小
利用dba_objects資料字典建立一個15萬行的表
> create table leo_bm_t1 as select * from dba_objects;
Table created.
> insert into leo_bm_t1 select * from leo_bm_t1; 翻倍插入
9876 rows created.
> /
19752 rows created.
> /
39504 rows created.
> /
79008 rows created.
> /
158016 rows created.
因object_type欄位重複值較高,顧在此欄位上建立bitmap索引
> create bitmap index leo_bm_t1_index on leo_bm_t1(object_type);
Index created.
建立一個和leo_bm_t1表結構一模一樣的表leo_bm_t2,並在object_type列上建立一個B-tree索引(15萬行記錄)
> create table leo_bm_t2 as select * from leo_bm_t1;
Table created.
> create index leo_bm_t2_bt_index on leo_bm_t2(object_type);
Index created.
對比點陣圖索引和B-tree索引所佔空間大小,很明顯點陣圖要遠遠小於B-tree索引所佔用的空間,節約空間特性也是我們選擇點陣圖的理由之一
> select segment_name,bytes from user_segments where segment_type='INDEX';
SEGMENT_NAME BYTES
--------------------------------------------------------------------------------- ----------
LEO_BM_T1_INDEX 327680(327K)
LEO_BM_T2_BT_INDEX 7340032(7M)
顯示執行計劃和統計資訊
set autotrace trace exp stat;
在建立有點陣圖索引的表上做count操作對比執行計劃
> select count(*) from leo_bm_t1 where object_type='TABLE';
Execution Plan 執行計劃
----------------------------------------------------------
Plan hash value: 3251686305
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 11 | 4 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 11 | | |
| 2 | BITMAP CONVERSION COUNT | | 36315 | 390K| 4 (0)| 00:00:01 |
|* 3 | BITMAP INDEX SINGLE VALUE| LEO_BM_T1_INDEX | | | | |
-----------------------------------------------------------------------------------------------
點陣圖索引上只掃描了一個值
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("OBJECT_TYPE"='TABLE')
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 11 | 4 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 11 | | |
| 2 | BITMAP CONVERSION COUNT | | 36315 | 390K| 4 (0)| 00:00:01 |
|* 3 | BITMAP INDEX SINGLE VALUE| LEO_BM_T1_INDEX | | | | |
-----------------------------------------------------------------------------------------------
點陣圖索引上只掃描了一個值
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("OBJECT_TYPE"='TABLE')
Note
-----
- dynamic sampling used for this statement 動態取樣
-----
- dynamic sampling used for this statement 動態取樣
Statistics 統計資訊
----------------------------------------------------------
9 recursive calls
0 db block gets
93 consistent gets oracle選擇使用點陣圖索引訪問資料,導致93次一致性讀
7 physical reads
0 redo size
413 bytes sent via SQL*Net to client
381 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
----------------------------------------------------------
9 recursive calls
0 db block gets
93 consistent gets oracle選擇使用點陣圖索引訪問資料,導致93次一致性讀
7 physical reads
0 redo size
413 bytes sent via SQL*Net to client
381 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
在建立有B-tree索引的表上做count操作對比執行計劃
> select count(*) from leo_bm_t2 where object_type='TABLE';
Execution Plan 執行計劃
----------------------------------------------------------
Plan hash value: 613433245
> select count(*) from leo_bm_t2 where object_type='TABLE';
Execution Plan 執行計劃
----------------------------------------------------------
Plan hash value: 613433245
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 11 | 59 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 11 | | |
|* 2 | INDEX RANGE SCAN| LEO_BM_T2_BT_INDEX | 25040 | 268K| 59 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
B-tree索引上全部掃描,cpu使用率達到了59%,比點陣圖索引cpu使用率4%高出許多
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_TYPE"='TABLE')
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 11 | 59 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 11 | | |
|* 2 | INDEX RANGE SCAN| LEO_BM_T2_BT_INDEX | 25040 | 268K| 59 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
B-tree索引上全部掃描,cpu使用率達到了59%,比點陣圖索引cpu使用率4%高出許多
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_TYPE"='TABLE')
Note
-----
- dynamic sampling used for this statement 動態取樣
Statistics 統計資訊
----------------------------------------------------------
32 recursive calls
0 db block gets
161 consistent gets B-tree索引表上發生了161次一致性讀要遠遠高於點陣圖索引表上93次一致性讀,因此還是點陣圖索引效率高
74 physical reads
0 redo size
413 bytes sent via SQL*Net to client
381 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
我們再看看等值查詢where object_type='TABLE'情況下點陣圖索引和B-tree索引的效能對比
> select * from leo_bm_t1 where object_type='TABLE' ;
28512 rows selected.
Execution Plan 執行計劃
----------------------------------------------------------
Plan hash value: 4228542614
-----
- dynamic sampling used for this statement 動態取樣
Statistics 統計資訊
----------------------------------------------------------
32 recursive calls
0 db block gets
161 consistent gets B-tree索引表上發生了161次一致性讀要遠遠高於點陣圖索引表上93次一致性讀,因此還是點陣圖索引效率高
74 physical reads
0 redo size
413 bytes sent via SQL*Net to client
381 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
我們再看看等值查詢where object_type='TABLE'情況下點陣圖索引和B-tree索引的效能對比
> select * from leo_bm_t1 where object_type='TABLE' ;
28512 rows selected.
Execution Plan 執行計劃
----------------------------------------------------------
Plan hash value: 4228542614
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 36315 | 6277K| 562 (0)| 00:00:07 |
| 1 | TABLE ACCESS BY INDEX ROWID | LEO_BM_T1 | 36315 | 6277K| 562 (0)| 00:00:07 |
| 2 | BITMAP CONVERSION TO ROWIDS| 點陣圖映像->rowids | | | | |
|* 3 | BITMAP INDEX SINGLE VALUE | LEO_BM_T1_INDEX | | | | |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("OBJECT_TYPE"='TABLE')
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 36315 | 6277K| 562 (0)| 00:00:07 |
| 1 | TABLE ACCESS BY INDEX ROWID | LEO_BM_T1 | 36315 | 6277K| 562 (0)| 00:00:07 |
| 2 | BITMAP CONVERSION TO ROWIDS| 點陣圖映像->rowids | | | | |
|* 3 | BITMAP INDEX SINGLE VALUE | LEO_BM_T1_INDEX | | | | |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("OBJECT_TYPE"='TABLE')
Note
-----
- dynamic sampling used for this statement動態取樣
-----
- dynamic sampling used for this statement動態取樣
Statistics 統計資訊
----------------------------------------------------------
7 recursive calls
0 db block gets
4407 consistent gets 使用點陣圖索引發生了4407次一致性讀
0 physical reads
0 redo size
2776927 bytes sent via SQL*Net to client
21281 bytes received via SQL*Net from client
1902 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
28512 rows processed
leo_bm_t2表上使用B-tree索引得到執行計劃
> select /*+index(leo_bm_t2 leo_bm_t2_bt_index) */ * from leo_bm_t2 where object_type='TABLE' ;
28512 rows selected. 我們強制使用B-tree索引掃描等值條件
Execution Plan 執行計劃
----------------------------------------------------------
Plan hash value: 1334503202
----------------------------------------------------------
7 recursive calls
0 db block gets
4407 consistent gets 使用點陣圖索引發生了4407次一致性讀
0 physical reads
0 redo size
2776927 bytes sent via SQL*Net to client
21281 bytes received via SQL*Net from client
1902 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
28512 rows processed
leo_bm_t2表上使用B-tree索引得到執行計劃
> select /*+index(leo_bm_t2 leo_bm_t2_bt_index) */ * from leo_bm_t2 where object_type='TABLE' ;
28512 rows selected. 我們強制使用B-tree索引掃描等值條件
Execution Plan 執行計劃
----------------------------------------------------------
Plan hash value: 1334503202
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 25040 | 4328K| 2063 (1)| 00:00:25 |
| 1 | TABLE ACCESS BY INDEX ROWID| LEO_BM_T2 | 25040 | 4328K| 2063 (1)| 00:00:25 |
|* 2 | INDEX RANGE SCAN | LEO_BM_T2_BT_INDEX | 25040 | | 59 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
B-tree索引上全部掃描,cpu使用率達到了2063%,比點陣圖索引cpu使用率562%高出許多
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_TYPE"='TABLE')
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 25040 | 4328K| 2063 (1)| 00:00:25 |
| 1 | TABLE ACCESS BY INDEX ROWID| LEO_BM_T2 | 25040 | 4328K| 2063 (1)| 00:00:25 |
|* 2 | INDEX RANGE SCAN | LEO_BM_T2_BT_INDEX | 25040 | | 59 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
B-tree索引上全部掃描,cpu使用率達到了2063%,比點陣圖索引cpu使用率562%高出許多
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_TYPE"='TABLE')
Note
-----
- dynamic sampling used for this statement
-----
- dynamic sampling used for this statement
Statistics
----------------------------------------------------------
7 recursive calls
0 db block gets
6621 consistent gets
0 physical reads
0 redo size
2776927 bytes sent via SQL*Net to client
21281 bytes received via SQL*Net from client
1902 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
28512 rows processed
小結:在等值查詢中我們可以看出點陣圖索引的效率依言高於B-tree索引
全文索引 Text index
定義:全文索引就是透過將文字按照某種語言進行詞彙拆分,重新將資料組合儲存,來達到快速檢索的目的
場合:當欄位裡儲存的都是文字時適合用全文索引,常用於搜尋文字
優點:全文索引不是按照鍵值儲存的,而是按照分詞重組資料,常用於模糊查詢Where name like '%leonarding%'效率比全表掃描高很多,適用OLAP系統,
定義:全文索引就是透過將文字按照某種語言進行詞彙拆分,重新將資料組合儲存,來達到快速檢索的目的
場合:當欄位裡儲存的都是文字時適合用全文索引,常用於搜尋文字
優點:全文索引不是按照鍵值儲存的,而是按照分詞重組資料,常用於模糊查詢Where name like '%leonarding%'效率比全表掃描高很多,適用OLAP系統,
OLTP系統裡面用到的並不多。
缺點:全文索引會佔用大量空間有時比原表本身佔的空間還多,bug較多,維護困難。
實驗:全文索引效能優勢
建立一個表包含2個欄位
> create table leo_text_t1 (id int,name varchar(10));
Table created.
在name欄位上建立B-tree索引,但檢索的時候並沒有用,還是全表掃描
> create index leo_text_t1_bt_index on leo_text_t1(name);
Index created.
插入4條記錄
insert into leo_text_t1 values(1,'Tom');
insert into leo_text_t1 values(2,'Tom Tom');
insert into leo_text_t1 values(1,'Tom');
insert into leo_text_t1 values(2,'Tom Tom');
commit;
> select * from leo_text_t1;
缺點:全文索引會佔用大量空間有時比原表本身佔的空間還多,bug較多,維護困難。
實驗:全文索引效能優勢
建立一個表包含2個欄位
> create table leo_text_t1 (id int,name varchar(10));
Table created.
在name欄位上建立B-tree索引,但檢索的時候並沒有用,還是全表掃描
> create index leo_text_t1_bt_index on leo_text_t1(name);
Index created.
插入4條記錄
insert into leo_text_t1 values(1,'Tom');
insert into leo_text_t1 values(2,'Tom Tom');
insert into leo_text_t1 values(1,'Tom');
insert into leo_text_t1 values(2,'Tom Tom');
commit;
> select * from leo_text_t1;
ID NAME
---------- ----------
1 Tom
2 Tom Tom
1 Tom
2 Tom Tom
我們在建立一個表,並在name欄位上建立全文索引
create table leo_text_t2 as select * from leo_text_t1;
建立全文索引的前提
ORACLE10g 建立全文索引過程:
1,首先檢視ORACLE是否已安裝“全文檢索工具”
透過檢視是否存在 CTXSYS 使用者,CTXAPP角色即可判斷。
> select username from dba_users;
USERNAME
------------------------------
LS
CTXSYS 預設是沒有的,需要安裝2個指令碼catctx.sql,drdefus.sql
2,如果ORACLE沒有安裝“全文檢索工具”,則使用以下步驟手工安裝。
a)進入ORACLE安裝目錄
cd $ORACLE_HOME
b)使用 DBA 角色登陸資料庫
sqlplus sys/sys as sysdba
c)檢視錶空間檔案存放路徑
select name from v$datafile;
d)為 CTXSYS 使用者建立表空間
CREATE TABLESPACE ctxsys
LOGGING
DATAFILE '/u01/app/oracle/oradata/LEO/file1/ctxsys01.dbf'
SIZE 32m
AUTOEXTEND ON
NEXT 32m MAXSIZE 2048m
EXTENT MANAGEMENT LOCAL ;
e)建立 CTXSYS 使用者,建立 CTXAPP 角色
@?/ctx/admin/catctx.sql ctxsys ctxsys temp1 nolock
--(密碼、表空間、臨時表空間、使用者狀態)
--如果當前sql指令碼無執行許可權,請手工新增。
f)為 CTXSYS 執行初始化工作,如果沒有此操作,後續操作會失敗。
connect ctxsys/ctxsys;
@?/ctx/admin/defaults/drdefus.sql
3,建立全文索引
a)建立詞法分析器及相關表
--詞法分析器
execute ctx_ddl.create_preference('offerProdAddrLexer','CHINESE_LEXER');
--詞法
execute ctx_ddl.create_preference('offerProdAddrList', 'BASIC_WORDLIST');
execute ctx_ddl.set_attribute('offerProdAddrList','PREFIX_INDEX','TRUE');
execute ctx_ddl.set_attribute('offerProdAddrList','PREFIX_MIN_LENGTH',1);
execute ctx_ddl.set_attribute('offerProdAddrList','PREFIX_MAX_LENGTH', 5);
execute ctx_ddl.set_attribute('offerProdAddrList','SUBSTRING_INDEX', 'YES');
b)建立全文索引
> conn ctxsys/ctxsys
Connected.
> create index ls.leo_text_t2_text_index on ls.leo_text_t2(name) indextype is ctxsys.context;
> conn ls/ls
Connected.
> set autotrace on;
> select * from leo_text_t1 where name like '%Tom%';
---------- ----------
1 Tom
2 Tom Tom
1 Tom
2 Tom Tom
我們在建立一個表,並在name欄位上建立全文索引
create table leo_text_t2 as select * from leo_text_t1;
建立全文索引的前提
ORACLE10g 建立全文索引過程:
1,首先檢視ORACLE是否已安裝“全文檢索工具”
透過檢視是否存在 CTXSYS 使用者,CTXAPP角色即可判斷。
> select username from dba_users;
USERNAME
------------------------------
LS
CTXSYS 預設是沒有的,需要安裝2個指令碼catctx.sql,drdefus.sql
2,如果ORACLE沒有安裝“全文檢索工具”,則使用以下步驟手工安裝。
a)進入ORACLE安裝目錄
cd $ORACLE_HOME
b)使用 DBA 角色登陸資料庫
sqlplus sys/sys as sysdba
c)檢視錶空間檔案存放路徑
select name from v$datafile;
d)為 CTXSYS 使用者建立表空間
CREATE TABLESPACE ctxsys
LOGGING
DATAFILE '/u01/app/oracle/oradata/LEO/file1/ctxsys01.dbf'
SIZE 32m
AUTOEXTEND ON
NEXT 32m MAXSIZE 2048m
EXTENT MANAGEMENT LOCAL ;
e)建立 CTXSYS 使用者,建立 CTXAPP 角色
@?/ctx/admin/catctx.sql ctxsys ctxsys temp1 nolock
--(密碼、表空間、臨時表空間、使用者狀態)
--如果當前sql指令碼無執行許可權,請手工新增。
f)為 CTXSYS 執行初始化工作,如果沒有此操作,後續操作會失敗。
connect ctxsys/ctxsys;
@?/ctx/admin/defaults/drdefus.sql
3,建立全文索引
a)建立詞法分析器及相關表
--詞法分析器
execute ctx_ddl.create_preference('offerProdAddrLexer','CHINESE_LEXER');
--詞法
execute ctx_ddl.create_preference('offerProdAddrList', 'BASIC_WORDLIST');
execute ctx_ddl.set_attribute('offerProdAddrList','PREFIX_INDEX','TRUE');
execute ctx_ddl.set_attribute('offerProdAddrList','PREFIX_MIN_LENGTH',1);
execute ctx_ddl.set_attribute('offerProdAddrList','PREFIX_MAX_LENGTH', 5);
execute ctx_ddl.set_attribute('offerProdAddrList','SUBSTRING_INDEX', 'YES');
b)建立全文索引
> conn ctxsys/ctxsys
Connected.
> create index ls.leo_text_t2_text_index on ls.leo_text_t2(name) indextype is ctxsys.context;
> conn ls/ls
Connected.
> set autotrace on;
> select * from leo_text_t1 where name like '%Tom%';
ID NAME
---------- ----------
1 Tom
2 Tom Tom
1 Tom
2 Tom Tom
---------- ----------
1 Tom
2 Tom Tom
1 Tom
2 Tom Tom
Execution Plan 執行計劃
----------------------------------------------------------
Plan hash value: 3687902158
----------------------------------------------------------
Plan hash value: 3687902158
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4 | 80 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| LEO_TEXT_T1 | 4 | 80 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------
全表掃描,cpu使用率3%
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("NAME" LIKE '%Tom%')
Note
-----
- dynamic sampling used for this statement動態取樣
Statistics 統計資訊
----------------------------------------------------------
56 recursive calls
0 db block gets
23 consistent gets 全表掃描沒有使用B-tree索引,導致23次一致性讀
0 physical reads
0 redo size
538 bytes sent via SQL*Net to client
381 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
4 rows processed
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4 | 80 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| LEO_TEXT_T1 | 4 | 80 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------
全表掃描,cpu使用率3%
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("NAME" LIKE '%Tom%')
Note
-----
- dynamic sampling used for this statement動態取樣
Statistics 統計資訊
----------------------------------------------------------
56 recursive calls
0 db block gets
23 consistent gets 全表掃描沒有使用B-tree索引,導致23次一致性讀
0 physical reads
0 redo size
538 bytes sent via SQL*Net to client
381 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
4 rows processed
> select * from leo_text_t2 where contains(name,'Tom')>0;
ID NAME
---------- ----------
1 Tom
2 Tom Tom
1 Tom
2 Tom Tom
---------- ----------
1 Tom
2 Tom Tom
1 Tom
2 Tom Tom
Execution Plan 執行計劃
----------------------------------------------------------
Plan hash value: 2789465217
----------------------------------------------------------
Plan hash value: 2789465217
------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 32 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| LEO_TEXT_T2 | 1 | 32 | 4 (0)| 00:00:01 |
|* 2 | DOMAIN INDEX | LEO_TEXT_T2_TEXT_INDEX | | | 4 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 32 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| LEO_TEXT_T2 | 1 | 32 | 4 (0)| 00:00:01 |
|* 2 | DOMAIN INDEX | LEO_TEXT_T2_TEXT_INDEX | | | 4 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("CTXSYS"."CONTAINS"("NAME",'Tom')>0)
---------------------------------------------------
2 - access("CTXSYS"."CONTAINS"("NAME",'Tom')>0)
Note
-----
- dynamic sampling used for this statement動態取樣
-----
- dynamic sampling used for this statement動態取樣
Statistics 統計資訊
----------------------------------------------------------
11 recursive calls
0 db block gets
19 consistent gets
0 physical reads
0 redo size
545 bytes sent via SQL*Net to client
381 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
4 rows processed
小結:從如上實驗來看,當我們檢索大量文字的時候使用全文索引要比全表掃描快很多了,有弊就有利,由於全文索引會佔用大量空間提前預估全文索引大小保留出足夠的空間,context型別全文索引不是基於事務的,無法保證索引和資料實時同步,DML完成後,如果在全文索引中查不到鍵值時,可以透過手工or定時任務來重新整理同步,而B-tree、點陣圖都是實時的。
總結:本次實驗了B-tree 點陣圖 全文三大索引的效能,同時比較了各自適合場合和用途,還總結了各自的優缺點,由於水平有限有不足之處還請大家指點。
Leonarding
2012.7.31
天津&summer
分享~收穫快樂
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/26686207/viewspace-738991/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 【索引】Bitmap點陣圖索引與普通的B-Tree索引鎖的比較索引
- B-Tree索引與Bitmap點陣圖索引的鎖代價比較研究索引
- 【Bitmap Index】B-Tree索引與Bitmap點陣圖索引的鎖代價比較研究Index索引
- LVS三種模式配置及優點缺點比較模式
- Apache與Nginx優缺點比較ApacheNginx
- Oracle索引——點陣圖索引Oracle索引
- Apache與Nginx的優缺點比較ApacheNginx
- css與 js動畫 優缺點比較CSSJS動畫
- Apache與Nginx的優缺點、效能比較,到底選擇哪個比較好?ApacheNginx
- 【實驗】【索引壓縮】索引壓縮演示及優缺點總結索引
- Oracle-點陣圖索引Oracle索引
- SQL、NoSQL和NewSQL的優缺點比較SQL
- Solr與Elasticsearch的優缺點比較總結和歸納SolrElasticsearch
- MySQL索引的優缺點MySql索引
- oracle 點陣圖索引(bitmap index)Oracle索引Index
- 開源系統監控工具Nagios、Zabbix和Open-Falcon的功能特性彙總及優缺點比較iOS
- AWS RDS MySQL和MariaDB Galera Cluster的優缺點比較總結MySql
- 【原創】Oracle之range,hash,list分割槽現實應用及優缺點彙總Oracle
- 點陣圖索引(Bitmap Index)——從B*樹索引到點陣圖索引索引Index
- oracle 海量資料之利器“資料壓縮”實際應用及優缺點彙總Oracle
- 點陣圖索引.sql索引SQL
- TokuDB優缺點總結
- 全文索引的效能優化索引優化
- 資料庫索引的作用和優點缺點資料庫索引
- Oracle幾種表的優缺點總結Oracle
- memcached 和 redis 使用場景及優缺點對比Redis
- kookeey、Luminati 和 Smartproxy 海外代理的特點和優缺點分析比較
- 點陣圖索引(Bitmap Index)——點陣圖索引與資料DML鎖定索引Index
- mysql和Oracle的特點,優缺點MySqlOracle
- 點陣圖索引(Bitmap Index)——索引共用索引Index
- 機器學習演算法優缺點對比及選擇(彙總篇)機器學習演算法
- Android系統編譯指令make 、mmm、mm優缺點比較Android編譯
- DDD CQRS架構和傳統架構的優缺點比較架構
- 【基礎知識】索引--點陣圖索引索引
- oracle點陣圖索引對DML操作的影響Oracle索引
- 關於ORACLE點陣圖索引內部淺論Oracle索引
- MVP 與 MVVM 優缺點總結MVPMVVM
- 點陣圖索引:原理(BitMap index)索引Index