clustering factor索引聚簇因子和執行計劃

dotaddjj發表於2011-11-08

今天在pub上看見網友提到了關於索引的順序讀,談談個人自己對索引的理解吧!走索引(這裡的索引不包括全索引掃描和全索引快速掃描)和全表掃描一直是我們這些初學者對cbo執行計劃的迷糊地方,何時走索引,為什麼不走索引,群裡的朋友經常問到這些問題,其實自己之前的blog也多多少少提到了索引的一些知識點,自己多索引的理解也是從最開始只知道索引效率高到現在慢慢知道clustering factor,結構,儲存資料的原理,如何查詢資料。

簡要摘要一下clustering factor的理解。

Clutering factor

索引聚簇因子,也就是表中row儲存的順序,clustering factor越低,相應的rows儲存越集中,相反則rows儲存越分散。

全表掃描oracle採取的是多塊讀,而索引掃描採取的是單塊讀取,當clustering factor過大時,oracle會重複讀取多個資料塊,將導致I/O消耗,而sqlcost最重要的也就是I/Ocpunetwork,所以cbo很有可能會選擇它認為cost最小的執行計劃,從而影響sql執行效率。

Clustering factor也可以認為是透過索引掃描一張表需要訪問的表的資料塊的數量,I/O影響。

Clustering factore的取值是如何計算出來的:

其實也是透過index,比較row的當前rowid和前一行的rowid,如果相鄰兩個rowid不屬於同一資料塊(在index中的rowid是受限rowid,由檔案號block_id和行號組成)則cluster factor增加1.然後求和clutering factor的值,而計算的索引的cost則是由clustering factor乘以某個選擇性引數即時訪問索引開銷。

SQL> select * from v$version;

BANNER

----------------------------------------------------------------

Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod

PL/SQL Release 10.2.0.1.0 - Production

CORE10.2.0.1.0 Production

TNS for 32-bit Windows: Version 10.2.0.1.0 - Production

NLSRTL Version 10.2.0.1.0 - Production

SQL> show parameter optimizer_index_cost_adj;

NAME TYPE VALUE

------------------------------------ ----------- ------------------------------

optimizer_index_cost_adj integer 50

SQL> declare

2 begin

3 for i in 1..5 loop

4 insert into test31 select * from dba_objects order by owner;

5 commit;

6 end loop;

7 end;

8 /

PL/SQL procedure successfully completed

SQL> create index index_test31 on test31(object_id);

Index created

SQL> execute dbms_stats.gather_table_stats('ashuang','test31',cascade=>true);

PL/SQL procedure successfully completed

SQL> select blocks from user_tables where table_name=upper('test31');

BLOCKS

----------

3520

SQL> select clustering_factor from user_indexes where index_name=upper('index_test31');

CLUSTERING_FACTOR

-----------------

255260

SQL> select blocks,num_rows from user_tables where table_name=upper('test31');

BLOCKS NUM_ROWS

---------- ----------

3520 255260

--此時num_rowsclustering_factor接近,資料已經完全分散

SQL> explain plan for select * from test31 where object_id in ('60998','8789','7889');

Explained

SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

Plan hash value: 3334622187

--------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU

--------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 14 | 1302 | 9 (0

| 1 | INLIST ITERATOR | | | |

| 2 | TABLE ACCESS BY INDEX ROWID| TEST31 | 14 | 1302 | 9 (0

|* 3 | INDEX RANGE SCAN | INDEX_TEST31 | 14 | | 1 (0

--------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

3 - access("OBJECT_ID"=7889 OR "OBJECT_ID"=8789 OR "OBJECT_ID"=60998)

15 rows selected

此時走索引,這個cost相對來說較小。

SQL> alter system flush buffer_cache;

系統已更改。

SQL> select /*+index(test31 index_test31)*/ * from test31 where object_id>5000 and object_id<6500;

已選擇6950行。

執行計劃

----------------------------------------------------------

Plan hash value: 2819306605

--------------------------------------------------------------------------------

------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)

| Time |

--------------------------------------------------------------------------------

------------

| 0 | SELECT STATEMENT | | 6957 | 631K| 3492 (1)

| 00:00:42 |

| 1 | TABLE ACCESS BY INDEX ROWID| TEST31 | 6957 | 631K| 3492 (1)

| 00:00:42 |

|* 2 | INDEX RANGE SCAN | INDEX_TEST31 | 6957 | | 9 (0)

| 00:00:01 |

--------------------------------------------------------------------------------

------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access("OBJECT_ID">5000 AND "OBJECT_ID"<6500)

統計資訊

----------------------------------------------------------

1 recursive calls

0 db block gets

7429 consistent gets

183 physical reads

0 redo size

690491 bytes sent via SQL*Net to client

5478 bytes received via SQL*Net from client

465 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

6950 rows processed

--強制走索引cost 3492,物理讀183

SQL> alter system flush buffer_cache;

系統已更改。

SQL> select * from test31 where object_id>5000 and object_id<6500;

已選擇6950行。

執行計劃

----------------------------------------------------------

Plan hash value: 1490571929

----------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

----------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 6957 | 631K| 696 (3)| 00:00:09 |

|* 1 | TABLE ACCESS FULL| TEST31 | 6957 | 631K| 696 (3)| 00:00:09 |

----------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - filter("OBJECT_ID"<6500 AND "OBJECT_ID">5000)

統計資訊

----------------------------------------------------------

0 recursive calls

0 db block gets

3990 consistent gets

3521 physical reads

0 redo size

337756 bytes sent via SQL*Net to client

5478 bytes received via SQL*Net from client

465 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

6950 rows processed

--全表掃描cost只有696,但是物理讀卻是3521

雖然cbo按照cost預設選擇了全表掃描,但是實際物理讀卻是index比全表掃描小很多。

雖然在sql最佳化中我們一直是強調降低邏輯讀,也就是即時讀db block gets和consistent reads一致性讀,因為降低了邏輯度,物理讀自然而然的會降下來

不過對於此種情況具體是強制走索引更適合了還是cbo預設的全表掃描了,雖然邏輯讀上全表掃描會好點,但是實際的物理讀我們差異巨大,由於全表掃描後buffer的儲存在lru的lru末端,優先會被覆蓋,如果下次再次查詢還是會引起較大的物理讀,此時強制索引可能會是更好地選擇!畢竟I/O是很消耗資源的操作!

SQL> create table test32 as select * from test31;

Table created

SQL> truncate table test31;

表被截斷。

SQL> insert into test31 select * from test32 order by object_id;

--讓其test31有序排序。

SQL> alter index index_test31 rebuild;

Index altered

SQL> execute dbms_stats.gather_table_stats('ashuang','test31',cascade=>true);

PL/SQL procedure successfully completed

SQL> select blocks,num_rows from user_tables where table_name=upper('test31');

BLOCKS NUM_ROWS

---------- ----------

3520 255270

SQL> select clustering_factor from user_indexes where index_name=upper('index_test31');

CLUSTERING_FACTOR

-----------------

4137

--此時clustering factorblocks大致接近

SQL> explain plan for select * from test31 where object_id>5000 and object_id<6500;

Explained

SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

Plan hash value: 2819306605

--------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)

--------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 6957 | 631K| 66 (0)

| 1 | TABLE ACCESS BY INDEX ROWID| TEST31 | 6957 | 631K| 66 (0)

|* 2 | INDEX RANGE SCAN | INDEX_TEST31 | 6957 | | 9 (0)

--------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access("OBJECT_ID">5000 AND "OBJECT_ID"<6500)

14 rows selected

--cbo此時預設選擇了index range scan

透過以上測試可以看出clutering factor是索引是否能在cbo中執行的一個標誌,其值越低越讓cbo高效利用索引,不過不要以為rebuild重建索引可以降低clustering factorclustering factor只跟rows的順序相關。還有一個distinct_keys也就是表中的不同記錄的值,distinct_keys和表中的num_rows值越接近,則索引的選擇性越高,可以收集列資訊的直方圖histogram,讓cbo可以獲得更多的統計分析資訊,從而讓cbo選擇最正確的執行計劃!

[@more@]

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/25362835/viewspace-1056211/,如需轉載,請註明出處,否則將追究法律責任。

相關文章