淺談索引系列之聚簇因子(clustering_factor)

13811135929發表於2017-02-25

初次聽說聚簇因子,我相信大部分人都是丈二和尚摸不著頭腦,不知所云。然而當我們瞭解其真正含義後,也許會覺得這個高大上的名詞也沒有那麼神秘。廢話少說,上官方文件:
Index Clustering Factor
    The  measures row order in relation to an indexed value such as employee last name. The more order that exists in row storage for this value, the lower the clustering factor.

The clustering factor is useful as a rough measure of the number of I/Os required to read an entire table by means of an index

  • If the clustering factor is high, then Oracle Database performs a relatively high number of I/Os during a large index range scan. The index entries point to random table blocks, so the database may have to read and reread the same blocks over and over again to retrieve the data pointed to by the index.
  • If the clustering factor is low, then Oracle Database performs a relatively low number of I/Os during a large index range scan. The index keys in a range tend to point to the same data block, so the database does not have to read and reread the same blocks over and over.
     簡單來講聚簇因子用來反映索引鍵值欄位存放雜亂排序程度的一個度量。那麼聚簇因子是怎麼計算得到的呢?Oracle在計算其值的時候,會對每個索引鍵值查詢對應表的資料,在查詢過程中,會跟蹤資料塊間跳轉的次數(當然,資料庫不可能真的這麼做,原始碼只是簡單的掃描索引,得到rowid,然後根據rowid就可以獲得資料塊的地址了),每一次跳轉,計數器都會增加,計算完全部的索引鍵值得到的結果就是聚簇因子。因此表的某個欄位越有序,若在此欄位建立索引,索引對應的聚簇因子就越小。
    下面透過實驗作一個直觀的展現:

點選(此處)摺疊或開啟

  1. drop table colocated;
  2. create table colocated ( x int, y varchar2(2000));
  3. begin
  4.   for i in 1..100000 loop
  5.      insert into colocated values(i,dbms_random.string('a',5));
  6.   end loop;
  7.   commit;
  8. end;
  9. /
  10. alter table colocated add constraint colocated_pk primary key(x);

  11. drop table disorganized;
  12. create table disorganized as select x,y from colocated order by y;
  13. alter table disorganized add constraint disorganized_pk primary key(x);
  14. exec dbms_stats.gather_table_stats(USER,'COLOCATED',CASCADE=>TRUE);
  15. exec dbms_stats.gather_table_stats(USER,'DISORGANIZED',CASCADE=>TRUE);
  16. SELECT a.index_name,b.num_rows,b.blocks,a.clustering_factor FROM user_indexes a,user_tables b WHERE a.table_name=b.table_name AND a.table_name='COLOCATED';
  17. INDEX_NAME NUM_ROWS BLOCKS CLUSTERING_FACTOR
  18. ------------------------------ ---------- ---------- -----------------
  19. COLOCATED_PK 100000 244 219

  20. SELECT a.index_name,b.num_rows,b.blocks,a.clustering_factor FROM user_indexes a,user_tables b WHERE a.table_name=b.table_name AND a.table_name='DISORGANIZED';

  21. INDEX_NAME NUM_ROWS BLOCKS CLUSTERING_FACTOR
  22. ------------------------------ ---------- ---------- -----------------
  23. DISORGANIZED_PK 100000 232 99572
       ID按照有序依次插入到COLOCATED表中,而DISORGANIZED表中的ID列則是隨機分佈的。因此索引COLOCATED_PK的聚簇因子較小,基本等於表的BLOCKS數,索引DISORGANIZED_PK的聚簇因子較大,基本等於表NUM_ROWS
       瞭解了聚簇因子的概念,具體聚簇因子有什麼用途呢?下面進一步說明。
     

點選(此處)摺疊或開啟

  1. set autotrace traceonly;
  2. SQL> select * from COLOCATED where x between 20000 and 20050;
  3. 51 rows selected.

  4. Execution Plan
  5. ----------------------------------------------------------
  6. Plan hash value: 1550765370

  7. --------------------------------------------------------------------------------------------
  8. | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
  9. --------------------------------------------------------------------------------------------
  10. | 0 | SELECT STATEMENT | | 52 | 572 | 3 (0)| 00:00:01 |
  11. | 1 | TABLE ACCESS BY INDEX ROWID| COLOCATED | 52 | 572 | 3 (0)| 00:00:01 |
  12. |* 2 | INDEX RANGE SCAN | COLOCATED_PK | 52 | | 2 (0)| 00:00:01 |
  13. --------------------------------------------------------------------------------------------

  14. Predicate Information (identified by operation id):
  15. ---------------------------------------------------

  16.    2 - access("X">=20000 AND "X"<=20050)


  17. Statistics
  18. ----------------------------------------------------------
  19.           0 recursive calls
  20.           0 db block gets
  21.          11 consistent gets
  22.           0 physical reads
  23.           0 redo size
  24.        2000 bytes sent via SQL*Net to client
  25.         556 bytes received via SQL*Net from client
  26.           5 SQL*Net roundtrips to/from client
  27.           0 sorts (memory)
  28.           0 sorts (disk)
  29.          51 rows processed

  30. SQL> select * from DISORGANIZED where x between 20000 and 20050;
  31. 51 rows selected.
  32. Execution Plan
  33. ----------------------------------------------------------
  34. Plan hash value: 2594580634

  35. -----------------------------------------------------------------------------------------------
  36. | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
  37. -----------------------------------------------------------------------------------------------
  38. | 0 | SELECT STATEMENT | | 52 | 572 | 54 (0)| 00:00:01 |
  39. | 1 | TABLE ACCESS BY INDEX ROWID| DISORGANIZED | 52 | 572 | 54 (0)| 00:00:01 |
  40. |* 2 | INDEX RANGE SCAN | DISORGANIZED_PK | 52 | | 2 (0)| 00:00:01 |
  41. -----------------------------------------------------------------------------------------------

  42. Predicate Information (identified by operation id):
  43. ---------------------------------------------------

  44.    2 - access("X">=20000 AND "X"<=20050)


  45. Statistics
  46. ----------------------------------------------------------
  47.           1 recursive calls
  48.           0 db block gets
  49.          56 consistent gets
  50.           0 physical reads
  51.           0 redo size
  52.        2000 bytes sent via SQL*Net to client
  53.         556 bytes received via SQL*Net from client
  54.           5 SQL*Net roundtrips to/from client
  55.           0 sorts (memory)
  56.           0 sorts (disk)
  57.          51 rows processed

      COLOCATEDDISORGANIZED兩張表儲存了同樣的語句,相同的SQL語句執行,聚簇因子小的表邏輯讀為11,聚簇因子大的邏輯讀卻為56,大家都知道邏輯讀和鎖機制息息相關,鎖會影響到了資料庫的併發性,也會影響效能。
     小結:
   1.聚簇因子的高低會影響執行計劃的選擇。
  2.聚簇因子與表的儲存有關,無法透過重建索引來改變聚簇因子的大小。

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29827284/viewspace-2134262/,如需轉載,請註明出處,否則將追究法律責任。

相關文章