在平時的工作中，可能會碰到一種很奇怪的問題，本來在生產環境中有些sql語句執行沒有問題，一個很普通的查詢預期走了索引掃面，但是複製資料到其它環境之後，就發現卻走了全表掃描。
或者情況相反，本來出現問題的查詢走了全表掃描，我們嘗試在測試環境中浮現，但是測試環境中在相同的資料量的情況下，查詢卻又走了索引掃描，問題無法復現了。
出現這種情況的原因比較複雜，涉及很多的原因，其中一個很重要的原因就是聚簇因子的導致的。
聚簇因子是一個與索引相關的統計資訊，它透過檢視錶中的資料塊來進行計算得到。
對於這個問題，可能直接說理論會有些枯燥。可以透過如下的問題來進行說明。

create table t1 as select trunc(rownum/100) id ,object_name from all_objects where rownum<1000;
create table t2 as select mod(rownum,100) id ,object_name from all_objects where rownum<1000;

create index inx_t1 on t1(id);
create index inx_t2 on t2(id);

exec dbms_stats.gather_table_stats(null,'T1',CASCADE=>true);
exec dbms_stats.gather_table_stats(null,'T2',CASCADE=>true);

檢視錶t1的資料類似下面的格式。
SQL> select *from t1 where rownum<20;

        ID OBJECT_NAME
---------- ------------------------------
         0 ICOL$
         0 I_USER1
         0 CON$
         0 UNDO$
         0 C_COBJ#
         0 I_OBJ#
         0 PROXY_ROLE_DATA$
         0 I_IND1
         0 I_CDEF2
         0 I_OBJ5
         0 I_PROXY_ROLE_DATA$_1
         0 FILE$
         0 UET$
         0 I_FILE#_BLOCK#
         0 I_FILE1
         0 I_CON1
         0 I_OBJ3
         0 I_TS#
         0 I_CDEF4

19 rows selected.

檢視錶t2的資料類似下面的格式。
SQL> select *from t2 where rownum<20;

        ID OBJECT_NAME
---------- ------------------------------
         1 ICOL$
         2 I_USER1
         3 CON$
         4 UNDO$
         5 C_COBJ#
         6 I_OBJ#
         7 PROXY_ROLE_DATA$
         8 I_IND1
         9 I_CDEF2
        10 I_OBJ5
        11 I_PROXY_ROLE_DATA$_1
        12 FILE$
        13 UET$
        14 I_FILE#_BLOCK#
        15 I_FILE1
        16 I_CON1
        17 I_OBJ3
        18 I_TS#
        19 I_CDEF4

19 rows selected.
下面的表格能夠簡要的說明資料的分佈。
T1中資料的分佈。

0	0	0	0	0
0	0	0	0	0
.	.	.	.	.
1	1	1	1	1
1	1	1	1	1
.	.	.	.	.

T2中資料的分佈。

1	2	3	4	5
6	7	8	9	10
11	12	13	14	15
.	.	.	.	.
1	2	3	4	5
6	7	8	9	10

我們來看看同樣的查詢對應的執行計劃。

SQL>select *from t1 where id=2;
Execution Plan
----------------------------------------------------------
Plan hash value: 2808986199

--------------------------------------------------------------------------------------
| Id | Operation                   | Name   | Rows | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |   100 | 1800 |     2   (0)| 00:00:01 |
|   1 | TABLE ACCESS BY INDEX ROWID| T1     |   100 | 1800 |     2   (0)| 00:00:01 |
|* 2 |   INDEX RANGE SCAN          | INX_T1 |   100 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID"=2)
Statistics
----------------------------------------------------------
          1 recursive calls
          0 db block gets
         17 consistent gets
          0 physical reads
          0 redo size
       4130 bytes sent via SQL*Net to client
        586 bytes received via SQL*Net from client
          8 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
        100 rows processed

SQL>select *from t1 where id=2;

10 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1513984157

--------------------------------------------------------------------------
| Id | Operation         | Name | Rows | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT |      |    10 |   180 |     4   (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T2   |    10 |   180 |     4   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("ID"=2)
Statistics
----------------------------------------------------------
          1 recursive calls
          0 db block gets
          6 consistent gets
          0 physical reads
          0 redo size
        820 bytes sent via SQL*Net to client
        520 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
         10 rows processed
可以看到一個走了索引掃描，一個走了全表掃描。這個時候我們再來看看聚簇因子。
SQL>select i.table_name,i.index_name,i.CLUSTERING_FACTOR,t.blocks,i.NUM_ROWS from user_tables t,user_indexes i where t.table_name=i.table_name and t.table_name in ('T1','T2');
TAB INDEX_NAME           CLUSTERING_FACTOR     BLOCKS   NUM_ROWS
--- -------------------- ----------------- ---------- ----------
T1 INX_T1                               4          7        999
T2 INX_T2                             365          7        999

表t2的資料分佈比較散，表的聚簇度高，接近於表中的資料，對於id=2,因為資料分佈得很開，掃描的資料塊就很很多，就很可能走全表掃描。而表中t1的資料聚簇度低，比如要查詢id=2的資料，因為這些資料分佈比較集中，掃描的資料塊就要很少，索引就很可能走索引掃描。

對於聚簇因子，可以透過重建索引，重建表，或者重新組織索引來改進，但是從實現的角度來說很困難，畢竟資料的分佈情況很難模擬，如果要進行問題的復現和排查還是需要掌握不少的細節,透過備份庫來複現問題也是一種思路。
tom對於聚簇因子的解釋如下。

Note that typically only 1 index per table will be heavily clustered (if any).  It would 
be extremely unlikely for 2 indexes to be very clustered.

If you want an index to be very clustered -- consider using index organized tables.  They 
force the rows into a specific physical location based on their index entry.

Otherwise, a rebuild of the table is the only way to get it clustered (but you really 
don't want to get into that habit for what will typically be of marginal overall 
improvement).

聚簇因子和執行計劃的聯絡

相關文章