高效的SQL( clustering factor減少COST)

lovehewenyu發表於2012-12-11

高效的SQL( clustering factor減少COST)

 

1、建立樣表cluster_factor(x有序列,y隨意列);實驗表cluster01(low)cluster02(high)

doudou@TEST> create table cluster_factor (x int, y int);

Table created.

doudou@TEST> begin

  2  for i in 1..1000000 loop

  3  insert into cluster_factor values (i,to_char(dbms_random.random,'9999999999999999'));

  4  end loop;

  5  commit;

  6  end;

  7  /

PL/SQL procedure successfully completed.

cluster01

doudou@TEST> create table cluster01 as select * from cluster_factor;

Table created.

cluster02

doudou@TEST> create table cluster02 as select * from cluster_factor order by y;

Table created.

index_cluster01索引(索引列有序)

doudou@TEST> create index index_cluster01 on cluster01(x);

Index created.

index_cluster02索引(索引列無序)

doudou@TEST> create index index_cluster02 on cluster02(x);

Index created.

利用dbms_stats收集表的索引資訊  (cascade是否收集索引資訊選項)

doudou@TEST> begin

  2  dbms_stats.gather_table_stats(user,'cluster01',cascade=>true);

  3   dbms_stats.gather_table_stats(user,'cluster02',cascade=>true);

  4  end;

  5  /

PL/SQL procedure successfully completed.

doudou@TEST> exec dbms_stats.gather_table_stats('DOUDOU','cluster01',cascade=>true);

PL/SQL procedure successfully completed.

doudou@TEST> exec dbms_stats.gather_table_stats('DOUDOU','cluster02',cascade=>true);

PL/SQL procedure successfully completed.

 

2、開啟執行計劃並查詢SQL

doudou@TEST> set autot on

doudou@TEST> select avg(y/(x+1)) from cluster01 where x between 10000 and 30000;

 

AVG(Y/(X+1))

------------

  -369.65884

Execution Plan

----------------------------------------------------------

Plan hash value: 3265002277

 

------------------------------------------------------------------------------------------------

| Id  | Operation                    | Name            | Rows  | Bytes | Cost (%CPU)| Time     |

------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT             |                 |     1 |    12 |   143   (0)| 00:00:02 |

|   1 |  SORT AGGREGATE              |                 |     1 |    12 |            |          |

|   2 |   TABLE ACCESS BY INDEX ROWID| CLUSTER01       | 20144 |   236K|   143   (0)| 00:00:02 |

|*  3 |    INDEX RANGE SCAN          | INDEX_CLUSTER01 | 20144 |       |    47   (0)| 00:00:01 |

------------------------------------------------------------------------------------------------

 

Predicate Information (identified by operation id):

---------------------------------------------------

   3 - access("X">=10000 AND "X"<=30000)

Statistics

----------------------------------------------------------

          1  recursive calls

          0  db block gets

        116  consistent gets

         60  physical reads

          0  redo size

        434  bytes sent via SQL*Net to client

        400  bytes received via SQL*Net from client

          2  SQL*Net roundtrips to/from client

          0  sorts (memory)

          0  sorts (disk)

          1  rows processed

 

doudou@TEST> select avg(y/(x+1)) from cluster02 where x between 10000 and 30000;

AVG(Y/(X+1))

------------

  -369.65884

Execution Plan

----------------------------------------------------------

Plan hash value: 2721670139

 

--------------------------------------------------------------------------------

| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |

--------------------------------------------------------------------------------

|   0 | SELECT STATEMENT   |           |     1 |    12 |   542   (1)| 00:00:07 |

|   1 |  SORT AGGREGATE    |           |     1 |    12 |            |          |

|*  2 |   TABLE ACCESS FULL| CLUSTER02 | 20130 |   235K|   542   (1)| 00:00:07 |

--------------------------------------------------------------------------------

問題12個表的資料、索引都是一樣的。為什麼cluster01走索引,而cluster02全表掃描呢?】

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - filter("X"<=30000 AND "X">=10000)

Statistics

----------------------------------------------------------

          1  recursive calls

          0  db block gets

       2417  consistent gets

       2411  physical reads

          0  redo size

        434  bytes sent via SQL*Net to client

        400  bytes received via SQL*Net from client

          2  SQL*Net roundtrips to/from client

          0  sorts (memory)

          0  sorts (disk)

1    rows processed     

 

doudou@TEST> SELECT /*+ INDEX(CLUSTER02 INDEX_CLUSTER02)*/AVG(Y/(X+1)) FROM CLUSTER02 WHERE X BETWEEN 10000 AND 30000;

 

AVG(Y/(X+1))

------------

  -369.65884

Execution Plan

----------------------------------------------------------

Plan hash value: 924486639

 

------------------------------------------------------------------------------------------------

| Id  | Operation                    | Name            | Rows  | Bytes | Cost (%CPU)| Time     |

------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT             |                 |     1 |    12 | 20171   (1)| 00:04:03 |

|   1 |  SORT AGGREGATE              |                 |     1 |    12 |            |          |

|   2 |   TABLE ACCESS BY INDEX ROWID| CLUSTER02       | 20130 |   235K| 20171   (1)| 00:04:03 |

|*  3 |    INDEX RANGE SCAN          | INDEX_CLUSTER02 | 20130 |       |    47   (0)| 00:00:01 |

------------------------------------------------------------------------------------------------

問題2強制cluster02走索引,但是最後cost消耗還是很大】

Predicate Information (identified by operation id):

---------------------------------------------------

   3 - access("X">=10000 AND "X"<=30000)

Statistics

----------------------------------------------------------

          1  recursive calls

          0  db block gets

      20040  consistent gets

          0  physical reads

          0  redo size

        434  bytes sent via SQL*Net to client

        400  bytes received via SQL*Net from client

          2  SQL*Net roundtrips to/from client

          0  sorts (memory)

          0  sorts (disk)

1    rows processed 

 

doudou@TEST> select

  2      idx.index_name,

  3      tab.table_name,

  4      tab.num_rows,

  5      tab.blocks,

  6      idx.clustering_factor

  7  from

  8      user_indexes idx inner join user_tables tab

  9  on idx.table_name = tab.table_name

 10  order by table_name;

INDEX_NAME                     TABLE_NAME                       NUM_ROWS     BLOCKS CLUSTERING_FACTOR

------------------------------ ------------------------------ ---------- ---------- -----------------

INDEX_CLUSTER01                CLUSTER01                         1006949       2459              4726

INDEX_CLUSTER02                CLUSTER02                         1006232       2459            999628

cluster01cluster02clustering factor不同,發現了問題1的答案;

問題1的答案:物理分佈的不同導致了索引的選擇

問題2的答案:強制走索引不是適合所有的操作,有時也會造成更大的cost消耗

clustering factor 高,相鄰索引值指向更多不同的塊,本來一個塊可以返回的資訊卻需要資料庫去讀更多的塊,而這裡又強制使用rowid全表掃描所以造成了更多的cost)】  

 

3、總結:

lowclustering factor減少了對相同塊的重複讀,從而減少cost的消耗,

highclustering factor增加了對相同塊的重複讀,從而增加cost的消耗。

後語:建立表按照一定的順序是有必要的,這樣可以減低clustering factor(或者可以說索引建在有序的列上效能會好一些),從而優化sql

 

4、附表:指令碼

1、檢視clustering_factoer

select

    idx.index_name,

    tab.table_name,

    tab.num_rows,

    tab.blocks,

    idx.clustering_factor

from

    user_indexes idx inner join user_tables tab

on idx.table_name = tab.table_name

order by table_name;

 

2、收集資訊

 dbms_stats analyze

 

dbms_stats注意:1不可以收集叢集資訊,但可以收集單獨表來代替收集整個叢集。

              2、收集優化器統計優先考慮

analyze注意:1use the VALIDATE or LIST CHAINED ROWS clausescollect information on free list blocks

              2、收集優化器統計不優先考慮

總結:Oracle推薦收集優化器統計使用dbms_stats

 

Note:

Do not use the COMPUTE and ESTIMATE clauses of ANALYZE statement to collect optimizer statistics. These clauses are supported solely for backward compatibility and may be removed in a future release. The DBMS_STATS package collects a broader, more accurate set of statistics, and gathers statistics more efficiently.

You may continue to use ANALYZE statement to for other purposes not related to optimizer statistics collection:

To use the VALIDATE or LIST CHAINED ROWS clauses

To collect information on free list blocks

 

Statistics Gathering Procedures in the DBMS_STATS Package

Procedure

Collects

GATHER_INDEX_STATS

Index statistics

GATHER_TABLE_STATS

Table, column, and index statistics

GATHER_SCHEMA_STATS

Statistics for all objects in a schema

GATHER_DICTIONARY_STATS

Statistics for all dictionary objects

GATHER_DATABASE_STATS

Statistics for all objects in a database

 

 

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/26442936/viewspace-750901/,如需轉載,請註明出處,否則將追究法律責任。

相關文章