SIZE AUTO和SIZE SKEWONLY在gather_table_stats時的區別

viadeazhu發表於2009-04-25

dbms_stats.gather_table_stats的引數method_opt有很多種用法,其中SIZE語法中有兩個關鍵字是AUTO和SKEWONLY。透過查閱10g官方文件:

method_opt

Accepts:

  • FOR ALL [INDEXED | HIDDEN] COLUMNS [size_clause]

  • FOR COLUMNS [size clause] column|attribute [size_clause] [,column|attribute [size_clause]...]

size_clause is defined as size_clause := SIZE {integer | REPEAT | AUTO | SKEWONLY}


- integer : Number of histogram buckets. Must be in the range [1,254].
- REPEAT : Collects histograms only on the columns that already have histograms.
- AUTO : Oracle determines the columns to collect histograms based on data distribution and the workload of the columns.
- SKEWONLY : Oracle determines the columns to collect histograms based on the data distribution of the columns.

The default is FOR ALL COLUMNS SIZE AUTO.The default value can be changed using the SET_PARAM Procedure.

從字面上解讀,AUTO和SKEWONLY的區別就在於收集histograms時ORACLE的選擇條件不同。

AUTO會根據column資料分佈情況以及column的負載情況進行判斷是不是收集這個column的histogram.

而SKEWONLY會根據column的資料分佈情況決定是否收集histogram。

這裡,我會強調這個“只”字,因為在SKEWONLY時,只要Oracle覺得資料分佈可能不均,就會收集列的柱狀圖。

而AUTO時,即使某一個column它的資料分佈不均,但只要這個column的workload很低,或者說沒有workload,那麼Oracle就不會收集這個column的histogram。

 

為什麼我會突然想起討論這個問題呢?

因為今天在tunning一個走full table scan的SQL的時候,這個SQL本可以走一個不錯的index,但由於這個index上的第一列資料分佈很不均,導致了CBO計算出來的走full table scan的cost小於走這個index的cost(這是透過10053 trace出來的。)。於是我在好幾個有同樣這個SQL的databases上用dbms_stats.gather_table_stats(method_opt=>for columns size auto )收集這個column的histogram。

但是奇怪的是,在某幾個用RBO(optimizer_mode=rule)的databases上,透過這樣收集統計資訊後並沒有在那一列上生成histogram。

後來,我就用了method_opt=>for columns size skewonly 進行收集,結果就成功生成histogram。

於是就產生了我在本文開頭的論述和疑問。

 

這兩種收集資訊的方式到底有什麼不同呢?

於是我在測試壞境中進行了10046 trace event,針對這AUTO和SKEWONLY這兩種收集統計資訊的方式。

從生成的兩份trace檔案進行比較,終於被我找到了可疑之處。

AUTO模式的trace檔案中清楚地記錄了Oracle的一個可疑的行為的一個SQL:

SELECT /*+ ordered use_nl(o c cu h) index(u i_user1) index(o i_obj2)
               index(ci_obj#) index(cu i_col_usage$)
               index(h i_hh_obj#_intcol#) */
               C.NAME COL_NAME, C.TYPE# COL_TYPE, C.CHARSETFORM. COL_CSF,
               C.DEFAULT$ COL_DEF, C.NULL$ COL_NULL, C.PROPERTY COL_PROP,
               C.COL# COL_UNUM, C.INTCOL# COL_INUM, C.OBJ# COL_OBJ, C.SCALE COL_SCALE,
               H.BUCKET_CNT H_BCNT,
               (T.ROWCNT-H.NULL_CNT)/GREATEST(H.DISTCNT,1) H_PFREQ, C.LENGTH COL_LEN, CU.TIMESTAMP CU_TIME,
               CU.EQUALITY_PREDS CU_EP, CU.EQUIJOIN_PREDS CU_EJP, CU.RANGE_PREDS CU_RP,
               CU.LIKE_PREDS CU_LP, CU.NONEQUIJOIN_PREDS CU_NEJP, CU.NULL_PREDS NP FROM
               SYS.USER$ U, SYS.OBJ$ O, SYS.TAB$ T, SYS.COL$ C, SYS.COL_USAGE$ CU, SYS.HIST_HEAD$
               H WHERE :B3 = '0' AND U.NAME = :B2 AND O.OWNER# = U.USER#
               AND O.TYPE# = 2 AND O.NAME = :B1 AND O.OBJ# = T.OBJ#
               AND O.OBJ# = C.OBJ# AND C.OBJ# = CU.OBJ#(+) AND C.INTCOL# = CU.INTCOL#(+)
               AND C.OBJ# = H.OBJ#(+)
               AND C.INTCOL# = H.INTCOL#(+) UNION ALL SELECT /*+ ordered use_nl(c) */
               C.KQFCONAM COL_NAME, C.KQFCODTY COL_TYPE, DECODE(C.KQFCODTY, 1, 1, 0)
               COL_CSF, NULL COL_DEF, 0 COL_NULL, 0 COL_PROP, C.KQFCOCNO COL_UNUM, C.KQFCOCNO COL_INUM,
               O.KQFTAOBJ COL_OBJ, DECODE(C.KQFCODTY, 2, -127, 0) COL_SCALE, H.BUCKET_CNT H_BCNT,
               (ST.ROWCNT-NULL_CNT)/GREATEST(H.DISTCNT,1) H_PFREQ, DECODE(C.KQFCODTY, 2, 22, C.KQFCOSIZ) COL_LEN,
               CU.TIMESTAMP CU_TIME, CU.EQUALITY_PREDS CU_EP, CU.EQUIJOIN_PREDS CU_EJP,
               CU.RANGE_PREDS CU_RP, CU.LIKE_PREDS CU_LP, CU.NONEQUIJOIN_PREDS CU_NEJP,
               CU.NULL_PREDS NP FROM SYS.X$KQFTA O, SYS.TAB_STATS$ ST, SYS.X$KQFCO C,
               SYS.COL_USAGE$ CU, SYS.HIST_HEAD$ H
               WHERE :B3 != '0' AND :B2 = 'SYS' AND O.KQFTANAM = :B1 AND O.KQFTAOBJ = ST.OBJ#(+)
               AND O.KQFTAOBJ = C.KQFCOTOB AND C.KQFCOTOB = CU.OBJ#(+) AND C.KQFCOCNO = CU.INTCOL#(+)
               AND C.KQFCOTOB = H.OBJ#(+) AND C.KQFCOCNO = H.INTCOL#(+);

而這個SQL並不在SKEWONLY模式的trace檔案裡。這難道就是傳說中的AUTO模式會多去考慮這個列的workload?

答案是肯定的!(其實這個結論在我看到IT-pub上另一篇文章之前,我也不敢如此肯定。。。)

這篇文章論證得其實很清楚,於是post上我論證時參考的文章:

http://www.itpub.net/thread-1096181-1-1.html

當看完此篇文章,頓時心生英雄所見略同的感慨。

我也在此之後做了相似的實驗,先把實驗結論奉上:

這個SQL果然就是AUTO模式下用來檢測the workload of column的,它查詢的最重要的一個系統檢視就是COL_USAGE$.

當這個檢視裡有屬於這個表這個列當前的monitoring資訊時,使用AUTO模式進行收集資料,並且在這列分佈情況很skew的情況下,會收集此列的histogram。

反之,當這個檢視裡沒有對此表此列當前monitoring資訊時,使用AUTO模式是不會產生此列histogram的,不管此列是否資料極其skew。

而,使用SKEWONLY模式時,則不會去檢測COL_USAGE$,只要此列資料skew,則會收集此列histogram。

 

具體實驗如下:

-----------TEST1-----------

SQL> create table test (id number,name varchar2(100));

Table created. 

SQL> insert into test  select 1,object_name from dba_objects;

23078 rows created. 

SQL> insert into test values(2,'xxx');

1 row created. 

SQL> commit;

Commit complete. 

SQL> create index testidx on test(id);

Index created. 

SQL> select TABLE_NAME,COLUMN_NAME,LOW_VALUE,HIGH_VALUE,DENSITY,NUM_BUCKETS,HISTOGRAM from dba_tab_columns

  2  where TABLE_NAME='TEST' and COLUMN_NAME='ID';  

TABLE_NAME                     COLUMN_NAME

------------------------------ ------------------------------------------------------------------------------------------

LOW_VALUE                                                        HIGH_VALUE                                                          DENSITY NUM_BUCKETS

---------------------------------------------------------------- ---------------------------------------------------------------- ---------- -----------

HISTOGRAM

---------------------------------------------

TEST                           ID 

NONE

SQL> set lines 180 pages 999

SQL> select ENDPOINT_NUMBER,ENDPOINT_VALUE from dba_histograms where TABLE_NAME='TEST' and COLUMN_NAME='ID'; 

no rows selected

SQL> select OWNER, OBJECT_ID,OBJECT_NAME from dba_objects where OBJECT_NAME='TEST' and wner='LONGRAW_USER'; 

OWNER                           OBJECT_ID OBJECT_NAME

------------------------------ ---------- ------------------------------

LONGRAW_USER                        44990 TEST

SQL> select * from sys.COL_USAGE$ where OBJ#=44990;            à when there is no rows in COL_USAGE$, the AUTO will not collect histograms. 

no rows selected

SQL> exec dbms_stats.gather_table_stats('LONGRAW_USER','TEST',estimate_percent=>100,cascade=>true,no_invalidate=>true,method_opt=>'for columns size auto ID'); 

PL/SQL procedure successfully completed.

SQL> set lines 180 pages 999

SQL> select ENDPOINT_NUMBER,ENDPOINT_VALUE from dba_histograms where TABLE_NAME='TEST' and COLUMN_NAME='ID'; 

ENDPOINT_NUMBER ENDPOINT_VALUE

--------------- --------------

              0              1

              1              2

SQL> select TABLE_NAME,COLUMN_NAME,LOW_VALUE,HIGH_VALUE,DENSITY,NUM_BUCKETS,HISTOGRAM from dba_tab_columns

  2  where TABLE_NAME='TEST' and COLUMN_NAME='ID';     

TABLE_NAME                     COLUMN_NAME

------------------------------ ------------------------------------------------------------------------------------------

LOW_VALUE                                                        HIGH_VALUE                                                          DENSITY NUM_BUCKETS

---------------------------------------------------------------- ---------------------------------------------------------------- ---------- -----------

HISTOGRAM

---------------------------------------------

TEST                           ID

C102                                                             C103                                                                     .5           1

NONE 

------------TEST2---------------

SQL> select * from test where id=2;       à it is to generate some statistics in COL_USAGE$

        ID

----------

NAME

-------------------------------------------------------------------------------------------------------------------------------------------------------------

         2

xxx 

(waiting for some while) 

SQL> select * from sys.COL_USAGE$ where OBJ#=44990;   à We can see that there is more row with EQUALITY_PREDS/RANGE_PREDS

      OBJ#    INTCOL# EQUALITY_PREDS EQUIJOIN_PREDS NONEQUIJOIN_PREDS RANGE_PREDS LIKE_PREDS NULL_PREDS TIMESTAMP

---------- ---------- -------------- -------------- ----------------- ----------- ---------- ---------- ---------------

     44990          1              1              0                 0           1          0          0 24-APR-09 

SQL> exec dbms_stats.gather_table_stats('LONGRAW_USER','TEST',estimate_percent=>100,cascade=>true,no_invalidate=>true,method_opt=>'for columns size auto ID');

PL/SQL procedure successfully completed. 

SQL> set lines 180 pages 999

SQL> select ENDPOINT_NUMBER,ENDPOINT_VALUE from dba_histograms where TABLE_NAME='TEST' and COLUMN_NAME='ID';

ENDPOINT_NUMBER ENDPOINT_VALUE

--------------- --------------

          23078              1

          23079              2 

SQL> select TABLE_NAME,COLUMN_NAME,LOW_VALUE,HIGH_VALUE,DENSITY,NUM_BUCKETS,HISTOGRAM from dba_tab_columns

  2  where TABLE_NAME='TEST' and COLUMN_NAME='ID'; 

TABLE_NAME                     COLUMN_NAME

------------------------------ ------------------------------------------------------------------------------------------

LOW_VALUE                                                        HIGH_VALUE                                                          DENSITY NUM_BUCKETS

---------------------------------------------------------------- ---------------------------------------------------------------- ---------- -----------

HISTOGRAM

---------------------------------------------

TEST                           ID

C102                                                             C103                                                             .000021665           2

FREQUENCY 

----------------TEST finishes--------------------

COL_USAGE$是關鍵的一個檢視,在metalink上也會找到此檢視的很多Bug。

既然說到它了,就再介紹下其中有個bug是說這個檢視會經常無限增長。直到不能再增長。。。

而workaround是:
exec DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;

如果某些表沒用了,你也可以drop table XXX purge來刪除此表相對應的monitoring stats.

 

搞清楚了這些問題之後,接著回到我先前遇到的一個問題:

為什麼在RBO的資料庫下用AUTO模式收集histogram失敗了呢

於是又引出了一個我自己透過實驗得出的猜測性結論,還沒有經過全方位的確認,僅僅是經驗上的結論

RBO一般情況下不會自動收集monitoring stats在COL_USAGE$中

舉個例子,其中一個RBO database:

SQL> select * from COL_USAGE$ where OBJ#=7840;

 

      OBJ#    INTCOL# EQUALITY_PREDS EQUIJOIN_PREDS NONEQUIJOIN_PREDS RANGE_PREDS LIKE_PREDS NULL_PREDS TIMESTAMP

---------- ---------- -------------- -------------- ----------------- ----------- ---------- ---------- ---------

      7840          1             10             10                 0           0          0          0 15-APR-09

而拿一個CBO database作為比較:

SQL> select * from COL_USAGE$ where OBJ#=7840;

      OBJ#    INTCOL# EQUALITY_PREDS EQUIJOIN_PREDS NONEQUIJOIN_PREDS RANGE_PREDS LIKE_PREDS NULL_PREDS TIMESTAMP

---------- ---------- -------------- -------------- ----------------- ----------- ---------- ---------- ---------------

      7840          1             50             15                 0           0          0          0 24-APR-09

      7840          2             14              0                 0           0          0          0 24-APR-09

      7840          4             18              0                 0           0          0          0 24-APR-09

      7840         15             20              0                 0           0          0         15 24-APR-09

      7840         12              6              0                 0           0          0          1 24-APR-09

      7840          3              4              0                 0           0          0          0 24-APR-09

      7840         16              7              0                 0           0          0         19 24-APR-09

      7840          5              7              0                 0           0          0          0 23-APR-09

      7840         10             42              0                 0           2          0          0 24-APR-09

      7840          6             11              0                 0           0          0          0 23-APR-09

      7840          7              6              0                 0           0          0          0 23-APR-09

      7840         13              7              0                 0           0          0          3 23-APR-09

      7840         14              1              0                 0           0          0          5 23-APR-09

      7840          8              2              0                 0           0          0          0 15-APR-09

      7840         17              2              0                 0           0          0          3 23-APR-09

 

ps,今天是2009-4-24,從上可以看出RBO以前那row很明顯已經過期了,但不知道什麼時候記錄下來對的了。

而CBO的monitoring資訊非常current,而這個SQL在這兩臺資料庫上都是每分鐘都在跑的,於是得出了前面對RBO的猜測性的結論。

這個結論也歡迎其他同學幫我佐證一下。

此文到此。。。下次再見。

 

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/15415488/viewspace-591765/,如需轉載,請註明出處,否則將追究法律責任。

相關文章