oracle的dbms_stats包詳細解說

小小黃-812發表於2014-08-20

dbms_stats包問世以後,Oracle專家可通過一種簡單的方式來為CBO收集統計資料。目前,已經不再推薦你使用老式的分析表和dbms_utility方法來生成CBO統計資料。那些古老的方式甚至有可能危及SQL的效能,因為它們並非總是能夠捕捉到有關表和索引的高質量資訊。CBO使用物件統計,為所有SQL語句選擇最佳的執行計劃。

dbms_stats能良好地估計統計資料(尤其是針對較大的分割槽表),並能獲得更好的統計結果,最終制定出速度更快的SQL執行計劃。


下邊給出了dbms_stats的一次示範執行情況,其中使用了options子句。
execdbms_stats.gather_schema_stats( -
ownname => 'SCOTT', -
options => 'GATHER AUTO', -
estimate_percent => dbms_stats.auto_sample_size, -
method_opt => 'for all columns size repeat', -
degree => 15 -
)

為了充分認識dbms_stats的好處,你需要仔細體會每一條主要的預編譯指令(directive)。下面讓我們研究每一條指令,並體會如何用它為基於代價的SQL優化器收集最高質量的統計資料。
options引數
使用4個預設的方法之一,這個選項能控制Oracle統計的重新整理方式:
gather——重新分析整個架構(Schema)。
gather empty——只分析目前還沒有統計的表。
gather stale——只重新分析修改量超過10%的表(這些修改包括插入、更新和刪除)。
gather auto——重新分析當前沒有統計的物件,以及統計資料過期(變髒)的物件。注意,使用gather auto類似於組合使用gather stale和gather empty。
注意,無論gather stale還是gather auto,都要求進行監視。如果你執行一個alter table xxx monitoring命令,Oracle會用dba_tab_modifications檢視來跟蹤發生變動的表。這樣一來,你就確切地知道,自從上一次分析統計資料以來,發生了多少次插入、更新和刪除操作。
 
estimate_percent選項
以下estimate_percent引數是一種比較新的設計,它允許Oracle的dbms_stats在收集統計資料時,自動估計要取樣的一個segment的最佳百分比:
estimate_percent => dbms_stats.auto_sample_size

要驗證自動統計取樣的準確性,你可檢視dba_tables sample_size列。一個有趣的地方是,在使用自動取樣時,Oracle會為一個樣本尺寸選擇5到20的百分比。記住,統計資料質量越好,CBO做出的決定越好。
method_opt選項
dbms_stats的method_opt引數尤其適合在表和索引資料發生變化時重新整理統計資料。method_opt引數也適合用於判斷哪些列需要直方圖(histograms)。
某些情況下,索引內的各個值的分佈會影響CBO是使用一個索引還是執行一次全表掃描的決策。例如,假如在where子句中指定的值的數量不對稱,全表掃描就顯得比索引訪問更經濟。
如果你有一個高度傾斜的索引(某些值的行數不對稱),就可建立Oracle直方圖統計。但在現實世界中,出現這種情況的機率相當小。使用CBO時,最常見的錯誤之一就是在CBO統計中不必要地引入直方圖。根據經驗,只有在列值要求必須修改執行計劃時,才應使用直方圖。

為了智慧地生成直方圖,Oracle為dbms_stats準備了method_opt引數。在method_opt子句中,還有一些重要的新選項,包括skewonly,repeat和auto:method_opt=>'for all columns size skewonly'
method_opt=>'for all columns size repeat'
method_opt=>'for all columns size auto'

skewonly選項會耗費大量處理時間,因為它要檢查每個索引中的每個列的值的分佈情況。
假如dbms_stat發現一個索引的各個列分佈得不均勻,就會為那個索引建立直方圖,幫助基於代價的SQL優化器決定是進行索引訪問,還是進行全表掃描訪問。例如,在一個索引中,假定有一個列在50%的行中,如清單B所示,那麼為了檢索這些行,全表掃描的速度會快於索引掃描。--*************************************************************
-- SKEWONLY option—Detailed analysis
--
-- Use this method for a first-time analysis for skewed indexes
-- This runs a long time because all indexes are examined
--*************************************************************
begin
dbms_stats.gather_schema_stats(
ownname => 'SCOTT',
estimate_percent => dbms_stats.auto_sample_size,
method_opt => 'for all columns size skewonly',
degree => 7
);
end;

重新分析統計資料時,使用repeat選項,重新分析任務所消耗的資源就會少一些。使用repeat選項(清單C)時,只會為現有的直方圖重新分析索引,不再搜尋其他直方圖機會。定期重新分析統計資料時,你應該採取這種方式。--**************************************************************
-- REPEAT OPTION - Only reanalyze histograms for indexes
-- that have histograms
--
-- Following the initial analysis, the weekly analysis
-- job will use the “repeat” option. The repeat option
-- tells dbms_stats that no indexes have changed, and
-- it will only reanalyze histograms for
-- indexes that have histograms.
--**************************************************************begin
dbms_stats.gather_schema_stats(
ownname => 'SCOTT',
estimate_percent => dbms_stats.auto_sample_size,
method_opt => 'for all columns size repeat',
degree => 7
);
end;
使用alter table xxx monitoring;命令來實現Oracle表監視時,需要使用dbms_stats中的auto選項。如清單D所示,auto選項根據資料分佈以及應用程式訪問列的方式(例如通過監視而確定的一個列的工作量)來建立直方圖。使用method_opt=>’auto’類似於在dbms_stats的option引數中使用gather auto。begin
dbms_stats.gather_schema_stats(
ownname => 'SCOTT',
estimate_percent => dbms_stats.auto_sample_size,
method_opt => 'for all columns size auto',
degree => 7
);
end;
並行收集
Oracle允許以並行方式來收集CBO統計資料,這就顯著提高了收集統計資料的速度。但是,要想並行收集統計資料,你需要一臺安裝了多個CPU的SMP伺服器。
更快的執行速度
dbms_stats是提高SQL執行速度的一種出色機制。通過使用dbms_stats來收集最高質量的統計資料,CBO能夠正確判斷執行任何SQL查詢時的最快途徑。dbms_stats還在不斷地改進。目前,它的一些令人激動的新特性(自動樣本大小和自動直方圖生成)已經顯著簡化了Oracle專家的工作。
 
 
 
One exciting feature of dbms_stats is the ability to automatically look for columns that should have histograms, and create the histograms. Multi-bucket histograms add a huge parsing overhead to SQL statements, and histograms should ONLY be used when the SQL will choose a different execution plan based upon the column value.

To aid in intelligent histogram generation, Oracle uses the method_opt parameter of dbms_stats. There are also important new options within the method_opt clause, namely skewonly, repeat and auto.

method_opt=>'for all columns size skewonly'
method_opt=>'for all columns size repeat'
method_opt=>'for all columns size auto'
In practice, there is a specific order to use the different options of dbms_stats. See this article for details. Let’s take a close look at each method option.

 

The method_opt=’SKEWONLY’ dbms_stats Option

The first is the “skewonly” option which very time-intensive because it examines the distribution of values for every column within every index. If dbms_stats discovers an index whose columns are unevenly distributed, it will create histograms for that index to aid the cost-based SQL optimizer in making a decision about index vs. full-table scan access. For example, if an index has one column that is in 50% of the rows, a full-table scan is faster than and index scan to retrieve these rows.

Histograms are also used with SQL that has bind variables and SQL with cursor_sharing enabled. In these cases, the CBO determines if the column value could affect the execution plan, and if so, replaced the bind variable with a literal and performs a hard parse.

 

--*************************************************************
-- SKEWONLY option – Detailed analysis
--
-- Use this method for a first-time analysis for skewed indexes
-- This runs a long time because all indexes are examined
--*************************************************************

begin
dbms_stats.gather_schema_stats(
ownname => 'SCOTT',
estimate_percent => dbms_stats.auto_sample_size,
method_opt => 'for all columns size skewonly',
degree => 7
);
end;
/


The method_opt=’REPEAT’ dbms_stats Option

Following the one-time detailed analysis, the re-analyze task will be less resource intensive with the REPEAT option. Using the repeat option will only re-analyze indexes with existing histograms, and will not search for other histograms opportunities. This is the way that you will re-analyze you statistics on a regular basis.

 

--**************************************************************
-- REPEAT OPTION - Only re-analyze histograms for indexes
-- that have histograms
--
-- Following the initial analysis, the weekly analysis
-- job will use the “repeat” option. The repeat option
-- tells dbms_stats that no indexes have changed, and
-- it will only re-analyze histograms for
-- indexes that have histograms.
--**************************************************************
begin
dbms_stats.gather_schema_stats(
ownname => 'SCOTT',
estimate_percent => dbms_stats.auto_sample_size,
method_opt => 'for all columns size repeat',
degree => 7
);
end;
/


The method_opt=’AUTO’ dbms_stats Option

The auto option is used when monitoring is implemented and creates histograms based upon data distribution and the manner in which the column is accessed by the application (e.g. the workload on the column as determined by monitoring, especially foreign keys to determine the cardinality of table join result sets). Using method_opt=>’auto’ is similar to using the gather auto in the option parameter of dbms_stats.

begin
dbms_stats.gather_schema_stats(
ownname => 'SCOTT',
estimate_percent => dbms_stats.auto_sample_size,
method_opt => 'for all columns size auto',
degree => 7
);
end;
/

Remember, analyzing for histograms is time-consuming, and histograms are used under two conditions:
Table join order – The CBO must know the size of the intermediate result sets (cardinality) to properly determine the correct join order the multi-table joins.

Table access method – The CBO needs to know about columns in SQL where clauses, where the column value is skewed such that a full-table scan might be faster than an index range scan. Oracle uses this skew information in conjunction with the clustering_factor columns of the dba_indexes view.

Hence, this is the proper order for using the dbms_stats package to locate proper columns for histograms:

1. Skewonly option - You want to use skewonly to do histograms for skewed columns, for cases where the value will make a difference between a full-table scan and an index scan.

2. Monitor - Next, turn-on monitoring. Issue an “alter table xx monitoring” and “alter index yyy monitoring” command for all segments in your schema. This will monitor workload against

3. Auto option - Once monitoring is in-place, you need to re-analyze with the "auto" option to create histograms for join columns within tables. This is critical for the CBO to determine the proper join order for finding the driving table in multi-table joins.

4. Repeat option - Finally, use the "repeat" option to re-analyze only the existing histograms.

Periodically you will want to re-run the skewonly and auto option to identify any new columns that require histograms. Once located, the repeat option will ensure that they are refreshed with current values.


--------------------------------------------------------------------------------

If you like DBA internal tricks, check-out my new book Creating a Self-tuning Oracle Database by Rampant TechPress. This book is now available at this link:

http://www.rampant-books.com/book_2...racle9i_sga.htm

Regards,
 
在使用DBMS_STATS分析表的時候,我們經常要儲存之前的分析,以防分析後導致系統效能低下然後進行快速恢復。
首先建立一個分析表,該表是用來儲存之前的分析值。
SQL> begin
2 dbms_stats.create_stat_table(ownname => 'TEST',stattab => 'STAT_TABLE');
3 end;
4 /
PL/SQL 過程已成功完成。
分析表資訊
SQL> BEGIN
2 --DBMS_STATS.delete_table_stats(ownname => 'TEST',tabname => 'A');
3 DBMS_STATS.gather_table_stats(ownname => 'TEST',tabname => 'A');
4 END;
5 /
PL/SQL 過程已成功完成。
匯出表分析資訊到stat_table中。
SQL> BEGIN
2 dbms_stats.export_table_stats(ownname => 'TEST',tabname => 'A',stattab => 'STAT_TABLE');
3 END;
4 /
PL/SQL 過程已成功完成。
SQL>
同理也有
EXPORT_COLUMN_STATS:匯出列的分析資訊
EXPORT_INDEX_STATS:匯出索引分析資訊
EXPORT_SYSTEM_STATS:匯出系統分析資訊
EXPORT_TABLE_STATS:匯出表分析資訊
EXPORT_SCHEMA_STATS:匯出方案分析資訊
EXPORT_DATABASE_STATS:匯出資料庫分析資訊
IMPORT_COLUMN_STATS:匯入列分析資訊
IMPORT_INDEX_STATS:匯入索引分析資訊
IMPORT_SYSTEM_STATS:匯入系統分析資訊
IMPORT_TABLE_STATS:匯入表分析資訊
IMPORT_SCHEMA_STATS:匯入方案分析資訊
IMPORT_DATABASE_STATS:匯入資料庫分析資訊
GATHER_INDEX_STATS:分析索引資訊
GATHER_TABLE_STATS:分析表資訊,當cascade為true時,分析表、列(索引)資訊
GATHER_SCHEMA_STATS:分析方案資訊
GATHER_DATABASE_STATS:分析資料庫資訊
GATHER_SYSTEM_STATS:分析系統資訊
SQL> select count(*) from stat_table;
COUNT(*)
----------
1
刪除分析資訊
SQL> BEGIN
2 DBMS_STATS.delete_table_stats(ownname => 'TEST',tabname => 'A');
3 END;
4 /
PL/SQL 過程已成功完成。
匯入分析資訊
SQL> BEGIN
2 DBMS_STATS.import_table_stats(ownname => 'TEST',tabname => 'A',stattab => 'STAT_TABLE');
3 END;
4 /
PL/SQL 過程已成功完成。
DBMS_STATS AND LEAF_BLOCKS
dbms_stats是oracle用來代替原有的analyze功能的一個包,與analyze相比dbms_stats具有很多優勢,比如並行,比如分割槽資訊統計等,但是dbms_stats再分析index的時候處理方式並不是太理想,dbms_stats分析index時將會只統計leaf_blocks為當前有資料的leaf block,而analyze則會統計為所有曾經被使用過的leaf block number,很顯然dbms_stats的統計結果會使index fast full scan的成本被嚴重低估,在某些情況下會錯誤得選擇index fast full scan做為執行路徑。下面來看一個例子:
 
先清空原來的表
SQL 10G>truncate table t1;
Table truncated.
 
插入資料
SQL 10G>insert into t1 select
2 rownum id,
3 trunc(100 * dbms_random.normal) val,
4 rpad(’x',100) padding
5 from
6 all_objects
7 where
8 rownum < = 10000
9 ;
10000 rows created.
 
SQL 10G>commit;
Commit complete.
 
建立索引,並限制pctfree為99,模擬大索引的產生
SQL 10G>create index ind_t1 on t1(id) pctfree 99;
Index created.
 
使用analyze分析索引
SQL 10G>ANALYZE INDEX IND_T1 COMPUTE STATISTICS;
Index analyzed.
 
SQL 10G>SELECT INDEX_NAME,BLEVEL,LEAF_BLOCKS FROM user_indexes where table_name=’T1′;
INDEX_NAME BLEVEL LEAF_BLOCKS
—————————— ———- ———–
IND_T1 2 10000
 
再使用dbms_stats分析,可以看到在這個時候基本統計資料是相同的
SQL 10G>EXEC DBMS_STATS.GATHER_INDEX_STATS(’TEST’,'IND_T1′);
PL/SQL procedure successfully completed.
 
SQL 10G>SELECT INDEX_NAME,BLEVEL,LEAF_BLOCKS FROM user_indexes where table_name=’T1′;
INDEX_NAME BLEVEL LEAF_BLOCKS
—————————— ———- ———–
IND_T1 2 10010
 
刪除資料使表中只保留一條記錄
SQL 10G>delete from t1 where rownum<10000;
9999 rows deleted.
 
SQL 10G>commit;
Commit complete.
 
再用analyze分析索引,可以發現leaf_blocks依然是10000
SQL 10G>ANALYZE INDEX IND_T1 COMPUTE STATISTICS;
Index analyzed.
 
SQL 10G>SELECT INDEX_NAME,BLEVEL,LEAF_BLOCKS FROM user_indexes where table_name=’T1′;
INDEX_NAME BLEVEL LEAF_BLOCKS
—————————— ———- ———–
IND_T1 2 10000
 
看cost這一項顯示index fast full scan的成本為2679,這是正確的
SQL 10G>set autotrace trace exp;
SQL 10G>select count(*) from t1;
Execution Plan
————————————————————-
| Id | Operation | Name | Rows | Cost (%CPU)|
————————————————————-
| 0 | SELECT STATEMENT | | 1 | 2679 (19)|
| 1 | SORT AGGREGATE | | 1 | |
| 2 | INDEX FAST FULL SCAN| IND_T1 | 1 | 2679 (19)|
————————————————————-
 
使用dbms_stats分析索引,leaf_blocks被統計為1,只統計了當前在用的leaf block
SQL 10G>EXEC DBMS_STATS.GATHER_INDEX_STATS(’TEST’,'IND_T1′);
PL/SQL procedure successfully completed.
 
SQL 10G>set autotrace off;
SQL 10G>SELECT INDEX_NAME,BLEVEL,LEAF_BLOCKS FROM user_indexes where table_name=’T1′;
INDEX_NAME BLEVEL LEAF_BLOCKS
—————————— ———- ———–
IND_T1 2 1
 
看cost這一項顯示index fast full scan的成本為1,這顯然是出現了錯誤
SQL 10G>set autotrace trace;
SQL 10G>select count(*) from t1;
Execution Plan
————————————————————-
| Id | Operation | Name | Rows | Cost (%CPU)|
————————————————————-
| 0 | SELECT STATEMENT | | 1 | 1 (0)|
| 1 | SORT AGGREGATE | | 1 | |
| 2 | INDEX FAST FULL SCAN| IND_T1 | 1 | 1 (0)|
————————————————————-
 
再來看看它究竟需要讀取多少個塊,是不是cost=1就夠了
 
SQL 10G> ALTER SESSION SET EVENTS ‘immediate trace name flush_cache’;
Session altered.
SQL 10G>select count(*) from t1;
Execution Plan
———————————————————-
————————————————————-
| Id | Operation | Name | Rows | Cost (%CPU)|
————————————————————-
| 0 | SELECT STATEMENT | | 1 | 1 (0)|
| 1 | SORT AGGREGATE | | 1 | |
| 2 | INDEX FAST FULL SCAN| IND_T1 | 1 | 1 (0)|
————————————————————-
Statistics
———————————————————-
0 recursive calls
0 db block gets
10035 consistent gets
10016 physical reads
0 redo size
411 bytes sent via SQL*Net to client
385 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
顯然,這裡發生了10016個physical reads,cost=1是遠遠不夠的。不知道oracle會不會就這個問題有改進方案,大家拭目以待。

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29144194/viewspace-1254709/,如需轉載,請註明出處,否則將追究法律責任。

相關文章