Oracle動態取樣分析

kunlunzhiying發表於2017-05-31

Oracle動態取樣分析



動態取樣概念

 

動態取樣(Dynamic Sampling)是在ORACLE 9i Release 2中開始引入的一個技術,引入它的目的是為了應對資料庫物件沒有分析(統計資訊缺失)的情況下,最佳化器生成更好的執行計劃。簡單的說,在資料庫段(表、索引、分割槽)物件沒有分析的情況下,為了使CBO最佳化器得到足夠多的資訊以保證最佳化器做出正確執行計劃而發明的一種技術。它會分析一定數量段物件上的資料塊獲取CBO需要的統計資訊。動態取樣技術僅僅是統計資訊的一種補充,它不能完全替代統計資訊分析。

Dynamic sampling first became available in Oracle9i Database Release 2. It is the ability of the cost-based optimizer (CBO) to sample the tables a query references during a hard parse, to determine better default statistics for unanalyzed segments, and to verify its “guesses.” This sampling takes place only at hard parse time and is used to dynamically generate better statistics for the optimizer to use, hence the name dynamic sampling.
 
The purpose of dynamic sampling is to improve server performance by determining more accurate estimates for predicate selectivity and statistics for tables and indexes. The statistics for tables and indexes include table block counts, applicable index block counts, table cardinalities, and relevant join column statistics. These more accurate estimates allow the optimizer to produce better performing plans.
 

動態取樣在Oracle 11g之前稱為 Dynamic Sampling, ORACLE 12c之後改名為Dynamic Statistic.

 

動態取樣介紹

如果要理解動態取樣,最好從鮮活的例子開始,向來理論都是枯燥乏味的。建立一個test表,總共有50319行資料。如下所示

SQL> create table test
  2 as 
  3 select owner, object_type
  4 from dba_objects;
 
Table created.
 
SQL> select count(1) from test;
 
  COUNT(1)
----------
     50319

我們使用dynamic_sampling(test 0)提示(hints)來禁用動態取樣(稍後動態取樣級別中介紹),從下面的執行計劃可以看出,在表物件沒有做分析情況下,如果禁用了動態取樣,CBO最佳化器唯一可以使用的資訊為該表儲存在資料字典的一些資訊,比如多少個extent,多少個block等,這些資訊往往不夠。此時最佳化器估計表test的行數為11027(如下所示), 跟實際的表記錄行數50319還是有蠻大的偏差。在複雜環境下,就很有可能導致CBO最佳化器做出錯誤的執行計劃。

SQL> set autotrace traceonly explain;
SQL> select /*+ dynamic_sampling(test 0) */ * from test;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 1357081020
 
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 11027 |   301K|    31   (0)| 00:00:01 |
|   1 | TABLE ACCESS FULL| TEST | 11027 |   301K|    31   (0)| 00:00:01 |
--------------------------------------------------------------------------
SQL> set autotrace off;

clip_image001

如果啟用動態取樣(預設情況下,動態取樣級別為2),最佳化器根據動態取樣得到一些資料資訊猜測、估計表TEST的記錄行數為48054,已經接近實際記錄行數50319了。比不做動態取樣分析要好很多了。當然你不能指望動態取樣獲取完全準確的資訊,因為它只是取樣了一些資料塊。

SQL> set autotrace traceonly explain;
SQL> select * from test;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 1357081020
 
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 48054 |  1313K|    32   (4)| 00:00:01 |
|   1 | TABLE ACCESS FULL| TEST | 48054 |  1313K|    32   (4)| 00:00:01 |
--------------------------------------------------------------------------
Note
-----
 - dynamic sampling used for this statement
 
SQL> set autotrace off;

clip_image002

如果我們將動態取樣的級別提高為3,如下所示,發現最佳化器根據動態取樣得到的資訊比預設(預設情況下,動態取樣級別為2)情況獲得的資訊更準確。最佳化器估計表TEST的行數為51463,比48054又接近實際情況一步了。

 
SQL> set autotrace traceonly explain;
SQL> select /*+ dynamic_sampling(test 3) */ * from test;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 1357081020
 
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 51463 |   703K|    32   (4)| 00:00:01 |
|   1 | TABLE ACCESS FULL| TEST | 51463 |   703K|    32   (4)| 00:00:01 |
--------------------------------------------------------------------------
SQL> set autotrace off;

clip_image003

在Tom的這篇文章中提到,在沒有動態取樣的情況下,如果刪除了該表資料,CBO最佳化器估算的結果集和沒有刪除之前是一樣的。這是因為當一個表的資料被刪除後,這個表所分配的extent和block是不會自動回收的(高水位線不變),所以CBO如果沒有采樣資料塊做分析,只是從資料字典中獲取extend等資訊,就會誤認為任然還有那麼多資料。

SQL> delete from test;
 
50319 rows deleted.
 
SQL> commit;
 
Commit complete.
 
SQL> set autotrace traceonly explain;
SQL> select /*+ dynamic_sampling(test 0) */ * from test;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 1357081020
 
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 11027 |   301K|    31   (0)| 00:00:01 |
|   1 | TABLE ACCESS FULL| TEST | 11027 |   301K|    31   (0)| 00:00:01 |
--------------------------------------------------------------------------
SQL> select * from test;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 1357081020
 
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    28 |    31   (0)| 00:00:01 |
|   1 | TABLE ACCESS FULL| TEST |     1 |    28 |    31   (0)| 00:00:01 |
--------------------------------------------------------------------------
Note
-----
 - dynamic sampling used for this statement
 
SQL> 

clip_image004

 

什麼時候使用動態取樣?

如下所示,我們使用包dbms_stats.gather_table_stats收集表Test的統計資訊過後,你會發現“dynamic sampling used for this statement”不見了,其實也就是說最佳化器發現有表TEST有分析過,它就不會使用動態取樣技術。其實開篇的時候已經敘說過“應對資料庫物件沒有分析(統計資訊缺失)的情況下,才會用到動態取樣技術“

SQL> set autotrace trace exp;
SQL> select * from test;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 1357081020
 
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    28 |    31   (0)| 00:00:01 |
|   1 | TABLE ACCESS FULL| TEST |     1 |    28 |    31   (0)| 00:00:01 |
--------------------------------------------------------------------------
Note
-----
 - dynamic sampling used for this statement
 
SQL> exec dbms_stats.gather_table_stats(user, 'test');
 
PL/SQL procedure successfully completed.
 
SQL> select * from test;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 1357081020
 
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    28 |    31   (0)| 00:00:01 |
|   1 | TABLE ACCESS FULL| TEST |     1 |    28 |    31   (0)| 00:00:01 |
--------------------------------------------------------------------------

clip_image005

第二種情況:當表TEST即使被分析過,如果查詢指令碼里麵包含臨時表,就會使用動態取樣技術。因為臨時表是不會被分析,它是沒有統計資訊的。如下所示

SQL> drop table test;
SQL> create table test
  2 as 
  3 select owner, object_type
  4 from dba_objects;
 
Table created.
 
SQL> exec dbms_stats.gather_table_stats(user, 'test');
 
PL/SQL procedure successfully completed.
 
 
SQL> create global temporary table tmp
  2  (object_type varchar2(19));
 
Table created.
 
SQL> insert into tmp
  2 select distinct object_type from dba_objects;
 
41 rows created.
 
SQL> commit;
 
Commit complete.
 
SQL> set autotrace traceonly explain;
 
SQL> select t.owner, l.object_type
  2 from test t inner join tmp l on t.object_type =l.object_type;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 19574435
 
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    25 |    35   (6)| 00:00:01 |
|*  1 |  HASH JOIN |      |     1 |    25 |    35   (6)| 00:00:01 |
|   2 | TABLE ACCESS FULL| TMP  |     1 |    11 |     2   (0)| 00:00:01 |
|   3 | TABLE ACCESS FULL| TEST | 49422 |   675K|    32   (4)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T"."OBJECT_TYPE"="L"."OBJECT_TYPE")
 
Note
-----
 - dynamic sampling used for this statement
 
SQL> 

clip_image006

動態取樣還有一個獨特能力,可以對不同列之間的相關性做統計。表統計資訊都是相對獨立的。當查詢涉及列之間的相關性時,統計資訊就顯得有些不足了,請看Tom的例子

建立一個特殊的表t,然後對欄位flag1、flag2建立索引t_idx,然後分析收集統計資訊

SQL> create table t
  2 as select decode(mod(rownum,2),0,'N', 'Y') flag1,
  3               decode(mod(rownum,2),0,'Y', 'N') flag2, a.*
  4 from all_objects a;
 
Table created.
 
SQL> create index t_idx on t(flag1, flag2);
 
Index created.
 
SQL> begin
  2   dbms_stats.gather_table_stats(user, 'T',      
  3        method_opt =>'for all indexed columns size 254');
  4 end;
  5  /
 
PL/SQL procedure successfully completed.

關於表t的行數情況如下所示,大家先不要糾結為什麼查詢獲取NUM_ROWS資料

SQL> select num_rows, num_rows/2, num_rows/2/2
  2 from user_tables
  3 where table_name='T';
 
  NUM_ROWS NUM_ROWS/2 NUM_ROWS/2/2
---------- ---------- ------------
     49875    24937.5     12468.75

首先看看對flag1過濾條件的SQL語句,CBO最佳化器猜測、估計的行數24757, 相當接近24937.5記錄數了。

SQL> set autotrace traceonly explain;
SQL> select * from t where flag1='N';
 
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
 
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 24757 |  2345K|   161   (2)| 00:00:02 |
|*  1 | TABLE ACCESS FULL| T    | 24757 |  2345K|   161   (2)| 00:00:02 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("FLAG1"='N')

首先看看對flag2過濾條件的SQL語句,CBO最佳化器猜測、估計的行數25118, 相當接近24937.5記錄數了。

SQL> select * from t where flag2='N';
 
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
 
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 25118 |  2379K|   161   (2)| 00:00:02 |
|*  1 | TABLE ACCESS FULL| T    | 25118 |  2379K|   161   (2)| 00:00:02 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("FLAG2"='N')

如果條件flag1 = 'N' and flag2 = 'N',我們根據邏輯推理判斷這樣的記錄肯定是不存在的,這也是苦心構造這個特例的初衷。下面看看CBO最佳化器怎麼探測、預測的

SQL> select * from t where flag1 = 'N' and flag2 = 'N';
 
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
 
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 12468 |  1181K|   160   (2)| 00:00:02 |
|*  1 | TABLE ACCESS FULL| T    | 12468 |  1181K|   160   (2)| 00:00:02 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("FLAG1"='N' AND "FLAG2"='N')

CBO估計的記錄數為12468,和實際情況相差非常遠。其實是CBO最佳化器這樣估算來的:

flag1=‘N' 的記錄數佔總數的1/2

flag2= 'N' 的記錄數佔總數的1/2

根據NUM_ROWS/2/2 =12468.這樣顯然是不合理的。下面我們透過提升動態取樣級別,來看看動態取樣是否能避免CBO的錯誤

SQL> select /*+ dynamic_sampling(t 3) */ * from t where flag1 = 'N' and flag2 = 'N';
 
Execution Plan
----------------------------------------------------------
Plan hash value: 470836197
 
-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     4 |   388 |     2   (0)| 00:00:01 |
|   1 | TABLE ACCESS BY INDEX ROWID| T     |     4 |   388 |     2   (0)| 00:00:01 |
|*  2 | INDEX RANGE SCAN          | T_IDX |     4 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("FLAG1"='N' AND "FLAG2"='N')
 
Note
-----
 - dynamic sampling used for this statement
 
SQL> 

clip_image007

 

動態取樣級別

ORACLE為動態取樣劃分了11個級別,詳情請見ORACLE 11g官方文件

Table 13-10 Dynamic Statistics Levels

Level

When the Optimizer Uses Dynamic Statistics

Sample Size (Blocks)

0

Do not use dynamic statistics

不做動態取樣分析

n/a

1

Use dynamic statistics for all tables that do not have statistics, but only if the following criteria are met:

· There is at least 1 nonpartitioned table in the query that does not have statistics.

· This table has no indexes.

· This table has more blocks than the number of blocks that would be used for dynamic statistics of this table.

Oracle 對沒有分析的表進行動態取樣,但需要同時滿足以下3個條件。

(1) SQL中至少有一個未分析的表(非分割槽表)

(2) 未分析的表沒有索引

(3) 未分析的表佔用的資料塊要大於動態取樣的資料塊(32個)

32

2

Use dynamic statistics if at least one table in the statement has no statistics. This is the default setting.

對所有的未分析表做分析,動態取樣的資料塊是預設資料塊數為64

64

3

Use dynamic statistics if any of the following conditions is true:

·

· The statement meets level 2 criteria.

·

· The statement has one or more expressions used in the WHERE clause predicates, for example, WHERE SUBSTR(cust_last_name,1,3).

取樣的表包含滿足Level 2定義的所有表,同時包括,那些謂詞有可能潛在地需要動態取樣的表,這些動態取樣的資料塊為預設資料塊,對沒有分析的表,動態取樣的預設塊為預設資料塊數量。

64

4

Use dynamic statistics if any of the following conditions is true:

·

· The statement meets level 3 criteria.

·

· The statement uses complex predicates (an OR or AND operator between multiple predicates on the same table).

取樣的表包含滿足Level 3定義的表,同時還包括一些表,他們包含一個單表的謂詞會引用另外的2個列或者更多的列;取樣的塊數是動態取樣預設資料塊數;對沒有分析的表,動態取樣的資料塊為預設資料塊的1倍。

64

5

Use dynamic statistics if the statement meets level 4 criteria.

取樣的表包含滿足Level 4定義的表,同時分別使用動態取樣預設資料塊的2倍的數量來做動態分析。

128

6

Use dynamic statistics if the statement meets level 4 criteria.

取樣的表包含滿足Level 4定義的表,同時分別使用動態取樣預設資料塊的4倍的數量來做動態分析。

256

7

Use dynamic statistics if the statement meets level 4 criteria.

取樣的表包含滿足Level 4定義的表,同時分別使用動態取樣預設資料塊的8倍的數量來做動態分析。

512

8

Use dynamic statistics if the statement meets level 4 criteria.

取樣的表包含滿足Level 4定義的表,同時分別使用動態取樣預設資料塊的32 倍的數量來做動態分析。

1024

9

Use dynamic statistics if the statement meets level 4 criteria.

取樣的表包含滿足Level 4定義的表,同時分別使用動態取樣預設資料塊的128倍的數量來做動態分析。

4086

10

Use dynamic statistics if the statement meets level 4 criteria.

取樣的表包含滿足Level 4定義的表,同時分別使用動態取樣對所有資料塊做動態分析。

All blocks

11

Use dynamic statistics automatically whenever the optimizer deems it necessary.

當最佳化器探測到需要的取樣時,對段段物件自動取樣

Automatically determined

取樣級別越高,取樣的資料塊越多,得到的分析資料就越接近於真實,但同時伴隨著資源消耗的開銷也增加了。這時一個需要權衡考慮的東西。ORACLE 10 g & 11g的預設取樣級別都為2,如下所示,一般使用在會話中使用dynamic_sampling提示來修改動態取樣級別。

SQL> show parameter optimizer_dynamic_sampling
 
NAME                               TYPE VALUE
------------------------------ ----------- -----------
optimizer_dynamic_sampling integer 2
SQL> 

另外一個方式就是透過提示hints裡修改動態取樣的級別。這個非常靈活、有用。

 

動態取樣注意事項

凡事有利必有弊,動態取樣也不是神器。它取樣的資料塊越多,系統開銷就越大,這樣會增加SQL硬解析的時間,如果是資料庫倉庫(DW、OLAP)環境,SQL執行時間相當長,硬解析時間只佔整個SQL執行時間的一小部分,那麼可以適當的提高動態取樣級別,這樣是有利於最佳化器獲取更加正確的資訊。一般設定為3或4比較合適。

但是在併發比較嚴重的OLTP系統中,每秒中有成千上萬的SQL語句執行,它要求SQL語句短小、執行時間短,所以在OLTP系統中應該減低動態取樣級別或不用動態取樣。可以參考下面:


When should I use dynamic sampling?” is a tricky question. As with any other feature, there are times to use it and times to avoid it. So far I’ve concentrated on the “goodness” of dynamic sampling, and based on that, it seems that you should set the level to 3 or 4 and just let the optimizer always use dynamic sampling to validate its guesses.

That makes sense in an environment in which you spend most of your time executing SQL and very little of your overall time hard-parsing the SQL. That is, the SQL you are executing runs for a long time and the parse time is a small portion of the overall execution time, such as in a data warehousing environment. There, dynamic sampling at levels above the default makes complete sense. You are willing to give the optimizer a little more time during a hard parse (when sampling takes place) to arrive at the optimal plan for these complex queries.

That leaves the other classic type of environment: the online transaction processing (OLTP) system. Here, in general, you are executing queries thousands of times per second and spend very little time executing a given query—the queries are typically small and fast. Increasing the parse time in an OLTP system might well cause you to spend more time parsing than executing SQL. You do not want to increase the parse times here, so higher levels of dynamic sampling would not be advisable


參考資料

https://blogs.oracle.com/optimizer/entry/dynamic_sampling_and_its_impact_on_the_optimizer

[讓Oracle跑得更快]---譚懷遠




http://blog.csdn.net/tianlesoftware/article/details/5845028

之前在說Oracle Optimizer中的CBO時講到,當表沒有做分析的時候,Oracle 會使用動態取樣來收集統計資訊。 獲取準確的段物件(表,表分割槽,索引等)的分析資料,是CBO存在的基石,CBO的機制就是收集儘可能多的物件資訊和系統資訊,透過對這些資訊進行計算,分析,評估,最終得出一個成本最低的執行計劃。 所以對於CBO,資料段的分析就非常重要。

 

Oracle Optimizer CBO RBO

http://blog.csdn.net/tianlesoftware/archive/2010/08/19/5824886.aspx

 

一.         先演示一個示例,來理解分析的作用

 

1.1建立表

SQL> create table t as select object_id,object_name from dba_objects where 1=2;

表已建立。

SQL> create index index_t on t(object_id);

索引已建立。

SQL> insert into t select object_id,object_name from dba_objects;

已建立72926行。

SQL> commit;

提交完成。

 

1.2檢視分的分析及執行計劃

SQL> select num_rows,avg_row_len,blocks,last_analyzed from user_tables where table_name='T';

  NUM_ROWS AVG_ROW_LEN     BLOCKS LAST_ANALYZED

---------- ----------- ---------- --------------

 

SQL> select blevel,leaf_blocks,distinct_keys,last_analyzed from user_indexes where table_name='T';

    BLEVEL LEAF_BLOCKS DISTINCT_KEYS LAST_ANALYZED

---------- ----------- ------------- --------------

         0           0             0 25-8 -10

 

從查詢結果看出,表的行數,行長,佔用的資料塊數及最後的分析時間都是空。 索引的相關資訊也沒有,說明這個表和說因都沒有被分析,如果此時有一條SQL 對錶做查詢,CBO 由於無法獲取這些資訊,很可能生成錯誤的執行計劃,如:

 

SQL> set linesize 200

SQL> set autot trace exp;

SQL> select /*+dynamic_sampling(t 0) */ * from t where object_id>30;

執行計劃

----------------------------------------------------------

Plan hash value: 80339723

 

---------------------------------------------------------------------------------------

| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |

---------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |         |     4 |   316 |    0   (0)| 00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID| T       |     4 |   316 |     0   (0)| 00:00:01 |

|*  2 |   INDEX RANGE SCAN          | INDEX_T |     1 |       |     0  (0)| 00:00:01 |

---------------------------------------------------------------------------------------

 

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("OBJECT_ID">30)

SQL>

 

Oracle 10g以後,如果一個表沒有做分析,資料庫將自動對它做動態取樣分析,所以這裡採用hint的方式將動態取樣的級別設定為0,即不使用動態取樣。

 

         從這個執行計劃,看書CBO 估計出表中滿足條件的記錄為4條,索引使用了索引。 我們對錶做一下分析,用結果比較一下。

 

1.3 分析表及檢視分析之後的執行計劃

分析可以透過兩中方式:

一種是analyze 命令,如:

analyze table tablename compute statistics for all indexes;

         還有一種就是透過DBMS_STATS包來分析,從9i 開始,Oracle 推薦使用DBMS_STATS包對錶進行分析操作,因為DBMS_STATS 提供了更多的功能,以及靈活的操作方式。

        

SQL> exec dbms_stats.gather_table_stats('SYS','T');

PL/SQL 過程已成功完成。

SQL> select blevel,leaf_blocks,distinct_keys,last_analyzed from user_indexes where table_name='T';

    BLEVEL LEAF_BLOCKS DISTINCT_KEYS LAST_ANALYZED

---------- ----------- ------------- --------------

         1         263         72926 25-8 -10

SQL> select num_rows,avg_row_len,blocks,last_analyzed from user_tables where table_name='T';

  NUM_ROWS AVG_ROW_LEN     BLOCKS LAST_ANALYZED

---------- ----------- ---------- --------------

     72926          29        345 25-8 -10

 

從上面的結果,可以看出DBMS_STATS.gather_table_stats已經對錶和索引都做了分析。 現在我們在來看一下執行計劃。

 

SQL> set autot trace exp;

SQL> select * from t where object_id>30;

執行計劃

----------------------------------------------------------

Plan hash value: 1601196873

--------------------------------------------------------------------------

| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |

--------------------------------------------------------------------------

|   0 | SELECT STATEMENT  |      | 72899 |  2064K|    96   (2)| 00:00:02 |

|*  1 |  TABLE ACCESS FULL| T    | 72899 |  2064K|    96   (2)| 00:00:02 |

--------------------------------------------------------------------------

 

Predicate Information (identified by operation id):

---------------------------------------------------

 

   1 - filter("OBJECT_ID">30)

 

從這個計劃,我們看出CBO 估算出的結果是72899 條記錄,與實際的72926很近。 此時選擇全表掃描更優。 透過這個例子,我們也看出了分析對執行計劃的重要性。

 

 




 

 

四.         動態取樣

 

4.1 什麼是動態取樣

         動態取樣(Dynamic Sampling)技術的最初提出是在Oracle 9i R2,在段(表,索引,分割槽)沒有分析的情況下,為了使CBO 最佳化器得到足夠的資訊以保證做出正確的執行計劃而發明的一種技術,可以把它看做分析手段的一種補充。

         當段物件沒有統計資訊時(即沒有做分析),動態取樣技術可以透過直接從需要分析的物件上收集資料塊(取樣)來獲得CBO需要的統計資訊。

 

一個簡單的例子:

 

建立表:

SQL> create table t

  2  as

  3  select owner,object_type from all_objects;

表已建立。

 

檢視錶的記錄數:

SQL> select count(*) from t;

COUNT(*)

----------

72236  -- 記錄數

 

這裡建立了一張普通表,沒有做分析,我們在hint中用0級來限制動態取樣,此時CBO 唯一可以使用的資訊就是表儲存在資料字典中的一些資訊,如有多少個extent,有多少個block,但是這些資訊是不夠的。

 

SQL> set autot traceonly explain

SQL> select /*+dynamic_sampling(t 0) */ * from t;

 

執行計劃

----------------------------------------------------------

Plan hash value: 1601196873

 

--------------------------------------------------------------------------

| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |

--------------------------------------------------------------------------

|   0 | SELECT STATEMENT  |      | 15928 |   435K|    55   (0)| 00:00:01 |

|   1 |  TABLE ACCESS FULL| T    | 15928 |   435K|    55   (0)| 00:00:01 |
--------------------------------------------------------------------------


在沒有做動態分析的情況下,CBO 估計的記錄數是15928條,與真實的72236 相差甚遠。

 

我們用動態分析來檢視一下:

SQL> select * from t;

執行計劃

----------------------------------------------------------

Plan hash value: 1601196873

 

--------------------------------------------------------------------------

| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |

--------------------------------------------------------------------------

|   0 | SELECT STATEMENT  |      | 80232 |  2193K|    56   (2)| 00:00:01 |

|   1 |  TABLE ACCESS FULL| T    | 80232 |  2193K|    56   (2)| 00:00:01 |

--------------------------------------------------------------------------

 

Note

-----

   - dynamic sampling used for this statement (level=2)

 

Oracle 10g中預設對沒有分析的段做動態取樣,上面的查詢結果顯示使用了Level 2級的動態取樣,CBO 估計的結果是80232 72236 很接近了。

 

注意一點:

         在沒有動態取樣的情況下,對於沒有分析過的段,CBO也可能錯誤地將結果判斷的程度擴大話。 如:

SQL> delete from t;

已刪除72236行。

SQL> commit;

提交完成。

SQL> select /*+dynamic_sampling(t 0) */ * from t;

執行計劃

----------------------------------------------------------

Plan hash value: 1601196873

--------------------------------------------------------------------------

| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |

--------------------------------------------------------------------------

|   0 | SELECT STATEMENT  |      | 15928 |   435K|    55   (0)| 00:00:01 |

|   1 |  TABLE ACCESS FULL| T    | 15928 |   435K|    55   (0)| 00:00:01 |

--------------------------------------------------------------------------

 

SQL> select * from t;

執行計劃

----------------------------------------------------------

Plan hash value: 1601196873

--------------------------------------------------------------------------

| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |

--------------------------------------------------------------------------

|   0 | SELECT STATEMENT  |      |     1 |    28 |    55   (0)| 00:00:01 |

|   1 |  TABLE ACCESS FULL| T    |     1 |    28 |    55   (0)| 00:00:01 |

--------------------------------------------------------------------------

Note

-----

   - dynamic sampling used for this statement (level=2)

 

如果細心一點,可能看出2個執行計劃的差別。 在沒有采用動態分析的情況下,CBO t表估計的還是15928行記錄,但是用動態分析就顯示1條記錄。 而表中的資料在查詢之前已經刪除掉了。  出現這種情況的原因是因為高水位。 雖然表的資料已經刪除,但是表分配的extent block 沒有被回收,所以在這種情況下CBO 依然認為有那麼多的資料在那。

        

         透過這一點,我們可以看出,此時CBO能夠使用的資訊非常有限,也就是這個表有幾個extent,有幾個block。 但動態取樣之後,Oracle 立即發現,原來資料塊中都是空的。

 

關於Oracle 高水位,參考我的blogOracle 高水位(HWM)

http://blog.csdn.net/tianlesoftware/archive/2009/10/22/4707900.aspx

 

動態取樣有兩方面的作用:

(1)       CBO 依賴的是充分的統計分析資訊,但是並不是每個使用者都會非常認真,及時地去對每個表做分析。 為了保證執行計劃都儘可能地正確,Oracle 需要使用動態取樣技術來幫助CBO 獲取儘可能多的資訊。

(2)       全域性臨時表。 通常來講,臨時表的資料是不做分析的,因為它存放的資料是臨時性的,可能很快就釋放了,但是當一個查詢關聯到這樣的臨時表時,CBO要想獲得臨時表上的統計資訊分析資料,就只能依賴於動態取樣了。

 

動態取樣除了可以在段物件沒有分析時,給CBO提供分析資料之外,還有一個獨特的能力,它可以對不同列之間的相關性做統計。

 

相對的,表分析的資訊是獨立的。 如:

(1)       表的行數,平均行長。

(2)       表的每個列的最大值,最小值,重複率,也可能包含直方圖。

(3)       索引的聚合因子,索引葉的塊數目,索引的高度等。

 

儘管看到動態取樣的優點,但是它的缺點也是顯而易見,否則Oracle 一定會一直使用動態取樣來取代資料分析:

(1)       取樣的資料塊有限,對於海量資料的表,結果難免有偏差。

(2)       取樣會消耗系統資源,特別是OLTP資料庫,尤其不推薦使用動態取樣。

 

 

4.2 動態取樣的級別

         Oracle 為動態取樣劃分了11個級別,在Oracle 的官網上詳細的介紹。

                   13.5.7.4 Dynamic Sampling Levels

             

 

The sampling levels are as follows if the dynamic sampling level used is from a cursor hint or from theOPTIMIZER_DYNAMIC_SAMPLING initialization parameter:

Level 0: Do not use dynamic sampling.

Level 1: Sample all tables that have not been analyzed if the following criteria are met: (1) there is at least 1 unanalyzed table in the query; (2) this unanalyzed table is joined to another table or appears in a subquery or non-mergeable view; (3) this unanalyzed table has no indexes; (4) this unanalyzed table has more blocks than the number of blocks that would be used for dynamic sampling of this table. The number of blocks sampled is the default number of dynamic sampling blocks (32).

Level 2: Apply dynamic sampling to all unanalyzed tables. The number of blocks sampled is two times the default number of dynamic sampling blocks.

Level 3: Apply dynamic sampling to all tables that meet Level 2 criteria, plus all tables for which standard selectivity estimation used a guess for a predicate that is a potential dynamic sampling predicate. The number of blocks sampled is the default number of dynamic sampling blocks. For unanalyzed tables, the number of blocks sampled is twice the default number of dynamic sampling blocks.

Level 4: Apply dynamic sampling to all tables that meet Level 3 criteria, plus all tables that have single-table predicates that reference 2 or more columns. The number of blocks sampled is the default number of dynamic sampling blocks. For unanalyzed tables, the number of blocks sampled is two times the default number of dynamic sampling blocks.

Levels 5, 6, 7, 8, and 9: Apply dynamic sampling to all tables that meet the previous level criteria using 2, 4, 8, 32, or 128 times the default number of dynamic sampling blocks respectively.

Level 10: Apply dynamic sampling to all tables that meet the Level 9 criteria using all blocks in the table.

 

The sampling levels are as follows if the dynamic sampling level for a table is set using the optimizer hint:

Level 0: Do not use dynamic sampling.

Level 1: The number of blocks sampled is the default number of dynamic sampling blocks (32).

Levels 2, 3, 4, 5, 6, 7, 8, and 9: The number of blocks sampled is 2, 4, 8, 16, 32, 64, 128, or 256 times the default number of dynamic sampling blocks respectively.

Level 10: Read all blocks in the table.

4.2.1 Level 0

         不做動態分析

 

4.2.2 Level 1

         Oracle 對沒有分析的表進行動態取樣,但需要同時滿足以下4個條件。

(1)       SQL中至少有一個未分析的表

(2)       未分析的表出現在關聯查詢或者子查詢中

(3)       未分析的表沒有索引

(4)       未分析的表佔用的資料塊要大於動態取樣預設的資料塊(32個)

 

4.2.3 Level 2

         對所有的未分析表做分析,動態取樣的資料塊是預設資料塊的2倍。

 

4.2.4 Level 3

         取樣的表包含滿足Level 2定義的所有表,同時包括,那些謂詞有可能潛在地需要動態取樣的表,這些動態取樣的資料塊為預設資料塊,對沒有分析的表,動態取樣的預設塊為預設資料塊的2倍。

 

4.2.5 Level 4

         取樣的表包含滿足Level 3定義的表,同時還包括一些表,他們包含一個單表的謂詞會引用另外的2個列或者更多的列;取樣的塊數是動態取樣預設資料塊數;對沒有分析的表,動態取樣的資料塊為預設資料塊的2倍。

 

4.2.6 Level 56789

         取樣的表包含滿足Level 4定義的表,同時分別使用動態取樣預設資料塊的24832128 倍的數量來做動態分析。

 

4.2.7 Level 10

         取樣的表包含滿足Level 9定義的所有表,同時對錶的所有資料進行動態取樣。

 

 

取樣的資料塊越多,得到的分析資料就越接近與真實,但同時伴隨著資源消耗的也越大。

 

 

4.3 什麼時候使用動態取樣

         動態取樣也需要額外的消耗資料庫資源,所以,如果 SQL 被反覆執行,變數被繫結,硬分析很少,在這樣一個環境中,是不宜使用動態取樣的,就像OLTP系統。 動態取樣發生在硬分析時,如果很少有硬分析發生,動態取樣的意義就不大。

 

         而在OLAP或者資料倉儲環境下,SQL執行消耗的資源要遠遠大於SQL解析,那麼讓解析在消耗多一點資源做一些動態取樣分析,從而做出一個最優的執行計劃是非常值得的。 實際上在這樣的環境中,硬分析消耗的資源幾乎是可以忽略的。

 

         所以,一般在OLAP 或者資料倉儲環境中,將動態取樣的level 設定為3或者比較好。 相反,在OLTP系統下,不應該使用動態取樣。







 動態取樣的增強特性

Different Level for DynamicStatistics (Dynamic Sampling) used than the Level Specified (文件 ID 1102413.1)

High Waits for 'cursor: pin S waiton X' due to Dynamic Sampling Against Parallel Queries (文件 ID 2006145.1)

Bug 17760686 - Bad Cardinalityestimation with dynamic sampling (文件 ID 17760686.8)


其中,文件1102413.1有如下論述:


Changes

The change in dynamic statistics is a result of anenhancement introduced in 11.2.
 --11.2
之後,Oracle對動態取樣進行了增強。

In previous versions the default value lead to certain situationswhere the result from using dynamic statistics resulted in bad plans, forexample, when the tables referenced in the query are
very large.

With this enhancement the optimizer itself changes the dynamic value undercertain conditions such as with  large tables and parallelism. Thelevel is automatically adjusted based on the size of the tables in the query.This is done only for queries that will run in parallel which is typical forData Warehouses and large databases. 
--
增強的功能是:在某些條件下,比如大表,開啟並行和複雜謂詞條件,最佳化器會自動啟動動態取樣,並且自行定義動態取樣的級別。
The automatic setting of the Dynamic Statistic level is designed such thatoverhead from using Dynamic Statistics is kept to a very minimum yet it willimprove its effectiveness.

While the fix for this unpublished enhancement request 
bug7452863  ER: USE DYNAMIC SAMPLING FOR KEY CHOICES WHEN ROWESTIMATES ARE DUBIOUS,  allows the auto-adjustment of the dynamicstatistic level for some queries, the tracing in 10053 may not correctlyreflect the adjustment made, and you may see a message of the sort:

--此特性是Oracle未公開的增強特性,相關FIX號是7452863


很顯然,從我們的執行計劃中可以很清晰的看到SQL啟用了並行功能,而在11.2之後oracle提供的增強特性中,對於並行/大表的複雜條件,即使表上存在統計資訊,它也會開啟動態取樣的功能,試圖來更精準的評估返回結果集的記錄數;然而,這裡很不幸,遇上了動態取樣本身的缺陷,導致動態取樣後評估的行數反而大大偏離了實際情況,導致SQL語句執行時間超長;

對於該增強特性,MOS也提供了相應的解決方案:


Solution

1. Use workaround of turning off the following “_fix_control”. To do this,  login assysdba and run the following:

alter system set “_fix_control”=’7452863:OFF';

--使用_fix_control引數關閉此特性

This turns off the automatic decisions about parallelism anddynamic sampling level.

2. Turn off parallelism from the table and indexes or set it to 1:

ALTER TABLE... NOPARALLEL;
ALTER TABLE ... PARALLEL 1;

ALTER INDEX... NOPARALLEL; 
ALTER INDEX ... PARALLEL 1;

--或者不啟動並行機制



High Waits for 'cursor: pin S wait on X' due to Dynamic Sampling Against Parallel Queries (文件 ID 2006145.1)

APPLIES TO:

Oracle Database - Enterprise Edition - Version 12.1.0.1 to 12.1.0.1 [Release 12.1]
Information in this document applies to any platform.

SYMPTOMS

  • The top wait event from AWR report shows 'cursor:pin s wait on X' as the top event:
     Event                       Waits  TotalWaitTime(sec)  WaitAvg(ms)  %DBtime 
     cursor: pin S wait on X       346                 18K        52050     64.0 
     library cache lock             64              8978.4       140287     31.9 
     
  • Problem queries are run in parallel
  • Stack from the process shows following functions:
    ksedsts<-ksdxfstk<-ksdxcb<-sspuser<-__sighandler<-__select<-skgpwwait<-kgxWait
    <-kgxSharedExamine<-kxsGetRuntimeLock<-kkscsCheckCursor<-kkscsSearchChildList<-kksfbc
    <-kkspsc0<-kksParseCursor<-opiosq0<-kpoal8




  • This issue may also occur when running Real Application Testing and replaying data. The AWR Compare Period report after the replay may show higher waits for 'cursor:pin s wait on X' compared to the original in the top wait events section

CAUSE

This is due to an unpublished bug that is fixed in 12.2 or later and also included in the 12.1.0.2.0 patchset. 

From Oracle Database 11g Release 2 onwards, the optimizer will automatically decide if dynamic sampling will be useful and what dynamic sampling level will be used for SQL statements executed in parallel. This decision is based on size of the tables in the statement and the complexity of the predicates. Executing the dynamic sampling query for the corresponding parallel query causes multiple slaves to wait to get an exclusive lock, with corresponding performance degradation. If the OPTIMIZER_DYNAMIC_SAMPLING parameter is explicitly set to a non-default value, then that specified value will be honored.

SOLUTION


1. Use workaround of turning off the following “_fix_control”. To do this,  login as sysdba and run the following:

alter system set “_fix_control”=’7452863:OFF';

This turns off the automatic decisions about parallelism and dynamic sampling level.

2. Turn off parallelism from the table and indexes or set it to 1:

ALTER TABLE ... NOPARALLEL;
ALTER TABLE ... PARALLEL 1;

ALTER INDEX ... NOPARALLEL; 
ALTER INDEX ... PARALLEL 1;




Different Level for Dynamic Statistics (Dynamic Sampling) used than the Level Specified (文件 ID 1102413.1)

In this Document

Symptoms
Changes
Cause
Solution
References


APPLIES TO:

Oracle Database - Enterprise Edition - Version 11.2.0.1 and later
Information in this document applies to any platform.
When performing an explain on a query it can be seen that the level used for Dynamic Statistics is higher than the level specified
for the parameter OPTIMIZER_DYNAMIC_SAMPLING

NOTE: Prior to 11.2.0.4, Dynamic Statistics was referred to as Dynamic Sampling

SYMPTOMS

Even though dynamic statistics is set through the parameter OPTIMIZER_DYNAMIC_SAMPLING or  a DYNAMIC_SAMPLING  hint , as shown below, the level shown for the plan may be higher.

SQL>  select /*+ dynamic_sampling (my_table 2) */ * from my_table;

Execution Plan
----------------------------------------------------------
Plan hash value: 3006137970

--------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |  9408K|  1704M|  4000   (2)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR      |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)| :TQ10000 |  9408K|  1704M|  4000   (2)| 00:00:01 |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX BLOCK ITERATOR |          |  9408K|  1704M|  4000   (2)| 00:00:01 |  Q1,00 | PCWC |            |
|   4 |     TABLE ACCESS FULL| MY_TABLE |  9408K|  1704M|  4000   (2)| 00:00:01 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=5)



In 10053 trace this can be identified by the following statement in the trace file:

Dynamic sampling level auto-adjusted from 2 to 5

 

CHANGES

The change in dynamic statistics is a result of an enhancement introduced in 11.2.
 

In previous versions the default value lead to certain situations where the result from using dynamic statistics resulted in bad plans, for example, when the tables referenced in the query are
very large.
 
With this enhancement the optimizer itself changes the dynamic value under certain conditions such as with  large tables and parallelism. The level is automatically adjusted based on the size of the tables in the query. This is done only for queries that will run in parallel which is typical for Data Warehouses and large databases. 

The automatic setting of the Dynamic Statistic level is designed such that overhead from using Dynamic Statistics is kept to a very minimum yet it will improve its effectiveness.

While the fix for this unpublished enhancement request   ER: USE DYNAMIC SAMPLING FOR KEY CHOICES WHEN ROW ESTIMATES ARE DUBIOUS,  allows the auto-adjustment of the dynamic statistic level for some queries, the tracing in 10053 may not correctly reflect the adjustment made, and you may see a message of the sort:

Dynamic sampling level auto-adjusted from 7 to 7

 This is documented in unpublished  CONFUSING DYNAMIC SAMPLING AUTO-ADJUST TRACE

CAUSE

This is a result of unpublished Enhancement Request filed under unpublished .

SOLUTION

This is expected behaviour

REFERENCES




Bug 17760686 - Bad Cardinality estimation with dynamic sampling (文件 ID 17760686.8)

Bug 17760686  Bad Cardinality estimation with dynamic sampling

 This note gives a brief overview of bug 17760686. 
 The content was last updated on: 16-MAY-2017
 Click here for details of each of the sections below.

Affects:

Product (Component) Oracle Server (Rdbms)
Range of versions believed to be affected Versions BELOW 12.2
Versions confirmed as being affected
Platforms affected Generic (all / most platforms affected)

Fixed:

The fix for 17760686 is first included in

Interim patches may be available for earlier versions - click  to check.

Symptoms:

Related To:

Description

A dynamic sampling query may return a little or zero cardinality estimate for a query involving a partitioned table stored on an ASSM tablespace. Workaround Raise the level of dynamic sampling.
 


Please note: The above is a summary description only. Actual symptoms can vary. Matching to any symptoms here does not confirm that you are encountering this problem. For questions about this bug please consult Oracle Support.


References

 (This link will only work for PUBLISHED bugs)
Note:245840.1 Information on the sections in this article




About Me

...............................................................................................................................

● 本文整理自網路,http://www.cnblogs.com/kerrycode/p/3854971.html,http://blog.csdn.net/tianlesoftware/article/details/5845028

● 本文在itpub(http://blog.itpub.net/26736162)、部落格園(http://www.cnblogs.com/lhrbest)和個人微信公眾號(xiaomaimiaolhr)上有同步更新

● 本文itpub地址:http://blog.itpub.net/26736162/abstract/1/

● 本文部落格園地址:http://www.cnblogs.com/lhrbest

● 本文pdf版及小麥苗雲盤地址:http://blog.itpub.net/26736162/viewspace-1624453/

● 資料庫筆試面試題庫及解答:http://blog.itpub.net/26736162/viewspace-2134706/

● QQ群:230161599     微信群:私聊

● 聯絡我請加QQ好友(646634621),註明新增緣由

● 於 2017-05-09 09:00 ~ 2017-05-30 22:00 在魔都完成

● 文章內容來源於小麥苗的學習筆記,部分整理自網路,若有侵權或不當之處還請諒解

● 版權所有,歡迎分享本文,轉載請保留出處

...............................................................................................................................

拿起手機使用微信客戶端掃描下邊的左邊圖片來關注小麥苗的微信公眾號:xiaomaimiaolhr,掃描右邊的二維碼加入小麥苗的QQ群,學習最實用的資料庫技術。

Oracle動態取樣分析
DBA筆試面試講解
歡迎與我聯絡

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/28211342/viewspace-2140104/,如需轉載,請註明出處,否則將追究法律責任。

相關文章