示例說明動態取樣的作用,並演示動態取樣對有內在關係的多列查詢的影響

buptdream發表於2014-08-06
 動態取樣技術是為了彌補CBO 優化器下,在段物件(表,索引,分割槽)沒有分析的情況下,為了使優化器得到得到足夠的資訊以保證做出正確的執行計劃而發明的一種技術,它可以把它看作是分析手段的一種補充。當段物件沒有統計資訊時(即沒有做分析),動態取樣技術可以通過直接從需要分析的物件上收集一些資料塊(取樣)來獲得CBO 需要的統計資訊。
   下面我們來演示一下動態採用的效果:
   Last login: Fri Aug 31 21:34:50 2012 from 10.8.0.248
[root@localhost ~]# su - oracle
[oracle@localhost ~]$ sqlplus /nolog

SQL*Plus: Release 11.2.0.1.0 Production on 星期五 8月 31 22:09:48 2012

Copyright (c) 1982, 2009, Oracle.  All rights reserved.

SQL> conn change/change
Connected.
SQL> drop table t purge;

Table dropped.

SQL> create table t  as  select owner, object_type   from dba_objects;  -----------建立一個表

Table created.

SQL> select count(*) from t; ------------ 表的記錄數

  COUNT(*)
----------
     72340

SQL> set autotrace traceonly        
SQL> set linesize 1000
通過在hint 中加入dynamic_sampling 的方式來使用動態取樣,如果我們把它的級別設為0,那麼此時CBO
將不會做動態採用操作,此時CBO 唯一可以使用的資訊就是表儲存在資料字典中的一些信
息,比如有多少個extents,有多少個blocks 等等。
SQL> select /*+ dynamic_sampling(t 0) */ * from t;

72340 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 16990 |   464K|    58   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| T    | 16990 |   464K|    58   (0)| 00:00:01 |
--------------------------------------------------------------------------


Statistics
----------------------------------------------------------
         44  recursive calls
          0  db block gets
       5012  consistent gets
          0  physical reads
          0  redo size
    2306268  bytes sent via SQL*Net to client
      53566  bytes received via SQL*Net from client
       4824  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      72340  rows processed
T 表裡面實際上是有72340 條資料,而CBO 告訴我們只有16990 條和實際數量相差非常多。這種偏離的猜測,往往會導致CBO 做出錯
誤的判斷,如果一個表沒有分析,又沒有動態取樣,那麼CBO 做出錯誤的執行計劃的可能行非常高
SQL> select * from t;

72340 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 84402 |  2307K|    58   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| T    | 84402 |  2307K|    58   (0)| 00:00:01 |
--------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2) -------------採用了level=2的動態採用


Statistics
----------------------------------------------------------
          4  recursive calls
          0  db block gets
       5083  consistent gets
          0  physical reads
          0  redo size
    2306268  bytes sent via SQL*Net to client
      53566  bytes received via SQL*Net from client
       4824  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      72340  rows processed
CBO估計出的記錄為84402 和實際的記錄數有點相近。因為CBO 只是收集了有限的一些資料塊作分析,來對
整個表的資料做出估算,但它比沒有動態取樣已經好了很多,能夠幫助CBO 做出更為正確
的執行計劃。
下面我們將表中的記錄全部刪除,看看沒有動態採用和採用動態採用的差別。
SQL> delete from t;

72340 rows deleted.


Execution Plan
----------------------------------------------------------
Plan hash value: 3335594643

-------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------
|   0 | DELETE STATEMENT   |      |     1 |    58   (0)| 00:00:01 |
|   1 |  DELETE            | T    |       |            |          |
|   2 |   TABLE ACCESS FULL| T    |     1 |    58   (0)| 00:00:01 |
-------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
         87  recursive calls
      75432  db block gets
        324  consistent gets
          0  physical reads
   18876988  redo size
        848  bytes sent via SQL*Net to client
        770  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
      72340  rows processed

SQL> commit;

Commit complete.

SQL> select /*+ dynamic_sampling(t 0) */ * from t;

no rows selected


Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 16990 |   464K|    58   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| T    | 16990 |   464K|    58   (0)| 00:00:01 |
--------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        200  consistent gets
          0  physical reads
          0  redo size
        414  bytes sent via SQL*Net to client
        513  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed
CBO 估算出結果集和沒有刪除資料時是一樣的,當一個表的資料被delete 掉以後,這個表所分配的extents
和blocks 是不會被回收的(高水位值不變),所以在CBO 看起來,它依然是有那麼多的數
據在那裡16990 條。採用level=2的動態採用,CBO認為只有一條記錄,和實際非常相近。
SQL> select * from t;

no rows selected


Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    28 |    58   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| T    |     1 |    28 |    58   (0)| 00:00:01 |
--------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        200  consistent gets
          0  physical reads
          0  redo size
        414  bytes sent via SQL*Net to client
        513  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed


SQL> drop table t purge;

Table dropped.

動態採用除了可以在段物件沒有分析時,給CBO 提供分析資料之外,還有一個獨特的
能力,它可以對不同列之間的相關性做統計;
相對的,表分析的資訊通常是獨立的,比如:
1 表的行數,平均行長的。
2 表的每個列的最大值,最小值,重複率,也可能包含直方圖。
3 索引的聚合因子,索引葉的塊數目,索引的高度等。
這些資訊相互之間都是獨立的,當查詢涉及到列之間的相關性時,這些資訊就顯得不夠了。


下面建立一張表T,它包含2 個列:flag1 和flag2,這兩列總組存放了兩個值,‘Y’和
‘N’,但是對於每一行記錄,flag1 和flag2 的值永遠保持一個是‘Y’而另一個是‘N’或
者相反,絕不會相同。
兩個值相同的情況沒有,下面我們看一下這個例子:

[oracle@localhost ~]$ sqlplus /nolog

SQL*Plus: Release 11.2.0.1.0 Production on 星期五 8月 31 22:33:50 2012

Copyright (c) 1982, 2009, Oracle.  All rights reserved.

SQL> conn change/change
Connected.
SQL> drop table t purge;

Table dropped.

SQL> create table t as select decode( mod(rownum,2), 0, 'N', 'Y' ) flag1, decode( mod(rownum,2), 0, 'Y', 'N' ) flag2, a.* from dba_objects a;

Table created.

SQL> select count(*) from t; ----------表中有記錄72340條。

  COUNT(*)
----------
     72340

SQL> create index t_idx on t(flag1,flag2);----------建立聯合索引

Index created.

SQL> exec dbms_stats.gather_table_stats( user, 'T',method_opt=>'for all indexed columns size 254' );---------收集統計資訊

PL/SQL procedure successfully completed.
這是表裡的資料總量,總量的1/2,和總量的1/4,我們獲得這個值是要和CBO 估算出的
結果集的數量做對比,以判斷CBO 估算的正確性:

SQL> select num_rows, num_rows/2,num_rows/2/2 from user_tables where table_name = 'T';

  NUM_ROWS NUM_ROWS/2 NUM_ROWS/2/2
---------- ---------- ------------
     72340      36170        18085

SQL> set autotrace traceonly
SQL> select * from t where flag1='N';

36170 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 36229 |  3573K|   300   (1)| 00:00:04 |
|*  1 |  TABLE ACCESS FULL| T    | 36229 |  3573K|   300   (1)| 00:00:04 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("FLAG1"='N')


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       3455  consistent gets
          0  physical reads
          0  redo size
    1887389  bytes sent via SQL*Net to client
      27045  bytes received via SQL*Net from client
       2413  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      36170  rows processed
 此時CBO獲取t表中,flag1為N的情況為36229條記錄,和總表的1/2非常相近。
SQL> select * from t where flag2='N';

36170 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 36111 |  3561K|   300   (1)| 00:00:04 |
|*  1 |  TABLE ACCESS FULL| T    | 36111 |  3561K|   300   (1)| 00:00:04 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("FLAG2"='N')


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       3452  consistent gets
          0  physical reads
          0  redo size
    1885652  bytes sent via SQL*Net to client
      27045  bytes received via SQL*Net from client
       2413  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      36170  rows processed
同樣,對於flag=2 的記錄數是總數的一半,此時CBO 表現完美,得出的執行計劃也是最優
的。
看下面這個sql:
實際上,flag1 和flag2 是不可能一樣的,也就是表中不可能存在一條記
錄,它的值同時為‘N’或者同時為’Y’,這意味著,這條SQL 的返回值為0 條。

SQL> select * from t where flag1 = 'N' and flag2 = 'N';

no rows selected


Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 18085 |  1783K|   300   (1)| 00:00:04 |
|*  1 |  TABLE ACCESS FULL| T    | 18085 |  1783K|   300   (1)| 00:00:04 |-----------CBO估算為18085條記錄,其實演算法就是NUM_ROWS/2/2,這和實際偏差比較大。
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("FLAG2"='N' AND "FLAG1"='N')


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       1077  consistent gets
          0  physical reads
          0  redo size
       1485  bytes sent via SQL*Net to client
        513  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed


SQL> select /*+ dynamic_sampling(t 3) */ * from t where flag1 = 'N' and flag2 = 'N';  -------下面採用level=3的動態取樣

no rows selected


Execution Plan
----------------------------------------------------------
Plan hash value: 470836197

--------------------------------------------------------------------------------
-----

| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time
    |

--------------------------------------------------------------------------------
-----

|   0 | SELECT STATEMENT            |       |     6 |   606 |     2   (0)| 00:00
:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID| T     |     6 |   606 |     2   (0)| 00:00
:01 |

|*  2 |   INDEX RANGE SCAN          | T_IDX |     6 |       |     1   (0)| 00:00
:01 |

--------------------------------------------------------------------------------
-----


Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("FLAG1"='N' AND "FLAG2"='N')

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
          5  recursive calls
          0  db block gets
        140  consistent gets
          0  physical reads
          0  redo size
       1485  bytes sent via SQL*Net to client
        513  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed
上面的執行計劃中,CBO估算記錄為6條,和實際非常接近。由於結果集很小,選擇了索引。

Oracle 為動態取樣化分了11 個級別,取樣的資料塊越多,得到的分析資料就越接近於真實,但同時伴隨著資源消耗的也越大:
SQL> exec dbms_stats.delete_table_stats(user,'t');

PL/SQL procedure successfully completed.

SQL> select /*+dynamic_sampling(t 1) */ * from t;

72340 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 81606 |    16M|   300   (1)| 00:00:04 |
|   1 |  TABLE ACCESS FULL| T    | 81606 |    16M|   300   (1)| 00:00:04 |
--------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
         48  recursive calls
          0  db block gets
       5871  consistent gets
          0  physical reads
          0  redo size
    8592752  bytes sent via SQL*Net to client
      53566  bytes received via SQL*Net from client
       4824  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      72340  rows processed

SQL> select /*+dynamic_sampling(t 5) */ * from t;

72340 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  4951 |  1020K|   299   (0)| 00:00:04 |
|   1 |  TABLE ACCESS FULL| T    |  4951 |  1020K|   299   (0)| 00:00:04 |
--------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
          4  recursive calls
          0  db block gets
       5867  consistent gets
          0  physical reads
          0  redo size
    8592752  bytes sent via SQL*Net to client
      53566  bytes received via SQL*Net from client
       4824  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      72340  rows processed

SQL> select /*+dynamic_sampling(t 10) */ * from t; --------可見level 10的資訊是最準確的。

72340 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 72340 |    14M|   300   (1)| 00:00:04 |
|   1 |  TABLE ACCESS FULL| T    | 72340 |    14M|   300   (1)| 00:00:04 |
--------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
          5  recursive calls
          0  db block gets
       6905  consistent gets
          0  physical reads
          0  redo size
    8592752  bytes sent via SQL*Net to client
      53566  bytes received via SQL*Net from client
       4824  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      72340  rows processed

SQL> select /*+dynamic_sampling(t 8) */ * from t;

72340 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 72340 |    14M|   300   (1)| 00:00:04 |
|   1 |  TABLE ACCESS FULL| T    | 72340 |    14M|   300   (1)| 00:00:04 |
--------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
          4  recursive calls
          0  db block gets
       6905  consistent gets
          0  physical reads
          0  redo size
    8592752  bytes sent via SQL*Net to client
      53566  bytes received via SQL*Net from client
       4824  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      72340  rows processed

SQL> select /*+dynamic_sampling(t 6) */ * from t;

72340 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  2473 |   509K|   299   (0)| 00:00:04 |
|   1 |  TABLE ACCESS FULL| T    |  2473 |   509K|   299   (0)| 00:00:04 |
--------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
          4  recursive calls
          0  db block gets
       5867  consistent gets
          0  physical reads
          0  redo size
    8592752  bytes sent via SQL*Net to client
      53566  bytes received via SQL*Net from client
       4824  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      72340  rows processed
  動態取樣也需要額外的消耗資料庫資源,所以,在一個這樣的環境中,SQL 被反覆的
執行,變數被繫結,硬分析很少----是不宜使用動態取樣的,就像OLTP 系統。動態取樣發
生在硬分析時候,如果很少有硬分析發上,動態取樣的意義就不大。
而OLAP 或者資料倉儲環境下,SQL 的執行消耗的資源要遠遠大於SQL 解析,那麼讓
解析再消耗多一點資源做一些動態取樣分析,從而做出一個最優的執行計劃是非常值得的,
實際上在這樣的環境下,硬分析消耗的資源幾乎是可以忽略的。一般來說 在OLAP 或者資料倉儲環境中,將動態取樣的level 設定為3 或者4 是
比較好;相反,在一個OLTP 系統下,不應該使用動態取樣。

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/10701850/viewspace-1246585/,如需轉載,請註明出處,否則將追究法律責任。

相關文章