直方圖

531968912發表於2017-12-20

直方圖概念:
 直方圖,當某個列資料分佈不均衡時,為了讓cbo生成的執行計劃更準確,可以收集直方圖。我們可能需要對錶收集直方圖。直方圖最大的桶數(Bucket)是254,預設桶數是75,桶數可以取值範圍是1-254,。收集時直方圖的時候很耗費cpu,oracle在收集直方圖的時候會對錶進行全表掃描,對於所有的列值放入不同的桶中。一般情況下,要理性的收集,如果收集不當可能會造成繫結變數窺探。
直方圖用在什麼情況下?
  列的值分佈非常不均衡的時候,並且where條件中經常用到這個列。

繫結變數窺探概念:
 使用繫結變數,可以使sql共享,從來不產生那麼多的硬解析,可以避免4031錯誤。但有時候繫結變數也會產生一些不好的事情,由於繫結變數而導致oracle沒有正常的走應該走的執行計劃。繫結變數窺探就是oracle在第一次硬解析一個帶有繫結變數的sql時,會窺探(檢視)變數的賦值,將真實值帶入並生成一個執行計劃,而以後同型別的sql都使用改執行計劃,如果此時的真實值恰好是選擇性小的值,生成了走索引的執行計劃,那萬一下次的真實值的選擇性很大,其實走全表掃描更好,但由於之前的繫結變數窺探導致走了索引,那效能會查很多。
直方圖實驗:
實驗:
SQL> create table test as select * from dba_objects;

Table created.
SQL> BEGIN
      DBMS_STATS.GATHER_TABLE_STATS(ownname          => 'SYS',
                                    tabname          => 'TEST',
                                    estimate_percent => 100,---取樣比率
                                    method_opt       => 'for all columns size skewonly',
                                    degree           => 1,---指定並行度
                                    cascade          => TRUE);--指定也收集相關表的索引的統計資訊
    END;
   /
PL/SQL procedure successfully completed.

 --這裡method_opt(與列的統計相關)中size後面的值(size後面就是和直方圖有關)有幾種:
auto:Oracle determines the columns to collect histograms based on data distribution and the workload of the columns.它基於資料的分佈以及應用程式訪問列的方式來建立直方圖。
repeat:只會為現有的直方圖重新分析索引,不再生成新的直方圖。
1-254桶數:size後面如果跟著10,那就代表用10個桶
skewonly:Oracle determines the columns to collect histograms based on the data distribution of the columns。只會根據column的資料分佈情況決定是否收集histogram。
這裡auto和skewonly類似,但區別是skewonly一定會收集這個列的直方圖,但auto是會根據列的資料和workload。

SQL> select a.column_name,
           b.num_rows,
           a.num_distinct Cardinality,
            round(a.num_distinct / b.num_rows * 100, 2) selectivity,
          a.histogram,
          a.num_buckets
      from dba_tab_col_statistics a, dba_tables b
     where a.owner = b.owner
       and a.table_name = b.table_name
      and a.table_name = 'TEST';
COLUMN_NAME                      NUM_ROWS CARDINALITY SELECTIVITY HISTOGRAM       NUM_BUCKETS
------------------------------ ---------- ----------- ----------- --------------- -----------
OWNER                               13202           8         .06 FREQUENCY                 8
OBJECT_NAME                         13202       10219        77.4 HEIGHT BALANCED         254
SUBOBJECT_NAME                      13202          83         .63 FREQUENCY                83
OBJECT_ID                           13202       13202         100 NONE                      1
DATA_OBJECT_ID                      13202        2689       20.37 HEIGHT BALANCED         254
OBJECT_TYPE                         13202          37         .28 FREQUENCY                37
CREATED                             13202         389        2.95 HEIGHT BALANCED         254
LAST_DDL_TIME                       13202         407        3.08 HEIGHT BALANCED         254
TIMESTAMP                           13202         402        3.04 HEIGHT BALANCED         254
STATUS                              13202           1         .01 FREQUENCY                 1
TEMPORARY                           13202           2         .02 FREQUENCY                 2
GENERATED                           13202           2         .02 FREQUENCY                 2
SECONDARY                           13202           1         .01 FREQUENCY                 1
NAMESPACE                           13202          15         .11 FREQUENCY                15
EDITION_NAME                        13202           0           0 NONE                      0

15 rows selected.
解釋一下:
CARDINALITY是基數,
SELECTIVITY是選擇性,
HISTOGRAM是直方圖,
頻率直方圖(FREQUENCY HISTOGRAM):num_buckets<254時,oracle會收集頻率直方圖。也可以說頻率直方圖的話,num_buckets就等於distinct的數量,(如果size 後面跟著的值不是桶數的話,比如是auto的話,那麼如果一個列的基數小於254,那麼它的桶數就等於基數。

高度平衡直方圖(HEIGHT BALANCED):num_buckets=254時,oracle會收集高度平衡直方圖。

對於object_name,不應該收集直方圖,可能是收集統計資訊的時候,不小心收集的直方圖(method_opt       => 'for all columns size skewonly',),導致對錶收集統計資訊的時候對所有的列都收集了直方圖。也可以說如果當一個列的值得選擇性很高的話,或者說這個列的distinct值很大的話大到已經要接近與主鍵的話,那就沒必要收集直方圖了。


在owner有直方圖的情況下,測試執行計劃:

SQL> create index ind_mao on test(owner);

Index created.
SQL> set autotrace traceonly
SQL> select * from test where owner='SYS';

9280 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1357081020

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  9280 |   806K|    48   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST |  9280 |   806K|    48   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("OWNER"='SYS')


Statistics
----------------------------------------------------------
        112  recursive calls
          0  db block gets
       1016  consistent gets
          0  physical reads
          0  redo size
     404254  bytes sent via SQL*Net to client
       7214  bytes received via SQL*Net from client
        620  SQL*Net roundtrips to/from client
         76  sorts (memory)
          0  sorts (disk)
       9280  rows processed
SQL> select * from test where owner='SH';
Execution Plan
----------------------------------------------------------
Plan hash value: 3105505699

---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |     2 |   178 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TEST    |     2 |   178 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_MAO |     2 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OWNER"='SH')


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          5  consistent gets
          0  physical reads
          0  redo size
       1519  bytes sent via SQL*Net to client
        416  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          2  rows processed


刪除直方圖資訊:
SQL> BEGIN
      DBMS_STATS.GATHER_TABLE_STATS(ownname          => 'SYS',
                                    tabname          => 'TEST',
                                    estimate_percent => 100,
                                    method_opt       => 'for all columns size 1',
                                    degree           => 1,
                                    cascade          => TRUE);
    END;
   /
SQL> select a.column_name,
  2         b.num_rows,
  3         a.num_distinct Cardinality,
  4          round(a.num_distinct / b.num_rows * 100, 2) selectivity,
  5         a.histogram,
  6         a.num_buckets
  7    from dba_tab_col_statistics a, dba_tables b
  8   where a.owner = b.owner
  9     and a.table_name = b.table_name
 10     and a.table_name = 'TEST';
COLUMN_NAME                      NUM_ROWS CARDINALITY SELECTIVITY HISTOGRAM       NUM_BUCKETS
------------------------------ ---------- ----------- ----------- --------------- -----------
OWNER                               13202           8         .06 NONE                      1
OBJECT_NAME                         13202       10219        77.4 NONE                      1
SUBOBJECT_NAME                      13202          83         .63 NONE                      1
OBJECT_ID                           13202       13202         100 NONE                      1
DATA_OBJECT_ID                      13202        2689       20.37 NONE                      1
OBJECT_TYPE                         13202          37         .28 NONE                      1
CREATED                             13202         389        2.95 NONE                      1
LAST_DDL_TIME                       13202         407        3.08 NONE                      1
TIMESTAMP                           13202         402        3.04 NONE                      1
STATUS                              13202           1         .01 NONE                      1
TEMPORARY                           13202           2         .02 NONE                      1
GENERATED                           13202           2         .02 NONE                      1
SECONDARY                           13202           1         .01 NONE                      1
NAMESPACE                           13202          15         .11 NONE                      1
EDITION_NAME                        13202           0           0 NONE                      0

15 rows selected.

SQL>  select * from test where owner='SH';


Execution Plan
----------------------------------------------------------
Plan hash value: 3105505699

---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |  1650 |   143K|    46   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TEST    |  1650 |   143K|    46   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_MAO |  1650 |       |     4   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OWNER"='SH')


Statistics
----------------------------------------------------------
         37  recursive calls
          0  db block gets
         64  consistent gets
          0  physical reads
          0  redo size
       1519  bytes sent via SQL*Net to client
        416  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          4  sorts (memory)
          0  sorts (disk)
          2  rows processed
這裡雖然只返回2行結果,但oracle認為返回了1650行結果。這裡的1650其實是13202/8。
我們沒有對owner列收集直方圖,那麼oracle就會認為owner這個列是均衡的,就會認為行數(13202)/基數(8個值) 就是oracle認為的行數。得到了1650行,oracle覺得1650行對於1w多行中是應該走索引的。所以就走了索引。
select * from test where owner='SYS';
SQL> select * from test where owner='SYS';

9280 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3105505699

---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |  1650 |   143K|    46   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TEST    |  1650 |   143K|    46   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_MAO |  1650 |       |     4   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OWNER"='SYS')


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       1421  consistent gets
          0  physical reads
          0  redo size
     938251  bytes sent via SQL*Net to client
       7214  bytes received via SQL*Net from client
        620  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       9280  rows processed
如果是sys的話,同樣計算,oracle也是因為行數為1650,也還是會走索引。
select /*+ full(test) */ * from test where owner='SYS';
SQL> select /*+ full(test) */ * from test where owner='SYS';

9280 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1357081020

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  1650 |   143K|    48   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST |  1650 |   143K|    48   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("OWNER"='SYS')


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        789  consistent gets
          0  physical reads
          0  redo size
     404254  bytes sent via SQL*Net to client
       7214  bytes received via SQL*Net from client
        620  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       9280  rows processed
邏輯讀比走索引要低。

為啥全表掃描要比索引的邏輯讀要高呢?

SQL> select blocks from dba_segments where segment_name='TEST'
  2  ;

    BLOCKS
----------
       256

SQL> show parameter db_f

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_file_multiblock_read_count        integer     120

多塊讀的個數是120。那消耗io的次數就是789/16次。所以49次io就可以全表掃描完畢了。

索引掃描的io次數:
索引掃描是單詞io單塊讀(除了INDEX FAST FULL SCAN)。我們這裡的索引是INDEX RANGE SCAN ,是單塊讀,假如索引的高度是2,那如果掃描索引塊的話,至少要掃描2個塊,那就需要2次io,剛才索引返回9280行資料,如果一個索引塊能存100條記錄,那需要90個索引塊才可以得到這些資料。所以需要進行90次的io,然後得到rowid後需要回表再得到資料。得到9k多個rowid後,回到表裡查這些查rowid所對應的表中資料,至少還需要至少1百個左右的io。

如果有一個1000w行的表,那麼如果查詢返回結果是500w,那索引的高度為3的話,那一個索引塊存100行記錄,那就需要掃描5w次,再回表的話,也需要幾萬個io。那差不多就是10w次io。但如是全表掃描的話,那就只有1w個io。索引是單塊讀,所以涉及的等待事件也是db file sequential read 。如果看到這個等待事件,可以判斷出可能是需要全表的但走了索引。

一次io單塊讀和一次io多快讀的時間差別並不太大,對於現在的儲存,多快讀其實大多是在多個儲存裡讀的,因為儲存有條帶化,多塊讀的話是在多個磁碟讀,單塊讀的話是在一個儲存裡讀,所以差別並不大。所以速度基本一樣的。其實也都是毫秒裡計算的,只有io掃描了上千萬次,那可能給我們的感覺才是慢,如果只是io幾萬次,那我們可能是感覺不出來慢的。

如果我們沒計算直方圖,那oracle就簡單的相除,如果我們計算了直方圖,那oracle就知道了,不能單純的相除了,要根據桶裡的值來計算。



來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/25462274/viewspace-2148952/,如需轉載,請註明出處,否則將追究法律責任。

相關文章