直方圖概念：
直方圖，當某個列資料分佈不均衡時，為了讓cbo生成的執行計劃更準確，可以收集直方圖。我們可能需要對錶收集直方圖。直方圖最大的桶數(Bucket)是254，預設桶數是75，桶數可以取值範圍是1-254,。收集時直方圖的時候很耗費cpu，oracle在收集直方圖的時候會對錶進行全表掃描，對於所有的列值放入不同的桶中。一般情況下，要理性的收集，如果收集不當可能會造成繫結變數窺探。
直方圖用在什麼情況下？
列的值分佈非常不均衡的時候，並且where條件中經常用到這個列。

繫結變數窺探概念：
使用繫結變數，可以使sql共享，從來不產生那麼多的硬解析，可以避免4031錯誤。但有時候繫結變數也會產生一些不好的事情，由於繫結變數而導致oracle沒有正常的走應該走的執行計劃。繫結變數窺探就是oracle在第一次硬解析一個帶有繫結變數的sql時，會窺探（檢視）變數的賦值，將真實值帶入並生成一個執行計劃，而以後同型別的sql都使用改執行計劃，如果此時的真實值恰好是選擇性小的值，生成了走索引的執行計劃，那萬一下次的真實值的選擇性很大，其實走全表掃描更好，但由於之前的繫結變數窺探導致走了索引，那效能會查很多。
直方圖實驗：
實驗：
SQL> create table test as select * from dba_objects;

Table created.
SQL> BEGIN
      DBMS_STATS.GATHER_TABLE_STATS(ownname          => 'SYS',
                                    tabname          => 'TEST',
                                    estimate_percent => 100,---取樣比率
                                    method_opt       => 'for all columns size skewonly',
                                    degree           => 1,---指定並行度
                                    cascade          => TRUE);--指定也收集相關表的索引的統計資訊
    END;
   /
PL/SQL procedure successfully completed.

--這裡method_opt（與列的統計相關）中size後面的值（size後面就是和直方圖有關）有幾種：
auto：Oracle determines the columns to collect histograms based on data distribution and the workload of the columns.它基於資料的分佈以及應用程式訪問列的方式來建立直方圖。
repeat：只會為現有的直方圖重新分析索引，不再生成新的直方圖。
1-254桶數：size後面如果跟著10，那就代表用10個桶
skewonly：Oracle determines the columns to collect histograms based on the data distribution of the columns。只會根據column的資料分佈情況決定是否收集histogram。
這裡auto和skewonly類似，但區別是skewonly一定會收集這個列的直方圖，但auto是會根據列的資料和workload。

SQL> select a.column_name,
           b.num_rows,
           a.num_distinct Cardinality,
            round(a.num_distinct / b.num_rows * 100, 2) selectivity,
          a.histogram,
          a.num_buckets
      from dba_tab_col_statistics a, dba_tables b
     where a.owner = b.owner
       and a.table_name = b.table_name
      and a.table_name = 'TEST';
COLUMN_NAME                      NUM_ROWS CARDINALITY SELECTIVITY HISTOGRAM       NUM_BUCKETS
------------------------------ ---------- ----------- ----------- --------------- -----------
OWNER                               13202           8         .06 FREQUENCY                 8
OBJECT_NAME                         13202       10219        77.4 HEIGHT BALANCED         254
SUBOBJECT_NAME                      13202          83         .63 FREQUENCY                83
OBJECT_ID                           13202       13202         100 NONE                      1
DATA_OBJECT_ID                      13202        2689       20.37 HEIGHT BALANCED         254
OBJECT_TYPE                         13202          37         .28 FREQUENCY                37
CREATED                             13202         389        2.95 HEIGHT BALANCED         254
LAST_DDL_TIME                       13202         407        3.08 HEIGHT BALANCED         254
TIMESTAMP                           13202         402        3.04 HEIGHT BALANCED         254
STATUS                              13202           1         .01 FREQUENCY                 1
TEMPORARY                           13202           2         .02 FREQUENCY                 2
GENERATED                           13202           2         .02 FREQUENCY                 2
SECONDARY                           13202           1         .01 FREQUENCY                 1
NAMESPACE                           13202          15         .11 FREQUENCY                15
EDITION_NAME                        13202           0           0 NONE                      0

15 rows selected.
解釋一下:
CARDINALITY是基數，
SELECTIVITY是選擇性，
HISTOGRAM是直方圖，
頻率直方圖(FREQUENCY HISTOGRAM)：num_buckets<254時，oracle會收集頻率直方圖。也可以說頻率直方圖的話，num_buckets就等於distinct的數量，（如果size 後面跟著的值不是桶數的話，比如是auto的話，那麼如果一個列的基數小於254，那麼它的桶數就等於基數。
）
高度平衡直方圖(HEIGHT BALANCED)：num_buckets=254時，oracle會收集高度平衡直方圖。

對於object_name，不應該收集直方圖，可能是收集統計資訊的時候，不小心收集的直方圖（method_opt => 'for all columns size skewonly',），導致對錶收集統計資訊的時候對所有的列都收集了直方圖。也可以說如果當一個列的值得選擇性很高的話，或者說這個列的distinct值很大的話大到已經要接近與主鍵的話，那就沒必要收集直方圖了。

在owner有直方圖的情況下，測試執行計劃：

SQL> create index ind_mao on test(owner);

Index created.
SQL> set autotrace traceonly
SQL> select * from test where owner='SYS';

9280 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1357081020

--------------------------------------------------------------------------
| Id | Operation         | Name | Rows | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT |      | 9280 |   806K|    48   (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST | 9280 |   806K|    48   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("OWNER"='SYS')

Statistics
----------------------------------------------------------
        112 recursive calls
          0 db block gets
       1016 consistent gets
          0 physical reads
          0 redo size
     404254 bytes sent via SQL*Net to client
       7214 bytes received via SQL*Net from client
        620 SQL*Net roundtrips to/from client
         76 sorts (memory)
          0 sorts (disk)
       9280 rows processed
SQL> select * from test where owner='SH';
Execution Plan
----------------------------------------------------------
Plan hash value: 3105505699

---------------------------------------------------------------------------------------
| Id | Operation                   | Name    | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |     2 |   178 |     2   (0)| 00:00:01 |
|   1 | TABLE ACCESS BY INDEX ROWID| TEST    |     2 |   178 |     2   (0)| 00:00:01 |
|* 2 |   INDEX RANGE SCAN          | IND_MAO |     2 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("OWNER"='SH')

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
          5 consistent gets
          0 physical reads
          0 redo size
       1519 bytes sent via SQL*Net to client
        416 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          2 rows processed

刪除直方圖資訊：
SQL> BEGIN
      DBMS_STATS.GATHER_TABLE_STATS(ownname          => 'SYS',
                                    tabname          => 'TEST',
                                    estimate_percent => 100,
                                    method_opt       => 'for all columns size 1',
                                    degree           => 1,
                                    cascade          => TRUE);
    END;
   /
SQL> select a.column_name,
2         b.num_rows,
3         a.num_distinct Cardinality,
4          round(a.num_distinct / b.num_rows * 100, 2) selectivity,
5         a.histogram,
6         a.num_buckets
7    from dba_tab_col_statistics a, dba_tables b
8   where a.owner = b.owner
9     and a.table_name = b.table_name
10     and a.table_name = 'TEST';
COLUMN_NAME                      NUM_ROWS CARDINALITY SELECTIVITY HISTOGRAM       NUM_BUCKETS
------------------------------ ---------- ----------- ----------- --------------- -----------
OWNER                               13202           8         .06 NONE                      1
OBJECT_NAME                         13202       10219        77.4 NONE                      1
SUBOBJECT_NAME                      13202          83         .63 NONE                      1
OBJECT_ID                           13202       13202         100 NONE                      1
DATA_OBJECT_ID                      13202        2689       20.37 NONE                      1
OBJECT_TYPE                         13202          37         .28 NONE                      1
CREATED                             13202         389        2.95 NONE                      1
LAST_DDL_TIME                       13202         407        3.08 NONE                      1
TIMESTAMP                           13202         402        3.04 NONE                      1
STATUS                              13202           1         .01 NONE                      1
TEMPORARY                           13202           2         .02 NONE                      1
GENERATED                           13202           2         .02 NONE                      1
SECONDARY                           13202           1         .01 NONE                      1
NAMESPACE                           13202          15         .11 NONE                      1
EDITION_NAME                        13202           0           0 NONE                      0

15 rows selected.

SQL> select * from test where owner='SH';

Execution Plan
----------------------------------------------------------
Plan hash value: 3105505699

---------------------------------------------------------------------------------------
| Id | Operation                   | Name    | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         | 1650 |   143K|    46   (0)| 00:00:01 |
|   1 | TABLE ACCESS BY INDEX ROWID| TEST    | 1650 |   143K|    46   (0)| 00:00:01 |
|* 2 |   INDEX RANGE SCAN          | IND_MAO | 1650 |       |     4   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("OWNER"='SH')

Statistics
----------------------------------------------------------
         37 recursive calls
          0 db block gets
         64 consistent gets
          0 physical reads
          0 redo size
       1519 bytes sent via SQL*Net to client
        416 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          4 sorts (memory)
          0 sorts (disk)
          2 rows processed
這裡雖然只返回2行結果，但oracle認為返回了1650行結果。這裡的1650其實是13202/8。
我們沒有對owner列收集直方圖，那麼oracle就會認為owner這個列是均衡的，就會認為行數（13202）/基數（8個值）就是oracle認為的行數。得到了1650行，oracle覺得1650行對於1w多行中是應該走索引的。所以就走了索引。
select * from test where owner='SYS';
SQL> select * from test where owner='SYS';

9280 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3105505699

---------------------------------------------------------------------------------------
| Id | Operation                   | Name    | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         | 1650 |   143K|    46   (0)| 00:00:01 |
|   1 | TABLE ACCESS BY INDEX ROWID| TEST    | 1650 |   143K|    46   (0)| 00:00:01 |
|* 2 |   INDEX RANGE SCAN          | IND_MAO | 1650 |       |     4   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("OWNER"='SYS')

Statistics
----------------------------------------------------------
          1 recursive calls
          0 db block gets
       1421 consistent gets
          0 physical reads
          0 redo size
     938251 bytes sent via SQL*Net to client
       7214 bytes received via SQL*Net from client
        620 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
       9280 rows processed
如果是sys的話，同樣計算，oracle也是因為行數為1650，也還是會走索引。
select /*+ full(test) */ * from test where owner='SYS';
SQL> select /*+ full(test) */ * from test where owner='SYS';

9280 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1357081020

--------------------------------------------------------------------------
| Id | Operation         | Name | Rows | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT |      | 1650 |   143K|    48   (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST | 1650 |   143K|    48   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("OWNER"='SYS')

Statistics
----------------------------------------------------------
          1 recursive calls
          0 db block gets
        789 consistent gets
          0 physical reads
          0 redo size
     404254 bytes sent via SQL*Net to client
       7214 bytes received via SQL*Net from client
        620 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
       9280 rows processed
邏輯讀比走索引要低。

為啥全表掃描要比索引的邏輯讀要高呢？

SQL> select blocks from dba_segments where segment_name='TEST'
2 ;

BLOCKS
----------
256

SQL> show parameter db_f

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
db_file_multiblock_read_count integer 120

多塊讀的個數是120。那消耗io的次數就是789/16次。所以49次io就可以全表掃描完畢了。

索引掃描的io次數：
索引掃描是單詞io單塊讀（除了INDEX FAST FULL SCAN）。我們這裡的索引是INDEX RANGE SCAN ，是單塊讀，假如索引的高度是2，那如果掃描索引塊的話，至少要掃描2個塊，那就需要2次io，剛才索引返回9280行資料，如果一個索引塊能存100條記錄，那需要90個索引塊才可以得到這些資料。所以需要進行90次的io，然後得到rowid後需要回表再得到資料。得到9k多個rowid後，回到表裡查這些查rowid所對應的表中資料，至少還需要至少1百個左右的io。

如果有一個1000w行的表，那麼如果查詢返回結果是500w，那索引的高度為3的話，那一個索引塊存100行記錄，那就需要掃描5w次，再回表的話，也需要幾萬個io。那差不多就是10w次io。但如是全表掃描的話，那就只有1w個io。索引是單塊讀，所以涉及的等待事件也是db file sequential read 。如果看到這個等待事件，可以判斷出可能是需要全表的但走了索引。

一次io單塊讀和一次io多快讀的時間差別並不太大，對於現在的儲存，多快讀其實大多是在多個儲存裡讀的，因為儲存有條帶化，多塊讀的話是在多個磁碟讀，單塊讀的話是在一個儲存裡讀，所以差別並不大。所以速度基本一樣的。其實也都是毫秒裡計算的，只有io掃描了上千萬次，那可能給我們的感覺才是慢，如果只是io幾萬次，那我們可能是感覺不出來慢的。

如果我們沒計算直方圖，那oracle就簡單的相除，如果我們計算了直方圖，那oracle就知道了，不能單純的相除了，要根據桶裡的值來計算。

直方圖

相關文章