oracle 11g 柱狀

zhengbao_jun發表於2014-04-11

oracle 柱狀圖(histogram)

oracle中的柱狀圖是用於記錄表中的資料分佈質量情況的描述,當每次使用analyze或者dbms_stat包分析資料表及列後,該表的分佈情況會唄儲存在統計表

(user_tab_columns/user_histograms)裡面,當多表連線時,CBO優化器會根據柱狀圖提供的資訊評估多表連線時將產生的成本(cost)或技術(cardinality),決定是否使用該列的索引,當然,導致CBO不能選擇最優執行計劃的因素有多種情況,而柱狀圖只是協助CBO優化器選擇最優的執行計劃,在一個資料分佈不均勻的表列上建立柱狀圖將有力地保證優化器做出正確合理的選擇。

其他因素後面在進行探討。 (直方圖的使用不受索引的限制,可以在表的任何列上構建直方圖)

1. 蒐集柱狀圖

SQL> conn scott/tiger
Connected to Oracle Database 10g Enterprise Edition Release 10.2.0.1.0
Connected as scott
 
SQL> select table_name from user_tables;
 
TABLE_NAME
------------------------------
DEPT
EMP
BONUS
SALGRADE
 
SQL> exec dbms_stats.gather_table_stats(ownname=>'scott',tabname => 'dept',estimate_percent => null,method_opt => 'for all indexed columns',cascade => true);
 
PL/SQL procedure successfully completed
 
SQL> select column_name,density,num_buckets,histogram from user_tab_col_statistics where table_name='DEPT';
 
COLUMN_NAME                       DENSITY NUM_BUCKETS HISTOGRAM
------------------------------ ---------- ----------- ---------------
DEPTNO                              0.125           4 HEIGHT BALANCED
DNAME                                0.25           1 NONE
LOC                                  0.25           1 NONE

SQL> select * from dept;
 
DEPTNO DNAME          LOC
------ -------------- -------------
    10 ACCOUNTING     NEW YORK
    20 RESEARCH       DALLAS
    30 SALES          CHICAGO
    40 OPERATIONS     BOSTON

SQL> select * from user_tab_histograms where table_name='DEPT';
 
TABLE_NAME                     COLUMN_NAME                 ENDPOINT_NUMBER     ENDPOINT_VALUE   ENDPOINT_ACTUAL_VALUE
------------------------------ --------------------------------------------  ---------------   -----------------------------
DEPT                           DEPTNO                             1             10
DEPT                           DEPTNO                             2             20
DEPT                           DEPTNO                             3             30
DEPT                           DEPTNO                             4             40
DEPT                           DNAME                              0    3.388635500875
DEPT                           LOC                                0   3.443005050520
DEPT                           DNAME                              1   4.322850386777
DEPT                           LOC                                1   4.064055440899
  
8 rows selected

柱狀圖的蒐集有三個引數,for columns SIZE | REPEAT | AUTO | SKEWONLY:統計指定列的histograms.N的取值範圍[1,254]; REPEAT上次統計過的histograms;AUTO由oracle決定

N的大小;SKEWONLY

在上面柱狀圖蒐集中,histogram欄位有三個值,NONE,FREQUENCY或者HEIGHT BALANCED
  a. NONE:就是沒有直方圖
  b. FREQUENCY: 當該列的distinct值數量<=bucket數量時,為此型別。在user_tab_histograms表中記錄有相關的值
  c. HEIGHT BALANCED:當該列的distinct值數量>bucket數量時,為此型別。
  d. density欄位值的含義 Density的含義是“密度”。DENSITY值是會影響CBO判斷執行計劃的


2. 並不是所有柱狀圖資訊都有存在的必要,產生直方圖的成本是很高的,頻繁分析一個表,若該表資料量非常大,做一次分析可能導致嚴重的效能問題,但是那些列的柱狀圖應該存在呢,建

議如下:

  a. 第一次收集統計資訊時,設定method_opt=>FOR ALL COLUMNS SIZE 1,這意味刪除所有列上的直方圖。
  b. 在測試階段或者在真實生產環境中,在調優SQL的過程中,DBA將會逐漸得知每個需要直方圖資訊的欄位,在這些欄位上人工收集統計資訊,method_opt=>FOR COLUMNS SIZE AUTO

[COLUMN_NAME]
  c. 在每次資料分佈有所變化的時候,更新統計資訊,使用method_opt=>FOR ALL COLUMNS SIZE REPEAT,這樣只會收集已經存在了直方圖資訊的欄位。
重複2,3步驟,直到系統穩定。

3. 柱狀圖是如何影響執行計劃的,下面通過示例來檢視

SQL> show user
User is "colin"
 
SQL> drop table tmp_liuhc_1;
 
Table dropped

SQL> create table tmp_liuhc_1 as select * from dba_objects;
 
Table created
 
SQL> desc tmp_liuhc_1;
Name           Type          Nullable Default Comments
-------------- ------------- -------- ------- --------
OWNER          VARCHAR2(30)  Y                        
OBJECT_NAME    VARCHAR2(128) Y                        
SUBOBJECT_NAME VARCHAR2(30)  Y                        
OBJECT_ID      NUMBER        Y                        
DATA_OBJECT_ID NUMBER        Y                        
OBJECT_TYPE    VARCHAR2(19)  Y                        
CREATED        DATE          Y                        
LAST_DDL_TIME  DATE          Y                        
TIMESTAMP      VARCHAR2(19)  Y                        
STATUS         VARCHAR2(7)   Y                        
TEMPORARY      VARCHAR2(1)   Y                        
GENERATED      VARCHAR2(1)   Y                        
SECONDARY      VARCHAR2(1)   Y                        
 
SQL> select owner,count(*) from tmp_liuhc_1 group by owner;
 
OWNER                            COUNT(*)
------------------------------ ----------
MDSYS                                 885
TSMSYS                                  3
DMSYS                                 189
LINK                                    3
PUBLIC                              19987
OUTLN                                   8
CTXSYS                                339
OLAPSYS                               720
HR                                     34
SYSTEM                                454
EXFSYS                                281
SCOTT                                   6
DBSNMP                                 46
ORDSYS                               1669
ORDPLUGINS                             10
SYSMAN                               1321
OE                                    127
PM                                     26
SH                                    306
XDB                                   682
 
OWNER                            COUNT(*)
------------------------------ ----------
IX                                     53
BI                                      8
SYS                                 22912
WMSYS                                 242
SI_INFORMTN_SCHEMA                      8
COLIN                                   6
 
26 rows selected

SQL>  create index idx_tmp_liuhc on tmp_liuhc_1(owner);
 
Index created
 
SQL> select sysdate from dual;
 
SYSDATE
-----------
2011-10-30

刪除柱狀圖資訊,bucket為1時,相當於一個普通分析,沒有柱狀圖資訊,執行計劃按絕大多數ID欄位內容來選擇走索引,刪除之後重新整理shared_pool
SQL> exec dbms_stats.gather_table_stats(ownname => 'colin',tabname => 'tmp_liuhc_1',estimate_percent => null ,method_opt => 'for all columns size 1' ,cascade => true);
 
PL/SQL procedure successfully completed
 
SQL> alter system flush shared_pool;
 
System altered
 
SQL>  select * from user_tab_col_statistics where column_name='OWNER';
 
TABLE_NAME                     COLUMN_NAME                    NUM_DISTINCT LOW_VALUE                                                        HIGH_VALUE                 

                                        DENSITY  NUM_NULLS NUM_BUCKETS LAST_ANALYZED SAMPLE_SIZE GLOBAL_STATS USER_STATS AVG_COL_LEN HISTOGRAM
------------------------------ ------------------------------ ------------ ---------------------------------------------------------------- --------------------------

-------------------------------------- ---------- ---------- ----------- ------------- ----------- ------------ ---------- ----------- ---------------
TMP_LIUHC_1                    OWNER                                    26 4249                                                             584442                     

                                     0.03846153          0           1 2011-10-30 9:       50325 YES          NO                   6 NONE
 
SQL> explain plan for select * from tmp_liuhc_1 where owner='COLIN';
 
Explained
 
SQL> select * From table(Dbms_Xplan.display);
 
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 3774022813
--------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |  1936 |   175K|    57   (0
|   1 |  TABLE ACCESS BY INDEX ROWID| TMP_LIUHC_1   |  1936 |   175K|    57   (0
|*  2 |   INDEX RANGE SCAN          | IDX_TMP_LIUHC |  1936 |       |     5   (0
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("OWNER"='COLIN')
 
14 rows selected
 
SQL> explain plan for select * from tmp_liuhc_1 where owner='SYS';
 
Explained
 
SQL> select * From table(Dbms_Xplan.display);
 
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 3774022813
--------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |  1936 |   175K|    57   (0
|   1 |  TABLE ACCESS BY INDEX ROWID| TMP_LIUHC_1   |  1936 |   175K|    57   (0
|*  2 |   INDEX RANGE SCAN          | IDX_TMP_LIUHC |  1936 |       |     5   (0
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("OWNER"='SYS')
 
14 rows selected
 
SQL> exec dbms_stats.gather_table_stats(ownname => 'colin',tabname => 'tmp_liuhc_1',estimate_percent => null ,method_opt => 'for all columns size auto' ,cascade =>

true);
 
PL/SQL procedure successfully completed

SQL> select * from user_tab_col_statistics where column_name='OWNER';
 
TABLE_NAME                     COLUMN_NAME                    NUM_DISTINCT LOW_VALUE                                                        HIGH_VALUE                 

                                        DENSITY  NUM_NULLS NUM_BUCKETS LAST_ANALYZED SAMPLE_SIZE GLOBAL_STATS USER_STATS AVG_COL_LEN HISTOGRAM
------------------------------ ------------------------------ ------------ ---------------------------------------------------------------- --------------------------

-------------------------------------- ---------- ---------- ----------- ------------- ----------- ------------ ---------- ----------- ---------------
TMP_LIUHC_1                    OWNER                                    26 4249                                                             584442                     

                                     9.93541977          0          26 2011-10-30 9:       50325 YES          NO                   6 FREQUENCY
 
SQL>
SQL> explain plan for select * from tmp_liuhc_1 where owner='COLIN';
 
Explained
 
SQL> select * From table(Dbms_Xplan.display);
 
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 3774022813
--------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |     6 |   558 |     2   (0
|   1 |  TABLE ACCESS BY INDEX ROWID| TMP_LIUHC_1   |     6 |   558 |     2   (0
|*  2 |   INDEX RANGE SCAN          | IDX_TMP_LIUHC |     6 |       |     1   (0
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("OWNER"='COLIN')
 
14 rows selected
 
SQL> explain plan for select * from tmp_liuhc_1 where owner='SYS';
 
Explained
 
SQL> select * From table(Dbms_Xplan.display);
 
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1961695573
--------------------------------------------------------------------------------
| Id  | Operation         | Name        | Rows  | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |             | 22912 |  2080K|   160   (2)| 00:00:02
|*  1 |  TABLE ACCESS FULL| TMP_LIUHC_1 | 22912 |  2080K|   160   (2)| 00:00:02
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("OWNER"='SYS')
 
13 rows selected

從以上可以看出,當刪除柱狀圖時,查詢SYS使用者時,CBO按大多數ID欄位的內容,選擇走索引;當蒐集柱狀圖後,CBO選擇了正確的執行計劃,走全表掃描,因為前面已經查詢了,SYS使用者下的表佔用了決大部分。

 

附錄:附帶兩張表的解釋資訊,此處的表DBA_TAB_COLUMNS和表user_tab_col_statistics是同樣效果

DBA_TAB_HISTOGRAMS
該檢視列出了所有表上列的柱狀圖資訊。
 
欄位名稱              描述
---------------------------------------------------------
OWNER                 表的所有者
TABLE_NAME            表名
COLUMN_NAME           列名(欄位名)
ENDPOINT_NUMBER       終點數值,也就是bucket的編號值
ENDPOINT_VALUE        根據該bucket的編號值,規格化的終點欄位值,
                      如果欄位型別是數字,則顯示該欄位值,
                      如果是其他型別的,則是被規格化的數值。
 
DBA_TAB_COLUMNS
該檢視包含了所有表上列的描述資訊。
(注意:雖然檢視和簇的資訊也包含在該檢視中,但沒有相關的柱狀圖資訊)
 
欄位名稱              描述
----------------------------------------------------------
OWNER                 表的所有者
TABLE_NAME            表名
COLUMN_NAME           列名(欄位名)
DATA_TYPE             欄位的資料型別
DATA_LENGTH           該列定義的長度
DATA_PRECISION        NUMBER或FLOAT數值型別的精度
DATA_SCALE            定義的小數位數
NULLABLE              是否允許NULL值
COLUMN_ID             該列在表中的順序的位置,表中第幾個欄位
DEFAULT_LENGTH        欄位預設值的長度
DATA_DEFAULT          欄位預設值
NUM_DISTINCT          該欄位不同值的數量
LOW_VALUE             該欄位的最小值,
                      該值是內部按16進位制儲存的該值的前32 bytes內容
HIGH_VALUE            該欄位的最大值,
                      該值是內部按16進位制儲存的該值的前32 bytes內容
DENSITY               該欄位的密度(不同值的比例值,
                      比如一個欄位只有2個不同值,那麼該欄位密度為0.5)
NUM_NULLS             該欄位NULL值的數量
NUM_BUCKETS           該欄位柱狀圖中bucket的數量
LAST_ANALYZED         最近一次分析表的時間
SAMPLE_SIZE           取樣資料的數量(根據分析時的百分比得出的數量,
                      例如100%分析,該取樣值就是行的數量)

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/235507/viewspace-1140039/,如需轉載,請註明出處,否則將追究法律責任。

相關文章