柱狀圖（Histograms）：

Height-Balanced Histograms：列中不同值的個數小於直方圖的桶數，每個桶的存放值的數量不同。
Frequency Histograms：列中不同值的個數大於直方圖的桶數，每個桶存放值的數量相同，最後一個桶除外。

The database automatically creates frequency histograms instead of height-balanced histograms under the following conditions:

The number of distinct values is less than or equal to the number of histogram buckets specified (up to 254).
It is not true that each column value repeats only once.

收集方法：GATHER_TABLE_STATS==》method_opt 透過size引數指定直方圖的桶數：
SIZE {integer | REPEAT | AUTO | SKEWONLY}
auto:基於列的使用資訊(sys.col_usage$)和是否存在資料傾斜來收集直方圖
integer:人為的指定建立直方圖的桶數範圍是1到254,如果size 1意味著不建立直方圖
repeat:只會對已經存在直方圖的列重新生成直方圖.如果是一個分割槽表,repeat會確保對在全域性級別存在直方圖的列重新生成直方圖.這是不被推薦的設定的.當前直方圖的桶數將會作為重新生成直方圖所使用的桶數的最大值.比如,當前直方圖的桶數是5,那麼生成的直方圖最大桶數就是5,說的直白點就是重新整理現有直方圖的列上的統計資訊.
skewonly:對任何資料分佈出現傾斜列的自動建立直方圖

建立測試表：
Begin
Create table histest as select rownum id,10000 sal from dual connect by level <10000;
update histest set sal=id where rownum< =10;
update histest set sal=9 where id between '100' and '200';
update histest set sal=8 where id between '1000' and '5000';
commit;
select sal,count(*) from histest group by sal;

不收集柱狀圖：
exec dbms_stats.gather_table_stats('SCOTT','HISTEST',method_opt=>'for all columns size 1');
select column_name,num_distinct,density,HISTOGRAM from user_tab_col_statistics where table_name='HISTEST';
//density=1/num_distinct

COLUMN_NAME NUM_DISTINCT DENSITY HISTOGRAM
-------------------------------------------------- ------------ ---------- ---------------
ID 9999 .00010001 NONE
SAL 11 .090909091 NONE

收集柱狀圖：
exec dbms_stats.gather_table_stats('SCOTT','HISTEST',method_opt=>'for all columns size 11');
select column_name,num_distinct,density,HISTOGRAM from user_tab_col_statistics where table_name='HISTEST' and COLUMN_NAME='SAL';

COLUMN_NAME NUM_DISTINCT DENSITY HISTOGRAM
-------------------------------------------------- ------------ ---------- ---------------
SAL 11 .000050005 FREQUENCY

--檢視直方圖資訊：
select column_name,endpoint_number,endpoint_value from user_tab_histograms where
table_name='HISTEST' and COLUMN_NAME='SAL';
ENDPOINT_NUMBER:累計的行數
ENDPOINT_VALUE：真實列值

COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE
-------------------------------------------------- --------------- --------------
SAL 1 1
SAL 2 2
SAL 3 3
SAL 4 4
SAL 5 5
SAL 6 6
SAL 7 7
SAL 4009 8
SAL 4111 9
SAL 4112 10 <=== 計算列值為10的個數4112-4111
SAL 9999 10000 <=== 計算列值為10000的個數9999-4112

高度平衡直方圖：
exec dbms_stats.gather_table_stats('SCOTT','HISTEST',method_opt=>'for all columns size 5');

COLUMN_NAME NUM_DISTINCT DENSITY HISTOGRAM
-------------------------------------------------- ------------ ---------- ---------------
SAL 11 .009466401 HEIGHT BALANCED

COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE
-------------------------------------------------- --------------- --------------
SAL 0 1
SAL 2 8
SAL 5 10000
----這裡buckets 1,2都是用8作為它的endpoint所以bucket 1為了節省空間沒有被儲存.下面的查詢能用來顯示桶數和它的endpoinit值
----這裡buckets 3到5都是用10000作為它的endpoint所以bucket 3到4為了節省空間沒有被儲存.下面的查詢能用來顯示桶數和它的endpoinit值
BUCKET_NUMBER：桶的編號。
ENDPOINT_VALUE：桶中的值的最大值。

為了節省空間進行了合併：

BUCKET_NUMBER	ENDPOINT_VALUE	BUCKET_NUMBER	ENDPOINT_VALUE
0	1	2	8
1	8	1	8
2	8	5	10000
3	10000
4	10000
5	10000

BUCKET_NUMBER ENDPOINT_VALUE
------------- --------------
1 8
2 8
3 10000
4 10000
5 10000
---
SELECT bucket_number, max(sal) AS endpoint_value
FROM (
SELECT sal, ntile(5) OVER (ORDER BY sal) AS bucket_number
FROM histest)
GROUP BY bucket_number
ORDER BY bucket_number;

oracle 柱狀圖（Histograms）

相關文章