table column histogram .親自實驗檢驗得出的學習筆記。

itpremier發表於2010-10-21

-- 看來,學習還是要時常牢記那句話:

紙上得來終覺淺,絕知此事要躬行。

表欄位柱狀圖的理解:
-- dictionary :
dba_tab_col_statistics/user_tab_col_statistics
dba_tab_histogram/user_tab_histogram

-- definition:
the definition name is base the meaning of column endpoint_number.
1, 當欄位列DISTINCT VALUE>255 時,只能建立HBH(HEIGHT-BALANCED)HISTOGRAM. 即:平衡樹。
here endpoint_number is stand for card-weighted(記錄行數加權值),
and endpoint_value is stand for the statistics-top-limit value(統計截止列值)。
calculate the full table scan result cardinality between valus of a specific column col1 is :
return card = ((endpoint-number1- endpoint-number2 ) PER (endpoint_value1 - endpoint_value2) )/ (MAX(endpoint-number)in the column bucket)
return card is (0,1), the closer to 1 and the larger of card .
成本值是個百分比,大於零,小於1,越接近1,越接近返回所有行記錄。

2, 當欄位列DISTINCT VALUE<=255 時,建立VBH(VALUE-BASED) HISTOGRAM 或者 FH(FREQUENCY-HISTOGRAM). 即:頻率分部圖。

--
endpoint-number:
1,HBH : cumulative card weight , used for (endpoint-number1 - endpoint-number2)/(MAX(endpoint-number)in the column bucket) get the return card percent of full table . 簡記為:成本加權
2,FH : the frequency of the actual column value that is the same row of endpoint_value. 簡記為:值為endpoint_value在表中的記錄行數
endpoint-value:
1,HBH : the specific column value , used for (endpoint-value1 - endpoint-value2) 簡記為:表的當前列值,只有區間使用才有意義(colvalue >endpoint_value1 and colvalue 2,FH : the actual column value that is stored in the database table. 簡記為:表的當前列值,單獨引用(=)或區間使用(>

---UNDERSTANDING:
in use HBH:
card weight = (endpoint-number1- endpoint-number2 ) PER (endpoint_value1 - endpoint_value2)
if the value is even distribute, then the difference of neighbor endpoint-number is around 1, and start with 0.
if the value is skew distribute, then the difference of neighbor endpoint-number is very large than others , maybe not start with 0.

in use FH:
endpoint-number IS standing for the cumulative value of rows AT every endpoint_value, so here the endpoint_value is actually equal to the database column value.
and we can conclude :
for a individual value search , such as where col1='val1' . then the return cardinality/rows is:
value (endpoint_number1- endpoint_number2) in user_tab_histogram(where table_name='tab1' and column_name='COL1' and endpoint_value=NUM_VAL1.
for a RANGE value search , such as where col1 > num1 and col1 value ( max(endpoint_number)- min(endpoint_number)) in user_tab_histogram(where table_name='tab1' and column_name='COL1'
and endpoint_value>NUM_VAL1 and endpoint_value

-- sample 例子:
exec dbms_stats.gather_table_stats(user,'TB1',estimate_percent => NULL, method_opt => 'FOR COLUMNS id SIZE AUTO' )
if you use the specified column , then the other unspecified column will not be statisticed.
estimate_percent => NULL means using compute statiscs ,that's 100 percent or full table statistics.
by default , parameter 'SIZE AUTO' will not generate histogram when the column value is distribute even. But you can enforce oracle generate
bucket by using parameter 'SIZE 10' (1..254).

[@more@]

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/350519/viewspace-1040168/,如需轉載,請註明出處,否則將追究法律責任。

相關文章