[20161208]11g直方圖與char資料型別.txt
[20161208]11g直方圖與char資料型別.txt
--以前看tom大師的書提到過不要使用char資料型別,哪怕是char(1)也不要使用,最近看了幾篇blob裡面都提到了11g升級後會出現char數
--據型別直方圖統計發生了變化,我重複別人的例子來說明.再次強調不要生產環境使用char型別.
--參考連結:http://blog.dbi-services.com/histograms-on-character-strings-between-11-2-0-3-and-11-2-0-4/
1.環境:
SCOTT@book> @ &r/ver1
PORT_STRING VERSION BANNER
------------------- ---------- --------------------------------------------------------------------------------
x86_64/Linux 2.4.xx 11.2.0.4.0 Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
SCOTT@book> create table DEMO ( flag char);
Table created.
SCOTT@book> @ &r/desc demo
Name Null? Type
----- -------- --------
FLAG CHAR(1)
--不指定長度,長度為1.
SCOTT@book> insert into DEMO select 'Y' from xmltable('1 to 100000');
100000 rows created.
SCOTT@book> insert into DEMO select 'N' from xmltable('1 to 1000');
1000 rows created.
SCOTT@book> commit ;
Commit complete.
SCOTT@book> select flag,to_char(ascii(flag),'XX'),count(*) from DEMO group by flag;
F TO_ COUNT(*)
- --- ----------
Y 59 100000
N 4E 1000
SCOTT@book> exec dbms_stats.gather_table_stats(user,'DEMO',no_invalidate=>false,Estimate_Percent => NULL,Method_Opt => 'FOR ALL COLUMNS SIZE 5');
PL/SQL procedure successfully completed.
select endpoint_number,endpoint_value
,to_char(endpoint_value,rpad('FM',65,'X'))
,utl_raw.cast_to_varchar2(hextoraw(to_char(endpoint_value,rpad('FM',65,'X')))) c20
from user_histograms h where table_name='DEMO';
ENDPOINT_NUMBER ENDPOINT_VALUE TO_CHAR(ENDPOINT_VALUE,RPAD('FM',65,'X')) C20
--------------- -------------- ---------------------------------------------------------------- --------------------
101000 4.6211E+35 590000000000127D2B51B120000000 Y }+Q?
1000 4.0500E+35 4E0000000000181F436C7BBB200000 N Cl{?
--你可以注意實際參加運算的Y/N後面的字元是0x00.}
--而如果你設定引數optimizer_features_enable.
SCOTT@book> alter session set optimizer_features_enable='11.2.0.3';
Session altered.
SCOTT@book> exec dbms_stats.gather_table_stats(user,'DEMO',no_invalidate=>false,Estimate_Percent => NULL,Method_Opt => 'FOR ALL COLUMNS SIZE 5');
PL/SQL procedure successfully completed.
select endpoint_number,endpoint_value
,to_char(endpoint_value,rpad('FM',65,'X'))
,utl_raw.cast_to_varchar2(hextoraw(to_char(endpoint_value,rpad('FM',65,'X')))) c20
4 from user_histograms h where table_name='DEMO';
ENDPOINT_NUMBER ENDPOINT_VALUE TO_CHAR(ENDPOINT_VALUE,RPAD('FM',65,'X')) C20
--------------- -------------- ---------------------------------------------------------------- --------------------
1000 4.0565E+35 4E20202020203A7BB119D5F6000000 N :{閉?
101000 4.6277E+35 59202020202034D998FF0B5AE00000 Y 4贅? Z?
--注意實際參加運算的Y/N後面的字元是0x20.} 也就是使用空格.
2.測試:
--這個時候退出,在登入執行如下:
SCOTT@book> alter session set statistics_level=all;
Session altered.
SCOTT@book> show parameter optimizer_features_enable
NAME TYPE VALUE
------------------------- ------ ---------
optimizer_features_enable string 11.2.0.4
--恢復原來設定.
SCOTT@book> select count(*) from demo where flag='Y';
COUNT(*)
----------
100000
SCOTT@book> @ &r/dpc '' ''
PLAN_TABLE_OUTPUT
--------------------------------------
SQL_ID 8var563j3tg8g, child number 0
-------------------------------------
select count(*) from demo where flag='Y'
Plan hash value: 2180342005
---------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 69 (100)| | 1 |00:00:00.03 | 219 |
| 1 | SORT AGGREGATE | | 1 | 1 | 2 | | | 1 |00:00:00.03 | 219 |
|* 2 | TABLE ACCESS FULL| DEMO | 1 | 500 | 1000 | 69 (2)| 00:00:01 | 100K|00:00:00.02 | 219 |
---------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1
2 - SEL$1 / DEMO@SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("FLAG"='Y')
--你可以發現這時E-Rows=500.
SCOTT@book> select * from DBA_TAB_HISTOGRAMS where owner=user and table_name='DEMO';
OWNER TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_A
------ ---------- -------------------- --------------- -------------- ----------
SCOTT DEMO FLAG 1000 4.0565E+35
SCOTT DEMO FLAG 101000 4.6277E+35
select * from DBA_TAB_cols where owner=user and table_name='DEMO';
Record View
As of: 2016/12/8 9:24:56
OWNER: SCOTT
TABLE_NAME: DEMO
COLUMN_NAME: FLAG
DATA_TYPE: CHAR
DATA_TYPE_MOD:
DATA_TYPE_OWNER:
DATA_LENGTH: 1
DATA_PRECISION:
DATA_SCALE:
NULLABLE: Y
COLUMN_ID: 1
DEFAULT_LENGTH:
DATA_DEFAULT:
NUM_DISTINCT: 2
LOW_VALUE: 4E
HIGH_VALUE: 59
DENSITY: 0.00000495049504950495
NUM_NULLS: 0
NUM_BUCKETS: 2
LAST_ANALYZED: 2016/12/8 9:19:24
SAMPLE_SIZE: 101000
CHARACTER_SET_NAME: CHAR_CS
CHAR_COL_DECL_LENGTH: 1
GLOBAL_STATS: YES
USER_STATS: NO
AVG_COL_LEN: 2
CHAR_LENGTH: 1
CHAR_USED: B
V80_FMT_IMAGE: NO
DATA_UPGRADED: YES
HIDDEN_COLUMN: NO
VIRTUAL_COLUMN: NO
SEGMENT_COLUMN_ID: 1
INTERNAL_COLUMN_ID: 1
HISTOGRAM: FREQUENCY
QUALIFIED_COL_NAME: FLAG
-- 實際上0x5920 與 0x5900 存在很大差異,運算並沒有按照直方圖來計算,而使用DENSITY來計算.0.00000495049504950495 * 1000* 101000 =499.99999999999995.
-- 作者做10053測試
Column (#1):
NewDensity:0.004950, OldDensity:0.000005 BktCnt:101000.000000, PopBktCnt:101000.000000, PopValCnt:2, NDV:2
--這個新舊Density,為什麼相差1000,我不知道?
--如果你重新分析就ok了.
SCOTT@book> exec dbms_stats.gather_table_stats(user,'DEMO',no_invalidate=>false,Estimate_Percent => NULL,Method_Opt => 'FOR ALL COLUMNS SIZE 5');
PL/SQL procedure successfully completed.
SCOTT@book> select count(*) from demo where flag='Y';
COUNT(*)
----------
100000
SCOTT@book> @ &r/dpc '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID 8var563j3tg8g, child number 0
-------------------------------------
select count(*) from demo where flag='Y'
Plan hash value: 2180342005
---------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 69 (100)| | 1 |00:00:00.03 | 219 |
| 1 | SORT AGGREGATE | | 1 | 1 | 2 | | | 1 |00:00:00.03 | 219 |
|* 2 | TABLE ACCESS FULL| DEMO | 1 | 100K| 195K| 69 (2)| 00:00:01 | 100K|00:00:00.02 | 219 |
---------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1
2 - SEL$1 / DEMO@SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("FLAG"='Y')
--這樣E-Rows正確了.所以作者建議升級後要重新分析.
3.另外我上面的測試使用資料型別char(1),改變長度呢?重複測試:
--drop table demo purge ;
create table DEMO ( flag char(20));
insert into DEMO select 'Y' from xmltable('1 to 100000');
insert into DEMO select 'N' from xmltable('1 to 1000');
commit ;
--實際上這個儲存在資料庫的資訊是
SCOTT@book> select dump(flag,16) from demo where rownum=1;
DUMP(FLAG,16)
------------------------------------------------------------------------------
Typ=96 Len=20: 59,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20
--有19個空格.
SCOTT@book> select flag,to_char(ascii(flag),'XX'),count(*) from DEMO group by flag;
F TO_ COUNT(*)
- --- ----------
Y 59 100000
N 4E 1000
SCOTT@book> exec dbms_stats.gather_table_stats(user,'DEMO',no_invalidate=>false,Estimate_Percent => NULL,Method_Opt => 'FOR ALL COLUMNS SIZE 5');
PL/SQL procedure successfully completed.
select endpoint_number,endpoint_value
,to_char(endpoint_value,rpad('FM',65,'X'))
,utl_raw.cast_to_varchar2(hextoraw(to_char(endpoint_value,rpad('FM',65,'X')))) c40
from user_histograms h where table_name='DEMO';
ENDPOINT_NUMBER ENDPOINT_VALUE TO_CHAR(ENDPOINT_VALUE,RPAD('FM',65,'X')) C40
--------------- -------------- ------------------------------------------- ----------------
1000 4.0500E+35 4E0000000000181F436C7BBB200000 N Cl{?
101000 4.6211E+35 590000000000127D2B51B120000000 Y }+Q?
--你可以發現oracle 11.2.0.4 改變了演算法,取消了後面的空格,而使用0x00 參加運算.
--從我個人的建議不要在生產系統使用任何char型別,哪怕是char(1),而統一使用varchar2型別,即使是varchar2(1).
--jonathanlewis大師也寫了一篇blog,連結
--另外在連結裡面還有一個連結指向直方圖如何運算,連結:
--我摘要主要部分:
In an earlier note on interpreting the content of frequency histograms I made a throwaway comment about the extra
complexity of interpreting frequency histograms on character-based columns. This note starts to examine some of the
complications.
The driving problem behind character columns is that they can get quite large – up to 4,000 bytes – so the content of
an "accurate histogram" could become quite large, and Oracle seems to have taken a strategic decision (at some point in
history) to minimise this storage. As a result we can see an algorithm that works roughly as follows:
Take the first 15 bytes of the string (after padding the string with nulls (for varchar2) or spaces (for char))
Treat the result as a hexadecimal number, and convert to decimal
Round to 15 significant digits and store as the endpoint_value
If duplicate rows appear, store the first 32 characters (increased to 64 for 12c) of each string as the endpoint_actual_value
Given this algorithm, we can do an approximate reversal (which will only be needed when the endpoint_actual_value is not
available) by formatting the endpoint_value into a hex string, extracting the first six pairs of digits, converting to
numeric and applying the chr() function to get a character value. (You'll have to fiddle with this bit of code to handle
multibyte character sets, of course).
With a nice friendly single-byte character code, the first 5 characters will be extracted correctly (assuming there are
at least 5 characters in the string), and the sixth will be pretty close to the original. Here's an example (which also
includes the logic to convert the endpoint_number into a frequency):
--哎,許多還是不懂,放棄..
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/267265/viewspace-2130022/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- [20170615]直方圖-高度直方圖(11g).txt直方圖
- MySQL資料型別操作(char與varchar)MySql資料型別
- [20190630]如何確定直方圖型別.txt直方圖型別
- MySQL字元資料型別char與varchar的區別MySql字元資料型別
- [20170114]12c varchar2型別直方圖.txt型別直方圖
- ABAP中Char型別資料轉換成Decimal型別型別Decimal
- CBO計算與資料型別的選擇(兼談日期、字元、數字三種型別的直方圖生成原理和使用)資料型別字元直方圖
- [20140301]直方圖與優化.txt直方圖優化
- MySQL CHAR和VARCHAR資料型別介紹MySql資料型別
- Oracle的資料型別:char/varchar2Oracle資料型別
- 聊聊Oracle 11g中的char型別使用Oracle型別
- std::cout 輸出 unsigned char型別資料型別
- 11G,可以只刪除直方圖資訊直方圖
- 直方圖直方圖
- [20180122]列統計與直方圖.txt直方圖
- 收集直方圖及檢視直方圖資訊直方圖
- [20120905]刪除直方圖.txt直方圖
- opencv——影像直方圖與反向投影OpenCV直方圖
- 直方圖(histograms)直方圖Histogram
- 7.3 直方圖直方圖
- oracle 直方圖Oracle直方圖
- PLSQL Language Reference-PL/SQL資料型別-SQL資料型別-CHAR和VARCHAR2變數SQL資料型別變數
- 直方圖均衡化原理與實現直方圖
- col_usage$與直方圖的收集直方圖
- cursor_sharing=similar 與 直方圖MILA直方圖
- 直方圖Histograms與CRUSOR_SHARING直方圖Histogram
- Oracle 資料型別CHAR, NCHAR, VARCHAR2, NVARCHAR2Oracle資料型別
- 基本資料型別與字串型別資料型別字串
- const char* 型別不能用於初始化char* 型別實體型別
- 直方圖均衡化直方圖
- 04:垂直直方圖直方圖
- oracle直方圖使用Oracle直方圖
- 直方圖學習直方圖
- Oracle直方圖解析Oracle直方圖圖解
- [zt] Histograms - 直方圖Histogram直方圖
- 收集資料庫統計資訊需要收集直方圖資訊.資料庫直方圖
- Java中的基本資料型別與引用資料型別Java資料型別
- dba_histograms等高直方圖和等頻直方圖的理解Histogram直方圖