Oracle當number型別超過一定長度直方圖限制導致SQL執行計劃錯誤

你好我是李白發表於2021-01-05

Oracle當number型別超過一定長度直方圖限制

背景

生產系統,監控巡檢發現某個SQL邏輯讀非常高,透過檢視執行計劃,存在三個執行計劃,最高cost 9w多,較低的兩個位100左右。

經過分析,把具體分析以及發現雖然查詢where條件列擁有直方圖,且存在資料傾斜,但是CBO依然無法判斷資料分佈情況透過

模擬測試如下。

1. 11.2.0.4構造測試環境

1.1 建立測試表,包含兩列,均為number

create table test_hist (id1 number(30) primary key,id2 number(20) not null);

1.2 插入10000條資料,均為15位加1~10000數字構成,也就是兩列的distinct均與行數相同

begin
for i in 1..10000
loop
insert into test_hist values(111021111112345||i,111021111112345||i);
end loop;
commit;
end;
/

1.3 再次插入1w條資料

第一列在上面1w的基礎上遞增,第二列與上面中第一行資料相同,也就是第二列的分佈已經傾斜,1110211111123451值在2w總資料中有10001條。

begin
for i in 1..10000
loop
insert into test_hist values(1110211111123451000||i,1110211111123451);
end loop;
commit;
end;
/
SCOTT@honor1 > select count(*) from test_hist;
                                COUNT(*)
----------------------------------------
                                   20000
SCOTT@honor1 > select count(*) from test_hist where id2=1110211111123451;
                                COUNT(*)
----------------------------------------
                                   10001

1.4 在第二列建立普通索引

create index idx_test_hist_id2 on test_hist(id2);

1.5 收集統計資訊,不收集直方圖

exec dbms_stats.gather_table_stats(user,'TEST_HIST',method_opt=>'for all columns size 1',estimate_percent=>100,cascade=>true,no_invalidate=>false);

1.6 檢視資料分佈

set lines 200 pages 200
col table_name for a15
col num_distinct for 9999999
col density for 999.9999999
col num_nulls for 9999
col num_buckets for 9999
col low_value for a25
col high_value for a25
select table_name,column_name,NUM_DISTINCT,LOW_VALUE,HIGH_VALUE,DENSITY,NUM_NULLS,NUM_BUCKETS,HISTOGRAM 
from user_tab_col_statistics where table_name='TEST_HIST';
TABLE_NAME COLUMN_NAME          NUM_DISTINCT LOW_VALUE                      HIGH_VALUE                     DENSITY NUM_NULLS NUM_BUCKETS HISTOGRAM
---------- -------------------- ------------ ------------------------------ ------------------------- ------------ --------- ----------- ---------------
TEST_HIST  ID1                         20000 C80C0B160C0C0D2334             CC0C0B160C0C0D23340102        .0000500         0           1 NONE
TEST_HIST  ID2                         10000 C80C0B160C0C0D2334             CA0C0B160C0C0D2334            .0001000         0           1 NONE
col ENDPOINT_ACTUAL_VALUE for a25
col ENDPOINT_ACTUAL_VALUE_RAW for a25
select * from user_tab_histograms where table_name='TEST_HIST';
TABLE_NAME  COLUMN_NAME  ENDPOINT_NUMBER            ENDPOINT_VALUE ENDPOINT_ACTUAL_VALU
----------- ---------------------------- ------------------------- --------------------
TEST_HIST   ID1                        0          1110211111123450
TEST_HIST   ID2                        0          1110211111123450
TEST_HIST   ID1                        1  111021111112345000000000
TEST_HIST   ID2                        1      11102111111234500000

2. 測試SQL語

2.1 檢視執行計劃

SCOTT@honor1 > set autotrace traceonly
SCOTT@honor1 > select /*+ gather_plan_statistics */ id2 from test_hist where id2=1110211111123451;
10001 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1411376830
--------------------------------------------------------------------------------------
| Id  | Operation        | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                   |     2 |    22 |     1   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| IDX_TEST_HIST_ID2 |     2 |    22 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

2.2 檢視真實執行計劃     

SYS@honor1 > select sql_id,sql_text,child_number,plan_hash_value from v$sql where sql_text like 'select /*+ gather_plan_statistics */ id2 from test_hist%';
SYS@honor1 > select * from table(dbms_xplan.display_cursor('02v238rmgtfg8',0,'allstats last'));
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------
SQL_ID  02v238rmgtfg8, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ id2 from test_hist where
id2=1110211111123451
Plan hash value: 1411376830
---------------------------------------------------------------------------------------------------------
| Id  | Operation        | Name              | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                   |      1 |        |  10001 |00:00:00.01 |     696 |     31 |
|*  1 |  INDEX RANGE SCAN| IDX_TEST_HIST_ID2 |      1 |      2 |  10001 |00:00:00.01 |     696 |     31 |
---------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("ID2"=1110211111123451)
19 rows selected.

# 可以看到由於CBO最佳化器不知道ID2存在傾斜,導致estimate-rows與actual-rows存在非常大偏差,導致選擇了錯誤執行計劃,走了索引。

2.3 收集直方圖

# 11g最大直方圖buckets為254,所以按照最大收集直方圖。

exec dbms_stats.gather_table_stats(user,'TEST_HIST',method_opt=>'for all columns size 254',
estimate_percent=>100,cascade=>true,no_invalidate=>false);
set lines 200 pages 200
col table_name for a15
col num_distinct for 9999999
col density for 999.9999999
col num_nulls for 9999
col num_buckets for 9999
col low_value for a25
col high_value for a25
select table_name,column_name,NUM_DISTINCT,LOW_VALUE,HIGH_VALUE,DENSITY,NUM_NULLS,NUM_BUCKETS,HISTOGRAM from user_tab_col_statistics where table_name='TEST_HIST'
TABLE_NAME      COLUMN_NAME          NUM_DISTINCT LOW_VALUE                 HIGH_VALUE                     DENSITY NUM_NULLS NUM_BUCKETS HISTOGRAM
--------------- -------------------- ------------ ------------------------- ------------------------- ------------ --------- ----------- ---------------
TEST_HIST       ID1                         20000 C80C0B160C0C0D2334        CC0C0B160C0C0D23340102        .0000500         0         254 HEIGHT BALANCED
TEST_HIST       ID2                         10000 C80C0B160C0C0D2334        CA0C0B160C0C0D2334            .0000500         0         254 HEIGHT BALANCED
SCOTT@honor1 > select table_name,column_name,endpoint_number,to_char(ENDPOINT_VALUE),ENDPOINT_ACTUAL_VALUE from user_tab_histograms where table_name='TEST_HIST' order by column_name,endpoint_number;
TABLE_NAME      COLUMN_NAME          ENDPOINT_NUMBER TO_CHAR(ENDPOINT_VALUE)                  ENDPOINT_ACTUAL_VALU
--------------- -------------------- --------------- ---------------------------------------- --------------------
TEST_HIST       ID2                              126 1110211111123450
TEST_HIST       ID2                              127 11102111111234500
...

2.4 透過上述列直方圖統計資訊以及測試可以得出如下情況:

# 取15位後,第十五位四捨五入,其餘位補0後,那麼實際查詢時,傳入值如果為具體值,不會參考具體值,驗證如下:

# 驗證之前ID2列直方圖資訊如下:

TABLE_NAME                     COLUMN_NAME  ENDPOINT_NUMBER TO_CHAR(ENDPOINT_VALUE) 
------------------------------ ---------------------------- ------------------------
TEST_HIST                      ID2                        0 1110211111123450
TEST_HIST                      ID2                      112 1110211111123450
TEST_HIST                      ID2                      124 1110211111123460
TEST_HIST                      ID2                      125 11102111111234500
TEST_HIST                      ID2                      126 11102111111234500
TEST_HIST                      ID2                      127 11102111111234500
TEST_HIST                      ID2                      128 11102111111234600
TEST_HIST                      ID2                      129 11102111111234600
TEST_HIST                      ID2                      130 11102111111234600
TEST_HIST                      ID2                      131 11102111111234600
TEST_HIST                      ID2                      132 111021111112345000
TEST_HIST                      ID2                      133 111021111112345000
TEST_HIST                      ID2                      134 111021111112345000
TEST_HIST                      ID2                      135 111021111112345000
TEST_HIST                      ID2                      136 111021111112345000
TEST_HIST                      ID2                      137 111021111112345000
TEST_HIST                      ID2                      138 111021111112345000
TEST_HIST                      ID2                      139 111021111112345000
TEST_HIST                      ID2                      140 111021111112345000
TEST_HIST                      ID2                      141 111021111112345000
TEST_HIST                      ID2                      142 111021111112345000
TEST_HIST                      ID2                      143 111021111112346000
TEST_HIST                      ID2                      144 111021111112346000
TEST_HIST                      ID2                      145 111021111112346000
TEST_HIST                      ID2                      146 111021111112346000
TEST_HIST                      ID2                      147 111021111112346000
TEST_HIST                      ID2                      148 111021111112346000
TEST_HIST                      ID2                      149 111021111112346000
TEST_HIST                      ID2                      150 111021111112346000
TEST_HIST                      ID2                      151 111021111112346000
TEST_HIST                      ID2                      152 1110211111123450000
TEST_HIST                      ID2                      153 1110211111123450000
TEST_HIST                      ID2                      154 1110211111123450000
TEST_HIST                      ID2                      155 1110211111123450000
TEST_HIST                      ID2                      156 1110211111123450000
TEST_HIST                      ID2                      157 1110211111123450000
TEST_HIST                      ID2                      158 1110211111123450000
TEST_HIST                      ID2                      159 1110211111123450000
TEST_HIST                      ID2                      160 1110211111123450000
TEST_HIST                      ID2                      161 1110211111123450000
TEST_HIST                      ID2                      162 1110211111123450000
TEST_HIST                      ID2                      163 1110211111123450000
TEST_HIST                      ID2                      164 1110211111123450000
TEST_HIST                      ID2                      165 1110211111123450000
TEST_HIST                      ID2                      166 1110211111123450000
TEST_HIST                      ID2                      167 1110211111123450000
TEST_HIST                      ID2                      168 1110211111123450000
TEST_HIST                      ID2                      169 1110211111123450000
TEST_HIST                      ID2                      170 1110211111123450000
TEST_HIST                      ID2                      171 1110211111123450000
TEST_HIST                      ID2                      172 1110211111123450000
TEST_HIST                      ID2                      173 1110211111123450000
TEST_HIST                      ID2                      174 1110211111123450000
TEST_HIST                      ID2                      175 1110211111123450000
TEST_HIST                      ID2                      176 1110211111123450000
TEST_HIST                      ID2                      177 1110211111123450000
TEST_HIST                      ID2                      178 1110211111123450000
TEST_HIST                      ID2                      179 1110211111123450000
TEST_HIST                      ID2                      180 1110211111123450000
TEST_HIST                      ID2                      181 1110211111123450000
TEST_HIST                      ID2                      182 1110211111123450000
TEST_HIST                      ID2                      183 1110211111123450000
TEST_HIST                      ID2                      184 1110211111123450000
TEST_HIST                      ID2                      185 1110211111123450000
TEST_HIST                      ID2                      186 1110211111123450000
TEST_HIST                      ID2                      187 1110211111123450000
TEST_HIST                      ID2                      188 1110211111123450000
TEST_HIST                      ID2                      189 1110211111123450000
TEST_HIST                      ID2                      190 1110211111123450000
TEST_HIST                      ID2                      191 1110211111123450000
TEST_HIST                      ID2                      192 1110211111123450000
TEST_HIST                      ID2                      193 1110211111123450000
TEST_HIST                      ID2                      194 1110211111123450000
TEST_HIST                      ID2                      195 1110211111123450000
TEST_HIST                      ID2                      196 1110211111123450000
TEST_HIST                      ID2                      197 1110211111123450000
TEST_HIST                      ID2                      198 1110211111123460000
TEST_HIST                      ID2                      199 1110211111123460000
TEST_HIST                      ID2                      200 1110211111123460000
TEST_HIST                      ID2                      201 1110211111123460000
TEST_HIST                      ID2                      202 1110211111123460000
TEST_HIST                      ID2                      203 1110211111123460000
TEST_HIST                      ID2                      204 1110211111123460000
TEST_HIST                      ID2                      205 1110211111123460000
TEST_HIST                      ID2                      206 1110211111123460000
TEST_HIST                      ID2                      207 1110211111123460000
TEST_HIST                      ID2                      208 1110211111123460000
TEST_HIST                      ID2                      209 1110211111123460000
TEST_HIST                      ID2                      210 1110211111123460000
TEST_HIST                      ID2                      211 1110211111123460000
TEST_HIST                      ID2                      212 1110211111123460000
TEST_HIST                      ID2                      213 1110211111123460000
TEST_HIST                      ID2                      214 1110211111123460000
TEST_HIST                      ID2                      215 1110211111123460000
TEST_HIST                      ID2                      216 1110211111123460000
TEST_HIST                      ID2                      217 1110211111123460000
TEST_HIST                      ID2                      218 1110211111123460000
TEST_HIST                      ID2                      219 1110211111123460000
TEST_HIST                      ID2                      220 1110211111123460000
TEST_HIST                      ID2                      221 1110211111123460000
TEST_HIST                      ID2                      222 1110211111123460000
TEST_HIST                      ID2                      223 1110211111123460000
TEST_HIST                      ID2                      224 1110211111123460000
TEST_HIST                      ID2                      225 1110211111123460000
TEST_HIST                      ID2                      226 1110211111123460000
TEST_HIST                      ID2                      227 1110211111123460000
TEST_HIST                      ID2                      228 1110211111123460000
TEST_HIST                      ID2                      229 1110211111123460000
TEST_HIST                      ID2                      230 1110211111123460000
TEST_HIST                      ID2                      231 1110211111123460000
TEST_HIST                      ID2                      232 1110211111123460000
TEST_HIST                      ID2                      233 1110211111123460000
TEST_HIST                      ID2                      234 1110211111123460000
TEST_HIST                      ID2                      235 1110211111123460000
TEST_HIST                      ID2                      236 1110211111123460000
TEST_HIST                      ID2                      237 1110211111123460000
TEST_HIST                      ID2                      238 1110211111123460000
TEST_HIST                      ID2                      239 1110211111123460000
TEST_HIST                      ID2                      240 1110211111123460000
TEST_HIST                      ID2                      241 1110211111123460000
TEST_HIST                      ID2                      242 1110211111123460000
TEST_HIST                      ID2                      243 1110211111123460000
TEST_HIST                      ID2                      244 1110211111123460000
TEST_HIST                      ID2                      245 1110211111123460000
TEST_HIST                      ID2                      246 1110211111123460000
TEST_HIST                      ID2                      247 1110211111123460000
TEST_HIST                      ID2                      248 1110211111123460000
TEST_HIST                      ID2                      249 1110211111123460000
TEST_HIST                      ID2                      250 1110211111123460000
TEST_HIST                      ID2                      251 1110211111123460000
TEST_HIST                      ID2                      252 1110211111123460000
TEST_HIST                      ID2                      253 1110211111123460000
TEST_HIST                      ID2                      254 11102111111234500000

# 使用直方圖顯示的為popular value實際只有6行的1110211111123450值驗證:

# 可以看到執行計劃依然根據popular value公式,走了全表掃描。

# cardinality計算公式如下:

cardinality = NUM_ROWS * selectivity
selectivity = (Buckets_this_popular_value) / Buckets_total) * Null_Adjust
Null_Adjust = (NUM_ROWS - NUM_NULLS) / NUM_ROWS

註釋:這裡Buckets_this_popular_value為popular value所佔buckets總數,Buckets_Total為buckets總數。

將上述統計資訊代入上述公式

cardinality = 22449 * (112 / 254) * 1 ≈ 9921 符合下面計算公式:

SCOTT@honor1 > select id1,id2 from test_hist where id2=1110211111123450;
6 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3529380458
-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |  9921 |   232K|    27   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST_HIST |  9921 |   232K|    27   (0)| 00:00:01 |
-------------------------------------------------------------------------------

# 使用真正的popular value但是在直方圖中為non-popular value反而走了不應該走的索引掃描:

SCOTT@honor1 > select id1,id2 from test_hist where id2=1110211111123451;
10007 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2004230999
-------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                   |     1 |    24 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TEST_HIST         |     1 |    24 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IDX_TEST_HIST_ID2 |     1 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

2.5 根據上述直方圖資料分佈以及測試得出如下情況

(1)直方圖收集number型別時,字符集為AL32UTF8時,取number前15位,其餘位有多少位補多少0,第十五位是否出現四捨五入情況,取決於資料分佈:

hight balanced直方圖含義為每個bucket中平均分佈所有not null值,當第十六位為5、6、7、8、9的總和無法放在取前十五位第十六位補0的值所在bucket中時,

那麼會再出現下一個bucket,這個bucket的endpoint_value將會為第十五位加一,這個情況可以透過測試得到驗證。


(2)ID2值1110211111123450從126 bucket開始表示該值為popular value,0~127個bucket的endpoint_value相同。

endpoint_number為0的user_tab_col_statistics行,表示該endpoint_value行數不足一個bucket,endpoint_number為累計值bucket number。


(3)多個endpoint_value相同的為popular value,可以看出當1110211111123451為最小值且行數大於1個bucket後,則bucket分佈如下,不會包含endpoint_number為0的資料分佈:

SCOTT@honor1 > select table_name,column_name,endpoint_number,to_char(ENDPOINT_VALUE),ENDPOINT_ACTUAL_VALUE from user_tab_histograms where table_name='TEST_HIST' order by column_name,endpoint_number;
TABLE_NAME      COLUMN_NAME          ENDPOINT_NUMBER TO_CHAR(ENDPOINT_VALUE)                  ENDPOINT_ACTUAL_VALU
--------------- -------------------- --------------- ---------------------------------------- --------------------
TEST_HIST       ID2                              126 1110211111123450
TEST_HIST       ID2                              127 11102111111234500


當最小值不滿足一個bucket時,則會有直方圖中endpoint_number為0的行數,但是並不是一個bucket,只是代表該endpoint_value並不滿足一個bucket,如下:

# 下面資料為再次插入了1110211111123450值與1110211111123450~1110211111123459部分值重新收集的統計資訊

TABLE_NAME                     COLUMN_NAME                                   ENDPOINT_NUMBER                           ENDPOINT_VALUE ENDPOINT_ACTUAL_VALU
------------------------------ -------------------- ---------------------------------------- ---------------------------------------- --------------------
TEST_HIST                      ID2                                                         0                         1110211111123450
TEST_HIST                      ID2                                                       112                         1110211111123450
...

(4)取15位後,第十五位四捨五入,其餘位補0後,那麼實際查詢時,傳入值如果為具體值,不會參考具體值,上面測試

已經驗證。

3.19c情況會有所改善嗎?

3.1 構造如下資料分佈的測試表

CZH@czhpdb > select count(*) from test_hist;
                                COUNT(*)
----------------------------------------
                                   27200

CZH@czhpdb > select id2,count(*) from test_hist group by id2 having count(*) > 2 order by 1;
                                     ID2                                 COUNT(*)
---------------------------------------- ----------------------------------------
                        1110211111123450                                      900
                        1110211111123451                                    10001
                        1110211111123452                                      901
                        1110211111123453                                      901
                        1110211111123454                                      901
                        1110211111123455                                      901
                        1110211111123456                                      901
                        1110211111123457                                      901
                        1110211111123458                                      901
其餘值均為distinct值,均只有1個。

3.2 收集直方圖

CZH@czhpdb > select * from user_tab_histograms where table_name='TEST_HIST' and column_name='ID2' order by endpoint_number;
TABLE_NAME      COLUM ENDPOINT_NUMBER                           ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE     ENDPOINT_ACTUAL_VALUE_RAW ENDPOINT_REPEAT_COUNT SCOPE
--------------- ----- --------------- ---------------------------------------- ------------------------- ------------------------- --------------------- -------
TEST_HIST       ID2                69                         1110211111123450 1110211111123450          C80C0B160C0C0D2333                            0 SHARED
TEST_HIST       ID2               838                         1110211111123450 1110211111123451          C80C0B160C0C0D2334                            0 SHARED
TEST_HIST       ID2               907                         1110211111123450 1110211111123452          C80C0B160C0C0D2335                            0 SHARED
TEST_HIST       ID2               974                         1110211111123450 1110211111123453          C80C0B160C0C0D2336                            0 SHARED
TEST_HIST       ID2              1040                         1110211111123450 1110211111123454          C80C0B160C0C0D2337                            0 SHARED
TEST_HIST       ID2              1107                         1110211111123450 1110211111123455          C80C0B160C0C0D2338                            0 SHARED
TEST_HIST       ID2              1174                         1110211111123460 1110211111123456          C80C0B160C0C0D2339                            0 SHARED
TEST_HIST       ID2              1241                         1110211111123460 1110211111123457          C80C0B160C0C0D233A                            0 SHARED
TEST_HIST       ID2              1307                         1110211111123460 1110211111123458          C80C0B160C0C0D233B                            0 SHARED
TEST_HIST       ID2              1308                        11102111111234500 11102111111234510         C9020C030C0C0C182E0B                          0 SHARED
TEST_HIST       ID2              1309                        11102111111234500 11102111111234524         C9020C030C0C0C182E19                          0 SHARED
TEST_HIST       ID2              1310                        11102111111234500 11102111111234537         C9020C030C0C0C182E26                          0 SHARED
TEST_HIST       ID2              1311                        11102111111234600 11102111111234551         C9020C030C0C0C182E34                          0 SHARED
TEST_HIST       ID2              1312                        11102111111234600 11102111111234564         C9020C030C0C0C182E41                          0 SHARED
TEST_HIST       ID2              1313                        11102111111234600 11102111111234578         C9020C030C0C0C182E4F                          0 SHARED
TEST_HIST       ID2              1314                        11102111111234600 11102111111234591         C9020C030C0C0C182E5C                          0 SHARED
TEST_HIST       ID2              1315                       111021111112345000 111021111112345105        C90C0B160C0C0D233406                          0 SHARED
TEST_HIST       ID2              1316                       111021111112345000 111021111112345118        C90C0B160C0C0D233413                          0 SHARED
...


# 省略其餘bucket直到2048

可以看到Oracle 19c對直方圖完善非常明顯,可以執行與11.2.0.4中無法正確選擇執行計劃的SQL,測試如下:

# 分別測試兩個non-popular與 一個真正popular值

CZH@czhpdb > set autot traceonly
CZH@czhpdb > select id1,id2 from test_hist where id2=1110211111123451;
10001 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3529380458
-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           | 10213 |   229K|    33   (4)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST_HIST | 10213 |   229K|    33   (4)| 00:00:01 |
-------------------------------------------------------------------------------


CZH@czhpdb > select id1,id2 from test_hist where id2=1110211111123450;
900 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2521703107
---------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                   |   916 | 21068 |     9   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| TEST_HIST         |   916 | 21068 |     9   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | IDX_TEST_HIST_ID2 |   916 |       |     4   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------


CZH@czhpdb > select id1,id2 from test_hist where id2=1110211111123453;
901 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2521703107
---------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                   |   890 | 20470 |     9   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| TEST_HIST         |   890 | 20470 |     9   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | IDX_TEST_HIST_ID2 |   890 |       |     4   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

# 可以看到19c雖然endpoint_value在AL32UTF8字符集下也會將16位變成0,但是會正確顯示ENDPOINT_ACTUAL_VALUE,這就會給CBO提供非常充足資訊

# 可以看到CBO有了充足資訊,可以正確選擇執行計劃,19c相比11.2.0.4進步非常明顯。

4. 在11g中這種情況有辦法改善嗎?

有辦法改善,但是不同情況 有不同辦法,針對本文上面資料情況,可以採用函式索引辦法,如下:

# 可透過substr函式索引改善上文中資料傾斜,直方圖無法解決問題,測試如下:

# s ubstr函式會隱式使用to_char轉換number建立索引,查詢時,需要用單引號括起來,否則會發生隱式轉換,將不走函式索引。

SCOTT@honor1 > create index idx_func_id2 on test_hist(substr(id2,16,5));
SCOTT@honor1 > exec dbms_stats.gather_table_stats(user,'TEST_HIST',method_opt=>'for all columns size 254',estimate_percent=>100,cascade=>true,no_invalidate=>false);
SCOTT@honor1 > select id1,id2 from test_hist where substr(id2,16,5)='1';    # 需要單引號括起來,否則不走索引
10001 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3529380458
-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |  9921 |   261K|    23   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST_HIST |  9921 |   261K|    23   (0)| 00:00:01 |
-------------------------------------------------------------------------------

SCOTT@honor1 > select id1,id2 from test_hist where substr(id2,16,5)='2';
Execution Plan
----------------------------------------------------------
Plan hash value: 3835702174
--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |     1 |    27 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TEST_HIST    |     1 |    27 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IDX_FUNC_ID2 |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

# 可以看到已經可以正確選擇執行計劃。

# 這種方法僅適用於資料前多少位完全一致,如果資料所有位均有可能發生變化,這種情況比較難以穩定執行計劃。


綜上:

(1)11.2.0.4 number取前15,其餘位根據情況補0.

(2)19c中bucket數量擴充套件至2048,已經可以正確處理上文中資料傾斜情況。

(3)所以直方圖並不是因為傾斜才需要收集,而是直方圖能真實反應列統計資訊才可以收集,才真正有幫助,上面例子就是列存在傾斜,但是由於直方圖11.2.0.4直方圖限制

導致無法反映真實統計資訊,導致傾斜值存在大量行數,但是Oracle會認為是non popular值,cardinality會計算很低,導致走索引掃描或者由於計算cardinality較低,會導致一旦跟其他表關聯時

走錯誤nest loop,將造成SQL執行效率低下。



直方圖其他情況(11.2.0.4為例):

詳情參見文章:

1、如果目標列的distinct值的數量和目標表的資料量相同,即使該目標列在SYS.COL_USAGE$中有使用記錄,

Oracle在自動收集直方圖統計資訊的時候也不會對該列收集直方圖統計資訊;

2、在手工收集直方圖統計資訊的時候,如果我手工指定的bucket的數量等於目標列的distinct值的數量,

且這個值是小於等於254的話,那麼Oracle此時收集的直方圖統計資訊的型別應該是FREQUENCY——這個結論成立的前提條件是該列的資料分佈是傾斜的。


來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/31439444/viewspace-2747700/,如需轉載,請註明出處,否則將追究法律責任。

相關文章