CBO中基數(cardinality)、可選擇率(selectivity)的計算公式
CBO中基數(cardinality)、可選擇率(selectivity)的計算公式
Cardinality(基數)是Oracle預估的返回行數,即對目標SQL的某個具體執行步驟的執行結果所包含記錄數的估算值。如果是針對整個目標SQL,那麼此時的Cardinality就表示該SQL最終執行結果所包含記錄數的估算值。例如,一張表T有1000行資料,列COL1上沒有直方圖,沒有空值,並且不重複的值(Distinct Value)有500個。那麼,在使用條件“WHERE COL1=”去訪問表的時候,優化器會假設資料均勻分佈,它估計出會有1000/500=2行被選出來,2就是這步操作的Cardinality。通常情況下,Cardinality越準確,生成的執行計劃就會越高效。
可選擇率(Selectivity)是指施加指定謂詞條件後返回結果集的記錄數佔未施加任何謂詞條件的原始結果集的記錄數的比率。可選擇率的取值範圍顯然是0~1,它的值越小,就表明可選擇性越好。當可選擇率為1時的可選擇性是最差的。CBO就是用可選擇率來估算對應結果集的Cardinality的,可選擇率和Cardinality之間的關係如下所示:
cardinality=NUM_ROWS*selectivity
其中,NUM_ROWS表示表的總行數。
在Oracle資料庫中,Oracle會預設認為SQL語句的WHERE條件中出現的各列彼此之間是獨立的,是沒有關聯關係的。所以,如果目標SQL語句各列之間是以AND來組合的話,那麼該SQL語句整個WHERE條件的組合可選擇率就等於各個列各自施加查詢條件後可選擇率的乘積。在得到了SQL語句整個WHERE條件的組合可選擇率後,Oracle會用它來估算整個SQL語句返回結果集的Cardinality,估算的方法就是用目標表的總記錄數(NUM_ROWS)乘組合可選擇率。但Oracle預設認為的各列之間是獨立的、沒有關聯關係的前提條件並不總是正確的,在實際的應用中各列之間有關聯關係的情況實際上並不罕見。在這種情況下如果還用上述計算方法來計算目標SQL語句整個WHERE條件的組合可選擇率並用它來估算返回結果集的Cardinality的話,那麼估算結果可能就會與實際結果有較大的偏差,進而可能導致CBO選錯執行計劃,所以Oracle又引入了動態取樣和多列統計資訊。
下表給出了一些常見的可選擇率計算公式:
下面給出一個示例:
DROP TABLE T_ROWS_20170605_LHR;
CREATE TABLE T_ROWS_20170605_LHR AS SELECT ROWNUM ID,'NAME1' SAL FROM DUAL CONNECT BY LEVEL<=10000;
UPDATE T_ROWS_20170605_LHR T SET T.ID='' WHERE T.ID<=100;
EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'T_ROWS_20170605_LHR',CASCADE=>TRUE,METHOD_OPT=>'FOR ALL COLUMNS SIZE 1',estimate_percent => 100);
LHR@orclasm > COL LOW_VALUE FORMAT A20
LHR@orclasm > COL HIGH_VALUE FORMAT A20
LHR@orclasm > SELECT D.LOW_VALUE,D.HIGH_VALUE,UTL_RAW.CAST_TO_NUMBER(D.LOW_VALUE) LOW_VALUE2,UTL_RAW.CAST_TO_NUMBER(D.HIGH_VALUE) HIGH_VALUE2, D.NUM_DISTINCT,D.NUM_NULLS FROM DBA_TAB_COL_STATISTICS D WHERE D.TABLE_NAME = 'T_ROWS_20170605_LHR' AND D.OWNER = 'LHR' AND D.COLUMN_NAME='ID';
LOW_VALUE HIGH_VALUE LOW_VALUE2 HIGH_VALUE2 NUM_DISTINCT NUM_NULLS
-------------------- -------------------- ---------- ----------- ------------ ----------
C20202 C302 101 10000 9900 100
LHR@orclasm > SELECT MIN(T.ID),DUMP(MIN(T.ID),16) LOW_VALUE,MAX(T.ID),DUMP(MAX(T.ID),16) HIGH_VALUE FROM T_ROWS_20170605_LHR T;
MIN(T.ID) LOW_VALUE MAX(T.ID) HIGH_VALUE
---------- -------------------- ---------- --------------------
101 Typ=2 Len=3: c2,2,2 10000 Typ=2 Len=2: c3,2
下面分別執行如下4條SQL語句並獲取執行計劃:
SELECT COUNT(1) FROM T_ROWS_20170605_LHR T WHERE T.ID=1000;--1
SELECT COUNT(1) FROM T_ROWS_20170605_LHR T WHERE T.ID>1000; --9000
SELECT COUNT(1) FROM T_ROWS_20170605_LHR T WHERE T.ID>=1000; --9001
SELECT COUNT(1) FROM T_ROWS_20170605_LHR T WHERE T.ID BETWEEN 1000 AND 1100; --101
LHR@orclasm > set autot on exp
LHR@orclasm > SELECT COUNT(1) FROM T_ROWS_20170605_LHR T WHERE T.ID=1000;
COUNT(1)
----------
1
Execution Plan
----------------------------------------------------------
Plan hash value: 612708570
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 9 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | TABLE ACCESS FULL| T_ROWS_20170605_LHR | 1 | 4 | 9 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("T"."ID"=1000)
-- ROUND(NUM_ROWS*(1/NUM_DISTINCT)*((NUM_ROWS-NUM_NULLS)/NUM_ROWS))
LHR@orclasm > SELECT ROUND(10000*1/9900*((10000-100)/10000)) VALUE FROM DUAL;
VALUE
----------
1
LHR@orclasm > SELECT COUNT(1) FROM T_ROWS_20170605_LHR T WHERE T.ID>1000;
COUNT(1)
----------
9000
Execution Plan
----------------------------------------------------------
Plan hash value: 612708570
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 9 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | TABLE ACCESS FULL| T_ROWS_20170605_LHR | 9001 | 36004 | 9 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("T"."ID">1000)
--ROUND(NUM_ROWS*((HIGH_VALUE-VAL)/(HIGH_VALUE-LOW_VALUE))*((NUM_ROWS-NUM_NULLS)/NUM_ROWS))
LHR@orclasm > SELECT ROUND(10000*((10000-1000)/(10000-101))*((10000-100)/10000)) VALUE FROM DUAL;
VALUE
----------
9001
LHR@orclasm > SELECT COUNT(1) FROM T_ROWS_20170605_LHR T WHERE T.ID>=1000;
COUNT(1)
----------
9001
Execution Plan
----------------------------------------------------------
Plan hash value: 612708570
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 9 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | TABLE ACCESS FULL| T_ROWS_20170605_LHR | 9002 | 36008 | 9 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("T"."ID">=1000)
--ROUND(NUM_ROWS*((HIGH_VALUE-VAL)/(HIGH_VALUE-LOW_VALUE)+1/NUM_DISTINCT)*((NUM_ROWS-NUM_NULLS)/NUM_ROWS))
LHR@orclasm > SELECT ROUND(10000*((10000-1000)/(10000-101)+1/9900)*((10000-100)/10000)) VALUE FROM DUAL;
VALUE
----------
9002
LHR@orclasm > SELECT COUNT(1) FROM T_ROWS_20170605_LHR T WHERE T.ID BETWEEN 1000 AND 1100;
COUNT(1)
----------
101
Execution Plan
----------------------------------------------------------
Plan hash value: 612708570
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 9 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | TABLE ACCESS FULL| T_ROWS_20170605_LHR | 102 | 408 | 9 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("T"."ID"<=1100 AND "T"."ID">=1000)
LHR@orclasm >
--ROUND(NUM_ROWS*(((VAL2-VAL1)/(HIGH_VALUE-LOW_VALUE)+2/NUM_DISTINCT)*((NUM_ROWS-NUM_NULLS)/NUM_ROWS)))
LHR@orclasm > SELECT ROUND(10000*(((1100-1000)/(10000-101)+2/9900)*((10000-100)/10000))) VALUE FROM DUAL;
VALUE
----------
102
可見預估行數和用公式計算出來的結果相吻合。
下面再檢視有頻率直方圖的時候基數的計算。
DROP TABLE T_ROWS_20170605_LHR;
CREATE TABLE T_ROWS_20170605_LHR AS SELECT ROWNUM ID,'NAME1' SAL FROM DUAL CONNECT BY LEVEL<=10000;
UPDATE T_ROWS_20170605_LHR T SET T.ID='' WHERE T.ID<=100;
UPDATE T_ROWS_20170605_LHR SET ID=2 WHERE ID BETWEEN 101 AND 200;
UPDATE T_ROWS_20170605_LHR SET ID=3 WHERE ID BETWEEN 200 AND 3000;
UPDATE T_ROWS_20170605_LHR SET ID=9 WHERE ID BETWEEN 3000 AND 9999;
SELECT T.ID,COUNT(*) FROM T_ROWS_20170605_LHR T GROUP BY T.ID;
檢視資料分佈:
LHR@orclasm > SELECT T.ID,COUNT(*) FROM T_ROWS_20170605_LHR T GROUP BY T.ID;
ID COUNT(*)
---------- ----------
100
10000 1
2 100
3 2800
9 6999
收集頻率直方圖:
LHR@orclasm > EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'T_ROWS_20170605_LHR',CASCADE=>TRUE,METHOD_OPT=>'FOR COLUMNS ID SIZE 6',estimate_percent => 100);
PL/SQL procedure successfully completed.
LHR@orclasm > SELECT D.COLUMN_NAME,D.NUM_DISTINCT,D.NUM_NULLS,D.NUM_BUCKETS,D.HISTOGRAM,D.DENSITY FROM Dba_Tab_Columns D WHERE D.TABLE_NAME = 'T_ROWS_20170605_LHR' AND D.COLUMN_NAME='ID';
COLUMN_NAME NUM_DISTINCT NUM_NULLS NUM_BUCKETS HISTOGRAM DENSITY
------------------------------ ------------ ---------- ----------- --------------- ----------
ID 4 100 4 FREQUENCY .000050505
LHR@orclasm > COL COLUMN_NAME FORMAT A6
LHR@orclasm > SELECT TABLE_NAME,COLUMN_NAME,ENDPOINT_NUMBER,ENDPOINT_VALUE,NVL((ENDPOINT_NUMBER-(LAG(ENDPOINT_NUMBER) OVER (ORDER BY ENDPOINT_VALUE))),ENDPOINT_NUMBER) COUNTS FROM DBA_TAB_HISTOGRAMS WHERE TABLE_NAME='T_ROWS_20170605_LHR' AND COLUMN_NAME='ID';
TABLE_NAME COLUMN ENDPOINT_NUMBER ENDPOINT_VALUE COUNTS
------------------------------ ------ --------------- -------------- ----------
T_ROWS_20170605_LHR ID 100 2 100
T_ROWS_20170605_LHR ID 2900 3 2800
T_ROWS_20170605_LHR ID 9899 9 6999
T_ROWS_20170605_LHR ID 9900 10000 1
當目標列有頻率直方圖並且對目標列施加等值查詢條件時,如果查詢條件的輸入值等於目標列的某個Bucket的ENDPOINT_VALUE,那麼cardinality=Current_ENDPOINT_NUMBER-Previous_ENDPOINT_NUMBER:
LHR@orclasm > SET AUTOT ON
LHR@orclasm > SELECT COUNT(1) FROM T_ROWS_20170605_LHR T WHERE T.ID=3;
COUNT(1)
----------
2800
Execution Plan
----------------------------------------------------------
Plan hash value: 612708570
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 9 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 3 | | |
|* 2 | TABLE ACCESS FULL| T_ROWS_20170605_LHR | 2800 | 8400 | 9 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("T"."ID"=3)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
23 consistent gets
0 physical reads
0 redo size
526 bytes sent via SQL*Net to client
519 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
可見,預估行數為2800,和直方圖中儲存的值吻合(2900-100)。
當目標列有頻率直方圖並且對目標列施加等值查詢條件時,如果查詢條件的輸入值不等於目標列的任意一個Bucket的ENDPOINT_VALUE,那麼cardinality=MIN(Current_ENDPOINT_NUMBER-Previous_ENDPOINT_NUMBER)/2:
LHR@orclasm > SELECT COUNT(1) FROM T_ROWS_20170605_LHR T WHERE T.ID=4;
COUNT(1)
----------
0
Execution Plan
----------------------------------------------------------
Plan hash value: 612708570
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 9 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 3 | | |
|* 2 | TABLE ACCESS FULL| T_ROWS_20170605_LHR | 1 | 3 | 9 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("T"."ID"=4)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
23 consistent gets
0 physical reads
0 redo size
525 bytes sent via SQL*Net to client
519 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
LHR@orclasm > select round(1/2) from dual;
ROUND(1/2)
----------
1
在直方圖中,由於MIN(Current_ENDPOINT_NUMBER-Previous_ENDPOINT_NUMBER)=1,所以,ROUND(1/2)=1,和執行計劃中的預估行數相吻合。
CBO優化器是基於對當前經過特定測試的資料集中預期的行比率估計來計算基數的。此處的行數之比是一個數值,稱為選擇率(selectivity)。得到選擇率之後,將其與輸入行數進行簡單相乘既可得到基數。
在理解選擇性之前,必須得對user_tab_col_statistics檢視有一定了解:
- SQL> desc user_tab_col_statistics
- 名稱 是否為空? 型別
- ----------------------------------------- -------- ----------------------------
- TABLE_NAME VARCHAR2(30) 表名
- COLUMN_NAME VARCHAR2(30) 列名
- NUM_DISTINCT NUMBER 列中distinct值的數目
- LOW_VALUE RAW(32) 列的最小值
- HIGH_VALUE RAW(32) 列的最大值
- DENSITY NUMBER 當對列建立了直方圖,則值不再等於1/NUM_DISTINCT。
- NUM_NULLS NUMBER 列中的NULL值數目。
- NUM_BUCKETS NUMBER Number of buckets in histogram for the column
- LAST_ANALYZED DATE 最近分析時間。
- SAMPLE_SIZE NUMBER 分析樣本大小。
- GLOBAL_STATS VARCHAR2(3) 對分割槽取樣,則-NO,否則-YES。
- USER_STATS VARCHAR2(3) 統計資訊由使用者匯入,則YES,否則-NO。
- AVG_COL_LEN NUMBER 列的平均長度(以位元組為單位)
- HISTOGRAM VARCHAR2(15) Indicates existence/type of histogram: NONE、FREQUENCY、HEIGHT BALANCED
下面建立一張測試表,並收集統計資訊:
- SQL> create table audience as
- 2 select
- 3 trunc(dbms_random.value(1,13)) month_no
- 4 from
- 5 all_objects
- 6 where
- 7 rownum <= 1200
- 8 ;
- 表已建立。
- SQL> begin
- 2 dbms_stats.gather_table_stats(
- 3 user,
- 4 'audience',
- 5 cascade => true,
- 6 estimate_percent => null,
- 7 );ethod_opt => 'for all columns size 1'
- method_opt => 'for all columns size 1'
- 8 );
- 9 end;
- 10 /
- PL/SQL 過程已成功完成。
先檢視一下上面表和列的統計資訊:
- SQL> select t.TABLE_NAME, t.NUM_ROWS, t.BLOCKS, t.SAMPLE_SIZE
- 2 from user_tables t;
- TABLE_NAME NUM_ROWS BLOCKS SAMPLE_SIZE
- ---------- ---------- ---------- -----------
- AUDIENCE 1200 5 1200
- SQL> select s.table_name,
- s.column_name,
- s.num_distinct,
- 4 s.low_value,
- s.high_value,
- s.density,
- 7 s.num_nulls,
- 8 s.sample_size,
- 9 s.avg_col_len
- 10 from user_tab_col_statistics s;
- TABLE_NAME COLUMN_NAM NUM_DISTINCT LOW_VALUE HIGH_VALUE DENSITY NUM_NULLS SAMPLE_SIZE AVG_COL_LEN
- ---------- ---------- ------------ ---------- ---------- ---------- ---------- ----------- -----------
- AUDIENCE MONTH_NO 12 C102 C10D .083333333 0 1200 3
- SQL> select rawtohex(1), rawtohex(12) from dual;
- RAWT RAWT
- ---- ----
- C102 C10D
- SQL> select dump(1,16),dump(12,16) from dual;
- DUMP(1,16) DUMP(12,16)
- ----------------- -----------------
- Typ=2 Len=2: c1,2 Typ=2 Len=2: c1,d
- SQL> select utl_raw.cast_to_number('c102'),utl_raw.cast_to_number('c10d') from dual;
- UTL_RAW.CAST_TO_NUMBER('C102') UTL_RAW.CAST_TO_NUMBER('C10D')
- ------------------------------ ------------------------------
- 1 12 --可以看見上面的LOW_VALUE和HIGH_VALUE的值分別就是1和12.
- SQL> select count(a.month_no) from AUDIENCE a;
- COUNT(A.MONTH_NO)
- -----------------
- 1200 --可以看見,這裡的值和NUM_ROWS是一樣的。
- SQL> select count(distinct a.month_no) from AUDIENCE a;
- COUNT(DISTINCTA.MONTH_NO)
- -------------------------
- 12 --可以看見,這裡的值也和NUM_DISTINCT的值是一樣的。
- SQL> select 1/12 from dual;
- 1/12
- ----------
- .083333333 --這裡的值和DENSITY一樣的,計算公式為1/NUM_DISTINCT。
1、假如在上面建立了一張表,裡面包含1200個人,如何才能確定其中有多少人的生日是在12月份。
- SQL> select count(*) from AUDIENCE where month_no=12;
- 執行計劃
- ----------------------------------------------------------
- Plan hash value: 3337892515
- -------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- -------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 |
- | 1 | SORT AGGREGATE | | 1 | 3 | | |
- |* 2 | TABLE ACCESS FULL| AUDIENCE | 100 | 300 | 3 (0)| 00:00:01 |
- -------------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 2 - filter("MONTH_NO"=12)
可以看見CBO計算出1200裡面,12月份生日的人是100人(在ID=2行的rows)。這和我們通常所理解的是一樣的,我們知道月份只有12個,在1200人中在某一個月出生的人,算概率也是100人(CBO也是這樣做得)。
計算方法為:DENSITY * NUM_ROWS = 1 / 12 * 1200 = 100。
2、現在假設有10%的人不記得自己的生日了,那麼CBO會怎麼計算吶?
- SQL> drop table audience purge;
- 表已刪除。
- SQL> create table audience as
- 2 select
- 3 rownum id,
- 4 trunc(dbms_random.value(1,13)) month_no
- 5 from
- 6 all_objects
- 7 where
- 8 rownum <= 1200;
- 表已建立。
- SQL> update
- 2 audience
- 3 set month_no = null
- 4 where mod(id,10) = 0; --10%的使用者不記得自己的生日。
- 已更新120行。
- SQL> commit;
- 提交完成。
- SQL> begin
- 2 dbms_stats.gather_table_stats(
- 3 user,
- 4 'audience',
- 5 cascade => true,
- 6 estimate_percent => null,
- 7 method_opt => 'for all columns size 1'
- 8 );
- 9 end;
- 10 /
- PL/SQL 過程已成功完成。
- SQL> select t.TABLE_NAME, t.NUM_ROWS, t.BLOCKS, t.SAMPLE_SIZE from user_tables t;
- TABLE_NAME NUM_ROWS BLOCKS SAMPLE_SIZE
- ---------- ---------- ---------- -----------
- AUDIENCE 1200 5 1200
- SQL> select s.table_name,
- 2 s.column_name,
- 3 s.num_distinct,
- 4 s.low_value,
- 5 s.high_value,
- 6 s.density,
- 7 s.num_nulls,
- 8 s.sample_size,
- 9 s.avg_col_len
- 10 from user_tab_col_statistics s;
- TABLE_NAME COLUMN_NAM NUM_DISTINCT LOW_VALUE HIGH_VALUE DENSITY NUM_NULLS SAMPLE_SIZE AVG_COL_LEN
- ---------- ---------- ------------ ---------- ---------- ---------- ---------- ----------- -----------
- AUDIENCE MONTH_NO 12 C102 C10D .083333333 120 1080 3 --這裡可以看見,NUM_NULLS的值確實為120。
- AUDIENCE ID 1200 C102 C20D .000833333 0 1200 4
- SQL> select count(*) from AUDIENCE where month_no=12;
- 執行計劃
- ----------------------------------------------------------
- Plan hash value: 3337892515
- -------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- -------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 |
- | 1 | SORT AGGREGATE | | 1 | 3 | | |
- |* 2 | TABLE ACCESS FULL| AUDIENCE | 90 | 270 | 3 (0)| 00:00:01 |
- -------------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 2 - filter("MONTH_NO"=12)
調整後的選擇率:DENSITY * ((NUM_ROWS-NUM_NULLS)/NUM_ROWS) = 1 / 12 * ((1200 - 120) / 1200) = 0.075。
返回的記錄數(ROWS):調整後的選擇率 * NUM_ROWS = 0.075 * 1200 = 90行。
或者我們可以換一種方法思考,通過前面可以很容易的知道12分月有100人生日(其中這裡就包含了不記得生日的人)。然後1200人中有10%的人不記得自己的生日,也就是120,那麼12月份不記得自己生日的人就平攤到10個人,100-10=90。
3、現在假如我們想知道在6、7、8月份生日的人有多少吶?
- SQL> select count(*) from AUDIENCE where month_no in(6,7,8);
- 執行計劃
- ----------------------------------------------------------
- Plan hash value: 3337892515
- -------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- -------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 |
- | 1 | SORT AGGREGATE | | 1 | 3 | | |
- |* 2 | TABLE ACCESS FULL| AUDIENCE | 270 | 810 | 3 (0)| 00:00:01 |
- -------------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 2 - filter("MONTH_NO"=6 OR "MONTH_NO"=7 OR "MONTH_NO"=8)
6、7、8月份的選擇率:6月份選擇率 + 7月份選擇率 + 8月份選擇率 = 0.075 * 3 = 0.225
返回的記錄數(ROWS):6、7、8月份的選擇率 * NUM_ROWS = 0.225 * 1200 = 270行。
4、下面來一個更復雜一點的,我們想知道不在6、7、8月份生日的人有多少吶?
- SQL> select count(*) from AUDIENCE where month_no not in(6,7,8);
- 執行計劃
- ----------------------------------------------------------
- Plan hash value: 3337892515
- -------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- -------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 |
- | 1 | SORT AGGREGATE | | 1 | 3 | | |
- |* 2 | TABLE ACCESS FULL| AUDIENCE | 674 | 2022 | 3 (0)| 00:00:01 |
- -------------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 2 - filter("MONTH_NO"<>6 AND "MONTH_NO"<>7 AND "MONTH_NO"<>8)
選擇率:1 - 6、7、8月份的選擇率 = 1 - 0.075 * 3
返回記錄數:(1-0.075*3)*1200 = 930。
month_no in{specific list} 的基數 + month_no not in{specific list} 的基數 = NUM_ROWS,這裡計算出來是相等的,但是在資料庫中看見的卻不想等,需要注意!
5、現在我們求8月份以後出生的人,不包含8月份。
- SQL> select count(*) from AUDIENCE where month_no>8;
- 執行計劃
- ----------------------------------------------------------
- Plan hash value: 3337892515
- -------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- -------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 |
- | 1 | SORT AGGREGATE | | 1 | 3 | | |
- |* 2 | TABLE ACCESS FULL| AUDIENCE | 393 | 1179 | 3 (0)| 00:00:01 |
- -------------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 2 - filter("MONTH_NO">8)
選擇率:((HIGH_VALUE - LIMIT) / (HIGH_VALUE - LOW_VALUE)) * ((NUM_ROWS - NUM_NULLS) / NUM_ROWS)
返回的記錄數:選擇率 * NUM_ROWS = ((HIGH_VALUE - LIMIT) / (HIGH_VALUE - LOW_VALUE)) * ((NUM_ROWS - NUM_NULLS) / NUM_ROWS) * NUM_ROWS = round(((12-8)/(12-1))*((1200-120)/1200)*1200) = 393。
如果是求8月份以後出生的人,包含8月份。
- SQL> select count(*) from AUDIENCE where month_no>=8;
- 執行計劃
- ----------------------------------------------------------
- Plan hash value: 3337892515
- -------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- -------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 |
- | 1 | SORT AGGREGATE | | 1 | 3 | | |
- |* 2 | TABLE ACCESS FULL| AUDIENCE | 483 | 1449 | 3 (0)| 00:00:01 |
- -------------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 2 - filter("MONTH_NO">=8)
選擇率:((HIGH_VALUE - LIMIT) / (HIGH_VALUE - LOW_VALUE) + 1 / DENSITY) * ((NUM_ROWS - NUM_NULLS) / NUM_ROWS)
返回記錄數:選擇率 * NUM_ROWS = ((HIGH_VALUE - LIMIT) / (HIGH_VALUE - LOW_VALUE) + 1 / DENSITY) * ((NUM_ROWS - NUM_NULLS) / NUM_ROWS) * NUM_ROWS = round(((12-8)/(12-1)+1/12)*((1200-120)/1200)*1200) = 483。
如果是<8,選擇率:((LIMIT - LOW_VALUE) / (HIGH_VALUE - LOW_VALUE)) * ((NUM_ROWS - NUM_NULLS) / NUM_ROWS)
如果是<=8,選擇率:((LIMIT - LOW_VALUE) / (HIGH_VALUE - LOW_VALUE) + 1 / DENSITY) * ((NUM_ROWS - NUM_NULLS) / NUM_ROWS)
6、現在我們想知道6月份到8月份出生的人的數量?
- SQL> select count(*) from AUDIENCE where month_no>=6 and month_no<=8;
- 執行計劃
- ----------------------------------------------------------
- Plan hash value: 3337892515
- -------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- -------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 |
- | 1 | SORT AGGREGATE | | 1 | 3 | | |
- |* 2 | TABLE ACCESS FULL| AUDIENCE | 376 | 1128 | 3 (0)| 00:00:01 |
- -------------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 2 - filter("MONTH_NO">=6 AND "MONTH_NO"<=8)
選擇率:((HIGH_LIMIT - LOW_LIMIT) / (HIGH_VALUE - LOW_VALUE) + 1 / DENSITY + 1 / DENSITY) * ((NUM_ROWS - NUM_NULLS) / NUM_ROWS)
返回記錄數:round(((8-6)/(12-1)+1/12+1/12)*((1200-120)/1200)*1200) = 376。
7、下面看兩個謂詞的情況下,CBO是怎麼計算選擇率的。
- SQL> drop table audience purge;
- 表已刪除。
- SQL> create table audience as
- 2 select
- 3 rownum id,
- 4 trunc(dbms_random.value(1,13))month_no,
- 5 trunc(dbms_random.value(1,16))eu_country
- 6 from
- 7 all_objects
- 8 where
- 9 rownum <= 1200;
- 表已建立。
- SQL> begin
- 2 dbms_stats.gather_table_stats(
- 3 user,
- 4 'audience',
- 5 cascade => true,
- 6 estimate_percent => null,
- 7 method_opt => 'for all columns size 1'
- 8 );
- 9 end;
- 10 /
- PL/SQL 過程已成功完成。
- SQL> select t.TABLE_NAME, t.NUM_ROWS, t.BLOCKS, t.SAMPLE_SIZE from user_tables t;
- TABLE_NAME NUM_ROWS BLOCKS SAMPLE_SIZE
- ---------- ---------- ---------- -----------
- AUDIENCE 1200 6 1200
- SQL> select s.table_name,
- 2 s.column_name,
- 3 s.num_distinct,
- 4 s.low_value,
- 5 s.high_value,
- 6 s.density,
- 7 s.num_nulls,
- 8 s.sample_size,
- 9 s.avg_col_len
- 10 from user_tab_col_statistics s;
- TABLE_NAME COLUMN_NAM NUM_DISTINCT LOW_VALUE HIGH_VALUE DENSITY NUM_NULLS SAMPLE_SIZE AVG_COL_LEN
- ---------- ---------- ------------ ---------- ---------- ---------- ---------- ----------- -----------
- AUDIENCE EU_COUNTRY 15 C102 C110 .066666667 0 1200 3
- AUDIENCE MONTH_NO 12 C102 C10D .083333333 0 1200 3
- AUDIENCE ID 1200 C102 C20D .000833333 0 1200 4
- SQL> select count(*) from audience where month_no=12 and eu_country=8;
- 執行計劃
- ----------------------------------------------------------
- Plan hash value: 3337892515
- -------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- -------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 1 | 6 | 3 (0)| 00:00:01 |
- | 1 | SORT AGGREGATE | | 1 | 6 | | |
- |* 2 | TABLE ACCESS FULL| AUDIENCE | 7 | 42 | 3 (0)| 00:00:01 |
- -------------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 2 - filter("EU_COUNTRY"=8 AND "MONTH_NO"=12)
選擇率:month_no選擇率 * eu_contry選擇率 = 1/12 * 1/15
返回記錄:round(1/12*1/15*1200) = 7。
- SQL> select count(*) from audience where month_no=12 or eu_country=8;
- 執行計劃
- ----------------------------------------------------------
- Plan hash value: 3337892515
- -------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- -------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 1 | 6 | 3 (0)| 00:00:01 |
- | 1 | SORT AGGREGATE | | 1 | 6 | | |
- |* 2 | TABLE ACCESS FULL| AUDIENCE | 173 | 1038 | 3 (0)| 00:00:01 |
- -------------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 2 - filter("MONTH_NO"=12 OR "EU_COUNTRY"=8)
選擇率:month_no選擇率 + eu_contry選擇率 - month_no選擇率 * eu_contry選擇率 = 1/12+1/15-1/12*1/15
返回記錄:round((1/12+1/15-1/12*1/15)*1200) = 173。
- SQL> select count(*) from audience where month_no<>12;
- 執行計劃
- ----------------------------------------------------------
- Plan hash value: 3337892515
- -------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- -------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 |
- | 1 | SORT AGGREGATE | | 1 | 3 | | |
- |* 2 | TABLE ACCESS FULL| AUDIENCE | 1100 | 3300 | 3 (0)| 00:00:01 |
- -------------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 2 - filter("MONTH_NO"<>12)
選擇率:1- month_no選擇率 = 1- 1/12
返回記錄:(1-1/12)*1200 = 1100。
8、總結:
- 單個謂詞過濾:
- = 基數計算公式 :1/num_distinct*(num_rows-num_nulls),如果有直方圖,基數計算公式=(num_rows-num_nulls)*density
- > 基數計算公式:(high_value-limit)/(high_value-low_value)*(num_rows-num_nulls)
- >= 基數計算公式:((high_value-limit)/(high_value-low_value)+1/num_distinct)*(num_rows-num_nulls) 因為有=,所以要加上=的選擇率,=的選擇率為1/num_distinct
- < 基數計算公式:(limit-low_value)/(high_value-low_value)*(num_rows-num_nulls)
- <= 基數計算公式:((limit-low_value)/(high_value-low_value)+1/num_distinct)*(num_rows-num_nulls)
- between ... and ... 的基數計算公式等價於 xxx<= high_limit ,xxxx>=low_limit
- 基數計算公式:((high_limit-low_limit)/(high_value-low_value)+2/num_distinct)*(num_rows-num_nulls)
- low_limit<xxx and</xxx xxx<high_limit 基數計算公式:(high_limit-low_limit) (high_value-low_value)*(num_rows-num_nulls)
- low_limit<=xxx and xxx<high_limit 基數計算公式:(high_limit-low_limit) (high_value-low_value)+1="" num_distinct)*(num_rows-num_nulls)
- 雙謂詞,多謂詞:
- A AND B 選擇率計算公式=A選擇率*B選擇率
- A OR B 選擇率計算公式=A+B-(A AND B)
- NOT A 選擇率計算公式=1-A選擇率
oracle包含的2種代價模型:
oracle通過隱含引數“_optimizer_cost_model"控制使用哪種代價模型
1io代價模型
cost=iocost
2cpu代價模型
cost=iocost+cpucost
在代價轉換的過程中,所有的代價都轉換為單資料塊讀的單次代價
存在四種型別的io,單塊讀,多塊讀,直接資料塊讀和直接資料塊寫
iocost=上述四種io的消耗
選擇率與密度,空值,唯一值,以及柱狀圖資料有關。過濾條件的選擇率
a and b a的選擇率*b的選擇率
a or b a的選擇率+b的選擇率-a*b
not a 1-a的選擇率
在未引入系統統計資訊之前,oracle會假設下面的2中情況是成立的:
1 單塊讀的平均消耗時間和多塊讀的平均消耗時間相等
2全表掃的時候,根據下面的情況計算全表掃描的成本
mbdivisor=1.675*power(db_file_multiblock_read_count,0.6581)
tsc=blocks/mbdivisor
10.2中的mbdivisor大約是4.4,11.2中的改值大約是3.6
cpu cost model在開啟的情況下,全表掃描的計算公式如下:
tsc cost=i/o cost+cpu cost
i/o cost =1+ceil(blocks/mbrc)*(mreadtim/sreadtim)
cpu cost=round(#cpucycles/cpuspeed/1000/sreadtim)
系統的統計資訊放在了aux_stats$中
檢視統計資訊收集的歷史
wri$_optstat_tab_history
執行計劃中的記錄數是資料記錄數*選擇率,執行時間的估算,總的單資料塊讀的io代價*單次資料塊讀時間
全表掃描的大概估計:
索引範圍掃描:分支節點讀取的資料塊數和葉子節點資料塊數決定。分支節點資料塊的讀取數等於索引數的分支節點層數。葉子節點的計算是葉子節點數*條件選擇率
所以索引的代價大概是層數+葉結點數*選擇率*optimizer_index_cost_adj
索引rowid訪問表的計算:這邊涉及到一個聚簇因子,它是代表了資料塊的聚集性。
iocost=索引樹高度+葉子資料塊數量*access的選擇率+聚簇因子*filter的選擇率*optica/100
索引範圍掃描的計算公式:
i/o cost= index access i/o cost + table access i/o cost
index access i/o cost= blevel+ceil(#leaf_block*ix_sel)
table acess i/o cost=ceil(clustering_factor*ix_sel_with_filters)
列上沒有直方圖,且沒有null值得情況下,列的選擇率=(1/num_distinct)
列上沒有直方圖,且有null值的情況下,列的選擇率=(1/num_distinct)*((num_rows-num_nulls)/num_rows)
列上的low_value和high_value是用來評估列做範圍查詢時的可選擇率
目標列大於指定值val,且val處於low_value與high_value之間
selectivity=((high_value-val)/(high_value-low_value)*null_adjust;
null_adjust=(num_rows-num_nulls)/num_rows
目標列小於指定val,且val處於low和high之間
selectivity=(val-low_value)/(high_value-low_value)*null_adjust;
null_adjust=(num_rows-num_nulls)/num_rows
目標列大於或等於指定值val,且val處於low和high之間
selectivity=((high-val)/(high-low)+1/num_distinct)*null_adjust
目標列小於或等於指定值val,且val處於low和high之間
selectivity=((val-low)/high-low)+1/num_distinct)*null_adjust
目標列在指定值val1和val2之間,且val1和val2都在low和high之間
selectivity=((val2-val1)/high-low)+2/num_distinct)*null_adjust
對於使用了繫結變數的sql,啟動繫結變數窺探,則對於>的範圍查詢
Selectivity = (high_value - limit) / (high_value - low_value) limit是繫結變數的值
Cardinality = rows*selectivity
對於使用了繫結變數的sql,如果沒有啟用繫結變數窺探,則對於>的範圍查詢
selectivity=0.05
沒有啟用繫結變數窺探,對於between and的查詢
selectivity=0.0025
--測試選擇率
SQL> select count(*) from tt where object_id is null;
COUNT(*)
----------
99
SQL> select count(*) from tt;
COUNT(*)
----------
49859
exec dbms_stats.gather_table_stats(ownname => 'baixyu',tabname => 'TT',method_opt => 'for all columns size 1');
SQL> select count(*) from tt where object_id is null;
COUNT(*)
----------
99
SQL> select count(*) from tt;
COUNT(*)
----------
49859
select num_distinct,low_value,high_value,num_nulls from user_tab_col_statistics s where s.table_name='TT' and s.column_name='OBJECT_ID';
49760 C2020BC3061C2299
SQL> select * from tt where object_id=120;
執行計劃
----------------------------------------------------------
Plan hash value: 264906180
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 93 | 159 (2)| 00:00:02 |
|* 1 | TABLE ACCESS FULL| TT | 1 | 93 | 159 (2)| 00:00:02 |
--------------------------------------------------------------------------
SQL> select 1/49760*((49859-99)/49859)*49859 from dual;
1/49760*((49859-99)/49859)*49859
--------------------------------
1
SQL> select * from tt where object_id>120 and object_id<200;
已選擇78行。
執行計劃
----------------------------------------------------------
Plan hash value: 264906180
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 76 | 7068 | 159 (2)| 00:00:02 |
|* 1 | TABLE ACCESS FULL| TT | 76 | 7068 | 159 (2)| 00:00:02 |
--------------------------------------------------------------------------
select (((200-120)/(52744-101)+2/49760)*49760/49859)*49859 from dual;
(((200-120)/(52744-101)+2/49760)*49760/49859)*49859
---------------------------------------------------
77.6187907
在列上沒有直方圖的情況下,統計資訊中的density可以認為是1/num_distinct,在列上有等頻直方圖時,density的計算公式為
density=1/(2*num_rows*null_adjust)
null_adjust-(num_rows-num_nulls)/num_rows
如果查詢條件的輸入值等於目標列的某個bucket的endpoint_value那麼 列的cardinality的計算公式是
cardinality=num_rows*selectivity
selectivity=bucketsize/num_rows
bucketsize=current_endpoint_number-previous_endpoint_number
如果查詢條件的輸入值不等於目標列的任意一個bucket的endpoint_value,計算公式如下
cardinality=num_rows*selectivity
selectivity=min(bucketsize)/(2*num_rows)
bucketsize=current_endpoint_number-previous_endpoint_number;
---怎麼不直接設定成0?
當列上的直方圖資訊是等高直方圖的時候,等值查詢的估算是
1如果查詢條件輸入值是popular value
cardinality=num_rows*selectivity
selectivity=(buckets_this_popular_value/buckets_totoal)*null_adjust
null_adjust=(num_rows-num_nulls)/num_rows
2如果是unpopular value
分成三種情況。。。。
1、前導零
建立一個包含2000000行資料的表,它有一個id列,採用序號並利用0來進行填充其他位。因此典型的儲存值將是A00000000000000001,系統中絕大部分使用這種策略的查詢,類似於where id={string constant}的形式;但是,如果它們使用了基於區間的謂詞,可能將出現一些奇怪的效能問題。
- SQL> create table t1
- 2 nologging
- 3 pctfree 0
- 4 as
- 5 with generator as (
- 6 select
- 7 rownum id
- 8 from all_objects
- 9 where rownum <= 2000
- 10 )
- 11 select
- 12 /*+ ordered use_nl(v2) */
- 13 trunc((rownum-1)/10000) grp_id,
- 14 'A' || lpad(rownum, 17, '0') id
- 15 from
- 16 generator v1,
- 17 generator v2
- 18 where
- 19 rownum <= 2000000
- 20 ;
- 表已建立。
- SQL> begin
- 2 dbms_stats.gather_table_stats(
- 3 ownname=> user,
- 4 tabname=> 't1',
- 5 cascade=> true,
- 6 estimate_percent=> null,
- 7 method_opt=>'for all columns size 1'
- 8 );
- 9 end;
- 10 /
- PL/SQL 過程已成功完成。
- SQL> set autotrace traceonly;
- SQL> select * from t1 where id between 'A00000000000060000' and 'A00000000000070000';
- 已選擇10001行。
- 執行計劃
- ----------------------------------------------------------
- Plan hash value: 3617692013
- --------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- --------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 17 | 374 | 1190 (3)| 00:00:15 |
- |* 1 | TABLE ACCESS FULL| T1 | 17 | 374 | 1190 (3)| 00:00:15 |
- --------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 1 - filter("ID"<='A00000000000070000' AND "ID">='A00000000000060000')
- 統計資訊
- ----------------------------------------------------------
- 1 recursive calls
- 0 db block gets
- 7520 consistent gets
- 6849 physical reads
- 0 redo size
- 325111 bytes sent via SQL*Net to client
- 7741 bytes received via SQL*Net from client
- 668 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 10001 rows processed
針對該查詢得到的基數是非常低的(17),但是很明顯卻返回了10001行。下面重新建立直方圖,直方圖的預設值為75個桶。
- SQL> set autotrace off;
- SQL> begin
- 2 dbms_stats.gather_table_stats(
- 3 ownname=> user,
- 4 tabname=> 't1',
- 5 cascade=> true,
- 6 estimate_percent=> null,
- 7 method_opt=>'for all columns size 75'
- 8 );
- 9 end;
- 10 /
- PL/SQL 過程已成功完成。
- SQL> set autotrace traceonly;
- SQL> select * from t1 where id between 'A00000000000060000' and 'A00000000000070000';
- 已選擇10001行。
- 執行計劃
- ----------------------------------------------------------
- Plan hash value: 3617692013
- --------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- --------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 8924 | 191K| 1190 (3)| 00:00:15 |
- |* 1 | TABLE ACCESS FULL| T1 | 8924 | 191K| 1190 (3)| 00:00:15 |
- --------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 1 - filter("ID"<='A00000000000070000' AND "ID">='A00000000000060000')
- 統計資訊
- ----------------------------------------------------------
- 0 recursive calls
- 0 db block gets
- 7520 consistent gets
- 6849 physical reads
- 0 redo size
- 325111 bytes sent via SQL*Net to client
- 7741 bytes received via SQL*Net from client
- 668 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 10001 rows processed
直方圖的預設值為75個桶,執行計劃顯示估計的基數為8924——這個結果至少大體上是正確的。建立直方圖是一種方法,如果id列的值沒有字元,是可以轉換為數字的字串(00000000000000001),那麼在id列上面建立一個函式索引也是一種辦法:create index t1_i1 on t1(grp_id, to_number(id));
2、致命的預設值
即時是資料庫應用程式正確的使用了日期型別,也仍然需要避免null值。為了不讓任何列為null,每個可空的列都有一個預設值。因此,大部分獨立於資料庫的開發人員會選擇一個什麼樣的值來表示null日期呢?如果表示很久以後的日期呢?比如4000年12月31日。
- SQL> create table t1
- 2 as
- 3 with generator as (
- 4 select
- 5 rownum id
- 6 from all_objects
- 7 where rownum <= 2000
- 8 )
- 9 select
- 10 /*+ ordered use_nl(v2) */
- 11 decode(
- 12 mod(rownum - 1,1000),
- 13 0,to_date('4000-12-31','yyyy-mm-dd'),
- 14 to_date('2000-01-01','yyyy-mm-dd') + trunc((rownum - 1)/100)
- 15 ) date_closed
- 16 from
- 17 generator v1,
- 18 generator v2
- 19 where rownum <= 1827 * 100;
- 表已建立。
- SQL> begin
- 2 dbms_stats.gather_table_stats(
- 3 ownname=> user,
- 4 tabname=> 't1',
- 5 cascade=> true,
- 6 estimate_percent=> null,
- 7 method_opt=>'for all columns size 1' --直方圖的預設值為1桶。
- 8 );
- 9 end;
- 10 /
- PL/SQL 過程已成功完成。
- SQL> set autotrace traceonly;
- SQL> select *
- 2 from t1
- 3 where date_closed between to_date('2003-01-01', 'yyyy-mm-dd') and
- 4 to_date('2003-12-31', 'yyyy-mm-dd');
- 已選擇36463行。
- 執行計劃
- ----------------------------------------------------------
- Plan hash value: 3617692013
- --------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- --------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 291 | 2328 | 61 (4)| 00:00:01 |
- |* 1 | TABLE ACCESS FULL| T1 | 291 | 2328 | 61 (4)| 00:00:01 |
- --------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 1 - filter("DATE_CLOSED"<=TO_DATE(' 2003-12-31 00:00:00',
- 'syyyy-mm-dd hh24:mi:ss') AND "DATE_CLOSED">=TO_DATE(' 2003-01-01
- 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
- 統計資訊
- ----------------------------------------------------------
- 1 recursive calls
- 0 db block gets
- 2759 consistent gets
- 0 physical reads
- 0 redo size
- 494301 bytes sent via SQL*Net to client
- 27145 bytes received via SQL*Net from client
- 2432 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 36463 rows processed --這裡實際放回了36463行記錄,但是oracle卻計算錯誤了(291)。
- SQL> set autotrace off;
- SQL> begin
- 2 dbms_stats.gather_table_stats(
- 3 ownname=> user,
- 4 tabname=> 't1',
- 5 cascade=> true,
- 6 estimate_percent=> null,
- 7 method_opt=>'for all columns size 11' --直方圖的預設值為11桶。
- 8 );
- 9 end;
- 10 /
- PL/SQL 過程已成功完成。
- SQL> set autotrace traceonly;
- SQL> select *
- 2 from t1
- 3 where date_closed between to_date('2003-01-01', 'yyyy-mm-dd') and
- 4 to_date('2003-12-31', 'yyyy-mm-dd');
- 已選擇36463行。
- 執行計劃
- ----------------------------------------------------------
- Plan hash value: 3617692013
- --------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- --------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 36320 | 283K| 61 (4)| 00:00:01 |
- |* 1 | TABLE ACCESS FULL| T1 | 36320 | 283K| 61 (4)| 00:00:01 |
- --------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 1 - filter("DATE_CLOSED">=TO_DATE(' 2003-01-01 00:00:00',
- 'syyyy-mm-dd hh24:mi:ss') AND "DATE_CLOSED"<=TO_DATE(' 2003-12-31
- 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
- 統計資訊
- ----------------------------------------------------------
- 0 recursive calls
- 0 db block gets
- 2759 consistent gets
- 0 physical reads
- 0 redo size
- 494301 bytes sent via SQL*Net to client
- 27145 bytes received via SQL*Net from client
- 2432 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 36463 rows processed --36463:36320這次oracle計算得比較準確了。
可以看見加大了直方圖的桶數之後,CBO估算的行數就比較接近真實值了,那我們再加大直方圖的桶數試一下吶!
- SQL> set autotrace off;
- SQL> begin
- 2 dbms_stats.gather_table_stats(
- 3 ownname=> user,
- 4 tabname=> 't1',
- 5 cascade=> true,
- 6 estimate_percent=> null,
- 7 method_opt=>'for all columns size 75'
- 8 );
- 9 end;
- 10 /
- PL/SQL 過程已成功完成。
- SQL> set autotrace trace;
- SQL> select *
- 2 from t1
- 3 where date_closed between to_date('2003-01-01', 'yyyy-mm-dd') and
- 4 to_date('2003-12-31', 'yyyy-mm-dd');
- 已選擇36463行。
- 執行計劃
- ----------------------------------------------------------
- Plan hash value: 3617692013
- --------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- --------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 36345 | 283K| 61 (4)| 00:00:01 |
- |* 1 | TABLE ACCESS FULL| T1 | 36345 | 283K| 61 (4)| 00:00:01 |
- --------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 1 - filter("DATE_CLOSED">=TO_DATE(' 2003-01-01 00:00:00',
- 'syyyy-mm-dd hh24:mi:ss') AND "DATE_CLOSED"<=TO_DATE(' 2003-12-31
- 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
- 統計資訊
- ----------------------------------------------------------
- 0 recursive calls
- 0 db block gets
- 2759 consistent gets
- 0 physical reads
- 0 redo size
- 494301 bytes sent via SQL*Net to client
- 27145 bytes received via SQL*Net from client
- 2432 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 36463 rows processed --36463:36345 加大了直方圖的桶數之後,CBO估算返回的行數誤差更小了。
3、離散資料的風險
考慮一個包含period列的計數系統——儲存1~12月的資料,並額外再加一個月,其對應的數值為99(同時包含了第二種選擇,即這個特殊的月給定的值為13)。
- SQL> create table t1
- 2 as
- 3 with generator as (
- 4 select
- 5 rownum id
- 6 from all_objects
- 7 where rownum <= 1000
- 8 )
- 9 select
- 10 /*+ ordered use_nl(v2) */
- 11 mod(rownum-1,13) period_01,
- 12 mod(rownum-1,13) period_02
- 13 from
- 14 generator v1,
- 15 generator v2
- 16 where
- 17 rownum <= 13000
- 18 ;
- 表已建立。
- SQL> update t1 set
- 2 period_01 = 99,
- 3 period_02 = 13
- 4 where
- 5 period_01 = 0;
- 已更新1000行。
- SQL> commit;
- 提交完成。
- SQL> begin
- 2 dbms_stats.gather_table_stats(
- 3 user,
- 4 't1',
- 5 cascade => true,
- 6 estimate_percent => null,
- 7 method_opt => 'for all columns size 1'
- 8 );
- 9 end;
- 10 /
- PL/SQL 過程已成功完成。
- SQL> set autotrace on exp;
- SQL> select count(*) from t1 where period_01 between 4 and 6;
- COUNT(*)
- ----------
- 3000 --這裡實際有3000行,但是oracle估算的是1663行,不準確!
- 執行計劃
- ----------------------------------------------------------
- Plan hash value: 3724264953
- ---------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- ---------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 1 | 3 | 6 (0)| 00:00:01 |
- | 1 | SORT AGGREGATE | | 1 | 3 | | |
- |* 2 | TABLE ACCESS FULL| T1 | 1663 | 4989 | 6 (0)| 00:00:01 |
- ---------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 2 - filter("PERIOD_01"<=6 AND "PERIOD_01">=4)
- SQL> select count(*) from t1 where period_02 between 4 and 6;
- COUNT(*)
- ----------
- 3000 --這裡實際有3000行,但是oracle估算的是4167行,不準確!
- 執行計劃
- ----------------------------------------------------------
- Plan hash value: 3724264953
- ---------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- ---------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 1 | 3 | 6 (0)| 00:00:01 |
- | 1 | SORT AGGREGATE | | 1 | 3 | | |
- |* 2 | TABLE ACCESS FULL| T1 | 4167 | 12501 | 6 (0)| 00:00:01 |
- ---------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 2 - filter("PERIOD_02"<=6 AND "PERIOD_02">=4)
- SQL> set autotrace off;
- SQL> begin
- 2 dbms_stats.gather_table_stats(
- 3 user,
- 4 't1',
- 5 cascade => true,
- 6 estimate_percent => null,
- 7 method_opt => 'for all columns size 254' --重新收集直方圖。
- 8 );
- 9 end;
- 10 /
- PL/SQL 過程已成功完成。
- SQL> set autotrace on exp;
- SQL> select count(*) from t1 where period_01 between 4 and 6;
- COUNT(*)
- ----------
- 3000
- 執行計劃
- ----------------------------------------------------------
- Plan hash value: 3724264953
- ---------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- ---------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 1 | 3 | 6 (0)| 00:00:01 |
- | 1 | SORT AGGREGATE | | 1 | 3 | | |
- |* 2 | TABLE ACCESS FULL| T1 | 3000 | 9000 | 6 (0)| 00:00:01 |
- ---------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 2 - filter("PERIOD_01"<=6 AND "PERIOD_01">=4)
- SQL> select count(*) from t1 where period_02 between 4 and 6;
- COUNT(*)
- ----------
- 3000
- 執行計劃
- ----------------------------------------------------------
- Plan hash value: 3724264953
- ---------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- ---------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 1 | 3 | 6 (0)| 00:00:01 |
- | 1 | SORT AGGREGATE | | 1 | 3 | | |
- |* 2 | TABLE ACCESS FULL| T1 | 3000 | 9000 | 6 (0)| 00:00:01 |
- ---------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 2 - filter("PERIOD_02"<=6 AND "PERIOD_02">=4)
4、函式索引
- SQL> create index idx_t1_PERIOD_01 on t1(upper(PERIOD_01));
- 索引已建立。
- SQL> select i.index_name,i.index_type from user_indexes i;
- INDEX_NAME INDEX_TYPE
- ------------------------------ ---------------------------
- IDX_T1_PERIOD_01 FUNCTION-BASED NORMAL
- SQL> select c.TABLE_NAME,c.COLUMN_NAME from user_tab_cols c;
- TABLE_NAME COLUMN_NAME
- ------------------------------ ------------------------------
- T1 PERIOD_01
- T1 PERIOD_02
- T1 SYS_NC00003$
需要牢記的是,如果建立了一個基於函式的索引,那麼實際上是在虛擬列上建立了索引,當收集關於該表及其索引的統計資訊時,同時也就收集了虛擬列上的統計資訊。這在類似情況下,諸如upper(PERIOD_01)='xxx'的謂詞將被優化為:SYS_NC00003$='xxx'。
5、相互關聯的列
如果在謂詞中使用相互依賴(相關)的列容易將問題複雜化,前提是這些相關聯的列同時出現在where子句中。
- SQL> create table t1
- 2 nologging
- 3 as
- 4 select
- 5 trunc(dbms_random.value(0,25))n1,
- 6 rpad('x',40)ind_pad,
- 7 trunc(dbms_random.value(0,20))n2,
- 8 lpad(rownum,10,'0')small_vc,
- 9 rpad('x',200)padding
- 10 from
- 11 all_objects
- 12 where
- 13 rownum <= 10000
- 14 ;
- 表已建立。
- SQL> update t1 set n2 = n1;
- 已更新10000行。
- SQL> commit;
- 提交完成。
- SQL> begin
- 2 dbms_stats.gather_table_stats(
- 3 ownname => user,
- 4 tabname=> 'T1',
- 5 cascade=> true,
- 6 estimate_percent => null,
- 7 method_opt => 'for all columns size 1'
- 8 );
- 9 end;
- 10 /
- PL/SQL 過程已成功完成。
- SQL> set autotrace traceonly;
- SQL> select small_vc
- 2 from t1
- 3 where n1 = 2
- 4 and n2 = 2;
- 已選擇420行。
- 執行計劃
- ----------------------------------------------------------
- Plan hash value: 3617692013
- --------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- --------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 16 | 272 | 66 (0)| 00:00:01 |
- |* 1 | TABLE ACCESS FULL| T1 | 16 | 272 | 66 (0)| 00:00:01 |
- --------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 1 - filter("N1"=2 AND "N2"=2)
- 統計資訊
- ----------------------------------------------------------
- 1 recursive calls
- 0 db block gets
- 401 consistent gets
- 0 physical reads
- 0 redo size
- 9059 bytes sent via SQL*Net to client
- 712 bytes received via SQL*Net from client
- 29 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 420 rows processed
根據以上執行計劃,CBO估算將會返回16行資料,實際確實返回了420行,如果這個表要與多表關聯,基數一旦算錯,必然導致整個SQL的執行計劃全部出錯,從而導致SQL效能下降。
接下來再次在相關列上執行查詢,不過sql語句中包含了dynamic_sampling提示。
- SQL> select /*+ dynamic_sampling(t1 1) */
- 2 small_vc
- 3 from t1
- 4 where n1 = 2
- 5 and n2 = 2;
- 已選擇420行。
- 執行計劃
- ----------------------------------------------------------
- Plan hash value: 3617692013
- --------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- --------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 370 | 6290 | 66 (0)| 00:00:01 |
- |* 1 | TABLE ACCESS FULL| T1 | 370 | 6290 | 66 (0)| 00:00:01 |
- --------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 1 - filter("N1"=2 AND "N2"=2)
- Note
- -----
- - dynamic sampling used for this statement (level=2)
- 統計資訊
- ----------------------------------------------------------
- 0 recursive calls
- 0 db block gets
- 401 consistent gets
- 0 physical reads
- 0 redo size
- 9059 bytes sent via SQL*Net to client
- 712 bytes received via SQL*Net from client
- 29 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
-
420 rows processed
沒有直方圖的執行計劃預估結果集行數計算公式
公式摘抄於《催華-基於ORACLE的SQL優化》
轉載請註明連結地址:http://blog.csdn.net/launch_225/article/details/25432661
- SQL> select * from t1;
- N1 V1
- ---------- --------
- 1 a
- 2 b
- 2 c
- 4 c
- 5 d
- 6 e
- 6 f
- 6 f
- 9 g
- 10 h
- 11 i
- N1 V1
- ---------- --------
- 12 i
- 12 i
- 14 j
- 15 k
- 16 l
- 16 m
- 18 n
- 18 rows selected.
create index idx_t1_n1 on t1(n1);
- 1. ====>沒有直方圖的預估結果集行數
- SQL> select count(1) from t1;
- COUNT(1)
- ----------
- 18
- SQL> select count(distinct n1) from t1;
- COUNT(DISTINCTN1)
- -----------------
- 13
- SQL> select min(n1),max(n1) from t1;
- MIN(N1) MAX(N1)
- ---------- ----------
- 1 18
- TABLE_NAME COLUMN_NAME DENSITY NUM_BUCKETS HISTOGRAM
- 1 T1 N1 0.0769230769230769 1 NONE
- 2 T1 V1 0.0714285714285714 1 NONE
- 1.1=>範圍查詢
- ------------------------------------------------------------------------------
- (1)目標列大於指定的val,且位於low_value and high_value之間
- selectivity=((HIGH_VALUE-VAL)/(HIGH_VALUE-LOW_VALUE))* NULL_ADJUST
- NULL_ADJUST=(NUM_ROWS-NUM_NULLS)/NUM_ROWS
- SQL> select ((18-5)/(18-1))*1*18 from dual;
- ((18-5)/(18-1))*1*18
- --------------------
- 13.7647059
- 1* select * from t1 where n1>5
- SQL> /
- Execution Plan
- ----------------------------------------------------------
- Plan hash value: 1577308413
- -----------------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- -----------------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 14 | 70 | 2 (0)| 00:00:01 |
- | 1 | TABLE ACCESS BY INDEX ROWID| T1 | 14 | 70 | 2 (0)| 00:00:01 |
- |* 2 | INDEX RANGE SCAN | IDX_T1_N1 | 14 | | 1 (0)| 00:00:01 |
- -----------------------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 2 - access("N1">5)
- (2)目標列小於指定的val,且位於low_value and high_value之間
- selectivity=((VAL-LOW_VALUE)/(HIGH_VALUE-LOW_VALUE))* NULL_ADJUST
- NULL_ADJUST=(NUM_ROWS-NUM_NULLS)/NUM_ROWS
- SQL> select ((5-1)/(18-1))*1*18 from dual;
- ((5-1)/(18-1))*1*18
- -------------------
- 4.23529412
- SQL> select * from t1 where n1<5
- 2 /
- Execution Plan
- ----------------------------------------------------------
- Plan hash value: 1577308413
- -----------------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- -----------------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 4 | 20 | 2 (0)| 00:00:01 |
- | 1 | TABLE ACCESS BY INDEX ROWID| T1 | 4 | 20 | 2 (0)| 00:00:01 |
- |* 2 | INDEX RANGE SCAN | IDX_T1_N1 | 4 | | 1 (0)| 00:00:01 |
- -----------------------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 2 - access("N1"<5)
- (3)目標列>=指定的val,且位於low_value and high_value之間
- selectivity=((HIGH_VALUE-VAL)/(HIGH_VALUE-LOW_VALUE)+1/NUM_DISTINCT)* NULL_ADJUST
- NULL_ADJUST=(NUM_ROWS-NUM_NULLS)/NUM_ROWS
- SQL> select ((18-5)/(18-1)+1/13)*1*18 from dual;
- ((18-5)/(18-1)+1/13)*1*18
- -------------------------
- 15.1493213
- SQL> select * from t1 where n1>=5;
- Execution Plan
- ----------------------------------------------------------
- Plan hash value: 1577308413
- -----------------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- -----------------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 15 | 75 | 2 (0)| 00:00:01 |
- | 1 | TABLE ACCESS BY INDEX ROWID| T1 | 15 | 75 | 2 (0)| 00:00:01 |
- |* 2 | INDEX RANGE SCAN | IDX_T1_N1 | 15 | | 1 (0)| 00:00:01 |
- -----------------------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 2 - access("N1">=5)
- (4)目標列<=指定的val,且位於low_value and high_value之間
- selectivity=((VAL-LOW_VALUE)/(HIGH_VALUE-LOW_VALUE)+1/NUM_DISTINCT)* NULL_ADJUST
- NULL_ADJUST=(NUM_ROWS-NUM_NULLS)/NUM_ROWS
- SQL> select ((5-1)/(18-1)+1/13)*1*18 from dual;
- ((5-1)/(18-1)+1/13)*1*18
- ------------------------
- 5.6199095
- SQL> select * from t1 where n1<=5;
- Execution Plan
- ----------------------------------------------------------
- Plan hash value: 1577308413
- -----------------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- -----------------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 6 | 30 | 2 (0)| 00:00:01 |
- | 1 | TABLE ACCESS BY INDEX ROWID| T1 | 6 | 30 | 2 (0)| 00:00:01 |
- |* 2 | INDEX RANGE SCAN | IDX_T1_N1 | 6 | | 1 (0)| 00:00:01 |
- -----------------------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 2 - access("N1"<=5)
- (5)列位於指定值VAL1和VAL2之間,且VAL1和VAL2在LOW_VALUE和HIGH_VALUE範圍
- selectivity=((val2-val1)/(HIGH_VALUE-LOW_VALUE)+2/NUM_DISTINCT)* NULL_ADJUST
- NULL_ADJUST=(NUM_ROWS-NUM_NULLS)/NUM_ROWS
- SQL> select ((10-5)/(18-1)+2/13)*1*18 from dual;
- ((10-5)/(18-1)+2/13)*1*18
- -------------------------
- 8.06334842
- SQL> select * from t1 where n1 between 5 and 10;
- Execution Plan
- ----------------------------------------------------------
- Plan hash value: 1577308413
- -----------------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- -----------------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 8 | 40 | 2 (0)| 00:00:01 |
- | 1 | TABLE ACCESS BY INDEX ROWID| T1 | 8 | 40 | 2 (0)| 00:00:01 |
- |* 2 | INDEX RANGE SCAN | IDX_T1_N1 | 8 | | 1 (0)| 00:00:01 |
- -----------------------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 2 - access("N1">=5 AND "N1"<=10)
- ------------------------------------------------------------------------------
- 1.2 =》等值查詢
- (1) 等值查詢且列上沒有NULL值沒有直方圖的計算公式
- selectivity_without_null=(1/num_distinct)
- SQL> select 1/13*18 from dual;
- 1/13*18
- ----------
- 1.38461538
- SQL> select * from t1 where n1=8;
- Execution Plan
- ----------------------------------------------------------
- Plan hash value: 1577308413
- -----------------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- -----------------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 1 | 5 | 2 (0)| 00:00:01 |
- | 1 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 5 | 2 (0)| 00:00:01 |
- |* 2 | INDEX RANGE SCAN | IDX_T1_N1 | 1 | | 1 (0)| 00:00:01 |
- -----------------------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 2 - access("N1"=8)
- (2)目標列沒有直方圖且有NULL值(通用於沒有直方圖時列的選擇率的計算)
- selectivity_with_null=(1/num_distinct) * ((num_rows-num_nulls)/num_rows)
- SQL> select 1/13*((18-0)/18)*18 from dual;
- 1/13*((18-0)/18)*18
- -------------------
- 1.38461538
- --不存在空值也通用
- SQL> select * from t1 where n1=8;
- Execution Plan
- ----------------------------------------------------------
- Plan hash value: 1577308413
- -----------------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- -----------------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 1 | 5 | 2 (0)| 00:00:01 |
- | 1 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 5 | 2 (0)| 00:00:01 |
- |* 2 | INDEX RANGE SCAN | IDX_T1_N1 | 1 | | 1 (0)| 00:00:01 |
- -----------------------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 2 - access("N1"=8)
等頻直方圖預估結果集行計算公式
公式:摘自《催華-基於ORACLE的SQL優化》
連結地址:http://blog.csdn.net/launch_225/article/details/25472129
等頻:唯一值《=桶數
- =>等頻(頻率)直方圖
- -->density計算
- density=1/(2*num_rows*null_adjust)
- null_adjust=(num_rows-num_nulls)/num_rows
- -->等值查詢,val在low_value and high_value之間,值等於某個endpoint_value
- cardinality=num_rows * selectivity
- selectivity=bucketsize/num_rows
- bucketsize=current_endpoint_number-previous_endpoint_number
- TABLE_NAME COLUMN_NAME NUM_BUCKETS LOW_VALUE HIGH_VALUE HISTOGRAM
- 1 T1 N1 13 C102 C113 FREQUENCY
- -->density計算
- density=1/(2*num_rows*null_adjust)
- SQL> select 1/(2*18*1) from dual;
- 1/(2*18*1)
- ----------
- .027777778
- OWNER TABLE_NAME NUM_DISTINCT DENSITY COLUMN_NAME
- 1 AIKI T1 13 0.0277777777777778 N1
- -->等值查詢,val在low_value and high_value之間,值等於某個endpoint_value
- cardinality=num_rows * selectivity
- selectivity=bucketsize/num_rows
- bucketsize=current_endpoint_number-previous_endpoint_number
- 1* select * from t1 where n1=6
- SQL> /
- Execution Plan
- ----------------------------------------------------------
- Plan hash value: 1577308413
- -----------------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- -----------------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 3 | 15 | 2 (0)| 00:00:01 |
- | 1 | TABLE ACCESS BY INDEX ROWID| T1 | 3 | 15 | 2 (0)| 00:00:01 |
- |* 2 | INDEX RANGE SCAN | IDX_T1_N1 | 3 | | 1 (0)| 00:00:01 |
- -----------------------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 2 - access("N1"=6)
- OWNER TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE
- AIKI T1 N1 1 1
- AIKI T1 N1 3 2
- AIKI T1 N1 4 4
- AIKI T1 N1 5 5 --previous_endpoint_number
- AIKI T1 N1 8 6 --current_endpoint_number
- AIKI T1 N1 9 9
- AIKI T1 N1 10 10
- AIKI T1 N1 11 11
- AIKI T1 N1 13 12
- AIKI T1 N1 14 14
- AIKI T1 N1 15 15
- AIKI T1 N1 17 16
- AIKI T1 N1 18 18
- bucketsize=8-5=3;
- selectivity=3/18
- cardinality=18*(3/18)=3;
等高直方圖預估結果集行計算公式
摘自《催華-基於ORACLE的SQL優化》
連結地址:http://blog.csdn.net/launch_225/article/details/25474427
- SQL> desc t1
- Name Null? Type
- ----------------------------------------- -------- ----------------------------
- N1 NUMBER
- V1 VARCHAR2(8)
- SQL> select * from t1;
- N1 V1
- ---------- --------
- 1 a
- 2 b
- 2 c
- 4 c
- 5 d
- 6 e
- 6 f
- 6 f
- 9 g
- 10 h
- 11 i
- 12 i
- 12 i
- 14 j
- 15 k
- 16 l
- 16 m
- 18 n
- 18 rows selected.
- TABLE_NAME COLUMN_NAME NUM_BUCKETS LOW_VALUE HIGH_VALUE HISTOGRAM
- 1 T1 N1 10 C102 C113 HEIGHT BALANCED
- OWNER TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE
- 1 AIKI T1 N1 0 1
- 2 AIKI T1 N1 1 2
- 3 AIKI T1 N1 2 4
- 4 AIKI T1 N1 4 6
- 5 AIKI T1 N1 5 10
- 6 AIKI T1 N1 6 12
- 7 AIKI T1 N1 7 14
- 8 AIKI T1 N1 8 15
- 9 AIKI T1 N1 9 16
- 10 AIKI T1 N1 10 18
- 1.查詢值為popular value:
- cardinality=num_rows * SELECTIVITY
- SELECTIVITY=(BUCKETS_THIS_POPULAR_VALUE/BUCKETS_TOTAL)* NULL_ADJUST
- NULL_ADJUST=(NUM_ROWS-NUM_NULLS)/NUM_ROWS
- BUCKETS_THIS_POPULAR_VALUE:POPULAR VALUE所點的bucket的數量,buckets_total:bucket的總數
- select * from t1 where n1=6;
- SELECTIVITY=(2/10)*1
- SQL> select (2/10)*1*18 from dual;
- (2/10)*1*18
- -----------
- 3.6
- SQL> select * from t1 where n1=6;
- Execution Plan
- ----------------------------------------------------------
- Plan hash value: 1577308413
- -----------------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- -----------------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 4 | 20 | 2 (0)| 00:00:01 |
- | 1 | TABLE ACCESS BY INDEX ROWID| T1 | 4 | 20 | 2 (0)| 00:00:01 |
- |* 2 | INDEX RANGE SCAN | IDX_T1_N1 | 4 | | 1 (0)| 00:00:01 |
- -----------------------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 2 - access("N1"=6)
- --2.非popular value
- --(>=10.2.0.4版本)
- selectivity=newdensity * null_adjust
- null_adjust=(num_rows-num_nulls)/num_rows
- newdensity=(buckets_total-buckets_all_popular_values)/buckets_total/(ndv-popular_value.count)
- olddensity=sum(np.count(i) * np.count(i))/((num_rows-num_nulls)* sum(np.count(i)))
- ndv=num_distinct
- olddensity儲存在dba_***_col_statistics的density中
- buckets_all_popular_values:所有的pupular value所佔的bucket的數量,buckets_total:bucket的總數
- popular_values.count:pupular value的個數,NP.count(i):每個nopopular value在目標表中的記錄數
- newdensity=(10-(4-2))/10/(13-1)
- selectivity=(10-(4-2))/10/(13-1)
- cardinality=(10-(4-2))/10/(13-1)*18
- SQL> select (10-(4-2))/10/(13-1)*18 from dual;
- (10-(4-2))/10/(13-1)*18
- -----------------------
- 1.2
- SQL> select * from t1 where n1=12; --12為非popular value
- Execution Plan
- ----------------------------------------------------------
- Plan hash value: 1577308413
- -----------------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- -----------------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 1 | 5 | 2 (0)| 00:00:01 |
- | 1 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 5 | 2 (0)| 00:00:01 |
- |* 2 | INDEX RANGE SCAN | IDX_T1_N1 | 1 | | 1 (0)| 00:00:01 |
- -----------------------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 2 - access("N1"=12)
- --2.1.版本10.2.0.4(不含10.2.0.4,10.2.0.1)以上版本
- selectivity=olddensity * null_adjust
- null_adjust=(num_rows-num_nulls)/num_rows
- olddensity=sum(np.count(i)* np.count(i))/((num_rows-num_nulls)*sum(np.count(i)))
- olddensity儲存在dba_***_col_statistics的density中
- NP.count(i):每個nopopular value在目標表中的記錄數
- --2.1版本10.2.0.1
- selectivity=olddensity * null_adjust
- null_adjust=(num_rows-num_nulls)/num_rows
- olddensity=sum(dv.count(i) * dv.count(i))/((num_rows-num_nulls)* sum(dv.count(i)))
- olddensity用於實際計算可選擇率和結果集的Cardinality
- olddensity也會儲存在資料字典DBA_TAB_COL_STATISTICS,DBA_PART_COL_STATISTICS和DBA_SUBPART_COL_STATISTICS中的DENSITY中
- DV.count(i)表示的是目標列的每個DISTINCT值在目標表中的記錄數
About Me
...............................................................................................................................
● 本文作者:小麥苗,只專注於資料庫的技術,更注重技術的運用
● 本文在itpub(http://blog.itpub.net/26736162)、部落格園(http://www.cnblogs.com/lhrbest)和個人微信公眾號(xiaomaimiaolhr)上有同步更新
● 本文itpub地址:http://blog.itpub.net/26736162/abstract/1/
● 本文部落格園地址:http://www.cnblogs.com/lhrbest
● 本文pdf版及小麥苗雲盤地址:http://blog.itpub.net/26736162/viewspace-1624453/
● 資料庫筆試面試題庫及解答:http://blog.itpub.net/26736162/viewspace-2134706/
● QQ群:230161599 微信群:私聊
● 聯絡我請加QQ好友(646634621),註明新增緣由
● 於 2017-06-02 09:00 ~ 2017-06-30 22:00 在魔都完成
● 文章內容來源於小麥苗的學習筆記,部分整理自網路,若有侵權或不當之處還請諒解
● 版權所有,歡迎分享本文,轉載請保留出處
...............................................................................................................................
拿起手機使用微信客戶端掃描下邊的左邊圖片來關注小麥苗的微信公眾號:xiaomaimiaolhr,掃描右邊的二維碼加入小麥苗的QQ群,學習最實用的資料庫技術。
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/26736162/viewspace-2140358/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 【cbo計算公式】Join 選擇率(六)公式
- 【cbo計算公式】單表選擇率(二)公式
- 【cbo計算公式】Frequenccy 直方圖選擇率(三)公式直方圖
- 【cbo計算公式】Height-balance直方圖選擇率(四)公式直方圖
- PostgreSQL多值列的選擇性-Statistics,Cardinality,Selectivity,EstimateSQL
- 【cbo計算公式】CBO基本概念(一)公式
- 【cbo計算公式】No Bind Peeking(五)公式
- 新股中籤率怎麼算?新股中籤率計算公式公式
- 【數值框架】戰鬥公式:傷害公式選擇框架公式
- CBO計算與資料型別的選擇(兼談日期、字元、數字三種型別的直方圖生成原理和使用)資料型別字元直方圖
- word公式怎麼計算 word公式計算的方法公式
- 【公式】殺號公式準確率可達98公式
- PostgreSQL DBA(12) - 統計資訊在計算選擇率上的應用#2SQL
- PostgreSQL DBA(11) - 統計資訊在計算選擇率上的應用#1SQL
- 邊緣計算軟體的選擇
- 求問高併發下容錯率有計算公式麼公式
- 數值計算:高斯-勒朗德積分公式公式
- iOS倒數計時的探究與選擇iOS
- 市銷率計算公式是什麼 股票市銷率高好還是低好公式
- 絕密公式算單雙準確率99最強計劃方法公式
- 不悔此生選擇計算機行業計算機行業
- 降水強度計算公式公式
- 2021個稅年度彙算:三種辦理方式可選擇
- Excel公式大全詳解 excel表格計算公式大全Excel公式
- 基於位運算的許可權設計
- SAP中匯率取值選擇邏輯分析測試
- RTX3070/RTX3080/RTX3090顯示卡電源功率怎麼選擇?附功率計算公式公式
- React計算antd的TreeSelect元件所選中的子節點數量React元件
- 11選5殺號精準公式99%勝率!公式
- 遊戲基礎知識——“選擇”的設計方式遊戲
- 雲端計算要選擇什麼模式的呢?雲端計算又有幾種模式?模式
- excel利用rank排名次公式 excel利用公式計算排名Excel公式
- 雲端計算要選擇什麼模式的?Linux學習模式Linux
- matchTemplate函式各個方法的計算公式函式公式
- 袋鼠雲數棧基於CBO在Spark SQL優化上的探索SparkSQL優化
- switch-case選擇結構中的變數宣告變數
- 從一個資料庫連線數計算公式談起資料庫公式
- 工齡怎麼計算excel公式精確到月 如何設定工齡計算公式Excel公式
- jquery中的選擇器jQuery