oracle實驗記錄 (oracle單表選擇率與基數計算(1))

單表選擇率與基數(cardinality)有關係，也就是執行計劃中的CARD 資訊
CARD表示oracle 認為此操作將返回多少rows

基本的CARD 計算就等於 num_rows*選擇率
CARD 對於otimizer 選初始的連線順序，選INDEX 都有很大影響,所以能夠看明白計算過程方法，對優化是很有幫助的

實驗單表中選擇率 (本次實驗學習參考cost-based oracle)
SQL> create table t1(a int);

表已建立。

SQL> ed
已寫入 file afiedt.buf

1 declare
2 begin
3 for i in 1..10000 loop
4 insert into t1 values(i);
5 end loop;
6 commit;
7* end;
SQL> /

PL/SQL 過程已成功完成。

SQL> execute dbms_stats.gather_table_stats('SYS','T1');

PL/SQL 過程已成功完成。

SQL> select table_name,density,num_nulls,num_distinct from user_tab_col_statisti
cs where table_name='T1';

TABLE_NAME DENSITY NUM_NULLS NUM_DISTINCT
------------------------------ ---------- ---------- ------------
T1 .0001 0 10000

SQL> set autotrace traceonly explain
SQL> select count(*) from t1;

執行計劃
----------------------------------------------------------
Plan hash value: 3724264953

-------------------------------------------------------------------
| Id | Operation          | Name | Rows | Cost (%CPU)| Time     |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     6   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |            |          |
|   2 |   TABLE ACCESS FULL| T1   | 10000 |     6   (0)| 00:00:01 |
-------------------------------------------------------------------

Note
-----
- dynamic sampling used for this statement

~沒有謂詞沒有過濾掃描所有行所以card=10000

SQL> select count(*) from t1 where a=2;

執行計劃
----------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     6   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |     3 |            |          |
|* 2 |   TABLE ACCESS FULL| T1   |     1 |     3 |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("A"=2)

card=num_row*1/num_distinct=1
SQL> select 10000/10000 from dual;

10000/10000
-----------
1
如果要是這個表存在HISTOGRAM 那麼，OPTIMIZER將使用DENSITY 列，CARD=NUM_rows*density

以上的如果謂詞就是COLUMN=字面值那麼 card=num_row/num_distinct (=num_row*1/num_distinct)如果有histogram card=NUM_rows*density

~~~~~~~~~~~~~~~~~~~~~~~~~~沒有統計資訊，使用動態採集
SQL> create table t2 (a int);

表已建立。

SQL> declare
2 begin
3 for i in 1..10000 loop
4 insert into t2 values(i);
5 end loop;
6 commmit;
7 end;
8 /
commmit;

SQL> ed
已寫入 file afiedt.buf

1 declare
2 begin
3 for i in 1..10000 loop
4 insert into t2 values(i);
5 end loop;
6 commit;
7* end;
8 /

PL/SQL 過程已成功完成。

SQL> select table_name,density,num_nulls,num_distinct from user_tab_col_statisti
cs where table_name='T2';~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~沒有統計資訊

未選定行

SQL> show parameter dyna

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
optimizer_dynamic_sampling integer 2~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~oracle將使用動態採集

SQL> select count(*) from t2;

執行計劃
----------------------------------------------------------
Plan hash value: 3321871023

-------------------------------------------------------------------
| Id | Operation          | Name | Rows | Cost (%CPU)| Time     |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     6   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |            |          |
|   2 |   TABLE ACCESS FULL| T2   | 10000 |     6   (0)| 00:00:01 |
-------------------------------------------------------------------

Note
-----
- dynamic sampling used for this statement

SQL> select count(*) from t2 where a=2;

執行計劃
----------------------------------------------------------
Plan hash value: 3321871023

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    13 |     6   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |    13 |            |          |
|* 2 |   TABLE ACCESS FULL| T2   |     1 |    13 |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("A"=2)~~~~~~~~

Note
-----
- dynamic sampling used for this statement~~~~~~~~~~~~~~~~~~~~~可以看到 ORACLE 利用動態採集的資訊也是很準確的

Dynamic Sampling Levels(以下是動態採集各個LEVEL 說明)
The sampling levels are as follows if the dynamic sampling level used is from a cursor hint or from the OPTIMIZER_DYNAMIC_SAMPLING initialization parameter:

Level 0: Do not use dynamic sampling.

Level 1: Sample all tables that have not been analyzed if the following criteria are met: (1) there is at least 1 unanalyzed table in the query; (2) this

unanalyzed table is joined to another table or appears in a subquery or non-mergeable view; (3) this unanalyzed table has no indexes; (4) this unanalyzed

table has more blocks than the number of blocks that would be used for dynamic sampling of this table. The number of blocks sampled is the default number of

dynamic sampling blocks (32).

Level 2: Apply dynamic sampling to all unanalyzed tables. The number of blocks sampled is two times the default number of dynamic sampling blocks.

Level 3: Apply dynamic sampling to all tables that meet Level 2 criteria, plus all tables for which standard selectivity estimation used a guess for some

predicate that is a potential dynamic sampling predicate. The number of blocks sampled is the default number of dynamic sampling blocks. For unanalyzed

tables, the number of blocks sampled is two times the default number of dynamic sampling blocks.

Level 4: Apply dynamic sampling to all tables that meet Level 3 criteria, plus all tables that have single-table predicates that reference 2 or more columns.

The number of blocks sampled is the default number of dynamic sampling blocks. For unanalyzed tables, the number of blocks sampled is two times the default

number of dynamic sampling blocks.

Levels 5, 6, 7, 8, and 9: Apply dynamic sampling to all tables that meet the previous level criteria using 2, 4, 8, 32, or 128 times the default number of

dynamic sampling blocks respectively.

Level 10: Apply dynamic sampling to all tables that meet the Level 9 criteria using all blocks in the table.

The sampling levels are as follows if the dynamic sampling level for a table is set using the DYNAMIC_SAMPLING optimizer hint:

Level 0: Do not use dynamic sampling.

Level 1: The number of blocks sampled is the default number of dynamic sampling blocks (32).

Levels 2, 3, 4, 5, 6, 7, 8, and 9: The number of blocks sampled is 2, 4, 8, 16, 32, 64, 128, or 256 times the default number of dynamic sampling blocks

respectively.

Level 10: Read all blocks in the table.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SQL> alter system set optimizer_dynamic_sampling=1;
系統已更改。
SQL> alter system set optimizer_dynamic_sampling=1; ~~~~關閉動態採集

系統已更改。
SSQL> set autotrace traceonly explain
SQL> alter session set events '10053 trace name context forever';

會話已更改。

SQL> select count(*) from t2;

執行計劃
----------------------------------------------------------
Plan hash value: 3321871023

-------------------------------------------------------------------
| Id | Operation          | Name | Rows | Cost (%CPU)| Time     |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     2   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |            |          |
|   2 |   TABLE ACCESS FULL| T2   |    82 |     2   (0)| 00:00:01 |
-------------------------------------------------------------------

SQL> select count(*) from t2 where a=2;

執行計劃
----------------------------------------------------------
Plan hash value: 3321871023

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    13 |     2   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |    13 |            |          |
|* 2 |   TABLE ACCESS FULL| T2   |     1 |    13 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("A"=2)

~~從trace中可以看到資訊十分不準確
Table Stats::
Table: T2 Alias: T2 (NOT ANALYZED)
    #Rows: 82 #Blks: 1 AvgRowLen: 100.00
***************************************
SINGLE TABLE ACCESS PATH
Table: T2 Alias: T2
    Card: Original: 82 Rounded: 82 Computed: 82.00 Non Adjusted: 82.00
Access Path: TableScan
    Cost: 2.00 Resp: 2.00 Degree: 0
      Cost_io: 2.00 Cost_cpu: 19421
      Resp_io: 2.00 Resp_cpu: 19421
Best:: AccessPath: TableScan
         Cost: 2.00 Degree: 1 Resp: 2.00 Card: 82.00 Bytes: 0

Table Stats::
Table: T2 Alias: T2 (NOT ANALYZED) ~~~~~~~~~~~沒統計資訊(按公式算的話oracle 自己決定num_distinct 為100)
    #Rows: 82 #Blks: 1 AvgRowLen: 100.00
***************************************
SINGLE TABLE ACCESS PATH
Column (#1): A(NUMBER) NO STATISTICS (using defaults)
    AvgLen: 22.00 NDV: 3 Nulls: 0 Density: 0.39024
Table: T2 Alias: T2
    Card: Original: 82 Rounded: 1 Computed: 0.82 Non Adjusted: 0.82
Access Path: TableScan
    Cost: 2.00 Resp: 2.00 Degree: 0
      Cost_io: 2.00 Cost_cpu: 23521
      Resp_io: 2.00 Resp_cpu: 23521
Best:: AccessPath: TableScan
         Cost: 2.00 Degree: 1 Resp: 2.00 Card: 0.82 Bytes: 0
````````````````````````````````````````````````````````````````````
一個問題
關於列中值嚴重不均勻

create table t3(a int)
SQL> ed
已寫入 file afiedt.buf

1 declare
2 begin
3 for i in 1..1000 loop
4 insert into t3 values(1);
5 end loop;
6 for i in 2..9001 loop
7 insert into t3 values(i);
8 end loop;
9 commit;
10* end;
11 /

PL/SQL 過程已成功完成。

SQL> execute dbms_stats.gather_table_stats('SYS','T3');

PL/SQL 過程已成功完成。

SQL> select count(*) from t3;

COUNT(*)
----------
10000

SQL> select num_rows from user_tables where table_name='T3';

NUM_ROWS
----------
10000

SQL> select table_name,density,num_nulls,num_distinct from user_tab_col_statisti
cs where table_name='T3';

TABLE_NAME DENSITY NUM_NULLS NUM_DISTINCT
------------------------------ ---------- ---------- ------------
T3 .000111099 0 9001

SQL> select count(*) from t3 where a=1;

COUNT(*)
----------
1000

從上面資訊可以看到 ~~列A 有10000ROWS 其中A=1的 1000 ROWS， A=2 到9001 的有9000 ROW
這樣NUM_DISTINCT=9001

SQL> set autotrace traceonly explain
SQL> alter session set events '10053 trace name context forever';

會話已更改。
SQL> select count(*) from t3 where a=1;

執行計劃
----------------------------------------------------------
Plan hash value: 463314188

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     7   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |     3 |            |          |
|* 2 |   TABLE ACCESS FULL| T3   |     1 |     3 |     7   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("A"=1)

可以看到CARD 非常不準確還是按公式 num_rows*1/num_distinct算的
SQL> select 10000*1/9001 from dual;

10000*1/9001
------------
1.11098767

***********************
Table Stats::
Table: T3 Alias: T3
    #Rows: 10000 #Blks: 23 AvgRowLen: 3.00
***************************************
SINGLE TABLE ACCESS PATH
Column (#1): A(NUMBER)
    AvgLen: 4.00 NDV: 9001 Nulls: 0 Density: 1.1110e-004 Min: 1 Max: 9001
Table: T3 Alias: T3
    Card: Original: 10000 Rounded: 1 Computed: 1.11 Non Adjusted: 1.11
Access Path: TableScan
    Cost: 7.14 Resp: 7.14 Degree: 0
      Cost_io: 7.00 Cost_cpu: 2163793
      Resp_io: 7.00 Resp_cpu: 2163793
Best:: AccessPath: TableScan
         Cost: 7.14 Degree: 1 Resp: 7.14 Card: 1.11 Bytes: 0~~~~~~~~~~~~~~~~~~看到trace 中card =1.11 ，實際應該為1000
我們可以用HISTOGREAM來解決

1 begin
2     dbms_stats.gather_table_stats(
3             'SYS',
4             't3',
5             cascade => true,
6             estimate_percent => null,
7             method_opt => 'for all columns size 120'
8     );
9* end;
SQL> /

PL/SQL 過程已成功完成。

使用HISTOGRAM收集統計資訊後 ORACLE 將使用DENSITY 來計算

SQL> select table_name,density,num_nulls,num_distinct from user_tab_col_statisti
cs where table_name='T3';

TABLE_NAME DENSITY NUM_NULLS NUM_DISTINCT
------------------------------ ---------- ---------- ------------
T3 .0001 0 9001

SQL> set autotrace traceonly explain
SQL> alter session set events '10053 trace name context forever';

會話已更改。

SQL> select count(*) from t3 where a=1;

執行計劃
----------------------------------------------------------
Plan hash value: 463314188

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     7   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |     3 |            |          |
|* 2 |   TABLE ACCESS FULL| T3   |   917 | 2751 |     7   (0)| 00:00:01 |~~~~~~~~~~可以看到CARD接近了=917
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("A"=1)

SQL> select count(*) from t3 where a=2;

執行計劃
----------------------------------------------------------
Plan hash value: 463314188

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     7   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |     3 |            |          |
|* 2 |   TABLE ACCESS FULL| T3   |     1 |     3 |     7   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("A"=2)

SQL>
上面可以看出 A=1時候CARD=917 ORACLE 並沒有使用density來計算， A=2時候用的DENSITY計算的 card=10000*0.0001=1
針對a=1 oracle 使用的 histogram bucket來計算的

SQL> select endpoint_number,endpoint_value from dba_histograms where table_name=
'T3' and rownum<5;

ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
             11              1
             12              9
             13             93
             14            177

可以看出histogram是一個高度均衡的（HISTOGRAM分2種第一中頻率histogram 每個bucket存一個值當列的histogram bucket >=列distinct時候為頻率，另一種高度均衡

HISTOGRAM
當列DISTINCT比 histogram bucket多時為高度均衡每個bucket存相同數量的值）
此例為高度均衡可以看出來 1-11 BUCKET 存值1，其實第12個BUCKET中也存值1，因為12-13中可以看出每個BUCKET 存 84個值(93-9)（此例中除值1外其他都唯一，所以很好看

出來），而第12個 BUCKET 最高存的值為9 這樣表示第12個BUCKET還存了值1（應該是76 =84-8 ，8是2到9的個數） ,oracle 發現值1 是一個高頻率出現的值（跨越的多個

桶），oracle將採用BUCKET計算11/120（跨越bucket數/總bucket數）=.091666667 從這裡看到了雖然BUCKET12中也有值1 但ORACLE沒算進來（因為該bucket中還有其它值2-8）用

11/120 這也就造成了CARD只是接近而不是準確的1000
關於histogram 後面將會有與histogram詳細的實驗~~
card=10000*.091666667= 916.66667
看下trace

SINGLE TABLE ACCESS PATH
Column (#1): A(NUMBER)
    AvgLen: 4.00 NDV: 9001 Nulls: 0 Density: 1.0000e-004 Min: 1 Max: 9001
    Histogram: HtBal #Bkts: 120 UncompBkts: 120 EndPtVals: 110
Table: T3 Alias: T3
    Card: Original: 10000 Rounded: 917 Computed: 916.67 Non Adjusted: 916.67
Access Path: TableScan
    Cost: 7.14 Resp: 7.14 Degree: 0
      Cost_io: 7.00 Cost_cpu: 2163793
      Resp_io: 7.00 Resp_cpu: 2163793
Best:: AccessPath: TableScan
         Cost: 7.14 Degree: 1 Resp: 7.14 Card: 916.67 Bytes: 0 ************ Card: 916.67
***************************************

~列中有NULL的情況

SQL> create table t6 (a int);

表已建立。

1 declare
2    begin
3     for i in 1..1000 loop
4      insert into t6 values(1);
5    end loop;
6     for i in 2..9001 loop
7     insert into t6 values(i);
8     end loop;
9     commit;
10* end;
11 /

PL/SQL 過程已成功完成。

SQL> update t6 set a=null where a>8001;

已更新1000行。

SQL> commit;

提交完成。

SQL> execute dbms_stats.gather_table_stats('SYS','T6');

PL/SQL 過程已成功完成。

SQL> select table_name,density,num_nulls,num_distinct from user_tab_col_statisti
cs where table_name='T6';

TABLE_NAME DENSITY NUM_NULLS NUM_DISTINCT
------------------------------ ---------- ---------- ------------
T6 .000124984 1000 8001

SQL> select num_rows from user_tables where table_name='T6';

NUM_ROWS
----------
10000
可以看出工10000ROWS 空值為1000

SQL> set autotrace traceonly explain
SQL> alter session set events '10053 trace name context forever';

會話已更改。

SQL> select count(*) from t6 where a=2;

執行計劃
----------------------------------------------------------
Plan hash value: 4096694858

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     6   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |     3 |            |          |
|* 2 |   TABLE ACCESS FULL| T6   |     1 |     3 |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("A"=2)

包含NULL的選擇率按以下公式算
Adjusted selectivity = Base selectivity * (num_rows - num_nulls) / num_rows

=1/8001*((10000-1000)/10000)

card = NUM_ROWS*Adjusted selectivity=10000*(/8001*((10000-1000)/10000))

SQL> select 1/8001*((10000-1000)/10000) from dual;

1/8001*((10000-1000)/10000)
---------------------------
.000112486

SQL> select 10000*0.000112486 from dual;

10000*0.000112486
-----------------
1.12486

SINGLE TABLE ACCESS PATH
Column (#1): A(NUMBER)
    AvgLen: 4.00 NDV: 8001 Nulls: 1000 Density: 1.2498e-004 Min: 1 Max: 8001
Table: T6 Alias: T6
    Card: Original: 10000 Rounded: 1 Computed: 1.12 Non Adjusted: 1.12
Access Path: TableScan
    Cost: 6.14 Resp: 6.14 Degree: 0
      Cost_io: 6.00 Cost_cpu: 2142429
      Resp_io: 6.00 Resp_cpu: 2142429
Best:: AccessPath: TableScan
         Cost: 6.14 Degree: 1 Resp: 6.14 Card: 1.12 Bytes: 0====================card 1.12

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

關於使用BIND
SQL> set autotrace traceonly explain
SQL> set autotrace off
SQL> variable a number
SQL> set autotrace traceonly explain
SQL> select count(*) from t6 where a=:a;*********************

執行計劃
----------------------------------------------------------
Plan hash value: 4096694858

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     6   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |     3 |            |          |
|* 2 |   TABLE ACCESS FULL| T6   |     1 |     3 |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("A"=TO_NUMBER(:A))

對於BIND 所使用的選擇率= COLUMN=字面值時候的選擇率（字面值情況也要分是否列中有NULL來計算，此例中列中有NULL）
SQL> select table_name,density,num_nulls,num_distinct from user_tab_col_statisti
cs where table_name='T6';

TABLE_NAME DENSITY NUM_NULLS NUM_DISTINCT
------------------------------ ---------- ---------- ------------
T6 .000124984 1000 8001~~~~~~~~~~~~~~~~~包含NULL

SQL> select num_rows from user_tables where table_name='T6';

NUM_ROWS
----------
10000

Adjusted selectivity = Base selectivity * (num_rows - num_nulls) / num_rows~~~~~~~~~~~~~~~調整選擇率

=1/8001*((10000-1000)/10000)
BIN選擇率=Adjusted selectivity（是字面值時候情況的選擇率）
card = NUM_ROWS*Adjusted selectivity=10000*(/8001*((10000-1000)/10000))

SQL> select 1/8001*((10000-1000)/10000) from dual;

1/8001*((10000-1000)/10000)
---------------------------
.000112486

SQL> select 10000*0.000112486 from dual;

10000*0.000112486
-----------------
          1.12486
trace
SINGLE TABLE ACCESS PATH
Column (#1): A(NUMBER)
    AvgLen: 4.00 NDV: 8001 Nulls: 1000 Density: 1.2498e-004 Min: 1 Max: 8001
Table: T6 Alias: T6
    Card: Original: 10000 Rounded: 1 Computed: 1.12 Non Adjusted: 1.12
Access Path: TableScan
    Cost: 6.21 Resp: 6.21 Degree: 0
      Cost_io: 6.00 Cost_cpu: 3142429
      Resp_io: 6.00 Resp_cpu: 3142429
Best:: AccessPath: TableScan
         Cost: 6.21 Degree: 1 Resp: 6.21 Card: 1.12 Bytes: 0
***************************************

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
使用INLIST
SQL> create table t4 (a int);

表已建立。

SQL> declare
2 begin
3 for i in 1..10000 loop
4 insert into t4 values(i);
5 end loop;
6 commit;
7 end;
8 /

PL/SQL 過程已成功完成。

SQL> execute dbms_stats.gather_table_stats('SYS','T4');

PL/SQL 過程已成功完成。

SQL> select num_rows from user_tables where table_name='T4';

NUM_ROWS
----------
10000

SQL> select table_name,density,num_nulls,num_distinct from user_tab_col_statisti
cs where table_name='T4';

TABLE_NAME DENSITY NUM_NULLS NUM_DISTINCT
------------------------------ ---------- ---------- ------------
T4 .0001 0 10000
SQL> col table_name format a10
SQL> select table_name,density,num_nulls,num_distinct,HIGH_VALUE from user_tab_c
ol_statistics where table_name='T4';

TABLE_NAME DENSITY NUM_NULLS NUM_DISTINCT HIGH_VALUE
---------- ---------- ---------- ------------ --------------------
T4 .0001 0 10000 C302

SQL> variable a number
SQL> variable b number
SQL> set autotrace traceonly explain
SQL> alter session set events '10053 trace name context forever';

會話已更改。

SQL> select count(*) from t4 where a in (1,2);~~~~~~~~~2個

執行計劃
----------------------------------------------------------
Plan hash value: 405148644

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     6   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |     3 |            |          |
|* 2 |   TABLE ACCESS FULL| T4   |     2 |     6 |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("A"=1 OR "A"=2)

SQL> select count(*) from t4 where a in (1,2,2);~~~~~~~~~~~~含重複值得

執行計劃
----------------------------------------------------------
Plan hash value: 405148644

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     6   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |     3 |            |          |
|* 2 |   TABLE ACCESS FULL| T4   |     2 |     6 |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("A"=1 OR "A"=2)

SQL> select count(*) from t4 where a in (1,2,3);~~~~~~~~~~3個

執行計劃
----------------------------------------------------------
Plan hash value: 405148644

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     6   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |     3 |            |          |
|* 2 |   TABLE ACCESS FULL| T4   |     3 |     9 |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("A"=1 OR "A"=2 OR "A"=3)

SQL> select count(*) from t4 where a in (1,2,2,null);~~~~~~~包含空值

執行計劃
----------------------------------------------------------
Plan hash value: 405148644

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     6   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |     3 |            |          |
|* 2 |   TABLE ACCESS FULL| T4   |     3 |     9 |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("A"=1 OR "A"=2 OR "A"=TO_NUMBER(NULL))

SQL> select count(*) from t4 where a in (1,100052);~~~~~~~~~超過該列的最大值（HIGHT_VALUE）

執行計劃
----------------------------------------------------------
Plan hash value: 405148644

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     6   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |     3 |            |          |
|* 2 |   TABLE ACCESS FULL| T4   |     2 |     6 |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("A"=1 OR "A"=100052)

SQL> select count(*) from t4 where a in (:a,:b);~~~~~~~~~~使用變數

執行計劃
----------------------------------------------------------
Plan hash value: 405148644

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     6   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |     3 |            |          |
|* 2 |   TABLE ACCESS FULL| T4   |     2 |     6 |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("A"=TO_NUMBER(:A) OR "A"=TO_NUMBER(:B))

Moreover, when you check the values for 9i and 10g, you see
that the cardinalities always come out as N * number of entries in the list until the number of
entries in the list exceeds the number of distinct values
結論：關於inlist card= n*(num_rows*1/num_distinct) (N 為中 inlist (數目 ))
oracle可以判斷出重複值，~~檢查不出是否超出最大值，是否含NULL

另外INLIST （N，N）中N的數目超過列的Num_distinct資料那麼 card=num_distinct*(num_rows*1/num_distinct)=num_rows
早期8I的 IN LIST 計算是按OR算的
in (1,2,3)= 1 or 2 or 3 8I 是這麼算的選擇率
sel(A or B or C) =
sel(A) + sel(B) + sel(C) – Sel(A)sel(B) – Sel(B)sel(C) – sel(C)sel(A) + Sel(A)Sel(B)Sel(C)

8I INLIST CARD=NUM_ROWS*1/(sel(A) + sel(B) + sel(C) – Sel(A)sel(B) – Sel(B)sel(C) – sel(C)sel(A) + Sel(A)Sel(B)Sel(C))****

~~~~~~~~~~~~~~~~~~~~~~~NOT IN
SQL> select count(*) from t4 where a in (1,2,2);用HINTS USE_CONCAT那麼可以將 OR操作轉換為 UNION ALL 操作(3個SELECT FROM X WHERE =1 union all=2 union all=3)

執行計劃
----------------------------------------------------------
Plan hash value: 405148644

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     6   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |     3 |            |          |
|* 2 |   TABLE ACCESS FULL| T4   |     2 |     6 |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("A"=1 OR "A"=2)

SQL> select count(*) from t4 where a not in (1,2);

執行計劃
----------------------------------------------------------
Plan hash value: 405148644

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     6   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |     3 |            |          |
|* 2 |   TABLE ACCESS FULL| T4   | 9998 | 29994 |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("A"<>1 AND "A"<>2)

SQL> select count(*) from t4 where a not in (1,2,2);

執行計劃
----------------------------------------------------------
Plan hash value: 405148644

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     6   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |     3 |            |          |
|* 2 |   TABLE ACCESS FULL| T4   | 9998 | 29994 |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("A"<>1 AND "A"<>2)

SQL> select count(*) from t4 where a not in (1,2,3);

執行計劃
----------------------------------------------------------
Plan hash value: 405148644

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     6   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |     3 |            |          |
|* 2 |   TABLE ACCESS FULL| T4   | 9997 | 29991 |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("A"<>1 AND "A"<>2 AND "A"<>3)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~可以看出是自相容的 card=NUM_ROWS-n*(num_rows*1/num_distinct)

關於like*************

SQL> set autotrace traceonly explain
SQL> select count(*) from t6 where a like '%1';

執行計劃
----------------------------------------------------------
Plan hash value: 4096694858

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     6   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |     3 |            |          |
|* 2 |   TABLE ACCESS FULL| T6   |   500 | 1500 |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter(TO_CHAR("A") LIKE '%1')

SQL> select count(*) from t6 where a like '1%';

執行計劃
----------------------------------------------------------
Plan hash value: 4096694858

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     6   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |     3 |            |          |
|* 2 |   TABLE ACCESS FULL| T6   |   500 | 1500 |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter(TO_CHAR("A") LIKE '1%')

SQL> select count(*) from t6 where a like '1%';

執行計劃
----------------------------------------------------------
Plan hash value: 4096694858

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
-----------ss----------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     6   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |     3 |            |          |
|* 2 |   TABLE ACCESS FULL| T6   |   500 | 1500 |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter(TO_CHAR("A") LIKE '1%')

SQL> set autotrace off
SQL> select count(*) from t6 where a like '1%';

COUNT(*)
----------
2110

SQL>

選擇率應該是5% CARD=NUM_ROW*5%=500

另外 LIKE %a 是不走INDEX 的，而 like a% 是走index的針對 LIKE%A 不走 , 可以建一個reverse(X) 的函式index 查詢時候使用reverse(XX) like '%a' 可以走這個函式index

來自 “ ITPUB部落格 ” ，連結：http://blog.itpub.net/12020513/viewspace-614301/，如需轉載，請註明出處，否則將追究法律責任。

oracle實驗記錄 (oracle單表選擇率與基數計算(1))

相關文章