oracle 謂詞表示式對基數的影響(及11G改進)

fufuh2o發表於2010-01-02


 

簡單的選擇率與函式問題,下面只用col>字面值的 選擇率,計算 比較方便

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Prod
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production


SQL> conn xh/a831115
Connected.
SQL> show user
USER is "XH"
SQL> create table t1 (a int);

Table created.

SQL> declare
  2  begin
  3  for i in 1..10000 loop
  4  insert into t1 values(i);
  5  end loop;
  6  commit;
  7  end;
  8  /

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats('XH','T1');

PL/SQL procedure successfully completed.

SQL> select owner,num_rows from dba_tables where table_name='T1'
  2  ;

OWNER                            NUM_ROWS
------------------------------ ----------
XH                                  10000


SQL> select table_name,density,num_nulls,num_distinct from user_tab_col_statistics where table_name='T1'
  2  ;

TABLE_NAME                        DENSITY  NUM_NULLS NUM_DISTINCT
------------------------------ ---------- ---------- ------------
T1                                  .0001          0        10000

 

SQL> SQL> select count(*) from t1 where a>2;

Execution Plan
----------------------------------------------------------

Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     6   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     3 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |  9999 | 29997 |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("A">2)
  
   這個選擇率  計算公式為:
   分析a>2 是一個無邊 開區間 (所謂無邊界 就是一側沒有邊界)
   a>2選擇率=(high_value – limit) / (high_value – low_value )
  
   SQL> select (10000-2)/(10000-1) from dual;

(10000-2)/(10000-1)
-------------------
          .99989999
card=num_rows*選擇率
SQL> select  10000*.99989999 from dual;

10000*.99989999
---------------
      9998.9999~~~~~~~~~~~~~~~可以看到10gr2是向上舍入(ceil)
     
      最後的card(rows)= num_rows*選擇率=9999 rows
     
     
      現在使用函式
     
     
  
SQL> select count(*) from t1 where to_char(a,'xx')>'2';

Execution Plan
----------------------------------------------------------

Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     7  (15)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     3 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |   500 |  1500 |     7  (15)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(TO_CHAR("A",'xx')>'2')
  
選擇率 是5%(跟使用bind(!=條件) or sql中用like 時選擇率一樣)   CARD=NUM_ROW*5%=500  

 


這樣資訊是不準確的  11g對此進行了改善

看下11g情況

11g 提供擴充套件列統計資訊收集(基於表示式or組合列),原理就是為表示式子或一組列建立一個隱藏列,在這個隱藏列上收集資訊.
這個虛擬列並不實際儲存資料,只在執行時根據一個基於其他欄位的表示式生成虛擬列的資訊.(感覺有點象view的意思,無實際內容,節省空間)

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
PL/SQL Release 11.1.0.6.0 - Production
CORE    11.1.0.6.0      Production
TNS for Linux: Version 11.1.0.6.0 - Production
NLSRTL Version 11.1.0.6.0 - Production


SQL> conn xh/a831115
Connected.
SQL> create table t1 (a int);

Table created.

SQL> declare
  2  begin
  3  for i in 1..10000 loop
  4  insert into t1 values(i);
  5  end loop;
  6  commit;
  7  end;
  8  /

PL/SQL procedure successfully completed.


SQL> set autotrace trace exp
SQL> select count(*) from t1 where a>2;

Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     7   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     3 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |  9999 | 29997 |     7   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("A">2)


SQL> select count(*) from t1 where to_char(a)>'2';

Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     7   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     3 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |   500 |  1500 |     7   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(TO_CHAR("A")>'2')

 

SQL> select dbms_stats.create_extended_stats(ownname=>'XH',tabname=>'T1',extension=>'(to_char(a))') from dual;

DBMS_STATS.CREATE_EXTENDED_STATS(OWNNAME=>'XH',TABNAME=>'T1',EXTENSION=>'(TO_CHA
--------------------------------------------------------------------------------
SYS_STUQ8Q3O$I37YENJXKXE0CSTFF  ~~~~~~~~~~~~~~~~~~~~~~~~~~返回這個擴充套件列名(sys_stu開頭)

另外drop_extended_stats可以刪除擴充套件統計資訊(一個procedure)

SQL> desc user_stat_extensions;
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 TABLE_NAME                                NOT NULL VARCHAR2(30)
 EXTENSION_NAME                            NOT NULL VARCHAR2(30)
 EXTENSION                                          CLOB
 CREATOR                                            VARCHAR2(6)
 DROPPABLE                                          VARCHAR2(3)


SQL> col extension for a40
SQL> select extension_name,extension from user_stat_extensions where table_name='T1';

EXTENSION_NAME                 EXTENSION
------------------------------ ----------------------------------------
SYS_STUQ8Q3O$I37YENJXKXE0CSTFF (TO_CHAR("A"))

 


SQL> select column_name,data_type,hidden_column,data_default,NUM_DISTINCT from user_tab_cols where table_name='T1';

COLUMN_NAME                    DATA_TYPE           HID DATA_DEFAU NUM_DISTINCT
------------------------------ ------------------- --- ---------- ------------
SYS_STUQ8Q3O$I37YENJXKXE0CSTFF VARCHAR2            YES TO_CHAR("A
                                                       ")

A                              NUMBER              NO                    10000

]]
SQL> set autotrace trace exp
SQL> select count(*) from t1 where to_char(a)>'2';

Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     7   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     3 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |   500 |  1500 |     7   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("T1"."SYS_STUQ8Q3O$I37YENJXKXE0CSTFF">'2')


SQL> exec dbms_stats.gather_table_stats('XH','T1');

PL/SQL procedure successfully completed.

SQL> set autotrace off
SQL> select column_name,data_type,hidden_column,data_default,NUM_DISTINCT from user_tab_cols where table_name='T1';

COLUMN_NAME                    DATA_TYPE           HID DATA_DEFAU NUM_DISTINCT
------------------------------ ------------------- --- ---------- ------------
SYS_STUQ8Q3O$I37YENJXKXE0CSTFF VARCHAR2            YES TO_CHAR("A        10000 ~~~~~~~~~~~~~~~~~~~~~要重新收集下 統計資訊
                                                       ")

A                              NUMBER              NO                    10000

 

SQL> set autotrace trace exp
SQL> select count(*) from t1 where to_char(a)>'2';

Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     8 |     7   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |  8902 | 71216 |     7   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("T1"."SYS_STUQ8Q3O$I37YENJXKXE0CSTFF">'2')


有改進 但資訊還是不正確,查了下擴充套件列統計資訊基於hash函式(sys_op_combined_hash),所以僅適用與=,(!=)的都不適用無法利用擴充套件統計資訊,不過測試看到還是影響了card且card接近實際值,(也許沒有做足夠做詳細的測試)

 

看下 = 
SQL> show user
USER is "XH"
SQL> create table t2( a varchar2(10));

Table created.

SQL> declare
  2  begin
  3  for i in 1..1000 loop
  4  insert into t2 values('a');
  5  end loop;
  6  commit;
  7  end;
  8  /

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats('XH','T2');

PL/SQL procedure successfully completed.


SQL> set autotrace trace exp
SQL> select * from t2 where a='a';

Execution Plan
----------------------------------------------------------
Plan hash value: 1513984157

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  1000 |  2000 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T2   |  1000 |  2000 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("A"='a')

SQL> select * from t2 where upper(a)='A';

Execution Plan
----------------------------------------------------------
Plan hash value: 1513984157

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    10 |    20 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T2   |    10 |    20 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(UPPER("A")='A')

 

SQL> select dbms_stats.create_extended_stats(ownname=>'XH',tabname=>'T2',extension=>'(upper(a))') from dual;

DBMS_STATS.CREATE_EXTENDED_STATS(OWNNAME=>'XH',TABNAME=>'T2',EXTENSION=>'(UPPER(
--------------------------------------------------------------------------------
SYS_STUWSDP8IKNLLR9QA4XVKCO0BY

SQL> select column_name,data_type,hidden_column,data_default,NUM_DISTINCT from user_tab_cols where table_name='T2';

COLUMN_NAME                    DATA_TYPE           HID DATA_DEFAU NUM_DISTINCT
------------------------------ ------------------- --- ---------- ------------
SYS_STUWSDP8IKNLLR9QA4XVKCO0BY VARCHAR2            YES UPPER("A")
A                              VARCHAR2            NO                        1

 

SQL> exec dbms_stats.gather_table_stats('XH','T2');

PL/SQL procedure successfully completed.

SQL> select column_name,data_type,hidden_column,data_default,NUM_DISTINCT from user_tab_cols where table_name='T2';

COLUMN_NAME                    DATA_TYPE           HID DATA_DEFAU NUM_DISTINCT
------------------------------ ------------------- --- ---------- ------------
SYS_STUWSDP8IKNLLR9QA4XVKCO0BY VARCHAR2            YES UPPER("A")            1
A                              VARCHAR2            NO                        1

SQL>

 

SQL> set autotrace trace exp
SQL> select * from t2 where upper(a)='A';

Execution Plan
----------------------------------------------------------
Plan hash value: 1513984157

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  1000 |  4000 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T2   |  1000 |  4000 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("T2"."SYS_STUWSDP8IKNLLR9QA4XVKCO0BY"='A')


解決問題了
如果頻繁使用表示式,也可以考慮在定義table時 定義虛擬列 (頻繁使用的前提)

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Prod
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

SQL> show user
USER is "XH"
SQL> create table t2 (a int ,a_to_char as (to_char(a)));
create table t2 (a int ,a_to_char as (to_char(a)))
                                  *
ERROR at line 1:
ORA-00902: invalid datatype


11g才支援

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
PL/SQL Release 11.1.0.6.0 - Production
CORE    11.1.0.6.0      Production
TNS for Linux: Version 11.1.0.6.0 - Production
NLSRTL Version 11.1.0.6.0 - Production

SQL> show user
USER is "XH"


SQL> create table t3 (a int ,a_to_char as (to_char(a)));

Table created.

建立 表時候定義虛擬列

 

SQL> col data_type for a10
SQL> col data_default for a10
SQL> select column_name,data_type,hidden_column,data_default,NUM_DISTINCT from user_tab_cols where table_name='T3';

COLUMN_NAME                    DATA_TYPE  HID DATA_DEFAU NUM_DISTINCT
------------------------------ ---------- --- ---------- ------------
A_TO_CHAR                      VARCHAR2   NO  TO_CHAR("A
                                              ")

A                              NUMBER     NO

\
SQL> execute dbms_stats.gather_table_stats('XH','T3');

PL/SQL procedure successfully completed.

SQL> select column_name,data_type,hidden_column,data_default,NUM_DISTINCT from user_tab_cols where table_name='T3';

COLUMN_NAME                    DATA_TYPE  HID DATA_DEFAU NUM_DISTINCT
------------------------------ ---------- --- ---------- ------------
A_TO_CHAR                      VARCHAR2   NO  TO_CHAR("A            1
                                              ")

A                              NUMBER     NO                        1

 

SQL> set autotrace trace exp
SQL> select a from t3 where a_to_char='1';

Execution Plan
----------------------------------------------------------
Plan hash value: 4161002650

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |     5 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T3   |     1 |     5 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("A_TO_CHAR"='1')

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/12020513/viewspace-624110/,如需轉載,請註明出處,否則將追究法律責任。

相關文章