【STAT】函式索引和使用表示式統計資訊有什麼不同
How Does Using Expression Statistics Differ from Using a Function-Based Index? (Doc ID 1489505.1) | To Bottom |
|
其實不管是表示式統計資訊還是 函式索引,都會建立一個虛擬列。 表示式統計資訊 還有合併列統計資訊 都是11g之後的特性,如果一張表中兩個列關聯性比較強,可以建立合併列統計資訊,並收集,提高效率。
In this Document
|
Goal |
|
Solution |
APPLIES TO:
Oracle Database Backup Service - Version N/A and later
Oracle Database Cloud Exadata Service - Version N/A and later
Oracle Database Cloud Service - Version N/A and later
Oracle Database - Enterprise Edition - Version 11.1.0.6 and later
Oracle Database Cloud Schema Service - Version N/A and later
Information in this document applies to any platform.
GOAL
Function-based indexes have existed for several releases of Oracle Database. They're very useful when a column appears frequently as either an argument to a function or within an arithmetic expression within the predicate (or WHERE clause). However, it was not until Release 11g that Oracle introduced extended statistics, of which there are two types: multicolumn statistics and expression statistics. Expression statistics allow you to collect statistics on a column involved in a function call or arithmetic expression, even in the absence of a function-based index. You may be asking yourself, "If I already have a function-based index, then do I really need expression statistics, too? Isn't that overkill?" The answer is "Yes, provided that you know how to collect the appropriate statistics on the function-based index."
Walking through an example will pull all of this information into focus. (Note: for purposes of these examples, we'll run with dynamic sampling disabled.)
SOLUTION
Let's create a table T with sample data.
It will have slightly over 75,000 rows.
In a copy of the DBA_OBJECTS view, you'll typically find many more objects of type TABLE than of type DIRECTORY. Therefore, type TABLE could be considered a popular entry for column OBJECT_TYPE. We'll create a function-based index on the function LOWER(object_type). Next, we'll collect statistics on the table.
SQL> CREATE TABLE t 2 AS SELECT owner, object_name, object_type 3 FROM dba_objects; Table created. SQL> SELECT object_type, COUNT(*) HOW_MANY 2 FROM t 3 WHERE object_type IN ('TABLE','DIRECTORY') 4 GROUP BY object_type 5 ORDER BY 1; OBJECT_TYPE HOW_MANY ------------------- ---------- DIRECTORY 9 TABLE 74827 SQL> CREATE INDEX i_obj_type_lower ON t (lower(object_type)); Index created. SQL> EXEC dbms_stats.gather_table_stats(USER,'T'); PL/SQL procedure successfully completed.
When you search the table for objects of type "directory" (in lowercase), you would expect that the optimizer would choose an index range scan on index i_obj_type_lower, and you're correct. However, why do you see the same execution plan when you search on the lowercase object type of "table"? Wouldn't a full table scan be a better choice? Furthermore, why does the optimizer make the same cardinality estimate (2,519 rows) in each case?
SQL> set autotrace traceonly explain SQL> SELECT owner, COUNT(*) HOW_MANY 2 FROM t 3 WHERE LOWER(object_type) = 'directory' 4 GROUP BY owner; Execution Plan ---------------------------------------------------------- Plan hash value: 1885952798 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 36 | 468 | 29 (4) | 00:00:01 | | 1 | HASH GROUP BY | | 36 | 468 | 29 (4) | 00:00:01 | | 2 | TABLE ACCESS BY INDEX ROWID| T | 2519 | 32747 | 28 (0) | 00:00:01 | |* 3 | INDEX RANGE SCAN | I_OBJ_TYPE_LOWER | 2519 | | 6 (0) | 00:00:01 | Predicate Information (identified by operation id): --------------------------------------------------- 3 - access(LOWER("OBJECT_TYPE")='directory') SQL> SELECT owner, COUNT(*) HOW_MANY 2 FROM t 3 WHERE LOWER(object_type) = 'table' 4 GROUP BY owner; Execution Plan ---------------------------------------------------------- Plan hash value: 1885952798 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 36 | 468 | 29 (4) | 00:00:01 | | 1 | HASH GROUP BY | | 36 | 468 | 29 (4) | 00:00:01 | | 2 | TABLE ACCESS BY INDEX ROWID| T | 2519 | 32747 | 28 (0) | 00:00:01 | |* 3 | INDEX RANGE SCAN | I_OBJ_TYPE_LOWER | 2519 | | 6 (0) | 00:00:01 | Predicate Information (identified by operation id): --------------------------------------------------- 3 - access(LOWER("OBJECT_TYPE")='table')
The problem is that you still haven't given the cost-based optimizer (CBO) all the information that it needs. As it turns out, when you created the function-based index, Oracle Database added a virtual column named "SYS_NC00004$" to the table, but given that the column contains skewed data, you have to provide a histogram on the data distribution. Oracle Database will "learn," after observing a variety of queries against the table, that the histogram is necessary. However, let's create it right away, so that all of our execution plans are as accurate as possible.
SQL> SELECT column_name, num_distinct, histogram, virtual_column, 2 data_default 3 FROM user_tab_cols 4 WHERE table_name='T' 5 ORDER BY column_id; COLUMN_NAME NUM_DISTINCT HISTOGRAM VIR DATA_DEFAULT --------------- ------------ --------------- --- -------------------- OWNER 36 NONE NO OBJECT_NAME 46424 NONE NO OBJECT_TYPE 30 FREQUENCY NO SYS_NC00004$ 30 NONE YES LOWER("OBJECT_TYPE") SQL> EXEC dbms_stats.gather_table_stats(USER,'T', - method_opt=>'FOR ALL COLUMNS SIZE SKEWONLY FOR COLUMNS SYS_NC00004$ SIZE 254'); PL/SQL procedure successfully completed.
Now, the CBO has all the information it needs to choose an index range scan for type "directory" and a full table scan for type "table".
SQL> set autotrace traceonly explain SQL> SELECT owner, COUNT(*) HOW_MANY 2 FROM t 3 WHERE LOWER(object_type) = 'directory' 4 GROUP BY owner; Execution Plan ---------------------------------------------------------- Plan hash value: 1885952798 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 11 | 143 | 3 (34)| 00:00:01 | | 1 | HASH GROUP BY | | 11 | 143 | 3 (34)| 00:00:01 | | 2 | TABLE ACCESS BY INDEX ROWID| T | 14 | 182 | 2 (0)| 00:00:01 | |* 3 | INDEX RANGE SCAN | I_OBJ_TYPE_LOWER | 7 | | 1 (0)| 00:00:01 | Predicate Information (identified by operation id): --------------------------------------------------- 3 - access(LOWER("OBJECT_TYPE")='directory') SQL> SELECT owner, COUNT(*) HOW_MANY 2 FROM t 3 WHERE LOWER(object_type) = 'table' 4 GROUP BY owner; Execution Plan ---------------------------------------------------------- Plan hash value: 47235625 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 36 | 468 | 136 (5)| 00:00:02 | | 1 | HASH GROUP BY | | 36 | 468 | 136 (5)| 00:00:02 | |* 2 | TABLE ACCESS FULL| T | 74979 | 951K| 132 (2)| 00:00:02 | Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter(LOWER("OBJECT_TYPE")='table')
Note that the cardinality estimates are much more accurate now, too (fourteen for type "directory and 74,979 for type "table").
So far, these steps are nothing new. You have been able to collect statistics on a function-based index since Release 9i. Now, let's discuss expression statistics. When are they necessary? Well, there may be occasions in which you don't need a function-based index as an access path (such as when the table contains an alternative access path -- perhaps an index on another column), but you still would like the CBO to understand the distribution of the column data. Let's assume that we create our table "T" again, but this time we won't create the function-based index.
SQL> CREATE TABLE t 2 AS SELECT owner, object_name, object_type 3 FROM dba_objects; Table created. SQL> EXEC dbms_stats.gather_table_stats(USER,'T', - method_opt=>'for all columns size skewonly'); PL/SQL procedure successfully completed. SQL> set autotrace traceonly explain SQL> SELECT owner, COUNT(*) HOW_MANY 2 FROM t 3 WHERE LOWER(object_type) = 'directory' 4 GROUP BY owner; Execution Plan ---------------------------------------------------------- Plan hash value: 47235625 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 36 | 468 | 133 (3)| 00:00:02 | | 1 | HASH GROUP BY | | 36 | 468 | 133 (3)| 00:00:02 | |* 2 | TABLE ACCESS FULL| T | 756 | 9828 | 132 (2)| 00:00:02 | Predicate Information (identified by operation id): --------------------------------------------------- 3 - access(LOWER("OBJECT_TYPE")='directory') SQL> SELECT owner, COUNT(*) HOW_MANY 2 FROM t 3 WHERE LOWER(object_type) = 'table' 4 GROUP BY owner; Execution Plan ---------------------------------------------------------- Plan hash value: 47235625 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 36 | 468 | 133 (3)| 00:00:02 | | 1 | HASH GROUP BY | | 36 | 468 | 133 (3)| 00:00:02 | |* 2 | TABLE ACCESS FULL| T | 756 | 9828 | 132 (2)| 00:00:02 | Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter(LOWER("OBJECT_TYPE")='table')
Now, you're in a situation similar to that you encountered before having gathered statistics on the virtual column, which represented the function-based index in the previous example. That is, both cardinality estimates are the same -- this time, the estimate is 756 rows, or approximately one percent of the total row count. Naturally, in this case the execution plan relies upon a full table scan of "T", since there is no function-based index.
The same, inaccurate guess of one percent occurs, even when you involve table "T" in a join -- for example, with the view DBA_OBJECTS.
SQL> set autotrace traceonly explain SQL> SELECT T1.owner, COUNT(*) HOW_MANY 2 FROM t T1, dba_objects T2 3 WHERE T1.owner = T2.owner 4 AND T1.object_name = T2.object_name 5 AND T1.object_type = T2. object_type 6 AND lower(T1.object_type) = 'directory' 7 GROUP BY T1.owner; Execution Plan ---------------------------------------------------------- Plan hash value: 1537169927 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 2 | 170 | 405 (4)| 00:00:05 | | 1 | HASH GROUP BY | | 2 | 170 | 405 (4)| 00:00:05 | |* 2 | HASH JOIN | | 75913 | 6301K| 401 (3)| 00:00:05 | |* 3 | TABLE ACCESS FULL | T | 756 | 27216 | 132 (2)| 00:00:02 | | 4 | VIEW | DBA_OBJECTS | 75940 | 3633K| 268 (3)| 00:00:04 | | 5 | UNION-ALL | | | | | | |* 6 | TABLE ACCESS BY INDEX ROWID| SUM$ | 1 | 9 | 1 (0)| 00:00:01 | |* 7 | INDEX UNIQUE SCAN | I_SUM$_1 | 1 | | 0 (0)| 00:00:01 | |* 8 | FILTER | | | | | | |* 9 | HASH JOIN | | 75939 | 6229K| 266 (3)| 00:00:04 | | 10 | TABLE ACCESS FULL | USER$ | 104 | 1768 | 3 (0)| 00:00:01 | |* 11 | HASH JOIN | | 75939 | 4968K| 262 (2)| 00:00:04 | | 12 | INDEX FULL SCAN | I_USER2 | 104 | 2288 | 1 (0)| 00:00:01 | |* 13 | TABLE ACCESS FULL | OBJ$ | 75939 | 3337K| 260 (2)| 00:00:04 | | 14 | NESTED LOOPS | | 1 | 29 | 2 (0)| 00:00:01 | |* 15 | INDEX SKIP SCAN | I_USER2 | 1 | 20 | 1 (0)| 00:00:01 | |* 16 | INDEX RANGE SCAN | I_OBJ4 | 1 | 9 | 1 (0)| 00:00:01 | | 17 | NESTED LOOPS | | 1 | 43 | 2 (0)| 00:00:01 | | 18 | INDEX FULL SCAN | I_LINK1 | 1 | 26 | 1 (0)| 00:00:01 | | 19 | TABLE ACCESS CLUSTER | USER$ | 1 | 17 | 1 (0)| 00:00:01 | |* 20 | INDEX UNIQUE SCAN | I_USER# | 1 | | 0 (0)| 00:00:01 |
Now, let's create expression statistics on the OBJECT_TYPE column, using the new syntax introduced in 11g.
First, you provide the definition of the expression statistics, and then you collect them.
SQL> SELECT dbms_stats.create_extended_stats(null,'T', '(LOWER(OBJECT_TYPE))') NEW_VIRT_COL 2 FROM dual; NEW_VIRT_COL ---------------------------------------- SYS_STUVJ5LMUFWDLVURW#_ROXEK1U SQL> EXEC dbms_stats.gather_table_stats(USER,'T', method_opt=>'for all columns size skewonly'); PL/SQL procedure successfully completed.
Oracle Database also creates a virtual column in this case, but it begins with "SYS_STU" instead of "SYS_NC", as was the case with the function-based index.
Note that Oracle Database already has identified that a histogram is in order in this case.
SQL> SELECT column_name, num_distinct, histogram, virtual_column 2 FROM user_tab_cols 3 WHERE table_name='T' 4 ORDER BY column_id; COLUMN_NAME NUM_DISTINCT HISTOGRAM VIR ------------------------------ ------------ --------------- --- OWNER 36 FREQUENCY NO OBJECT_NAME 46420 HEIGHT BALANCED NO OBJECT_TYPE 30 FREQUENCY NO SYS_STUVJ5LMUFWDLVURW#_ROXEK1U 30 FREQUENCY YES
The new, virtual column name has also been entered into USER_STAT_EXTENSIONS.
SQL> SELECT extension_name, extension 2 FROM user_stat_extensions 3 WHERE table_name='T'; EXTENSION_NAME EXTENSION ------------------------------ ---------------------- SYS_STUVJ5LMUFWDLVURW#_ROXEK1U (LOWER("OBJECT_TYPE"))
In this simplified example, even though the CBO still has to choose a full table scan on table "T", you'll find that the cardinality estimates are much more accurate: an estimate of fourteen rows of type "directory", and 74,734 of type "table".
SQL> SELECT owner, COUNT(*) HOW_MANY 2 FROM t 3 WHERE lower(object_type) = 'directory' 4 GROUP BY owner; Execution Plan ---------------------------------------------------------- Plan hash value: 47235625 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 11 | 143 | 133 (3)| 00:00:02 | | 1 | HASH GROUP BY | | 11 | 143 | 133 (3)| 00:00:02 | |* 2 | TABLE ACCESS FULL| T | 14 | 182 | 132 (2)| 00:00:02 | SQL> SELECT owner, COUNT(*) HOW_MANY 2 FROM t 3 WHERE lower(object_type) = 'table' 4 GROUP BY owner; Execution Plan ---------------------------------------------------------- Plan hash value: 47235625 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 36 | 468 | 136 (5)| 00:00:02 | | 1 | HASH GROUP BY | | 36 | 468 | 136 (5)| 00:00:02 | |* 2 | TABLE ACCESS FULL| T | 74734 | 948K| 132 (2)| 00:00:02 |
Finally, you'll notice that the CBO devised a different execution plan for the table join with DBA_OBJECTS than the one seen earlier, using the same estimate of fourteen rows of type "directory":
SQL> SELECT T1.owner, COUNT(*) HOW_MANY 2 FROM t T1, dba_objects T2 3 WHERE T1.owner = T2.owner 4 AND T1.object_name = T2.object_name 5 AND T1.object_type = T2. object_type 6 AND lower(T1.object_type) = 'directory' 7 GROUP BY T1.owner; Execution Plan ---------------------------------------------------------- Plan hash value: 4292311508 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 2 | 182 | 220 (3)| 00:00:03 | | 1 | HASH GROUP BY | | 2 | 182 | 220 (3)| 00:00:03 | | 2 | NESTED LOOPS | | 73585 | 6539K| 216 (1)| 00:00:03 | |* 3 | TABLE ACCESS FULL | T | 14 | 588 | 132 (2)| 00:00:02 | |* 4 | VIEW | DBA_OBJECTS | 1 | 49 | 6 (0)| 00:00:01 | | 5 | UNION ALL PUSHED PREDICATE | | | | | | |* 6 | TABLE ACCESS BY INDEX ROWID | SUM$ | 1 | 9 | 1 (0)| 00:00:01 | |* 7 | INDEX UNIQUE SCAN | I_SUM$_1 | 1 | | 0 (0)| 00:00:01 | |* 8 | FILTER | | | | | | |* 9 | FILTER | | | | | | | 10 | NESTED LOOPS | | 1 | 84 | 5 (0)| 00:00:01 | | 11 | NESTED LOOPS | | 1 | 62 | 4 (0)| 00:00:01 | | 12 | TABLE ACCESS BY INDEX ROWID| USER$ | 1 | 17 | 1 (0)| 00:00:01 | |* 13 | INDEX UNIQUE SCAN | I_USER1 | 1 | | 0 (0)| 00:00:01 | |* 14 | TABLE ACCESS BY INDEX ROWID| OBJ$ | 1 | 45 | 3 (0)| 00:00:01 | |* 15 | INDEX RANGE SCAN | I_OBJ5 | 1 | | 2 (0)| 00:00:01 | |* 16 | INDEX RANGE SCAN | I_USER2 | 1 | 22 | 1 (0)| 00:00:01 | | 17 | NESTED LOOPS | | 1 | 29 | 2 (0)| 00:00:01 | |* 18 | INDEX SKIP SCAN | I_USER2 | 1 | 20 | 1 (0)| 00:00:01 | |* 19 | INDEX RANGE SCAN | I_OBJ4 | 1 | 9 | 1 (0)| 00:00:01 | | 20 | NESTED LOOPS | | 1 | 43 | 1 (0)| 00:00:01 | | 21 | TABLE ACCESS BY INDEX ROWID | USER$ | 1 | 17 | 1 (0)| 00:00:01 | |* 22 | INDEX UNIQUE SCAN | I_USER1 | 1 | | 0 (0)| 00:00:01 | |* 23 | INDEX RANGE SCAN | I_LINK1 | 1 | 26 | 0 (0)| 00:00:01 |
In summary you have seen that if a function-based index exists, and if (in the case of skewed data) you collect statistics on the virtual column representing this index, then expression statistics are unnecessary. However, in the situation in which a function-based index is not necessary, but you still need to provide the CBO with a clear picture of the column's data distribution in the context of a join (or, perhaps, when a choice of other access paths exists on a table -- e.g., indexes on other columns), then expression statistics play a vital role.
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29487349/viewspace-2920638/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 語句和表示式有什麼不同
- 函式宣告與函式表示式有什麼區別?函式
- 什麼是IIFE(立即呼叫函式表示式)?函式
- 【STAT】Oracle 表統計資訊被鎖,如何建立索引Oracle索引
- kotlin 函式和 Lambda 表示式Kotlin函式
- 何時使用函式表示式與函式宣告函式
- JavaScript函式宣告和函式表示式區別JavaScript函式
- Python range()函式有什麼作用?如何使用?Python函式
- 5.函式和lambda表示式函式
- 什麼是Lambda表示式?有什麼優點?
- 【Python】python map()函式和lambda表示式Python函式
- class和普通建構函式有什麼區別?函式
- ES6類和ES5函式建構函式有什麼區別?函式
- Python中eval函式的表示式如何使用Python函式
- JS 中的函式表示式和函式宣告你混淆了嗎?JS函式
- Python函式與lambda 表示式(匿名函式)Python函式
- javascript-函式表示式JavaScript函式
- 函式表示式–遞迴函式遞迴
- Oracle OCP(06):通用函式和條件表示式Oracle函式
- JS函式表示式——函式遞迴、閉包JS函式遞迴
- vue2專案中 箭頭函式和普通函式里面 this的指向有何不同?Vue函式
- matlab表示函式Matlab函式
- Lambda表示式入門--函數語言程式設計與函式式介面函數程式設計函式
- 你不知道的JavaScript--Item7 函式和(命名)函式表示式JavaScript函式
- 【碼上開學】Kotlin 的高階函式、匿名函式和 Lambda 表示式Kotlin函式
- 函式: 函式是怎麼使用的?函式
- SCRM系統和CRM系統有什麼不同?
- Oracle 正規表示式函式-REGEXP_REPLACE 使用例子Oracle函式
- Qt訊號 lamda 表示式使用QT
- 什麼是正規表示式?
- 什麼是正規表示式
- 什麼是Python函式?如何定義函式?Python函式
- 互斥鎖和訊號量有什麼不同?(譯)
- 雲組態和傳統組態有什麼不同?有什麼功能?
- jdk1.8-Lambda函式表示式JDK函式
- 課時21:函式:lambda表示式函式
- jdk1.8Lambda函式表示式JDK函式
- 初識Lambda表示式(匿名函式)函式