【STAT】函式索引和使用表示式統計資訊有什麼不同
How Does Using Expression Statistics Differ from Using a Function-Based Index? (Doc ID 1489505.1) | To Bottom |
|
其實不管是表示式統計資訊還是 函式索引,都會建立一個虛擬列。 表示式統計資訊 還有合併列統計資訊 都是11g之後的特性,如果一張表中兩個列關聯性比較強,可以建立合併列統計資訊,並收集,提高效率。
In this Document
|
Goal |
|
Solution |
APPLIES TO:
Oracle Database Backup Service - Version N/A and later
Oracle Database Cloud Exadata Service - Version N/A and later
Oracle Database Cloud Service - Version N/A and later
Oracle Database - Enterprise Edition - Version 11.1.0.6 and later
Oracle Database Cloud Schema Service - Version N/A and later
Information in this document applies to any platform.
GOAL
Function-based indexes have existed for several releases of Oracle Database. They're very useful when a column appears frequently as either an argument to a function or within an arithmetic expression within the predicate (or WHERE clause). However, it was not until Release 11g that Oracle introduced extended statistics, of which there are two types: multicolumn statistics and expression statistics. Expression statistics allow you to collect statistics on a column involved in a function call or arithmetic expression, even in the absence of a function-based index. You may be asking yourself, "If I already have a function-based index, then do I really need expression statistics, too? Isn't that overkill?" The answer is "Yes, provided that you know how to collect the appropriate statistics on the function-based index."
Walking through an example will pull all of this information into focus. (Note: for purposes of these examples, we'll run with dynamic sampling disabled.)
SOLUTION
Let's create a table T with sample data.
It will have slightly over 75,000 rows.
In a copy of the DBA_OBJECTS view, you'll typically find many more objects of type TABLE than of type DIRECTORY. Therefore, type TABLE could be considered a popular entry for column OBJECT_TYPE. We'll create a function-based index on the function LOWER(object_type). Next, we'll collect statistics on the table.
SQL> CREATE TABLE t 2 AS SELECT owner, object_name, object_type 3 FROM dba_objects; Table created. SQL> SELECT object_type, COUNT(*) HOW_MANY 2 FROM t 3 WHERE object_type IN ('TABLE','DIRECTORY') 4 GROUP BY object_type 5 ORDER BY 1; OBJECT_TYPE HOW_MANY ------------------- ---------- DIRECTORY 9 TABLE 74827 SQL> CREATE INDEX i_obj_type_lower ON t (lower(object_type)); Index created. SQL> EXEC dbms_stats.gather_table_stats(USER,'T'); PL/SQL procedure successfully completed.
When you search the table for objects of type "directory" (in lowercase), you would expect that the optimizer would choose an index range scan on index i_obj_type_lower, and you're correct. However, why do you see the same execution plan when you search on the lowercase object type of "table"? Wouldn't a full table scan be a better choice? Furthermore, why does the optimizer make the same cardinality estimate (2,519 rows) in each case?
SQL> set autotrace traceonly explain SQL> SELECT owner, COUNT(*) HOW_MANY 2 FROM t 3 WHERE LOWER(object_type) = 'directory' 4 GROUP BY owner; Execution Plan ---------------------------------------------------------- Plan hash value: 1885952798 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 36 | 468 | 29 (4) | 00:00:01 | | 1 | HASH GROUP BY | | 36 | 468 | 29 (4) | 00:00:01 | | 2 | TABLE ACCESS BY INDEX ROWID| T | 2519 | 32747 | 28 (0) | 00:00:01 | |* 3 | INDEX RANGE SCAN | I_OBJ_TYPE_LOWER | 2519 | | 6 (0) | 00:00:01 | Predicate Information (identified by operation id): --------------------------------------------------- 3 - access(LOWER("OBJECT_TYPE")='directory') SQL> SELECT owner, COUNT(*) HOW_MANY 2 FROM t 3 WHERE LOWER(object_type) = 'table' 4 GROUP BY owner; Execution Plan ---------------------------------------------------------- Plan hash value: 1885952798 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 36 | 468 | 29 (4) | 00:00:01 | | 1 | HASH GROUP BY | | 36 | 468 | 29 (4) | 00:00:01 | | 2 | TABLE ACCESS BY INDEX ROWID| T | 2519 | 32747 | 28 (0) | 00:00:01 | |* 3 | INDEX RANGE SCAN | I_OBJ_TYPE_LOWER | 2519 | | 6 (0) | 00:00:01 | Predicate Information (identified by operation id): --------------------------------------------------- 3 - access(LOWER("OBJECT_TYPE")='table')
The problem is that you still haven't given the cost-based optimizer (CBO) all the information that it needs. As it turns out, when you created the function-based index, Oracle Database added a virtual column named "SYS_NC00004$" to the table, but given that the column contains skewed data, you have to provide a histogram on the data distribution. Oracle Database will "learn," after observing a variety of queries against the table, that the histogram is necessary. However, let's create it right away, so that all of our execution plans are as accurate as possible.
SQL> SELECT column_name, num_distinct, histogram, virtual_column, 2 data_default 3 FROM user_tab_cols 4 WHERE table_name='T' 5 ORDER BY column_id; COLUMN_NAME NUM_DISTINCT HISTOGRAM VIR DATA_DEFAULT --------------- ------------ --------------- --- -------------------- OWNER 36 NONE NO OBJECT_NAME 46424 NONE NO OBJECT_TYPE 30 FREQUENCY NO SYS_NC00004$ 30 NONE YES LOWER("OBJECT_TYPE") SQL> EXEC dbms_stats.gather_table_stats(USER,'T', - method_opt=>'FOR ALL COLUMNS SIZE SKEWONLY FOR COLUMNS SYS_NC00004$ SIZE 254'); PL/SQL procedure successfully completed.
Now, the CBO has all the information it needs to choose an index range scan for type "directory" and a full table scan for type "table".
SQL> set autotrace traceonly explain SQL> SELECT owner, COUNT(*) HOW_MANY 2 FROM t 3 WHERE LOWER(object_type) = 'directory' 4 GROUP BY owner; Execution Plan ---------------------------------------------------------- Plan hash value: 1885952798 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 11 | 143 | 3 (34)| 00:00:01 | | 1 | HASH GROUP BY | | 11 | 143 | 3 (34)| 00:00:01 | | 2 | TABLE ACCESS BY INDEX ROWID| T | 14 | 182 | 2 (0)| 00:00:01 | |* 3 | INDEX RANGE SCAN | I_OBJ_TYPE_LOWER | 7 | | 1 (0)| 00:00:01 | Predicate Information (identified by operation id): --------------------------------------------------- 3 - access(LOWER("OBJECT_TYPE")='directory') SQL> SELECT owner, COUNT(*) HOW_MANY 2 FROM t 3 WHERE LOWER(object_type) = 'table' 4 GROUP BY owner; Execution Plan ---------------------------------------------------------- Plan hash value: 47235625 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 36 | 468 | 136 (5)| 00:00:02 | | 1 | HASH GROUP BY | | 36 | 468 | 136 (5)| 00:00:02 | |* 2 | TABLE ACCESS FULL| T | 74979 | 951K| 132 (2)| 00:00:02 | Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter(LOWER("OBJECT_TYPE")='table')
Note that the cardinality estimates are much more accurate now, too (fourteen for type "directory and 74,979 for type "table").
So far, these steps are nothing new. You have been able to collect statistics on a function-based index since Release 9i. Now, let's discuss expression statistics. When are they necessary? Well, there may be occasions in which you don't need a function-based index as an access path (such as when the table contains an alternative access path -- perhaps an index on another column), but you still would like the CBO to understand the distribution of the column data. Let's assume that we create our table "T" again, but this time we won't create the function-based index.
SQL> CREATE TABLE t 2 AS SELECT owner, object_name, object_type 3 FROM dba_objects; Table created. SQL> EXEC dbms_stats.gather_table_stats(USER,'T', - method_opt=>'for all columns size skewonly'); PL/SQL procedure successfully completed. SQL> set autotrace traceonly explain SQL> SELECT owner, COUNT(*) HOW_MANY 2 FROM t 3 WHERE LOWER(object_type) = 'directory' 4 GROUP BY owner; Execution Plan ---------------------------------------------------------- Plan hash value: 47235625 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 36 | 468 | 133 (3)| 00:00:02 | | 1 | HASH GROUP BY | | 36 | 468 | 133 (3)| 00:00:02 | |* 2 | TABLE ACCESS FULL| T | 756 | 9828 | 132 (2)| 00:00:02 | Predicate Information (identified by operation id): --------------------------------------------------- 3 - access(LOWER("OBJECT_TYPE")='directory') SQL> SELECT owner, COUNT(*) HOW_MANY 2 FROM t 3 WHERE LOWER(object_type) = 'table' 4 GROUP BY owner; Execution Plan ---------------------------------------------------------- Plan hash value: 47235625 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 36 | 468 | 133 (3)| 00:00:02 | | 1 | HASH GROUP BY | | 36 | 468 | 133 (3)| 00:00:02 | |* 2 | TABLE ACCESS FULL| T | 756 | 9828 | 132 (2)| 00:00:02 | Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter(LOWER("OBJECT_TYPE")='table')
Now, you're in a situation similar to that you encountered before having gathered statistics on the virtual column, which represented the function-based index in the previous example. That is, both cardinality estimates are the same -- this time, the estimate is 756 rows, or approximately one percent of the total row count. Naturally, in this case the execution plan relies upon a full table scan of "T", since there is no function-based index.
The same, inaccurate guess of one percent occurs, even when you involve table "T" in a join -- for example, with the view DBA_OBJECTS.
SQL> set autotrace traceonly explain SQL> SELECT T1.owner, COUNT(*) HOW_MANY 2 FROM t T1, dba_objects T2 3 WHERE T1.owner = T2.owner 4 AND T1.object_name = T2.object_name 5 AND T1.object_type = T2. object_type 6 AND lower(T1.object_type) = 'directory' 7 GROUP BY T1.owner; Execution Plan ---------------------------------------------------------- Plan hash value: 1537169927 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 2 | 170 | 405 (4)| 00:00:05 | | 1 | HASH GROUP BY | | 2 | 170 | 405 (4)| 00:00:05 | |* 2 | HASH JOIN | | 75913 | 6301K| 401 (3)| 00:00:05 | |* 3 | TABLE ACCESS FULL | T | 756 | 27216 | 132 (2)| 00:00:02 | | 4 | VIEW | DBA_OBJECTS | 75940 | 3633K| 268 (3)| 00:00:04 | | 5 | UNION-ALL | | | | | | |* 6 | TABLE ACCESS BY INDEX ROWID| SUM$ | 1 | 9 | 1 (0)| 00:00:01 | |* 7 | INDEX UNIQUE SCAN | I_SUM$_1 | 1 | | 0 (0)| 00:00:01 | |* 8 | FILTER | | | | | | |* 9 | HASH JOIN | | 75939 | 6229K| 266 (3)| 00:00:04 | | 10 | TABLE ACCESS FULL | USER$ | 104 | 1768 | 3 (0)| 00:00:01 | |* 11 | HASH JOIN | | 75939 | 4968K| 262 (2)| 00:00:04 | | 12 | INDEX FULL SCAN | I_USER2 | 104 | 2288 | 1 (0)| 00:00:01 | |* 13 | TABLE ACCESS FULL | OBJ$ | 75939 | 3337K| 260 (2)| 00:00:04 | | 14 | NESTED LOOPS | | 1 | 29 | 2 (0)| 00:00:01 | |* 15 | INDEX SKIP SCAN | I_USER2 | 1 | 20 | 1 (0)| 00:00:01 | |* 16 | INDEX RANGE SCAN | I_OBJ4 | 1 | 9 | 1 (0)| 00:00:01 | | 17 | NESTED LOOPS | | 1 | 43 | 2 (0)| 00:00:01 | | 18 | INDEX FULL SCAN | I_LINK1 | 1 | 26 | 1 (0)| 00:00:01 | | 19 | TABLE ACCESS CLUSTER | USER$ | 1 | 17 | 1 (0)| 00:00:01 | |* 20 | INDEX UNIQUE SCAN | I_USER# | 1 | | 0 (0)| 00:00:01 |
Now, let's create expression statistics on the OBJECT_TYPE column, using the new syntax introduced in 11g.
First, you provide the definition of the expression statistics, and then you collect them.
SQL> SELECT dbms_stats.create_extended_stats(null,'T', '(LOWER(OBJECT_TYPE))') NEW_VIRT_COL 2 FROM dual; NEW_VIRT_COL ---------------------------------------- SYS_STUVJ5LMUFWDLVURW#_ROXEK1U SQL> EXEC dbms_stats.gather_table_stats(USER,'T', method_opt=>'for all columns size skewonly'); PL/SQL procedure successfully completed.
Oracle Database also creates a virtual column in this case, but it begins with "SYS_STU" instead of "SYS_NC", as was the case with the function-based index.
Note that Oracle Database already has identified that a histogram is in order in this case.
SQL> SELECT column_name, num_distinct, histogram, virtual_column 2 FROM user_tab_cols 3 WHERE table_name='T' 4 ORDER BY column_id; COLUMN_NAME NUM_DISTINCT HISTOGRAM VIR ------------------------------ ------------ --------------- --- OWNER 36 FREQUENCY NO OBJECT_NAME 46420 HEIGHT BALANCED NO OBJECT_TYPE 30 FREQUENCY NO SYS_STUVJ5LMUFWDLVURW#_ROXEK1U 30 FREQUENCY YES
The new, virtual column name has also been entered into USER_STAT_EXTENSIONS.
SQL> SELECT extension_name, extension 2 FROM user_stat_extensions 3 WHERE table_name='T'; EXTENSION_NAME EXTENSION ------------------------------ ---------------------- SYS_STUVJ5LMUFWDLVURW#_ROXEK1U (LOWER("OBJECT_TYPE"))
In this simplified example, even though the CBO still has to choose a full table scan on table "T", you'll find that the cardinality estimates are much more accurate: an estimate of fourteen rows of type "directory", and 74,734 of type "table".
SQL> SELECT owner, COUNT(*) HOW_MANY 2 FROM t 3 WHERE lower(object_type) = 'directory' 4 GROUP BY owner; Execution Plan ---------------------------------------------------------- Plan hash value: 47235625 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 11 | 143 | 133 (3)| 00:00:02 | | 1 | HASH GROUP BY | | 11 | 143 | 133 (3)| 00:00:02 | |* 2 | TABLE ACCESS FULL| T | 14 | 182 | 132 (2)| 00:00:02 | SQL> SELECT owner, COUNT(*) HOW_MANY 2 FROM t 3 WHERE lower(object_type) = 'table' 4 GROUP BY owner; Execution Plan ---------------------------------------------------------- Plan hash value: 47235625 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 36 | 468 | 136 (5)| 00:00:02 | | 1 | HASH GROUP BY | | 36 | 468 | 136 (5)| 00:00:02 | |* 2 | TABLE ACCESS FULL| T | 74734 | 948K| 132 (2)| 00:00:02 |
Finally, you'll notice that the CBO devised a different execution plan for the table join with DBA_OBJECTS than the one seen earlier, using the same estimate of fourteen rows of type "directory":
SQL> SELECT T1.owner, COUNT(*) HOW_MANY 2 FROM t T1, dba_objects T2 3 WHERE T1.owner = T2.owner 4 AND T1.object_name = T2.object_name 5 AND T1.object_type = T2. object_type 6 AND lower(T1.object_type) = 'directory' 7 GROUP BY T1.owner; Execution Plan ---------------------------------------------------------- Plan hash value: 4292311508 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 2 | 182 | 220 (3)| 00:00:03 | | 1 | HASH GROUP BY | | 2 | 182 | 220 (3)| 00:00:03 | | 2 | NESTED LOOPS | | 73585 | 6539K| 216 (1)| 00:00:03 | |* 3 | TABLE ACCESS FULL | T | 14 | 588 | 132 (2)| 00:00:02 | |* 4 | VIEW | DBA_OBJECTS | 1 | 49 | 6 (0)| 00:00:01 | | 5 | UNION ALL PUSHED PREDICATE | | | | | | |* 6 | TABLE ACCESS BY INDEX ROWID | SUM$ | 1 | 9 | 1 (0)| 00:00:01 | |* 7 | INDEX UNIQUE SCAN | I_SUM$_1 | 1 | | 0 (0)| 00:00:01 | |* 8 | FILTER | | | | | | |* 9 | FILTER | | | | | | | 10 | NESTED LOOPS | | 1 | 84 | 5 (0)| 00:00:01 | | 11 | NESTED LOOPS | | 1 | 62 | 4 (0)| 00:00:01 | | 12 | TABLE ACCESS BY INDEX ROWID| USER$ | 1 | 17 | 1 (0)| 00:00:01 | |* 13 | INDEX UNIQUE SCAN | I_USER1 | 1 | | 0 (0)| 00:00:01 | |* 14 | TABLE ACCESS BY INDEX ROWID| OBJ$ | 1 | 45 | 3 (0)| 00:00:01 | |* 15 | INDEX RANGE SCAN | I_OBJ5 | 1 | | 2 (0)| 00:00:01 | |* 16 | INDEX RANGE SCAN | I_USER2 | 1 | 22 | 1 (0)| 00:00:01 | | 17 | NESTED LOOPS | | 1 | 29 | 2 (0)| 00:00:01 | |* 18 | INDEX SKIP SCAN | I_USER2 | 1 | 20 | 1 (0)| 00:00:01 | |* 19 | INDEX RANGE SCAN | I_OBJ4 | 1 | 9 | 1 (0)| 00:00:01 | | 20 | NESTED LOOPS | | 1 | 43 | 1 (0)| 00:00:01 | | 21 | TABLE ACCESS BY INDEX ROWID | USER$ | 1 | 17 | 1 (0)| 00:00:01 | |* 22 | INDEX UNIQUE SCAN | I_USER1 | 1 | | 0 (0)| 00:00:01 | |* 23 | INDEX RANGE SCAN | I_LINK1 | 1 | 26 | 0 (0)| 00:00:01 |
In summary you have seen that if a function-based index exists, and if (in the case of skewed data) you collect statistics on the virtual column representing this index, then expression statistics are unnecessary. However, in the situation in which a function-based index is not necessary, but you still need to provide the CBO with a clear picture of the column's data distribution in the context of a join (or, perhaps, when a choice of other access paths exists on a table -- e.g., indexes on other columns), then expression statistics play a vital role.
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29487349/viewspace-2920638/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 函式宣告與函式表示式有什麼區別?函式
- 函式索引的使用細節——常數表示式函式索引
- 什麼是IIFE(立即呼叫函式表示式)?函式
- JavaScript函式宣告和函式表示式區別JavaScript函式
- 函式宣告和函式表示式的區別函式
- 何時使用函式表示式與函式宣告函式
- 【STAT】Oracle 表統計資訊被鎖,如何建立索引Oracle索引
- 方法(method)和函式(function)有什麼區別?函式Function
- Python range()函式有什麼作用?如何使用?Python函式
- 索引中使用函式索引函式
- 函式表示式和函式宣告簡單介紹函式
- 函式索引使用細節——自定義函式的索引化函式索引
- kotlin 函式和 Lambda 表示式Kotlin函式
- 通用函式和條件表示式函式
- 5.函式和lambda表示式函式
- 使用LAG和LEAD函式統計函式
- sequence 和索引函式呼叫索引函式
- ES6類和ES5函式建構函式有什麼區別?函式
- 為什麼正規表示式的test()函式總是返回true函式
- 正規表示式match()函式和exec()函式的區別函式
- synchronized函式和synchronized程式碼塊有什麼區別?synchronized函式
- 什麼時候使用 Lambda 函式?函式
- SCRM系統和CRM系統有什麼不同?
- JavaFX教程-函式、陣列、表示式和操作Java函式陣列
- 閉包函式到底是什麼?有什麼用?函式
- JS 中的函式表示式和函式宣告你混淆了嗎?JS函式
- query rewrite和基於函式的索引有關係?函式索引
- fstat函式及struct stat結構函式Struct
- javascript-函式表示式JavaScript函式
- 函式表示式–遞迴函式遞迴
- js表示式方式和函式語句方式宣告函式的區別JS函式
- vue2專案中 箭頭函式和普通函式里面 this的指向有何不同?Vue函式
- 【Python】python map()函式和lambda表示式Python函式
- Oracle OCP(06):通用函式和條件表示式Oracle函式
- Python中eval函式的表示式如何使用Python函式
- Python函式與lambda 表示式(匿名函式)Python函式
- JS函式表示式——函式遞迴、閉包JS函式遞迴
- 函式: 函式是怎麼使用的?函式