Pipelined table function statistics and dynamic sampling

chncaesar發表於2013-09-16
This is an extraction of Adrian Billington's article:

At some point, you might need to join a pipelined function to another rowsource (such as a table, a view, or the intermediate output of other joins within a SQL execution plan). Rowsource statistics (such as cardinality, data distribution, nulls, etc) are critical to achieving efficient execution plans, but in the case of pipelined functions (or indeed any table function), the cost-based optimizer doesn't have much information to work with.

cardinality heuristics for pipelined table functions

Up to and including Oracle Database 11g Release 1, the CBO applies a heuristic cardinality to pipelined and table functions in SQL statements and this can sometimes lead to inefficient execution plans. The default cardinality appears to be dependent on the value of the DB_BLOCK_SIZE initialization parameter, but on a database with a standard 8Kb block size Oracle uses a heuristic of 8,168 rows. I can demonstrate this quite easily with a pipelined function that pipes a subset of columns from the employees table. Using Autotrace in SQL*Plus to generate an execution plan, I see the following.

/* Files on web: cbo_setup.sql and cbo_test.sql */
SQL> SELECT *
  2  FROM   TABLE(pipe_employees) e;

Execution Plan
----------------------------------------------------------
Plan hash value: 1802204150

--------------------------------------------------------------------
| Id  | Operation                         | Name           | Rows  |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |                |  8168 |
|   1 |  COLLECTION ITERATOR PICKLER FETCH| PIPE_EMPLOYEES |       |
--------------------------------------------------------------------
This pipelined function actually returns 50,000 rows, so if I join this pipelined function to the departments table, I run the risk of getting a suboptimal plan.

/* File on web: cbo_test.sql */
SQL> SELECT *
  2  FROM   departments           d
  3  ,      TABLE(pipe_employees) e
  4  WHERE  d.department_id = e.department_id;

Execution Plan
----------------------------------------------------------
Plan hash value: 4098497386

----------------------------------------------------------------------
| Id  | Operation                           | Name           | Rows  |
----------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                |  8168 |
|   1 |  MERGE JOIN                         |                |  8168 |
|   2 |   TABLE ACCESS BY INDEX ROWID       | DEPARTMENTS    |    27 |
|   3 |    INDEX FULL SCAN                  | DEPT_ID_PK     |    27 |
|*  4 |   SORT JOIN                         |                |  8168 |
|   5 |    COLLECTION ITERATOR PICKLER FETCH| PIPE_EMPLOYEES |       |
----------------------------------------------------------------------
As predicted, this appears to be a suboptimal plan; it is unlikely that a sort-merge join will be more efficient than a hash join in this scenario. So how do I influence the CBO? For this example I could use simple access hints such as LEADING and USE_HASH to effectively override the CBO’s cost-based decision and secure a hash join between the table and pipelined function. However, for more complex SQL statements, it is quite difficult to provide all the hints necessary to “lock down” an execution plan. It is often far better to provide the CBO with better statistics with which to make its decisions. 

Optimizer dynamic sampling
This feature was enhanced in Oracle Database 11g (11.1.0.7) to include sampling for table and pipelined functions;

Dynamic sampling is an extremely useful feature that enables the optimizer to take a small statistics sample of one or more objects in a query during the parse phase. You might use dynamic sampling when you haven’t gathered statistics on all of your tables in a query or when you are using transient objects such as global temporary tables. Starting with version 11.1.0.7, the Oracle database is able to use dynamic sampling for table or pipelined functions.

To see what difference this feature can make, I’ll repeat my previous query but include a DYNAMIC_SAMPLING hint for the pipe_employees function.

/* File on web: cbo_test.sql */
SQL> SELECT /*+ DYNAMIC_SAMPLING(5) */
  2         *
  3  FROM   departments           d
  4  ,      TABLE(pipe_employees) e
  5  WHERE  d.department_id = e.department_id;

Execution Plan
----------------------------------------------------------
Plan hash value: 815920909

---------------------------------------------------------------------
| Id  | Operation                          | Name           | Rows  |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                | 50000 |
|*  1 |  HASH JOIN                         |                | 50000 |
|   2 |   TABLE ACCESS FULL                | DEPARTMENTS    |    27 |
|   3 |   COLLECTION ITERATOR PICKLER FETCH| PIPE_EMPLOYEES |       |
---------------------------------------------------------------------
This time, the CBO has correctly computed the 50,000 rows that my function returns and has generated a more suitable plan. Note that I used the word “computed” and not “estimated” because in version 11.1.0.7 and later, the optimizer takes a 100% sample of the table or pipelined function, regardless of the dynamic sampling level being used (this is also the case in Oracle Database 11g Release 2). I used level 5, but I could have used anything between level 2 and level 10 to get exactly the same result. This means, of course, that dynamic sampling can be potentially costly or time-consuming if it is being used for queries involving high-volume or long-running pipelined functions.

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/638844/viewspace-772753/,如需轉載,請註明出處,否則將追究法律責任。

相關文章