理解CBO

myhuaer發表於2005-06-01
1.2.2 Understanding the Cost-Based Optimizer
 
The cost-based optimizer is a more sophisticated facility than the rule-based optimizer. To determine the best execution path for a statement, it uses database information such as table size, number of rows, key spread, and so forth, rather than rigid rules.
 
CBO 是比RBO 更智慧的工具.
使用最優的執行路徑由以下database 資訊決定,如 table Size,行數量,鍵值的分佈,因此比嚴格的RBO 要靈活.
 
The information required by the cost-based optimizer is available once a table has been analyzed via the ANALYZE command, or via the DBMS_STATS facility. If a table has not been analyzed, the cost-based optimizer can use only rule-based logic to select the best access path. It is possible to run a schema with a combination of cost-based and rule-based behavior by having some tables analyzed and others not analyzed.
 
CBO 需要table 至少一次被 Analyze,或DBMS_STATS 命令分析過.
如果沒有分析過,則CBO 只能依照RBO 的邏輯選擇最優的訪問路徑.
如果Schema 中一些table 被分析而另一些table沒有分析,
     執行是以CBO 和RBO 的合併結果來選擇訪問路徑.
 
[ The ANALYZE command and the DBMS_STATS functions collect statistics about tables, clusters, and indexes, and store those statistics in the data dictionary.]
  Analyze 命令和DBMS_STATS 函式收集 table,Clusters,indexes 的統計資訊,並且將這些統計資訊儲存的資料字典中.
 
A SQL statement will default to the cost-based optimizer if any one of the tables involved in the statement has been analyzed. The cost-based optimizer then makes an educated guess as to the best access path for the other tables based on information in the data dictionary.
 
-- Optimizer_mode<>Rule
 如果SQL 語句中包含的任一table 有被分析過則此SQL 會預設的選擇CBO.CBO 將根據資料字典中資訊猜測對其他table訪問的訪問路徑.
 
 
The RDBMS kernel defaults to using the cost-based optimizer under a number of situations, including the following:
 
滿足下列條件 RDBMS 將預設使用CBO.
 
• OPTIMIZER_MODE = CHOOSE has been specified in the INIT.ORA file, and statistics exist for at least one table involved in the statement
• An ALTER SESSION SET OPTIMIZER_MODE = CHOOSE command has been executed, and statistics exist for at least one table involved in the statement
 
• An ALTER SESSION SET OPTIMIZER_MODE = FIRST_ROWS (or ALL_ROWS) command has been executed, and statistics exist for at least one table involved in the statement
• A statement uses the FIRST_ROWS or ALL_ROWS hint (e.g., SELECT /*+ FIRST_ROWS */. . .)
 
  • 在Init.ora 引數檔案中設定 optimizer_mode=choose,並且SQL 語句中至少有一個table統計資訊存在
  • 執行 alter session set optimizer_mode=choose 命令,並且SQL 語句中至少有一個table統計資訊存在
  • 執行 alter session set optimizer_mode=first_rows(或者 all_rows) 命令,並且SQL 語句中至少有一個table統計資訊存在
  • SQL 語句使用 First_rows 或者 all_rows 提示(如: select /*+ first_rows*+/)
--optimizer_mode=rule
 
SELECT /*+ FIRST_ROWS +*/ emp_no,DEPT_NO,COST_CENTER
 FROM emp
 WHERE  emp_no = 2 
  AND emp_name = 'PJWANG'
      AND dept_no = '12'
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=HINT: FIRST_ROWS (Cost=2 Card=1 B
          ytes=22)
 
   1    0   INDEX (RANGE SCAN) OF 'IDX_3' (NON-UNIQUE) (Cost=2 Card=1
          Bytes=22)

 
When you analyze, you can have Oracle look at all rows in a table (ANALYZE COMPUTE) or at a sampling of rows (ANALYZE ESTIMATE).
Typically, I use ANALYZE ESTIMATE for very large tables (1,000,000 rows or more), and ANALYZE COMPUTE for small to medium tables.
 
當你對Table 進行分析 對Table 中所有行進行分析(Analyze compute) 或者 取樣分析(Analyze estimate)
 
比較有代表性的是,用Analyze estimate 對大table(大於1百萬行),用analyze compute 對小,或中等table.
 
 
I strongly recommend that you analyze FOR ALL INDEXED COLUMNS for any table that can have severe data skewness. For example, if a large percentage of rows in a table has the same value in a given column, that represents skewness. The FOR ALL INDEXED COLUMNS option makes the cost-based optimizer aware of the skewness of a column's data in addition to the cardinality (number-distinct values) of that data.
 
我強烈推薦對比較歪斜的資料(很多column 值為 A,很多為B部分散) 用 for all indexed columns 對這些table 做analyze.For all indexed columns 選項能對歪斜的Columns 資料做協調.
 
 
When a table is analyzed using ANALYZE, all associated indexes are analyzed as well. If an index is subsequently dropped and recreated, it must be re-analyzed. Be aware that the procedures DBMS_STATS.GATHER_SCHEMA_STATS and GATHER_TABLE_STATS analyze only tables by default, not their indexes. When using those procedures, you must specify the CASCADE=>TRUE option for indexes to be analyzed as well.
 
 
當某一table 用analyze 進行分析後, 所有和此table有關係的Index 也會被分析.
如果某一index 在後來被Drop 和重建,則此index 必須重新分析.
我們知道 procedure DBMS_STATS.GATHER_SCHEMA_STATS 和Gather_TABLE_stats analyze
   預設對table 進行分析,不包括indexes.
當使用前兩個procedure,必須設定 cascade=>true 時對index 也會分析.
 
 
 
Following are some sample ANALYZE statements:
ANALYZE TABLE EMP ESTIMATE STATISTICS SAMPLE 5 PERCENT FOR ALL INDEXED COLUMNS;
ANALYZE INDEX EMP_NDX1 ESTIMATE STATISTICS SAMPLE 5 PERCENT FOR ALL INDEXED COLUMNS;
ANALYZE TABLE EMP COMPUTE STATISTICS FOR ALL INDEXED COLUMNS;
If you analyze a table by mistake, you can delete the statistics.
For example: ANALYZE TABLE EMP DELETE STATISTICS;
Analyzing can take an excessive amount of time if you use the COMPUTE option on large objects.
We find that on almost every occasion, ANALYZE ESTIMATE 5 PERCENT on a large table forces the optimizer make the same decision as ANALYZE COMPUTE.
 
下面是一些 Analyze 的語句:
    ANALYZE TABLE EMP ESTIMATE STATISTICS SAMPLE 5 PERCENT FOR ALL INDEXED COLUMNS;
    ANALYZE INDEX EMP_NDX1 ESTIMATE STATISTICS SAMPLE 5 PERCENT FOR ALL INDEXED COLUMNS;
    ANALYZE TABLE EMP COMPUTE STATISTICS FOR ALL INDEXED COLUMNS;
 
  如果分析table 出現錯誤,可以將統計資訊刪除.
  
 
1.2.2.3 Inner workings of the cost-based optimizer
  Unlike the rule-based optimizer, the cost-based optimizer does not have hard and fast path evaluation rules. The cost-based optimizer is flexible and can adapt to its environment. This adaptation is possible only once the necessary underlying object statistics have been refreshed (re-analyzed). What is constant is the method by which the cost-based optimizer calculates each possible execution plan and evaluates its cost (efficiency).
 
 
CBO 的內部工作
 不想RBO ,CBO 不必評估最快或是最慢的執行路徑.CBO 比較靈活 並且能適應環境.這種適應可能是將Object 的統計資訊重新計算一邊.
1.2.2.3 Inner workings of the cost-based optimizer
Unlike the rule-based optimizer, the cost-based optimizer does not have hard and fast path evaluation rules. The cost-based optimizer is flexible and can adapt to its environment.
This adaptation is possible only once the necessary underlying object statistics have been refreshed (re-analyzed). What is constant is the method by which the cost-based optimizer calculates each possible execution plan and evaluates its cost (efficiency).
 
 
CBO 內部工作
  不像 RBO,CBO 不需要評估執行的最快和最慢路徑.CBO 是很靈活並且能適應環境.
這種適應可能是將此object 的統計資訊重新重新整理(分析).
不變的是 CBO 可能的執行計劃和評估他的cost(效果)
 
The cost-based optimizer's functionality can be (loosely) broken into the following steps:
1. Parse the SQL (check syntax, object privileges, etc.).
2. Generate a list of all potential execution plans.
3. Calculate (estimate) the cost of each execution plan using all available object statistics.
4. Select the execution plan with the lowest cost.
 
CBO功能如果遇到下列步驟會被中斷(釋放) :
  1: 分析SQL (檢查語法,object privileges ,等等)
  2: 產生所有可能的執行計劃列表
  3: 計算每一個執行計劃所用到的物件的統計花費
  4: 選擇花費最少的執行計劃.
 
The cost-based optimizer will be used only if at least one table within a SQL statement has statistics (table statistics for unanalyzed tables are estimated).
If no statistics are available for any table involved in the SQL, the RDBMS will resort to the rule-based optimizer, unless the cost-based optimizer is forced via statement-level HINTS or by an optimizer goal of ALL_ROWS or FIRST_ROWS.
 
 
如果SQL中至少有一個TABLE被統計則可以使用CBO .
如果沒有一個被統計過,RDBMS 將重新選擇RBO,除非SQL 語句級別強制使用 HITS 或最佳化器目標是all_rows或first_rows.
 
 
To understand how the cost-based optimizer works and, ultimately, how to exploit it, we need to understand how it thinks.
 
為了瞭解CBO 如何工作, 最終讓我們如何使用他.我們需要了解CBO是如何思考的.
 

Primary key and/or UNIQUE index equality

A UNIQUE index's selectivity is recognized as 100%. No other indexed access method is more precise. For this reason, a unique index is always used when available.

 

主鍵和/或 Unique index 的等同性

Unique index的選擇性被認定是100%.其他的index 都沒有Unique index 精確.

基於這個原因,unique index 是合法時,他能一直被用到.

 

Non-UNIQUE index equality

For non-UNIQUE indexes, index selectivity is calculated. The cost-based optimizer makes the assumption that the table (and subsequent indexes) have uniform data spread unless you use the FOR ALL INDEXED COLUMNS option of the ANALYZE. That option will make the cost-based optimizer aware of how the data in the indexed columns is skewed.

-Unique index 等同性

    -unique index ,index 的選擇是計算出來的.

 

 

Range evaluation

For index range execution plans, selectivity is evaluated. This evaluation is based on a column's most recent high-value and low-value statistics. Again, the cost-based optimizer makes the assumption that the table (and subsequent indexes) have uniform data spread unless you use the FOR ALL INDEXED COLUMNS option when analyzing the table.

 

範圍評估

   對於Index  範圍的執行計劃,選擇性是可評估的. 此評估基於欄位最近的最高和最低值的統計資訊.

再次重申,當分析table 時,CBO假定table (和併發的Index) 都有相同的資料分佈,除非你使用 for all indexed columns 選項.

 

 

Range evaluation over bind variables

For index range execution plans, selectivity is guessed. Prior to Oracle9i, because bind variable values are not available at parse time (values are passed to the cursor after the execution plan has been decided), the optimizer cannot make decisions based on bind variable values. The optimizer assumes a rule of thumb of 25% selectivity for unbounded bind variable ranges (e.g., WHERE dept_no = :b1) and 50% selectivity for bounded ranges (WHERE dept_no > :b1 AND dept_no < :b2). Beginning with Oracle9i, the cost-based optimizer obtains bind variable values prior to determining an execution plan.

 

範圍評估優於繫結變數

     對於Index 範圍執行計劃,選擇是猜測的.在Oracle9i 以前版本,因為繫結變數的值不能在分析時間提供(

在值傳入遊標之前執行計劃已經決定),所以最佳化器不能以繫結變數的值來決定執行計劃. 最佳化器的經驗法則是 對於無範圍限制的選擇機率是 25%(如where dept_no=:b1) ,對於有範圍限制的選擇機率是50%

(where dept_no>:b1 and dept_no<:b2>

 

 

 

 

Histograms

Prior to the introduction of histograms in Oracle 7.3, The cost-based optimizer could not distinguish grossly uneven key data spreads.

 

柱狀圖

    在Oracle7.3 以前的柱狀圖.CBO 不能辨別粗略的辨別主要資料資料的分佈.

 

 

 

 

System resource usage

By default, the cost-based optimizer assumes that you are the only person accessing the database. Oracle9i gives you the ability to store information about system resource usage, and can make much better informed decisions based on workload (read up on the DBMS_STATS.GATHER_SYSTEM_STATS package).

 

系統資源使用率:

     在預設情況下.CBO假定只有一人訪問資料庫.  Oracle9i 提供保留System resource Usage 的資源資訊,並能對資源進行基於負載的最優分配.(參考 DBMS_STATS.GATHER_SYSTEM_STATS 包)

 

Current statistics are important

The cost-based optimizer can make poor execution plan choices when a table has been analyzed but its indexes have not been, or when indexes have been analyzed but not the tables.

You should not force the database to use the cost-based optimizer via inline hints when no statistics are available for any table involved in the SQL.

Using old (obsolete) statistics can be more dangerous than estimating the statistics at runtime, but keep in mind that changing statistics frequently can also blow up in your face, particularly on a mission-critical system with lots of online users. Always back up your statistics before you re-analyze by using DBMS_STATS.EXPORT_SCHEMA_STATS.

Analyzing large tables and their associated indexes with the COMPUTE option will take a long, long time, requiring lots of CPU, I/O, and temporary tablespace resources. It is often overkill. Analyzing with a consistent value, for example, estimate 3%, will usually allow the cost-based optimizer to make optimal decisions

 

當前統計資訊很重要:

 CBO 當在Table 進行分析,但所涉及的Index沒有被分析或是Index被分析但table 沒有分析的情況下所能選擇的執行計劃很少.

 你不能強制對於那些沒有統計資訊的Table 所組成的SQL語句採用inhint 方法使用CBO.

 

使用Old(廢棄的) 統計資訊比在執行是估計統計資訊更危險.但要緊記頻繁改動的統計資訊,也能使你很生氣.

顯然一個大點的系統有很多線上使用者,.在你重新分析時要經常用DBMS_STATS.EXPORT_SCHEMA-STATS來備份你的統計資訊.

 

  用Compute 選項分析大Table 和相關聯的index 將會花費很長時間,需要大量的CPU,I/O 和臨時表空間等資源.

經常太花資源. 分析一致性,例如使用estimate 3% 選項來做CBO將是一個明智選擇.

 

 

 

1.2.2.4 EXPLAIN PLAN for the cost-based optimizer

    Oracle provides information on the cost of query execution via the EXPLAIN PLAN facility. EXPLAIN PLAN can be used to display the calculated execution cost(s) via some extensions to the utility. In particular, the plan table's COST column returns a value that increases or decreases to show the relative cost of a query. For example:

 

   oracle 是透過Explain Plan 工具提供基於Cost上的查詢資訊.

   Explain Plan 能透過擴充套件功能顯示計算執行的Cost.詳細的說,Plan table的Cost 欄位返回此查詢的花費是增加還是減少. 例如:

 

  EXPLAIN PLAN FOR

  SELECT count(*) FROM winners, horses

  WHERE winners.owner=horses.owner AND winners.horse_name LIKE 'Mr %'
COLUMN "SQL" FORMAT a56
SELECT lpad(' ',2*level)||operation||'' ||options ||' '||object_name|| decode(OBJECT_TYPE, '', '', '('||object_type||')') "SQL", cost "Cost", cardinality "Num Rows"
FROM plan_table
CONNECT BY prior id = parent_id START WITH id = 0;
SQL                  Cost     Num Rows
-----------------------------------------------
SELECT STATEMENT       44   1
   SORT AGGREGATE

      HASH JOIN  44  100469
       INDEX RANGE SCAN MG1(NON-UNIQUE)
                       2     1471
          INDEX FAST FULL SCAN OWNER_PK(UNIQUE)

                        4    683

 

By manipulating the cost-based optimizer (i.e., via inline hints, by creating/removing indexes, or by adjusting the way that indexes or tables are analyzed), we can see the differences in the execution cost as calculated by the optimizer. Use EXPLAIN PLAN to look at different variations on a query, and choose the variation with the lowest relative cost. For absolute optimal performance, many sites have the majority of the tables and indexes analyzed but a small number of tables that are used in isolation are not analyzed. This is usually to force rule-based behavior on the tables that are not analyzed. However, it is important that tables that have not been analyzed are not joined with tables that have been analyzed.

 

[@more@]

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/34596/viewspace-799584/,如需轉載,請註明出處,否則將追究法律責任。

相關文章