關於ORDERED-HASH中錯誤選擇連線欄位對效能的影響[@more@]

在BOSS專案的最佳化過程中,遇到了這麼一個情況:

/*+ ordered use_hash(c t cr con ) */ count( distinct c.customer_id)
FROM crm_customer c,
(select t.customer_id,max(t.gmt_train) gmt_train from crm_train_record t
where t.gmt_train >= to_date('2008-02-01', 'yyyy-MM-dd') - 5
and t.gmt_train < to_date('2009-02-26','yyyy-MM-dd')+1
and t.train_type = 'y'
group by t.customer_id) t
,crm_contract con
,crm_contact_record cr
WHERE
con.prod_id = 1
and t.customer_id = con.customer_id
and con.customer_id = cr.customer_id
and cr.gmt_start >= t.gmt_train
and cr.gmt_start < trunc(t.gmt_train + 6)
and c.customer_id = con.customer_id
and c.service_id = '82';

Elapsed: 00:00:20.21

#這個SQL足足花了20秒才跑完. 而且這中間的資料量並不是很大. 那為什麼需要花費這麼長時間呢?

#我們先來分析一下資料:

count(*) from crm_customer c where c.service_id = '82';

COUNT(*)
----------
28494

count(*) from crm_contract ;

COUNT(*)
----------
20302

count(*) from crm_contact_record ;

COUNT(*)
----------
139774

count(distinct t.customer_id) from crm_train_record t
2 where t.gmt_train >= to_date('2008-02-01', 'yyyy-MM-dd') - 5
3 and t.gmt_train < to_date('2009-02-26','yyyy-MM-dd')+1
4 and t.train_type = 'y' ;

COUNT(DISTINCTT.CUSTOMER_ID)
----------------------------
2262

#執行計劃如下:

SELECT /*+ ordered use_hash(c t cr con ) */ count( distinct c.customer_id)
FROM crm_customer c,
(select t.customer_id,max(t.gmt_train) gmt_train from crm_train_record t
where t.gmt_train >= to_date('2008-02-01', 'yyyy-MM-dd') - 5
and t.gmt_train < to_date('2009-02-26','yyyy-MM-dd')+1
and t.train_type = 'y'
group by t.customer_id) t
,crm_contract con
,crm_contact_record cr
WHERE
con.prod_id = 1
and t.customer_id = con.customer_id
and con.customer_id = cr.customer_id
and cr.gmt_start >= t.gmt_train
and cr.gmt_start < trunc(t.gmt_train + 6)
and c.customer_id = con.customer_id
and c.service_id = '82'

Elapsed: 00:00:20.21

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 14 | 7570 |
| 1 | SORT GROUP BY | | 1 | 14 | |
| 2 | VIEW | | 50 | 700 | 7570 |
|* 3 | FILTER | | | | |
| 4 | SORT GROUP BY | | 50 | 4650 | 7570 |
|* 5 | HASH JOIN | | 998 | 92814 | 7559 |
|* 6 | HASH JOIN | | 65 | 4745 | 7128 |
|* 7 | TABLE ACCESS FULL | CRM_CONTRACT | 64 | 1664 | 19 |
| 8 | MERGE JOIN CARTESIAN| | 10621 | 487K| 7108 | --注意這裡的 MERGE JOIN CARTESIAN
|* 9 | TABLE ACCESS FULL | CRM_CUSTOMER | 160 | 4960 | 68 |
| 10 | BUFFER SORT | | 67 | 1072 | 7040 | -- T
|* 11 | TABLE ACCESS FULL | CRM_TRAIN_RECORD | 67 | 1072 | 44 |
| 12 | TABLE ACCESS FULL | CRM_CONTACT_RECORD | 137K| 2689K| 430 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - filter("CR"."GMT_START">=MAX("T"."GMT_TRAIN") AND
"CR"."GMT_START" 5 - access("CON"."CUSTOMER_ID"="CR"."CUSTOMER_ID")
6 - access("T"."CUSTOMER_ID"="CON"."CUSTOMER_ID" AND
"C"."CUSTOMER_ID"="CON"."CUSTOMER_ID")
7 - filter("CON"."PROD_ID"=1)
9 - filter("C"."SERVICE_ID"='82')
11 - filter("T"."GMT_TRAIN">=TO_DATE(' 2008-01-27 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "T"."GMT_TRAIN" hh24:mi:ss') AND "T"."TRAIN_TYPE"='y')

Note: cpu costing is off

疑問:
為什麼慢?
為什麼執行計劃沒有按我指定的來走? (秩序不對)
為什麼會是:MERGE JOIN CARTESIAN (CRM_CUSTOMER , T) 而不是HASH?

仔細看了查詢條件:
t.customer_id = con.customer_id
and con.customer_id = cr.customer_id
and c.customer_id = con.customer_id

由於我們在HINT裡指定了ORDERED ,也就是讓表根據: C T CON CR 的順序走. 而我們的條件中沒有C與T的條件.
所以執行計劃認為需要走MERGE JOIN CARTESIAN,然後再與CON表進行HASH.

所以我們把條件稍微調整一下,使C和T表有關聯:
t.customer_id = c.customer_id
(當然這樣的邏輯是一樣的)

SELECT /*+ ordered use_hash(c t cr con ) */ count( distinct c.customer_id)
FROM crm_customer c,
(select t.customer_id,max(t.gmt_train) gmt_train from crm_train_record t
where t.gmt_train >= to_date('2008-02-01', 'yyyy-MM-dd') - 5
and t.gmt_train < to_date('2009-02-26','yyyy-MM-dd')+1
and t.train_type = 'y'
group by t.customer_id) t
,crm_contract con
,crm_contact_record cr
WHERE
con.prod_id = 1
and t.customer_id = c.customer_id
and con.customer_id = cr.customer_id
and cr.gmt_start >= t.gmt_train
and cr.gmt_start < trunc(t.gmt_train + 6)
and c.customer_id = con.customer_id
and c.service_id = '82'

Elapsed: 00:00:00.31

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 14 | 571 |
| 1 | SORT GROUP BY | | 1 | 14 | |
| 2 | VIEW | | 22 | 308 | 571 |
|* 3 | FILTER | | | | |
| 4 | SORT GROUP BY | | 22 | 2046 | 571 |
|* 5 | HASH JOIN | | 423 | 39339 | 564 |
|* 6 | HASH JOIN | | 27 | 1971 | 133 |
|* 7 | HASH JOIN | | 67 | 3149 | 113 |
|* 8 | TABLE ACCESS FULL| CRM_CUSTOMER | 160 | 4960 | 68 |
|* 9 | TABLE ACCESS FULL| CRM_TRAIN_RECORD | 67 | 1072 | 44 |
|* 10 | TABLE ACCESS FULL | CRM_CONTRACT | 64 | 1664 | 19 |
| 11 | TABLE ACCESS FULL | CRM_CONTACT_RECORD | 137K| 2689K| 430 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - filter("CR"."GMT_START">=MAX("T"."GMT_TRAIN") AND
"CR"."GMT_START" 5 - access("C"."CUSTOMER_ID"="CR"."CUSTOMER_ID")
6 - access("C"."CUSTOMER_ID"="CON"."CUSTOMER_ID")
7 - access("T"."CUSTOMER_ID"="C"."CUSTOMER_ID")
8 - filter("C"."SERVICE_ID"='82')
9 - filter("T"."GMT_TRAIN">=TO_DATE(' 2008-01-27 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "T"."GMT_TRAIN" hh24:mi:ss') AND "T"."TRAIN_TYPE"='y')
10 - filter("CON"."PROD_ID"=1)

Note: cpu costing is off

-- 當然在這裡你調整表的順序,一樣能最佳化這個SQL.
-- 但是在這裡希望大家注意到的是:
-- 當你使用ORDERED的時候,千萬要注意表的順序以及條件中連線欄位的使用.
-- 不然會導致執行計劃錯誤,SQL緩慢

一個重複的例子:

create table t1 as select object_id from dba_objects ;
create table t2 as select * from t1 ;
create table t3 as select * from t1 ;

plan for
2 select /*+ ordered */count(*) from
3 t1,t3,t2
4 where t1.object_id=t2.object_id
5 and t2.object_id=t3.object_id;

Explained.

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------

----------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 39 | 3960 |
| 1 | SORT AGGREGATE | | 1 | 39 | |
|* 2 | HASH JOIN | | 1307 | 50973 | 3960 |
| 3 | TABLE ACCESS FULL | T2 | 1307 | 16991 | 3 |
| 4 | MERGE JOIN CARTESIAN| | 1708K| 42M| 3924 |
| 5 | TABLE ACCESS FULL | T1 | 1307 | 16991 | 3 |
| 6 | BUFFER SORT | | 1307 | 16991 | 3921 |
| 7 | TABLE ACCESS FULL | T3 | 1307 | 16991 | 3 |
----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("T1"."OBJECT_ID"="T2"."OBJECT_ID" AND
"T2"."OBJECT_ID"="T3"."OBJECT_ID")

Note: cpu costing is off

21 rows selected.

plan for
2 select /*+ ordered */count(*) from
3 t1,t3,t2
4 where t1.object_id=t3.object_id
5 and t2.object_id=t3.object_id;

Explained.

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------

--------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
--------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 39 | 11 |
| 1 | SORT AGGREGATE | | 1 | 39 | |
|* 2 | HASH JOIN | | 1307 | 50973 | 11 |
| 3 | TABLE ACCESS FULL | T2 | 1307 | 16991 | 3 |
|* 4 | HASH JOIN | | 1307 | 33982 | 7 |
| 5 | TABLE ACCESS FULL| T1 | 1307 | 16991 | 3 |
| 6 | TABLE ACCESS FULL| T3 | 1307 | 16991 | 3 |
--------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("T2"."OBJECT_ID"="T3"."OBJECT_ID")
4 - access("T1"."OBJECT_ID"="T3"."OBJECT_ID")

Note: cpu costing is off

20 rows selected.

SQL最佳化-關於ORDERED-HASH中錯誤選擇連線欄位對效能的影響

相關文章