SQL最佳化-關於ORDERED-HASH中錯誤選擇連線欄位對效能的影響
在BOSS專案的最佳化過程中,遇到了這麼一個情況:
/*+ ordered use_hash(c t cr con ) */ count( distinct c.customer_id)
FROM crm_customer c,
(select t.customer_id,max(t.gmt_train) gmt_train from crm_train_record t
where t.gmt_train >= to_date('2008-02-01', 'yyyy-MM-dd') - 5
and t.gmt_train < to_date('2009-02-26','yyyy-MM-dd')+1
and t.train_type = 'y'
group by t.customer_id) t
,crm_contract con
,crm_contact_record cr
WHERE
con.prod_id = 1
and t.customer_id = con.customer_id
and con.customer_id = cr.customer_id
and cr.gmt_start >= t.gmt_train
and cr.gmt_start < trunc(t.gmt_train + 6)
and c.customer_id = con.customer_id
and c.service_id = '82';
Elapsed: 00:00:20.21
#這個SQL足足花了20秒才跑完. 而且這中間的資料量並不是很大. 那為什麼需要花費這麼長時間呢?
#我們先來分析一下資料:
count(*) from crm_customer c where c.service_id = '82';
COUNT(*)
----------
28494
count(*) from crm_contract ;
COUNT(*)
----------
20302
count(*) from crm_contact_record ;
COUNT(*)
----------
139774
count(distinct t.customer_id) from crm_train_record t
2 where t.gmt_train >= to_date('2008-02-01', 'yyyy-MM-dd') - 5
3 and t.gmt_train < to_date('2009-02-26','yyyy-MM-dd')+1
4 and t.train_type = 'y' ;
COUNT(DISTINCTT.CUSTOMER_ID)
----------------------------
2262
#執行計劃如下:
SELECT /*+ ordered use_hash(c t cr con ) */ count( distinct c.customer_id)
FROM crm_customer c,
(select t.customer_id,max(t.gmt_train) gmt_train from crm_train_record t
where t.gmt_train >= to_date('2008-02-01', 'yyyy-MM-dd') - 5
and t.gmt_train < to_date('2009-02-26','yyyy-MM-dd')+1
and t.train_type = 'y'
group by t.customer_id) t
,crm_contract con
,crm_contact_record cr
WHERE
con.prod_id = 1
and t.customer_id = con.customer_id
and con.customer_id = cr.customer_id
and cr.gmt_start >= t.gmt_train
and cr.gmt_start < trunc(t.gmt_train + 6)
and c.customer_id = con.customer_id
and c.service_id = '82'
Elapsed: 00:00:20.21
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 14 | 7570 |
| 1 | SORT GROUP BY | | 1 | 14 | |
| 2 | VIEW | | 50 | 700 | 7570 |
|* 3 | FILTER | | | | |
| 4 | SORT GROUP BY | | 50 | 4650 | 7570 |
|* 5 | HASH JOIN | | 998 | 92814 | 7559 |
|* 6 | HASH JOIN | | 65 | 4745 | 7128 |
|* 7 | TABLE ACCESS FULL | CRM_CONTRACT | 64 | 1664 | 19 |
| 8 | MERGE JOIN CARTESIAN| | 10621 | 487K| 7108 | --注意這裡的 MERGE JOIN CARTESIAN
|* 9 | TABLE ACCESS FULL | CRM_CUSTOMER | 160 | 4960 | 68 |
| 10 | BUFFER SORT | | 67 | 1072 | 7040 | -- T
|* 11 | TABLE ACCESS FULL | CRM_TRAIN_RECORD | 67 | 1072 | 44 |
| 12 | TABLE ACCESS FULL | CRM_CONTACT_RECORD | 137K| 2689K| 430 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("CR"."GMT_START">=MAX("T"."GMT_TRAIN") AND
"CR"."GMT_START"
6 - access("T"."CUSTOMER_ID"="CON"."CUSTOMER_ID" AND
"C"."CUSTOMER_ID"="CON"."CUSTOMER_ID")
7 - filter("CON"."PROD_ID"=1)
9 - filter("C"."SERVICE_ID"='82')
11 - filter("T"."GMT_TRAIN">=TO_DATE(' 2008-01-27 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "T"."GMT_TRAIN"
Note: cpu costing is off
疑問:
為什麼慢?
為什麼執行計劃沒有按我指定的來走? (秩序不對)
為什麼會是:MERGE JOIN CARTESIAN (CRM_CUSTOMER , T) 而不是HASH?
仔細看了查詢條件:
t.customer_id = con.customer_id
and con.customer_id = cr.customer_id
and c.customer_id = con.customer_id
由於我們在HINT裡指定了ORDERED ,也就是讓表根據: C T CON CR 的順序走. 而我們的條件中沒有C與T的條件.
所以執行計劃認為需要走MERGE JOIN CARTESIAN,然後再與CON表進行HASH.
所以我們把條件稍微調整一下,使C和T表有關聯:
t.customer_id = c.customer_id
(當然這樣的邏輯是一樣的)
SELECT /*+ ordered use_hash(c t cr con ) */ count( distinct c.customer_id)
FROM crm_customer c,
(select t.customer_id,max(t.gmt_train) gmt_train from crm_train_record t
where t.gmt_train >= to_date('2008-02-01', 'yyyy-MM-dd') - 5
and t.gmt_train < to_date('2009-02-26','yyyy-MM-dd')+1
and t.train_type = 'y'
group by t.customer_id) t
,crm_contract con
,crm_contact_record cr
WHERE
con.prod_id = 1
and t.customer_id = c.customer_id
and con.customer_id = cr.customer_id
and cr.gmt_start >= t.gmt_train
and cr.gmt_start < trunc(t.gmt_train + 6)
and c.customer_id = con.customer_id
and c.service_id = '82'
Elapsed: 00:00:00.31
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 14 | 571 |
| 1 | SORT GROUP BY | | 1 | 14 | |
| 2 | VIEW | | 22 | 308 | 571 |
|* 3 | FILTER | | | | |
| 4 | SORT GROUP BY | | 22 | 2046 | 571 |
|* 5 | HASH JOIN | | 423 | 39339 | 564 |
|* 6 | HASH JOIN | | 27 | 1971 | 133 |
|* 7 | HASH JOIN | | 67 | 3149 | 113 |
|* 8 | TABLE ACCESS FULL| CRM_CUSTOMER | 160 | 4960 | 68 |
|* 9 | TABLE ACCESS FULL| CRM_TRAIN_RECORD | 67 | 1072 | 44 |
|* 10 | TABLE ACCESS FULL | CRM_CONTRACT | 64 | 1664 | 19 |
| 11 | TABLE ACCESS FULL | CRM_CONTACT_RECORD | 137K| 2689K| 430 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("CR"."GMT_START">=MAX("T"."GMT_TRAIN") AND
"CR"."GMT_START"
6 - access("C"."CUSTOMER_ID"="CON"."CUSTOMER_ID")
7 - access("T"."CUSTOMER_ID"="C"."CUSTOMER_ID")
8 - filter("C"."SERVICE_ID"='82')
9 - filter("T"."GMT_TRAIN">=TO_DATE(' 2008-01-27 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "T"."GMT_TRAIN"
10 - filter("CON"."PROD_ID"=1)
Note: cpu costing is off
-- 當然在這裡你調整表的順序,一樣能最佳化這個SQL.
-- 但是在這裡希望大家注意到的是:
-- 當你使用ORDERED的時候,千萬要注意表的順序以及 條件中連線欄位的使用.
-- 不然會導致執行計劃錯誤,SQL緩慢
一個重複的例子:
create table t1 as select object_id from dba_objects ;
create table t2 as select * from t1 ;
create table t3 as select * from t1 ;
plan for
2 select /*+ ordered */count(*) from
3 t1,t3,t2
4 where t1.object_id=t2.object_id
5 and t2.object_id=t3.object_id;
Explained.
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 39 | 3960 |
| 1 | SORT AGGREGATE | | 1 | 39 | |
|* 2 | HASH JOIN | | 1307 | 50973 | 3960 |
| 3 | TABLE ACCESS FULL | T2 | 1307 | 16991 | 3 |
| 4 | MERGE JOIN CARTESIAN| | 1708K| 42M| 3924 |
| 5 | TABLE ACCESS FULL | T1 | 1307 | 16991 | 3 |
| 6 | BUFFER SORT | | 1307 | 16991 | 3921 |
| 7 | TABLE ACCESS FULL | T3 | 1307 | 16991 | 3 |
----------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T1"."OBJECT_ID"="T2"."OBJECT_ID" AND
"T2"."OBJECT_ID"="T3"."OBJECT_ID")
Note: cpu costing is off
21 rows selected.
plan for
2 select /*+ ordered */count(*) from
3 t1,t3,t2
4 where t1.object_id=t3.object_id
5 and t2.object_id=t3.object_id;
Explained.
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
--------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
--------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 39 | 11 |
| 1 | SORT AGGREGATE | | 1 | 39 | |
|* 2 | HASH JOIN | | 1307 | 50973 | 11 |
| 3 | TABLE ACCESS FULL | T2 | 1307 | 16991 | 3 |
|* 4 | HASH JOIN | | 1307 | 33982 | 7 |
| 5 | TABLE ACCESS FULL| T1 | 1307 | 16991 | 3 |
| 6 | TABLE ACCESS FULL| T3 | 1307 | 16991 | 3 |
--------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T2"."OBJECT_ID"="T3"."OBJECT_ID")
4 - access("T1"."OBJECT_ID"="T3"."OBJECT_ID")
Note: cpu costing is off
20 rows selected.
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/703656/viewspace-1017900/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 新增欄位對SQL的影響SQL
- css屬性的選擇對動畫效能的影響CSS動畫
- MySQL中join語句的基本使用教程及其欄位對效能的影響MySql
- 行連結與行遷移, LOB欄位的儲存及效能影響
- Oracle主鍵選擇對插入的影響Oracle
- RAID的概念和RAID對於SQL效能的影響AISQL
- JAVA 異常對於效能的影響Java
- 轉:RAID的概念及RAID對於SQL效能的影響AISQL
- 關於資料庫開啟大頁對效能的影響資料庫
- 連線條件是兩個欄位“or”關係的SQL優化SQL優化
- 關於OPcache對Swoole影響的理解opcache
- 關於drop操作對role的影響
- C# 關於Try/Catch對系統效能影響的總結C#
- 淺談SQL Server中統計對於查詢的影響SQLServer
- SQL中欄位比較型別不匹配錯誤:‘cannot be cast to’SQL型別AST
- [zt] 影響SQL效能的原因SQL
- 三個影響SQL Server效能關鍵點SQLServer
- 修改欄位長度應用會影響到生產效能
- 【SQL 效能最佳化】表的三種連線方式SQL
- MySQL:查詢欄位數量多少對查詢效率的影響MySql
- 關於jQuery中的選擇器jQuery
- 影響rest api版本選擇的因素RESTAPI
- 微課sql最佳化(15)、表的連線方法(4)-關於Hash Join(雜湊連線)SQL
- 關於go中拷貝次數進而影響效能的問題Go
- 關於SQL Server通過OLEDB訪問ORACLE資料表涉及CLOB或BLOB欄位的錯誤提示SQLServerOracle
- sql server中對日期欄位值的比較SQLServer
- 【OPTIMIZATION】Oracle影響優化器選擇的相關技術Oracle優化
- 對列進行連線操作會影響索引的使用索引
- 「性別選擇」在遊戲中產生了哪些影響?遊戲
- SQL Server中事務日誌自動增長對效能的影響(下)PGSQLServer
- SQL Server中事務日誌自動增長對效能的影響(上)OSSQLServer
- 微課sql最佳化(16)、表的連線方法(5)-關於Merge Join(排序合連線)SQL排序
- 關於系統效能的10大錯誤
- Java中的Exception拋異常對效能的影響 - BaeldungJavaException
- SQL Server效能影響的重要結論SQLServer
- tnsname service listener 對qlsql develop連線遠端影響SQLdev
- -206 錯誤. 在表中找不到對應的資料欄位txt
- 關於shell中的pl/sql指令碼錯誤排查與分析SQL指令碼