使用exists(Semi-Join)優化distinct語句
在Oracle 官方文件,semi-join是這麼解釋的:
A semijoin returns rows that match an EXISTS subquery without duplicating rows
from the left side of the predicate when multiple rows on the right side satisfy the
criteria of the subquery.
Semijoin and antijoin transformation cannot be done if the subquery is on an OR
branch of the WHERE clause.
由上可見,semi-join特別適用於如下場景:
表A 和B有1:N關係,需要根據B表某個條件,通過semi-join查詢A表上某個列(屬性)。因為semi-join已經包含一個去重過程,不需要在A表上加distinct。比內連線然後去重方式效能更好。
假設有兩個表Customer , orders。當然,每個客戶可以有多個訂單。現在需要找出訂單金額超過10000的客戶數量。本文例子在12cR1上實現,11g上應該是同樣結果。
show rel;
release 1201000100
desc customer
Name Null Type
----------- -------- -------------
CUSTOMER_ID NOT NULL NUMBER(12)
NAME VARCHAR2(100)
STATUS VARCHAR2(10)
desc orders
Name Null Type
----------- ---- ------------
ORDER_ID NUMBER(12)
CUSTOMER_ID NUMBER(12)
CREATE_DATE DATE
ORDER_PRICE NUMBER(38,6)
select count(*) from orders;
count(*)
--------------
1000002
select count(*) from customer;
count(*)
--------------
100000
orders表有100萬行,customer表有10萬行。
下面來比較下inner join和semi-join兩種SQL寫法的效能。
Select /*HE7*/ count(distinct cus.name)
from customer cus, orders ord
where cus.customer_id=ord.customer_id
and ord.order_price > 10000;
由於做了內連線(inner join),最終結果集裡將會出現所有匹配行,包含一部分重複customer(使用者)。著名Oracle佈道者Tom Kyte在2006年一篇文章裡就寫到:
“In general, you should phrase the queries in the manner that says it best. If one set of joins were particularly efficient to use in all cases, Oracle would not have implemented the rest of them!
In general, you use a join when you need data from more than one table in the ultimate SELECT list.”
文章出處:
http://www.oracle.com/technetwork/issue-archive/2006/06-may/o36asktom-099107.html
在這裡,我們只需要customer表的資訊,orders表是用來做檢索(filtering)的。
只將customer表放在from處,更加自然。用exists(semi-join)改寫後:
Select /*HE8*/ count(name)
from customer
where exists(
select 1 from orders
where customer.customer_id=orders.customer_id
and orders.order_price > 10000
);
由於semi-join,這裡我們不需要distinct,從customer能直接得到去重後的結果。檢視下執行計劃和最終的cost
SQL_ID 3kfvh06bqrn5h, child number 0
-------------------------------------
select /*HE7*/ count(distinct cus.name) from customer cus, orders ord
where cus.customer_id=ord.customer_id and ord.order_price > 10000
Plan hash value: 3082118893
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | 3212 (100)| |
| 1 | SORT AGGREGATE | | 1 | 52 | | | |
| 2 | VIEW | VW_DAG_0 | 52570 | 2669K| | 3212 (1)| 00:00:01 |
| 3 | HASH GROUP BY | | 52570 | 2104K| 2688K| 3212 (1)| 00:00:01 |
|* 4 | HASH JOIN SEMI | | 52570 | 2104K| 2640K| 2651 (1)| 00:00:01 |
| 5 | TABLE ACCESS FULL| CUSTOMER | 100K| 1464K| | 102 (0)| 00:00:01 |
|* 6 | TABLE ACCESS FULL| ORDERS | 753K| 18M| | 1063 (1)| 00:00:01 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("CUS"."CUSTOMER_ID"="ORD"."CUSTOMER_ID")
6 - filter("ORD"."ORDER_PRICE">10000)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
SQL_ID ggs8p2c27nzmu, child number 0
-------------------------------------
select /*HE8*/ count(name) from customer where exists( select 1 from
orders where customer.customer_id=orders.customer_id and
orders.order_price > 10000 )
Plan hash value: 2841769433
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | 2651 (100)| |
| 1 | SORT AGGREGATE | | 1 | 41 | | | |
|* 2 | HASH JOIN SEMI | | 52570 | 2104K| 2640K| 2651 (1)| 00:00:01 |
| 3 | TABLE ACCESS FULL| CUSTOMER | 100K| 1464K| | 102 (0)| 00:00:01 |
|* 4 | TABLE ACCESS FULL| ORDERS | 753K| 18M| | 1063 (1)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("CUSTOMER"."CUSTOMER_ID"="ORDERS"."CUSTOMER_ID")
4 - filter("ORDERS"."ORDER_PRICE">10000)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
很明顯,第二個執行計劃優於第一個。
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/638844/viewspace-772370/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- oracle之優化一用group by或exists優化distinctOracle優化
- 對IN & EXISTS NOT IN & NOT EXISTS的優化優化
- [20151212優化sql語句要注意關鍵字DISTINCT優化SQL
- SQL語句優化SQL優化
- MYSQL SQL語句優化MySql優化
- MySQL——優化ORDER BY語句MySql優化
- sql語句效能優化SQL優化
- SQL語句的優化SQL優化
- mysql limit語句優化MySqlMIT優化
- 求助:SQL語句優化SQL優化
- mysql 優化常用語句MySql優化
- SQL Server優化之SQL語句優化SQLServer優化
- MySQL exists 優化 in 效率MySql優化
- MySQL之SQL語句優化MySql優化
- update語句的優化方式優化
- SQL語句優化(轉載)SQL優化
- 常用SQL語句優化技巧SQL優化
- Oracle之sql語句優化OracleSQL優化
- 效能優化查詢語句優化
- SQL 語句的優化方法SQL優化
- 蘊含式(包含EXISTS語句的分析)
- 使用SQL調整顧問進行語句優化SQL優化
- 資料庫效能優化之SQL語句優化資料庫優化SQL
- 淺談mysql配置優化和sql語句優化MySql優化
- 優化 SQL 語句的步驟優化SQL
- mysql 語句的索引和優化MySql索引優化
- 一個SQL語句的優化SQL優化
- Oracle SQL語句優化之UNIONOracleSQL優化
- SQL語句操作符優化SQL優化
- 關於sql語句的優化SQL優化
- SQL語句優化技術分析SQL優化
- SQL語句優化方法30例SQL優化
- 一條sql語句的優化SQL優化
- sql語句的優化案例分析SQL優化
- 【PostgreSQL 】PostgreSQL 15對distinct的優化SQL優化
- MYSQL 中 exists 語句執行效率變低MySql
- exists與in子查詢優化優化
- 使用hint來調優sql語句SQL