hash join\nest loop join\sort merge join的實驗
前幾天看了join相關的知識,做了個實驗,今天整理看到了,貼出來與大家探討。分析部分是自己看文件後想的,不知道對不對。
以下結果都是執行多次,消除物理讀後的結果
SQL> select ename from emp,dept where dept.dname='SALES' and dept.deptno=emp.deptno;
ENAME
----------
……
6 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 844388907
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 110 | 6 (17)| 00:00:01 |
| 1 | MERGE JOIN | | 5 | 110 | 6 (17)| 00:00:01 |
|* 2 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 13 | 2 (0)| 00:00:01 |
| 3 | INDEX FULL SCAN | PK_DEPT | 4 | | 1 (0)| 00:00:01 |
|* 4 | SORT JOIN | | 15 | 135 | 4 (25)| 00:00:01 |
| 5 | TABLE ACCESS FULL | EMP | 15 | 135 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("DEPT"."DNAME"='SALES')
4 - access("DEPT"."DEPTNO"="EMP"."DEPTNO")
filter("DEPT"."DEPTNO"="EMP"."DEPTNO")
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
7 consistent gets
0 physical reads
0 redo size
508 bytes sent via SQL*Net to client
420 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
6 rows processed
預設情況下選擇了sort merge join。
SQL> select /*+use_nl(emp,dept)*/ename from emp,dept where dept.dname='SALES' and dept.deptno=emp.deptno;
ENAME
----------
……
6 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 4192419542
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 110 | 6 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 5 | 110 | 6 (0)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| DEPT | 1 | 13 | 3 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| EMP | 5 | 45 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("DEPT"."DNAME"='SALES')
3 - filter("DEPT"."DEPTNO"="EMP"."DEPTNO")
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
12 consistent gets
0 physical reads
0 redo size
508 bytes sent via SQL*Net to client
420 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
6 rows processed
來看看hash join的效果:
SQL> select /*+use_hash(emp,dept)*/ename from emp,dept where dept.dname='SALES' and dept.deptno=emp.deptno;
ENAME
----------
BLAKE
JAMES
ALLEN
WARD
MARTIN
TURNER
6 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 615168685
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 110 | 7 (15)| 00:00:01 |
|* 1 | HASH JOIN | | 5 | 110 | 7 (15)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| DEPT | 1 | 13 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| EMP | 15 | 135 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("DEPT"."DEPTNO"="EMP"."DEPTNO")
2 - filter("DEPT"."DNAME"='SALES')
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
11 consistent gets
0 physical reads
0 redo size
508 bytes sent via SQL*Net to client
420 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
6 rows processed
可以看到,第一種方式的consistent gets值是最小的。可以這樣理解:
我們的目的是以部門名為條件,根據部門編號到職員表中得到某一個部門的職員名。
有三種方法可以實現:
(1)sort merge join:先將兩個表分別排序,然後按照外來鍵值deptno來對應。先對emp表按照deptno進行排序,然後根據‘SALES’通過索引找到對應的部門編號30,再將已經排序的emp中相應資料輸出。由於emp、dept表存在確定的對應關係,即外來鍵約束,因此使用這種方式是比較高效的。
(2)nest loop join:相當於對兩個表分別做了一次迴圈,對兩個表做全表掃描。由於dept表中對30以外的部門編號不需要查詢,這時候對deptno全表掃描就會多付出一定的成本。
(3)hash join:oracle根據對行的評估,認為dept表中資料較少,將dept表滿足條件的資料(即deptno=30的資料)讀入記憶體,稱為hash表,然後與磁碟上的emp表匹配,得到結果。
再來看看查詢emp全表資料的情況:
select emp.* from emp,dept where dept.dname='SALES' and dept.deptno=emp.deptno;
……
6 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 844388907
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 250 | 6 (17)| 00:00:01 |
| 1 | MERGE JOIN | | 5 | 250 | 6 (17)| 00:00:01 |
|* 2 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 13 | 2 (0)| 00:00:01 |
| 3 | INDEX FULL SCAN | PK_DEPT | 4 | | 1 (0)| 00:00:01 |
|* 4 | SORT JOIN | | 15 | 555 | 4 (25)| 00:00:01 |
| 5 | TABLE ACCESS FULL | EMP | 15 | 555 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("DEPT"."DNAME"='SALES')
4 - access("DEPT"."DEPTNO"="EMP"."DEPTNO")
filter("DEPT"."DEPTNO"="EMP"."DEPTNO")
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
7 consistent gets
0 physical reads
0 redo size
1089 bytes sent via SQL*Net to client
420 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
6 rows processed
執行計劃和只查詢職員名基本一樣。
下面換一種思路,如果是查詢工資最高員工的部門名稱,會使用哪種方式呢?
按照最普通的想法,應該是先對emp表全表掃描得到工資最高的員工部門號,再根據部門編號通過索引找到部門名稱。應該還是sort merge join吧。
SQL> select dname from dept where deptno=(select deptno from emp where sal=(select max(sal) from emp));
DNAME
--------------
ACCOUNTING
Execution Plan
----------------------------------------------------------
Plan hash value: 603094018
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 4 (0)|00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID | DEPT | 1 | 13 | 1 (0)|00:00:01 |
|* 2 | INDEX UNIQUE SCAN | PK_DEPT | 1 | | 0 (0)|00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID | EMP | 1 | 7 | 2 (0)|00:00:01 |
|* 4 | INDEX RANGE SCAN | IDX_SAL | 1 | | 1 (0)|00:00:01 |
| 5 | SORT AGGREGATE | | 1 | 4 | | |
| 6 | INDEX FULL SCAN (MIN/MAX)| IDX_SAL | 1 | 4 | 1 (0)|00:00:01 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("DEPTNO"= (SELECT "DEPTNO" FROM "EMP" "EMP" WHERE "SAL"= (SELECT
MAX("SAL") FROM "EMP" "EMP")))
4 - access("SAL"= (SELECT MAX("SAL") FROM "EMP" "EMP"))
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
427 bytes sent via SQL*Net to client
420 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
結果與想法差異很大,這個語句確實不好表示。根據執行計劃,應該是逐層處理的。oracle根據索引idx_sal找到最大值,然後sort aggregate操作,再對idx_sal進行範圍查詢;然後根據索引查詢rowid,這時應該得到了sal最大的職員資訊,在通過主鍵唯一索引找到對應的部門編號,最後通過dept表中deptno列的索引找到rowid。
Execution Plan
----------------------------------------------------------
Plan hash value: 2360289771
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 7 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 13 | 1 (0)| 00:00:01 |
|* 2 | INDEX UNIQUE SCAN | PK_DEPT | 1 | | 0 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL | EMP | 1 | 7 | 3 (0)| 00:00:01 |
| 4 | SORT AGGREGATE | | 1 | 4 | | |
| 5 | TABLE ACCESS FULL | EMP | 15 | 60 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
刪除sal列上的索引後,執行計劃如上。對sort aggregate很是不理解,論壇中給出的答案是在:如果返回值是一行,就會有此操作。
============================================================================================
SQL> create table dbt as select * from dba_tables;
Table created.
SQL> create table dbi as select * from dba_indexes;
Table created.
SQL> select dbi.* from dbi,dbt where dbi.table_name=dbt.table_name and dbt.num_rows>1000;
Execution Plan
----------------------------------------------------------
Plan hash value: 3672305388
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4612 | 5003K| 80 (2)| 00:00:01 |
|* 1 | HASH JOIN | | 4612 | 5003K| 80 (2)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| DBT | 2689 | 61847 | 31 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| DBI | 4603 | 4890K| 48 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("DBI"."TABLE_NAME"="DBT"."TABLE_NAME")
2 - filter("DBT"."NUM_ROWS">1000)
Note
-----
- dynamic sampling used for this statement (level=2)
SQL> select * from dbi where table_name=(select table_name from dbt where num_rows>1000);
Execution Plan
----------------------------------------------------------
Plan hash value: 3796313462
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 46 | 50048 | 50 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL | DBI | 46 | 50048 | 48 (0)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| DBT | 2 | 46 | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("TABLE_NAME"= (SELECT "TABLE_NAME" FROM "DBT" "DBT" WHERE
"NUM_ROWS">1000))
2 - filter("NUM_ROWS">1000)
Note
-----
- dynamic sampling used for this statement (level=2)
由於oracle在生成執行計劃前會對語句進行幾個操作:
對檢視進行謂詞合併,即將對簡單檢視的操作改寫為對錶的操作;
對複雜檢視,如果可以將條件合併為對錶的操作,將進行改寫;
對巢狀語句的改寫,即如果oracle認為巢狀語句能夠改寫為兩個表的合併操作,將進行改寫;
當語句呼叫了物化檢視時,嘗試物化檢視重寫來提高效率。
官方說法是query transformation,包括view merging、predicate pushing、subquery unnesting in subquery nesting、query rewrite with materialized view
上面的語句為什麼沒有進行subquery nesting的操作呢?
由於我們沒有對兩個表進行關聯,oracle不知道兩個表的對應關係,因此無從改寫。
來看下面的例子:
SQL> select * from emp where deptno=(select deptno from dept where dname='SALES');
Execution Plan
----------------------------------------------------------
Plan hash value: 2858400319
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 185 | 6 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL | EMP | 5 | 185 | 3 (0)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| DEPT | 1 | 13 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("DEPTNO"= (SELECT "DEPTNO" FROM "DEPT" "DEPT" WHERE
"DNAME"='SALES'))
2 - filter("DNAME"='SALES')
SQL> select * from emp where deptno in(select deptno from dept );
Execution Plan
----------------------------------------------------------
Plan hash value: 3956160932
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 15 | 555 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| EMP | 15 | 555 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("DEPTNO" IS NOT NULL)
SQL> select * from emp where deptno in(select deptno from dept where dname='SALES');
Execution Plan
----------------------------------------------------------
Plan hash value: 844388907
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 250 | 6 (17)| 00:00:01 |
| 1 | MERGE JOIN | | 5 | 250 | 6 (17)| 00:00:01 |
|* 2 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 13 | 2 (0)| 00:00:01 |
| 3 | INDEX FULL SCAN | PK_DEPT | 4 | | 1 (0)| 00:00:01 |
|* 4 | SORT JOIN | | 15 | 555 | 4 (25)| 00:00:01 |
| 5 | TABLE ACCESS FULL | EMP | 15 | 555 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("DNAME"='SALES')
4 - access("DEPTNO"="DEPTNO")
filter("DEPTNO"="DEPTNO")
通過上面的三個執行計劃可以看到:
使用等號連線兩個巢狀表時,沒有進行巢狀到非巢狀的轉化
當兩個表間存在外來鍵約束時,oracle知道子表中的外來鍵列在主表中肯定是存在的,因此直接進行全表掃描,而不需要再去主表中檢查是否有對應關係了。
使用in操作符連線兩個巢狀表,且oracle知道兩個表間存在依賴關係時,進行了從巢狀查詢到非巢狀查詢的改寫。
更進一步地將,第1、3個例子的結果其實是等價的。但從cost值可以看出很大的差別。
我們的最終目的是成本最低的得到結果,而例1中的成本顯然比例3低,雖然使用了巢狀查詢。
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SQL> select d.deptno ,e.empno from emp e full outer join dept d
2 on e.deptno=d.deptno
3 order by d.deptno;
Execution Plan
----------------------------------------------------------
Plan hash value: 2430776502
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 16 | 416 | 7 (29)| 00:00:01 |
| 1 | SORT ORDER BY | | 16 | 416 | 7 (29)| 00:00:01 |
| 2 | VIEW | VW_FOJ_0 | 16 | 416 | 6 (17)| 00:00:01 |
|* 3 | HASH JOIN FULL OUTER | | 16 | 160 | 6 (17)| 00:00:01 |
| 4 | INDEX FAST FULL SCAN| PK_DEPT | 4 | 12 | 2 (0)| 00:00:01 |
| 5 | TABLE ACCESS FULL | EMP | 15 | 105 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------
----
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("E"."DEPTNO"="D"."DEPTNO")
以上是hash join full outer的例子
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/26451536/viewspace-751601/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- nested loop,sort merge join,hash joinOOP
- HASH JOIN ,MERGE JOIN ,NESTED LOOPOOP
- 表的連線方式:NESTED LOOP、HASH JOIN、SORT MERGE JOIN(轉)OOP
- HASH JOIN ,MERGE JOIN ,NESTED LOOP的比較OOP
- HASH JOIN ,MERGE JOIN ,NESTED LOOP(R2)OOP
- Mysql join 的本質,Nested Loop Join 和 Hash JoinMySqlOOP
- NEST LOOP改為HASH JOIN 效能提高6倍OOP
- 多表連線的三種方式詳解 hash join、merge join、 nested loopOOP
- Nested loops、Hash join、Sort merge join(三種連線型別原理、使用要點)OOP型別
- Oracle Sort JoinOracle
- ORACLE Hash JoinOracle
- 排序合併連線(sort merge join)的原理排序
- HASH join詳解
- join、inner join、left join、right join、outer join的區別
- oracle實驗記錄 (計算hash join cost)Oracle
- 【sql調優之執行計劃】merge sort joinSQL
- oracle實驗記錄 (PGA manual or auto 與hash join)Oracle
- 【sql調優之執行計劃】merge semi join and merge anti joinSQL
- sql中的join、left join、right joinSQL
- Oracle中的Hash Join祥解Oracle
- Oracle 的 hash join連線方式Oracle
- Oracle中的Hash Join詳解Oracle
- MySQL LEFT JOIN/ INNER JOIN/RIGHT JOINMySql
- Hash join演算法原理演算法
- PostgreSQL/GreenPlum Merge Inner Join解密SQL解密
- 資料庫實現原理#4(Hash Join)資料庫
- 大表範圍掃描走SORT MERGE JOIN的SQL優化SQL優化
- Oracle表連線操作——Merge Sort Join(合併排序連線)Oracle排序
- sql之left join、right join、inner join的區別SQL
- Oracle中的Hash Join詳解 ztOracle
- Hash join演算法原理(轉)演算法
- oracle hash join演算法原理Oracle演算法
- 資料庫實現原理#1(Nested Loop Join)資料庫OOP
- ZT:簡單介紹join,outer-join,semi-join,anti-join的區別
- 【sql調優之執行計劃】nested loops join and nested loop join outerSQLOOP
- sql的left join 、right join 、inner join之間的區別SQL
- mysql中的left join、right join 、inner join的詳細用法MySql
- mysql left join轉inner joinMySql