hash join\nest loop join\sort merge join的實驗

to_be_Dba發表於2012-12-25

前幾天看了join相關的知識,做了個實驗,今天整理看到了,貼出來與大家探討。分析部分是自己看文件後想的,不知道對不對。

以下結果都是執行多次,消除物理讀後的結果

SQL> select ename from emp,dept where dept.dname='SALES' and dept.deptno=emp.deptno;

ENAME
----------
……

6 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 844388907

----------------------------------------------------------------------------------------

| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |

----------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT             |         |     5 |   110 |     6  (17)| 00:00:01 |

|   1 |  MERGE JOIN                  |         |     5 |   110 |     6  (17)| 00:00:01 |

|*  2 |   TABLE ACCESS BY INDEX ROWID| DEPT    |     1 |    13 |     2   (0)| 00:00:01 |

|   3 |    INDEX FULL SCAN           | PK_DEPT |     4 |       |     1   (0)| 00:00:01 |

|*  4 |   SORT JOIN                  |         |    15 |   135 |     4  (25)| 00:00:01 |

|   5 |    TABLE ACCESS FULL         | EMP     |    15 |   135 |     3   (0)| 00:00:01 |

----------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("DEPT"."DNAME"='SALES')
   4 - access("DEPT"."DEPTNO"="EMP"."DEPTNO")
       filter("DEPT"."DEPTNO"="EMP"."DEPTNO")


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
        508  bytes sent via SQL*Net to client
        420  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          6  rows processed


預設情況下選擇了sort merge join。

SQL> select /*+use_nl(emp,dept)*/ename from emp,dept where dept.dname='SALES' and dept.deptno=emp.deptno;

ENAME
----------
……

6 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 4192419542

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     5 |   110 |     6   (0)| 00:00:01 |
|   1 |  NESTED LOOPS      |      |     5 |   110 |     6   (0)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| DEPT |     1 |    13 |     3   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| EMP  |     5 |    45 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("DEPT"."DNAME"='SALES')
   3 - filter("DEPT"."DEPTNO"="EMP"."DEPTNO")


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         12  consistent gets
          0  physical reads
          0  redo size
        508  bytes sent via SQL*Net to client
        420  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          6  rows processed

來看看hash join的效果:
SQL>  select /*+use_hash(emp,dept)*/ename from emp,dept where dept.dname='SALES' and dept.deptno=emp.deptno;

ENAME
----------
BLAKE
JAMES
ALLEN
WARD
MARTIN
TURNER

6 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 615168685

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     5 |   110 |     7  (15)| 00:00:01 |
|*  1 |  HASH JOIN         |      |     5 |   110 |     7  (15)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| DEPT |     1 |    13 |     3   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| EMP  |    15 |   135 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("DEPT"."DEPTNO"="EMP"."DEPTNO")
   2 - filter("DEPT"."DNAME"='SALES')


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         11  consistent gets
          0  physical reads
          0  redo size
        508  bytes sent via SQL*Net to client
        420  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          6  rows processed


可以看到,第一種方式的consistent gets值是最小的。可以這樣理解:
我們的目的是以部門名為條件,根據部門編號到職員表中得到某一個部門的職員名。
有三種方法可以實現:
(1)sort merge join:先將兩個表分別排序,然後按照外來鍵值deptno來對應。先對emp表按照deptno進行排序,然後根據‘SALES’通過索引找到對應的部門編號30,再將已經排序的emp中相應資料輸出。由於emp、dept表存在確定的對應關係,即外來鍵約束,因此使用這種方式是比較高效的。
(2)nest loop join:相當於對兩個表分別做了一次迴圈,對兩個表做全表掃描。由於dept表中對30以外的部門編號不需要查詢,這時候對deptno全表掃描就會多付出一定的成本。
(3)hash join:oracle根據對行的評估,認為dept表中資料較少,將dept表滿足條件的資料(即deptno=30的資料)讀入記憶體,稱為hash表,然後與磁碟上的emp表匹配,得到結果。


再來看看查詢emp全表資料的情況:
select emp.*  from emp,dept where dept.dname='SALES' and dept.deptno=emp.deptno;

……

6 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 844388907

----------------------------------------------------------------------------------------

| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |

----------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT             |         |     5 |   250 |     6  (17)| 00:00:01 |

|   1 |  MERGE JOIN                  |         |     5 |   250 |     6  (17)| 00:00:01 |

|*  2 |   TABLE ACCESS BY INDEX ROWID| DEPT    |     1 |    13 |     2   (0)| 00:00:01 |

|   3 |    INDEX FULL SCAN           | PK_DEPT |     4 |       |     1   (0)| 00:00:01 |

|*  4 |   SORT JOIN                  |         |    15 |   555 |     4  (25)| 00:00:01 |

|   5 |    TABLE ACCESS FULL         | EMP     |    15 |   555 |     3   (0)| 00:00:01 |

----------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("DEPT"."DNAME"='SALES')
   4 - access("DEPT"."DEPTNO"="EMP"."DEPTNO")
       filter("DEPT"."DEPTNO"="EMP"."DEPTNO")


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
       1089  bytes sent via SQL*Net to client
        420  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          6  rows processed

執行計劃和只查詢職員名基本一樣。
下面換一種思路,如果是查詢工資最高員工的部門名稱,會使用哪種方式呢?
按照最普通的想法,應該是先對emp表全表掃描得到工資最高的員工部門號,再根據部門編號通過索引找到部門名稱。應該還是sort merge join吧。

SQL> select dname from dept where deptno=(select deptno from emp where sal=(select max(sal) from emp));

DNAME
--------------
ACCOUNTING


Execution Plan
----------------------------------------------------------
Plan hash value: 603094018

------------------------------------------------------------------------------------------

| Id  | Operation                      | Name    | Rows  | Bytes | Cost (%CPU)|Time     |

------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT               |         |     1 |    13 |     4   (0)|00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID   | DEPT    |     1 |    13 |     1   (0)|00:00:01 |

|*  2 |   INDEX UNIQUE SCAN            | PK_DEPT |     1 |       |     0   (0)|00:00:01 |

|   3 |    TABLE ACCESS BY INDEX ROWID | EMP     |     1 |     7 |     2   (0)|00:00:01 |

|*  4 |     INDEX RANGE SCAN           | IDX_SAL |     1 |       |     1   (0)|00:00:01 |

|   5 |      SORT AGGREGATE            |         |     1 |     4 |            |         |

|   6 |       INDEX FULL SCAN (MIN/MAX)| IDX_SAL |     1 |     4 |     1   (0)|00:00:01 |

------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("DEPTNO"= (SELECT "DEPTNO" FROM "EMP" "EMP" WHERE "SAL"= (SELECT
              MAX("SAL") FROM "EMP" "EMP")))
   4 - access("SAL"= (SELECT MAX("SAL") FROM "EMP" "EMP"))


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          5  consistent gets
          0  physical reads
          0  redo size
        427  bytes sent via SQL*Net to client
        420  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

結果與想法差異很大,這個語句確實不好表示。根據執行計劃,應該是逐層處理的。oracle根據索引idx_sal找到最大值,然後sort aggregate操作,再對idx_sal進行範圍查詢;然後根據索引查詢rowid,這時應該得到了sal最大的職員資訊,在通過主鍵唯一索引找到對應的部門編號,最後通過dept表中deptno列的索引找到rowid。

Execution Plan
----------------------------------------------------------
Plan hash value: 2360289771

---------------------------------------------------------------------------------------

| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |

---------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |         |     1 |    13 |     7   (0)| 00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID| DEPT    |     1 |    13 |     1   (0)| 00:00:01 |

|*  2 |   INDEX UNIQUE SCAN         | PK_DEPT |     1 |       |     0   (0)| 00:00:01 |

|*  3 |    TABLE ACCESS FULL        | EMP     |     1 |     7 |     3   (0)| 00:00:01 |

|   4 |     SORT AGGREGATE          |         |     1 |     4 |            |      |

|   5 |      TABLE ACCESS FULL      | EMP     |    15 |    60 |     3   (0)| 00:00:01 |

---------------------------------------------------------------------------------------

刪除sal列上的索引後,執行計劃如上。對sort aggregate很是不理解,論壇中給出的答案是在:如果返回值是一行,就會有此操作。

 

 

============================================================================================
SQL> create table dbt as select * from dba_tables;

Table created.

SQL> create table dbi as select * from dba_indexes;

Table created.


SQL> select dbi.* from dbi,dbt where dbi.table_name=dbt.table_name and dbt.num_rows>1000;

Execution Plan
----------------------------------------------------------
Plan hash value: 3672305388

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  4612 |  5003K|    80   (2)| 00:00:01 |
|*  1 |  HASH JOIN         |      |  4612 |  5003K|    80   (2)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| DBT  |  2689 | 61847 |    31   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| DBI  |  4603 |  4890K|    48   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("DBI"."TABLE_NAME"="DBT"."TABLE_NAME")
   2 - filter("DBT"."NUM_ROWS">1000)

Note
-----
   - dynamic sampling used for this statement (level=2)


SQL> select * from dbi where table_name=(select table_name from dbt where num_rows>1000);

Execution Plan
----------------------------------------------------------
Plan hash value: 3796313462

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    46 | 50048 |    50   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL | DBI  |    46 | 50048 |    48   (0)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| DBT  |     2 |    46 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("TABLE_NAME"= (SELECT "TABLE_NAME" FROM "DBT" "DBT" WHERE
              "NUM_ROWS">1000))
   2 - filter("NUM_ROWS">1000)

Note
-----
   - dynamic sampling used for this statement (level=2)

由於oracle在生成執行計劃前會對語句進行幾個操作:
對檢視進行謂詞合併,即將對簡單檢視的操作改寫為對錶的操作;
對複雜檢視,如果可以將條件合併為對錶的操作,將進行改寫;
對巢狀語句的改寫,即如果oracle認為巢狀語句能夠改寫為兩個表的合併操作,將進行改寫;
當語句呼叫了物化檢視時,嘗試物化檢視重寫來提高效率。


官方說法是query transformation,包括view merging、predicate pushing、subquery unnesting in subquery nesting、query rewrite with materialized view

上面的語句為什麼沒有進行subquery nesting的操作呢?
由於我們沒有對兩個表進行關聯,oracle不知道兩個表的對應關係,因此無從改寫。
來看下面的例子:


SQL> select * from emp where deptno=(select deptno from dept where dname='SALES');

Execution Plan
----------------------------------------------------------
Plan hash value: 2858400319

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     5 |   185 |     6   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL | EMP  |     5 |   185 |     3   (0)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| DEPT |     1 |    13 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("DEPTNO"= (SELECT "DEPTNO" FROM "DEPT" "DEPT" WHERE
              "DNAME"='SALES'))
   2 - filter("DNAME"='SALES')

SQL> select * from emp where deptno in(select deptno from dept );

Execution Plan
----------------------------------------------------------
Plan hash value: 3956160932

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    15 |   555 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| EMP  |    15 |   555 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("DEPTNO" IS NOT NULL)

SQL>  select * from emp where deptno in(select deptno from dept where dname='SALES');

Execution Plan
----------------------------------------------------------
Plan hash value: 844388907

----------------------------------------------------------------------------------------

| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |

----------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT             |         |     5 |   250 |     6  (17)| 00:00:01 |

|   1 |  MERGE JOIN                  |         |     5 |   250 |     6  (17)| 00:00:01 |

|*  2 |   TABLE ACCESS BY INDEX ROWID| DEPT    |     1 |    13 |     2   (0)| 00:00:01 |

|   3 |    INDEX FULL SCAN           | PK_DEPT |     4 |       |     1   (0)| 00:00:01 |

|*  4 |   SORT JOIN                  |         |    15 |   555 |     4  (25)| 00:00:01 |

|   5 |    TABLE ACCESS FULL         | EMP     |    15 |   555 |     3   (0)| 00:00:01 |

----------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("DNAME"='SALES')
   4 - access("DEPTNO"="DEPTNO")
       filter("DEPTNO"="DEPTNO")


通過上面的三個執行計劃可以看到:
使用等號連線兩個巢狀表時,沒有進行巢狀到非巢狀的轉化
當兩個表間存在外來鍵約束時,oracle知道子表中的外來鍵列在主表中肯定是存在的,因此直接進行全表掃描,而不需要再去主表中檢查是否有對應關係了。
使用in操作符連線兩個巢狀表,且oracle知道兩個表間存在依賴關係時,進行了從巢狀查詢到非巢狀查詢的改寫。

更進一步地將,第1、3個例子的結果其實是等價的。但從cost值可以看出很大的差別。
我們的最終目的是成本最低的得到結果,而例1中的成本顯然比例3低,雖然使用了巢狀查詢。


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SQL> select d.deptno ,e.empno from emp e full outer join dept d
  2  on e.deptno=d.deptno
  3  order by d.deptno;

Execution Plan
----------------------------------------------------------
Plan hash value: 2430776502

------------------------------------------------------------------------------------

| Id  | Operation               | Name     | Rows  | Bytes | Cost (%CPU)| Time   |

------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT        |          |    16 |   416 |     7  (29)| 00:00:01 |

|   1 |  SORT ORDER BY          |          |    16 |   416 |     7  (29)| 00:00:01 |

|   2 |   VIEW                  | VW_FOJ_0 |    16 |   416 |     6  (17)| 00:00:01 |

|*  3 |    HASH JOIN FULL OUTER |          |    16 |   160 |     6  (17)| 00:00:01 |

|   4 |     INDEX FAST FULL SCAN| PK_DEPT  |     4 |    12 |     2   (0)| 00:00:01 |

|   5 |     TABLE ACCESS FULL   | EMP      |    15 |   105 |     3   (0)| 00:00:01 |

--------------------------------------------------------------------------------
----


Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("E"."DEPTNO"="D"."DEPTNO")

以上是hash join full outer的例子

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/26451536/viewspace-751601/,如需轉載,請註明出處,否則將追究法律責任。

相關文章