exists和not exists及in和not in的用法與區別

不一樣的天空w發表於2017-05-20
一、exists和not exists的用法

exists : 強調的是是否返回結果集,不要求知道返回什麼, 比如: select name from student where sex = 'm' and mark exists(select 1 from grade where ...) ,只要exists引導的子句有結果集返回,那麼exists這個條件就算成立了,大家注意返回的欄位始終為1,如果改成“select 2 from grade where ...”,那麼返回的欄位就是2,這個數字沒有意義。所以exists子句不在乎返回什麼,而是在乎是不是有結果集返回。

而 exists 與 in 最大的區別在於 in引導的子句只能返回一個欄位,比如:
select name from student where sex = 'm' and mark in (select 1,2,3 from grade where ...),in子句返回了三個欄位,這是不正確的,exists子句是允許的,但in只允許有一個欄位返回,在1,2,3中隨便去了兩個欄位即可。

而not exists 和not in 分別是exists 和 in 的 對立面。

exists (sql 返回結果集為真)
not exists (sql 不返回結果集為真)


表A
ID   NAME
1    A1
2    A2
3    A3

表B
ID   AID NAME
1    1   B1
2    2   B2
3    2   B3

表A和表B是1對多的關係 A.ID => B.AID

實驗如下


SQL> create table a (id number,name varchar2(10));

Table created.

SQL> create table b (id number,aid number,name varchar2(10));

Table created.

SQL> insert into a values(1,'A1');

1 row created.

SQL> insert into a values(2,'A2');

1 row created.

SQL> insert into a values(3,'A3');

1 row created.

SQL> insert into b values(1,1,'B1');

1 row created.

SQL> insert into b values(2,2,'B2');

1 row created.

SQL> insert into b values(3,2,'B3');

1 row created.

SQL> commit;

SQL> select id,name from a;

        ID NAME
---------- ----------
         1 A1
         2 A2
         3 A3

SQL> select * from b;

        ID        AID NAME
---------- ---------- ----------
         1          1 B1
         2          2 B2
         3          2 B3

SQL> select * from b,a where a.id=b.aid;

        ID        AID NAME               ID NAME
---------- ---------- ---------- ---------- ----------
         1          1 B1                  1 A1
         3          2 B3                  2 A2
         2          2 B2                  2 A2

SQL> select id,name from a where exists (select 1 from b where a.id=b.aid);

        ID NAME
---------- ----------
         1 A1
         2 A2
         
SQL> select id,name from a where exists (select * from b where a.id=b.aid);

        ID NAME
---------- ----------
         1 A1
         2 A2
         
上述結果分析:原因可以按照如下分析
SELECT ID,NAME FROM A WHERE EXISTS (SELECT * FROM B WHERE B.AID=1)
--->SELECT * FROM B WHERE B.AID=1有值返回真所以有資料

SELECT ID,NAME FROM A WHERE EXISTS (SELECT * FROM B WHERE B.AID=2)
--->SELECT * FROM B WHERE B.AID=2有值返回真所以有資料

SELECT ID,NAME FROM A WHERE EXISTS (SELECT * FROM B WHERE B.AID=3)
--->SELECT * FROM B WHERE B.AID=3無值返回真所以沒有資料

2.NOT EXISTS 就是反過來         
SQL> select id,name from a where not exists (select * from b where a.id=b.aid);

        ID NAME
---------- ----------         3 A3

3.in /not in
EXISTS = IN,意思相同不過語法上有點點區別,好像使用IN效率要差點,應該是不會執行索引的原因
SQL> SELECT ID,NAME FROM A WHERE ID IN (SELECT AID FROM B);

        ID NAME
---------- --------------------
         1 A1
         2 A2

NOT EXISTS = NOT IN ,意思相同不過語法上有點點區別
SQL> SELECT ID,NAME FROM A WHERE ID NOT IN(SELECT AID FROM B);

        ID NAME
---------- --------------------
         3 A3

二、in與exist , not in與not exist 的區別


in和exists   in 是把外表和內表作hash 連線,而exists是對外表作loop迴圈,每次loop迴圈再對內表進行查詢。一直以來認為exists比in效率高的說法是不準確的。  

如果查詢的兩個表大小相當,那麼用in和exists差別不大。  
如果兩個表中一個較小,一個是大表,則子查詢表大的用exists,子查詢表小的用in:  
例如:表A(小表),表B(大表)
1:select * from A where cc in (select cc from B)  
效率低,用到了A表上cc列的索引;
select * from A where exists(select cc from B where cc=A.cc)  
效率高,用到了B表上cc列的索引。
 
相反的2:
select * from B where cc in (select cc from A)  
效率高,用到了B表上cc列的索引;
select * from B where exists(select cc from A where cc=B.cc)  
效率低,用到了A表上cc列的索引。  
 
not in 和not exists  
 
如果查詢語句使用了not in 那麼內外表都進行全表掃描,沒有用到索引;而not extsts 的子查詢依然能用到表上的索引。所以無論那個表大,用not exists都比not in要快。  

not in 邏輯上不完全等同於not exists,如果你誤用了not in,小心你的程式存在致命的BUG:  
 
請看下面的例子:  
create table t1 (c1 number,c2 number);  
create table t2 (c1 number,c2 number);  
 
insert into t1 values (1,2);  
insert into t1 values (1,3);  
insert into t2 values (1,2);  
insert into t2 values (1,null);      

SQL> select * from t1;

        C1         C2
---------- ----------
         1          2
         1          3

SQL>
SQL> select * from t2;

        C1         C2
---------- ----------
         1          2
         1

--查詢如下:
SQL> select * from t1 where c2 not in (select c2 from t2);  

no rows selected

SQL> select * from t1 where not exists (select 1 from t2 where t1.c2=t2.c2);

        C1         C2
---------- ----------
         1          3
 
正如所看到的,not in 出現了不期望的結果集,存在邏輯錯誤。如果看一下上述兩個select語句的執行計劃,也會不同。後者使用了hash_aj。  
因此,請儘量不要使用not in(它會呼叫子查詢),而儘量使用not exists(它會呼叫關聯子查詢)。如果子查詢中返回的任意一條記錄含有空值,則查詢將不返回任何記錄,正如上面例子所示。  除非子查詢欄位有非空限制,這時可以使用not in ,並且也可以透過提示讓它使用hasg_aj或merge_aj連線 

三、實驗測試in,exists,not in, not exists

建立測試表test1/test2:

SQL> create table test1 as select object_id,object_name,object_type from dba_objects;

Table created.

SQL> select count(1) from test1;

  COUNT(1)
----------
     87042

SQL> create table test2 as select object_id,object_name,object_type from dba_objects where rownum <5000;

Table created.

SQL> select count(1) from test2;

  COUNT(1)
----------
      4999
      
SQL> alter table test1 add constraint pk_test1 primary key(object_id);

Table altered.

SQL> alter table test2 add constraint pk_test2 primary key(object_id);

Table altered.

SQL> create index idx_test1_name on test1(object_name);

Index created.

SQL> create index idx_test2_name on test2(object_name);

Index created.

SQL>
SQL> select TABLE_OWNER,table_name,index_name,status from dba_indexes where table_name like 'TEST%';

TABLE_OWNER                    TABLE_NAME                     INDEX_NAME                     STATUS
------------------------------ ------------------------------ ------------------------------ --------
HR                             TEST1                          PK_TEST1                       VALID
HR                             TEST1                          IDX_TEST1_NAME                 VALID
HR                             TEST2                          PK_TEST2                       VALID
HR                             TEST2                          IDX_TEST2_NAME                 VALID

SQL> col COLUMN_NAME for a15
SQL> select TABLE_OWNER,TABLE_NAME,INDEX_NAME,COLUMN_NAME,COLUMN_POSITION from dba_ind_columns where table_name like 'TEST%';

TABLE_OWNER                    TABLE_NAME                     INDEX_NAME                     COLUMN_NAME     COLUMN_POSITION
------------------------------ ------------------------------ ------------------------------ --------------- ---------------
HR                             TEST1                          PK_TEST1                       OBJECT_ID                     1
HR                             TEST1                          IDX_TEST1_NAME                 OBJECT_NAME                   1
HR                             TEST2                          PK_TEST2                       OBJECT_ID                     1
HR                             TEST2                          IDX_TEST2_NAME                 OBJECT_NAME                   1

1.in & exists 測試

select a.* from test1 a where a.object_name in (select b.object_name from test2 b );

select a.* from test1 a where exists (select 1 from test2 b where a.object_name=b.object_name);

1.1 in 的測試
SQL>  set autot trace;
SQL> set lines 200
SQL> select a.* from test1 a where a.object_name in (select b.object_name from test2 b );

5017 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 311842540
----------------------------------------------------------------------------------------
| Id  | Operation             | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                | 41686 |  6350K|   158   (1)| 00:00:02 |
|*  1 |  HASH JOIN RIGHT SEMI |                | 41686 |  6350K|   158   (1)| 00:00:02 |
|   2 |   INDEX FAST FULL SCAN| IDX_TEST2_NAME |  4999 |   322K|     9   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL   | TEST1          | 93550 |  8222K|   149   (1)| 00:00:02 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("A"."OBJECT_NAME"="B"."OBJECT_NAME")

Note
-----
   - dynamic sampling used for this statement (level=2)

Statistics
----------------------------------------------------------
         26  recursive calls
          0  db block gets
       1009  consistent gets
         20  physical reads
          0  redo size
     205691  bytes sent via SQL*Net to client
       4193  bytes received via SQL*Net from client
        336  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       5017  rows processed


1.2 exists的測試
SQL> select a.* from test1 a where exists (select 1 from test2 b where a.object_name=b.object_name);

5017 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 311842540
----------------------------------------------------------------------------------------
| Id  | Operation             | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                | 41686 |  6350K|   158   (1)| 00:00:02 |
|*  1 |  HASH JOIN RIGHT SEMI |                | 41686 |  6350K|   158   (1)| 00:00:02 |
|   2 |   INDEX FAST FULL SCAN| IDX_TEST2_NAME |  4999 |   322K|     9   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL   | TEST1          | 93550 |  8222K|   149   (1)| 00:00:02 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
--------------------------------------------------
   1 - access("A"."OBJECT_NAME"="B"."OBJECT_NAME")

Note
-----
   - dynamic sampling used for this statement (level=2)

Statistics
----------------------------------------------------------
          7  recursive calls
          0  db block gets
        989  consistent gets
          0  physical reads
          0  redo size
     205691  bytes sent via SQL*Net to client
       4193  bytes received via SQL*Net from client
        336  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       5017  rows processed

1.3總結
透過比較執行計劃我們可以發現in和exists的執行計劃基本是一致的,都以小表(test2)作為外部迴圈表,然後大表(test1)作為內部迴圈表,所以執行效率是一樣的。

注意: 這裡我們把test1和test2的位置交換一下oracle依然能做出正確的判斷,以小表作為外部迴圈表:

select a.* from test2 a where a.object_name in (select b.object_name from test1 b );

select a.* from test2 a where exists (select 1 from test1 b where b.object_name=a.object_name);


SQL> select a.* from test2 a where a.object_name in (select b.object_name from test1 b );

4999 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2045358773
----------------------------------------------------------------------------------------
| Id  | Operation             | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                |  4999 |   761K|   135   (1)| 00:00:02 |
|*  1 |  HASH JOIN SEMI       |                |  4999 |   761K|   135   (1)| 00:00:02 |
|   2 |   TABLE ACCESS FULL   | TEST2          |  4999 |   439K|     9   (0)| 00:00:01 |
|   3 |   INDEX FAST FULL SCAN| IDX_TEST1_NAME | 93550 |  6029K|   125   (0)| 00:00:02 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("A"."OBJECT_NAME"="B"."OBJECT_NAME")

Note
-----
   - dynamic sampling used for this statement (level=2)

Statistics
----------------------------------------------------------
          7  recursive calls
          0  db block gets
        836  consistent gets
        369  physical reads
          0  redo size
     190890  bytes sent via SQL*Net to client
       4182  bytes received via SQL*Net from client
        335  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       4999  rows processed

SQL>
SQL> select a.* from test2 a where exists (select 1 from test1 b where b.object_name=a.object_name);

4999 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2045358773
----------------------------------------------------------------------------------------
| Id  | Operation             | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                |  4999 |   761K|   135   (1)| 00:00:02 |
|*  1 |  HASH JOIN SEMI       |                |  4999 |   761K|   135   (1)| 00:00:02 |
|   2 |   TABLE ACCESS FULL   | TEST2          |  4999 |   439K|     9   (0)| 00:00:01 |
|   3 |   INDEX FAST FULL SCAN| IDX_TEST1_NAME | 93550 |  6029K|   125   (0)| 00:00:02 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("B"."OBJECT_NAME"="A"."OBJECT_NAME")
Note
-----
   - dynamic sampling used for this statement (level=2)

Statistics
----------------------------------------------------------
          7  recursive calls
          0  db block gets
        836  consistent gets
          0  physical reads
          0  redo size
     190890  bytes sent via SQL*Net to client
       4182  bytes received via SQL*Net from client
        335  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       4999  rows processed

2. not in & not exists 測試
2.1 not in 測試

select a.* from test1 a where a.object_name not in (select b.object_name from test2 b );

select a.* from test2 a where a.object_name not in (select b.object_name from test1 b );

SQL> select a.* from test1 a where a.object_name not in (select b.object_name from test2 b );

82025 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3618458784
---------------------------------------------------------------------------------
| Id  | Operation               | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |       | 93550 |    13M|   158   (1)| 00:00:02 |
|*  1 |  HASH JOIN RIGHT ANTI NA|       | 93550 |    13M|   158   (1)| 00:00:02 |
|   2 |   TABLE ACCESS FULL     | TEST2 |  4999 |   322K|     9   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL     | TEST1 | 93550 |  8222K|   149   (1)| 00:00:02 |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("A"."OBJECT_NAME"="B"."OBJECT_NAME")

Note
-----
   - dynamic sampling used for this statement (level=2)

Statistics
----------------------------------------------------------
          7  recursive calls
          0  db block gets
       6089  consistent gets
          0  physical reads
          0  redo size
    4297213  bytes sent via SQL*Net to client
      60667  bytes received via SQL*Net from client
       5470  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      82025  rows processed

SQL>
SQL> select a.* from test2 a where a.object_name not in (select b.object_name from test1 b );

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 1072756877
----------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |    50 |  2600 |   158   (1)| 00:00:02 |
|*  1 |  HASH JOIN ANTI NA |       |    50 |  2600 |   158   (1)| 00:00:02 |
|   2 |   TABLE ACCESS FULL| TEST2 |  4999 |   131K|     9   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| TEST1 | 87042 |  2125K|   149   (1)| 00:00:02 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("A"."OBJECT_NAME"="B"."OBJECT_NAME")

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        551  consistent gets
         17  physical reads
          0  redo size
        483  bytes sent via SQL*Net to client
        508  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed


2.2 not exists 測試

select a.* from test1 a where not exists (select 1 from test2 b where a.object_name=b.object_name);

select a.* from test2 a where not exists (select 1 from test1 b where a.object_name=b.object_name);

SQL> select a.* from test1 a where not exists (select 1 from test2 b where a.object_name=b.object_name);

82025 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3712868752
----------------------------------------------------------------------------------------
| Id  | Operation             | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                | 93550 |    13M|   158   (1)| 00:00:02 |
|*  1 |  HASH JOIN RIGHT ANTI |                | 93550 |    13M|   158   (1)| 00:00:02 |
|   2 |   INDEX FAST FULL SCAN| IDX_TEST2_NAME |  4999 |   322K|     9   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL   | TEST1          | 93550 |  8222K|   149   (1)| 00:00:02 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("A"."OBJECT_NAME"="B"."OBJECT_NAME")

Note
-----
   - dynamic sampling used for this statement (level=2)

Statistics
----------------------------------------------------------
          7  recursive calls
          0  db block gets
       6088  consistent gets
          0  physical reads
          0  redo size
    4297213  bytes sent via SQL*Net to client
      60667  bytes received via SQL*Net from client
       5470  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      82025  rows processed

SQL>  select a.* from test2 a where not exists (select 1 from test1 b where a.object_name=b.object_name);

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 319202148
----------------------------------------------------------------------------------------
| Id  | Operation             | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                |  4999 |   761K|   135   (1)| 00:00:02 |
|*  1 |  HASH JOIN ANTI       |                |  4999 |   761K|   135   (1)| 00:00:02 |
|   2 |   TABLE ACCESS FULL   | TEST2          |  4999 |   439K|     9   (0)| 00:00:01 |
|   3 |   INDEX FAST FULL SCAN| IDX_TEST1_NAME | 93550 |  6029K|   125   (0)| 00:00:02 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("A"."OBJECT_NAME"="B"."OBJECT_NAME")

Note
-----
   - dynamic sampling used for this statement (level=2)

Statistics
----------------------------------------------------------
          7  recursive calls
          0  db block gets
        572  consistent gets
          0  physical reads
          0  redo size
        483  bytes sent via SQL*Net to client
        508  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

2.3 總結
透過觀察這個兩個執行計劃我們可以清楚的看到not in 用的是filter來連線兩張表,而not exists 用的hash join 來連線兩張表。對於只有兩個子節點的filter來說,我們可以把這個filter看成是一個nested loop。

透過執行計劃我們可以清楚的看到,oracle把test2表的結果集作為構造輸入(build input),test1的結果集作為探測資料(probe input),然後把這兩個子集做hash join。
(對於hash join 一般把小表作為build input,這些資料將存放在PGA記憶體中,如果記憶體不夠將放在temporary tablespace)


來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/31397003/viewspace-2139624/,如需轉載,請註明出處,否則將追究法律責任。

相關文章