exists和not exists及in和not in的用法與區別
一、exists和not exists的用法
exists : 強調的是是否返回結果集,不要求知道返回什麼, 比如: select name from student where sex = 'm' and mark exists(select 1 from grade where ...) ,只要exists引導的子句有結果集返回,那麼exists這個條件就算成立了,大家注意返回的欄位始終為1,如果改成“select 2 from grade where ...”,那麼返回的欄位就是2,這個數字沒有意義。所以exists子句不在乎返回什麼,而是在乎是不是有結果集返回。
而 exists 與 in 最大的區別在於 in引導的子句只能返回一個欄位,比如:
select name from student where sex = 'm' and mark in (select 1,2,3 from grade where ...),in子句返回了三個欄位,這是不正確的,exists子句是允許的,但in只允許有一個欄位返回,在1,2,3中隨便去了兩個欄位即可。
而not exists 和not in 分別是exists 和 in 的 對立面。
exists (sql 返回結果集為真)
not exists (sql 不返回結果集為真)
表A
ID NAME
1 A1
2 A2
3 A3
表B
ID AID NAME
1 1 B1
2 2 B2
3 2 B3
表A和表B是1對多的關係 A.ID => B.AID
實驗如下
SQL> create table a (id number,name varchar2(10));
Table created.
SQL> create table b (id number,aid number,name varchar2(10));
Table created.
SQL> insert into a values(1,'A1');
1 row created.
SQL> insert into a values(2,'A2');
1 row created.
SQL> insert into a values(3,'A3');
1 row created.
SQL> insert into b values(1,1,'B1');
1 row created.
SQL> insert into b values(2,2,'B2');
1 row created.
SQL> insert into b values(3,2,'B3');
1 row created.
SQL> commit;
SQL> select id,name from a;
ID NAME
---------- ----------
1 A1
2 A2
3 A3
SQL> select * from b;
ID AID NAME
---------- ---------- ----------
1 1 B1
2 2 B2
3 2 B3
SQL> select * from b,a where a.id=b.aid;
ID AID NAME ID NAME
---------- ---------- ---------- ---------- ----------
1 1 B1 1 A1
3 2 B3 2 A2
2 2 B2 2 A2
SQL> select id,name from a where exists (select 1 from b where a.id=b.aid);
ID NAME
---------- ----------
1 A1
2 A2
SQL> select id,name from a where exists (select * from b where a.id=b.aid);
ID NAME
---------- ----------
1 A1
2 A2
上述結果分析:原因可以按照如下分析
SELECT ID,NAME FROM A WHERE EXISTS (SELECT * FROM B WHERE B.AID=1)
--->SELECT * FROM B WHERE B.AID=1有值返回真所以有資料
SELECT ID,NAME FROM A WHERE EXISTS (SELECT * FROM B WHERE B.AID=2)
--->SELECT * FROM B WHERE B.AID=2有值返回真所以有資料
SELECT ID,NAME FROM A WHERE EXISTS (SELECT * FROM B WHERE B.AID=3)
--->SELECT * FROM B WHERE B.AID=3無值返回真所以沒有資料
2.NOT EXISTS 就是反過來
SQL> select id,name from a where not exists (select * from b where a.id=b.aid);
ID NAME
---------- ---------- 3 A3
3.in /not in
EXISTS = IN,意思相同不過語法上有點點區別,好像使用IN效率要差點,應該是不會執行索引的原因
SQL> SELECT ID,NAME FROM A WHERE ID IN (SELECT AID FROM B);
ID NAME
---------- --------------------
1 A1
2 A2
NOT EXISTS = NOT IN ,意思相同不過語法上有點點區別
SQL> SELECT ID,NAME FROM A WHERE ID NOT IN(SELECT AID FROM B);
ID NAME
---------- --------------------
3 A3
二、in與exist , not in與not exist 的區別
in和exists in 是把外表和內表作hash 連線,而exists是對外表作loop迴圈,每次loop迴圈再對內表進行查詢。一直以來認為exists比in效率高的說法是不準確的。
如果查詢的兩個表大小相當,那麼用in和exists差別不大。
如果兩個表中一個較小,一個是大表,則子查詢表大的用exists,子查詢表小的用in:
例如:表A(小表),表B(大表)
1:select * from A where cc in (select cc from B)
效率低,用到了A表上cc列的索引;
select * from A where exists(select cc from B where cc=A.cc)
效率高,用到了B表上cc列的索引。
相反的2:
select * from B where cc in (select cc from A)
效率高,用到了B表上cc列的索引;
select * from B where exists(select cc from A where cc=B.cc)
效率低,用到了A表上cc列的索引。
not in 和not exists
如果查詢語句使用了not in 那麼內外表都進行全表掃描,沒有用到索引;而not extsts 的子查詢依然能用到表上的索引。所以無論那個表大,用not exists都比not in要快。
not in 邏輯上不完全等同於not exists,如果你誤用了not in,小心你的程式存在致命的BUG:
請看下面的例子:
create table t1 (c1 number,c2 number);
create table t2 (c1 number,c2 number);
insert into t1 values (1,2);
insert into t1 values (1,3);
insert into t2 values (1,2);
insert into t2 values (1,null);
SQL> select * from t1;
C1 C2
---------- ----------
1 2
1 3
SQL>
SQL> select * from t2;
C1 C2
---------- ----------
1 2
1
--查詢如下:
SQL> select * from t1 where c2 not in (select c2 from t2);
no rows selected
SQL> select * from t1 where not exists (select 1 from t2 where t1.c2=t2.c2);
C1 C2
---------- ----------
1 3
正如所看到的,not in 出現了不期望的結果集,存在邏輯錯誤。如果看一下上述兩個select語句的執行計劃,也會不同。後者使用了hash_aj。
因此,請儘量不要使用not in(它會呼叫子查詢),而儘量使用not exists(它會呼叫關聯子查詢)。如果子查詢中返回的任意一條記錄含有空值,則查詢將不返回任何記錄,正如上面例子所示。 除非子查詢欄位有非空限制,這時可以使用not in ,並且也可以透過提示讓它使用hasg_aj或merge_aj連線
三、實驗測試in,exists,not in, not exists
建立測試表test1/test2:
SQL> create table test1 as select object_id,object_name,object_type from dba_objects;
Table created.
SQL> select count(1) from test1;
COUNT(1)
----------
87042
SQL> create table test2 as select object_id,object_name,object_type from dba_objects where rownum <5000;
Table created.
SQL> select count(1) from test2;
COUNT(1)
----------
4999
SQL> alter table test1 add constraint pk_test1 primary key(object_id);
Table altered.
SQL> alter table test2 add constraint pk_test2 primary key(object_id);
Table altered.
SQL> create index idx_test1_name on test1(object_name);
Index created.
SQL> create index idx_test2_name on test2(object_name);
Index created.
SQL>
SQL> select TABLE_OWNER,table_name,index_name,status from dba_indexes where table_name like 'TEST%';
TABLE_OWNER TABLE_NAME INDEX_NAME STATUS
------------------------------ ------------------------------ ------------------------------ --------
HR TEST1 PK_TEST1 VALID
HR TEST1 IDX_TEST1_NAME VALID
HR TEST2 PK_TEST2 VALID
HR TEST2 IDX_TEST2_NAME VALID
SQL> col COLUMN_NAME for a15
SQL> select TABLE_OWNER,TABLE_NAME,INDEX_NAME,COLUMN_NAME,COLUMN_POSITION from dba_ind_columns where table_name like 'TEST%';
TABLE_OWNER TABLE_NAME INDEX_NAME COLUMN_NAME COLUMN_POSITION
------------------------------ ------------------------------ ------------------------------ --------------- ---------------
HR TEST1 PK_TEST1 OBJECT_ID 1
HR TEST1 IDX_TEST1_NAME OBJECT_NAME 1
HR TEST2 PK_TEST2 OBJECT_ID 1
HR TEST2 IDX_TEST2_NAME OBJECT_NAME 1
1.in & exists 測試
select a.* from test1 a where a.object_name in (select b.object_name from test2 b );
select a.* from test1 a where exists (select 1 from test2 b where a.object_name=b.object_name);
1.1 in 的測試
SQL> set autot trace;
SQL> set lines 200
SQL> select a.* from test1 a where a.object_name in (select b.object_name from test2 b );
5017 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 311842540
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 41686 | 6350K| 158 (1)| 00:00:02 |
|* 1 | HASH JOIN RIGHT SEMI | | 41686 | 6350K| 158 (1)| 00:00:02 |
| 2 | INDEX FAST FULL SCAN| IDX_TEST2_NAME | 4999 | 322K| 9 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | TEST1 | 93550 | 8222K| 149 (1)| 00:00:02 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."OBJECT_NAME"="B"."OBJECT_NAME")
Note
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
26 recursive calls
0 db block gets
1009 consistent gets
20 physical reads
0 redo size
205691 bytes sent via SQL*Net to client
4193 bytes received via SQL*Net from client
336 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
5017 rows processed
1.2 exists的測試
SQL> select a.* from test1 a where exists (select 1 from test2 b where a.object_name=b.object_name);
5017 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 311842540
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 41686 | 6350K| 158 (1)| 00:00:02 |
|* 1 | HASH JOIN RIGHT SEMI | | 41686 | 6350K| 158 (1)| 00:00:02 |
| 2 | INDEX FAST FULL SCAN| IDX_TEST2_NAME | 4999 | 322K| 9 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | TEST1 | 93550 | 8222K| 149 (1)| 00:00:02 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
--------------------------------------------------
1 - access("A"."OBJECT_NAME"="B"."OBJECT_NAME")
Note
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
7 recursive calls
0 db block gets
989 consistent gets
0 physical reads
0 redo size
205691 bytes sent via SQL*Net to client
4193 bytes received via SQL*Net from client
336 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
5017 rows processed
1.3總結
透過比較執行計劃我們可以發現in和exists的執行計劃基本是一致的,都以小表(test2)作為外部迴圈表,然後大表(test1)作為內部迴圈表,所以執行效率是一樣的。
注意: 這裡我們把test1和test2的位置交換一下oracle依然能做出正確的判斷,以小表作為外部迴圈表:
select a.* from test2 a where a.object_name in (select b.object_name from test1 b );
select a.* from test2 a where exists (select 1 from test1 b where b.object_name=a.object_name);
SQL> select a.* from test2 a where a.object_name in (select b.object_name from test1 b );
4999 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2045358773
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4999 | 761K| 135 (1)| 00:00:02 |
|* 1 | HASH JOIN SEMI | | 4999 | 761K| 135 (1)| 00:00:02 |
| 2 | TABLE ACCESS FULL | TEST2 | 4999 | 439K| 9 (0)| 00:00:01 |
| 3 | INDEX FAST FULL SCAN| IDX_TEST1_NAME | 93550 | 6029K| 125 (0)| 00:00:02 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."OBJECT_NAME"="B"."OBJECT_NAME")
Note
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
7 recursive calls
0 db block gets
836 consistent gets
369 physical reads
0 redo size
190890 bytes sent via SQL*Net to client
4182 bytes received via SQL*Net from client
335 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
4999 rows processed
SQL>
SQL> select a.* from test2 a where exists (select 1 from test1 b where b.object_name=a.object_name);
4999 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2045358773
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4999 | 761K| 135 (1)| 00:00:02 |
|* 1 | HASH JOIN SEMI | | 4999 | 761K| 135 (1)| 00:00:02 |
| 2 | TABLE ACCESS FULL | TEST2 | 4999 | 439K| 9 (0)| 00:00:01 |
| 3 | INDEX FAST FULL SCAN| IDX_TEST1_NAME | 93550 | 6029K| 125 (0)| 00:00:02 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("B"."OBJECT_NAME"="A"."OBJECT_NAME")
Note
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
7 recursive calls
0 db block gets
836 consistent gets
0 physical reads
0 redo size
190890 bytes sent via SQL*Net to client
4182 bytes received via SQL*Net from client
335 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
4999 rows processed
2. not in & not exists 測試
2.1 not in 測試
select a.* from test1 a where a.object_name not in (select b.object_name from test2 b );
select a.* from test2 a where a.object_name not in (select b.object_name from test1 b );
SQL> select a.* from test1 a where a.object_name not in (select b.object_name from test2 b );
82025 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3618458784
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 93550 | 13M| 158 (1)| 00:00:02 |
|* 1 | HASH JOIN RIGHT ANTI NA| | 93550 | 13M| 158 (1)| 00:00:02 |
| 2 | TABLE ACCESS FULL | TEST2 | 4999 | 322K| 9 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | TEST1 | 93550 | 8222K| 149 (1)| 00:00:02 |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."OBJECT_NAME"="B"."OBJECT_NAME")
Note
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
7 recursive calls
0 db block gets
6089 consistent gets
0 physical reads
0 redo size
4297213 bytes sent via SQL*Net to client
60667 bytes received via SQL*Net from client
5470 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
82025 rows processed
SQL>
SQL> select a.* from test2 a where a.object_name not in (select b.object_name from test1 b );
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 1072756877
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 50 | 2600 | 158 (1)| 00:00:02 |
|* 1 | HASH JOIN ANTI NA | | 50 | 2600 | 158 (1)| 00:00:02 |
| 2 | TABLE ACCESS FULL| TEST2 | 4999 | 131K| 9 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| TEST1 | 87042 | 2125K| 149 (1)| 00:00:02 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."OBJECT_NAME"="B"."OBJECT_NAME")
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
551 consistent gets
17 physical reads
0 redo size
483 bytes sent via SQL*Net to client
508 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
0 rows processed
2.2 not exists 測試
select a.* from test1 a where not exists (select 1 from test2 b where a.object_name=b.object_name);
select a.* from test2 a where not exists (select 1 from test1 b where a.object_name=b.object_name);
SQL> select a.* from test1 a where not exists (select 1 from test2 b where a.object_name=b.object_name);
82025 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3712868752
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 93550 | 13M| 158 (1)| 00:00:02 |
|* 1 | HASH JOIN RIGHT ANTI | | 93550 | 13M| 158 (1)| 00:00:02 |
| 2 | INDEX FAST FULL SCAN| IDX_TEST2_NAME | 4999 | 322K| 9 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | TEST1 | 93550 | 8222K| 149 (1)| 00:00:02 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."OBJECT_NAME"="B"."OBJECT_NAME")
Note
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
7 recursive calls
0 db block gets
6088 consistent gets
0 physical reads
0 redo size
4297213 bytes sent via SQL*Net to client
60667 bytes received via SQL*Net from client
5470 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
82025 rows processed
SQL> select a.* from test2 a where not exists (select 1 from test1 b where a.object_name=b.object_name);
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 319202148
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4999 | 761K| 135 (1)| 00:00:02 |
|* 1 | HASH JOIN ANTI | | 4999 | 761K| 135 (1)| 00:00:02 |
| 2 | TABLE ACCESS FULL | TEST2 | 4999 | 439K| 9 (0)| 00:00:01 |
| 3 | INDEX FAST FULL SCAN| IDX_TEST1_NAME | 93550 | 6029K| 125 (0)| 00:00:02 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."OBJECT_NAME"="B"."OBJECT_NAME")
Note
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
7 recursive calls
0 db block gets
572 consistent gets
0 physical reads
0 redo size
483 bytes sent via SQL*Net to client
508 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
0 rows processed
2.3 總結
透過觀察這個兩個執行計劃我們可以清楚的看到not in 用的是filter來連線兩張表,而not exists 用的hash join 來連線兩張表。對於只有兩個子節點的filter來說,我們可以把這個filter看成是一個nested loop。
透過執行計劃我們可以清楚的看到,oracle把test2表的結果集作為構造輸入(build input),test1的結果集作為探測資料(probe input),然後把這兩個子集做hash join。
(對於hash join 一般把小表作為build input,這些資料將存放在PGA記憶體中,如果記憶體不夠將放在temporary tablespace)
exists : 強調的是是否返回結果集,不要求知道返回什麼, 比如: select name from student where sex = 'm' and mark exists(select 1 from grade where ...) ,只要exists引導的子句有結果集返回,那麼exists這個條件就算成立了,大家注意返回的欄位始終為1,如果改成“select 2 from grade where ...”,那麼返回的欄位就是2,這個數字沒有意義。所以exists子句不在乎返回什麼,而是在乎是不是有結果集返回。
而 exists 與 in 最大的區別在於 in引導的子句只能返回一個欄位,比如:
select name from student where sex = 'm' and mark in (select 1,2,3 from grade where ...),in子句返回了三個欄位,這是不正確的,exists子句是允許的,但in只允許有一個欄位返回,在1,2,3中隨便去了兩個欄位即可。
而not exists 和not in 分別是exists 和 in 的 對立面。
exists (sql 返回結果集為真)
not exists (sql 不返回結果集為真)
表A
ID NAME
1 A1
2 A2
3 A3
表B
ID AID NAME
1 1 B1
2 2 B2
3 2 B3
表A和表B是1對多的關係 A.ID => B.AID
實驗如下
SQL> create table a (id number,name varchar2(10));
Table created.
SQL> create table b (id number,aid number,name varchar2(10));
Table created.
SQL> insert into a values(1,'A1');
1 row created.
SQL> insert into a values(2,'A2');
1 row created.
SQL> insert into a values(3,'A3');
1 row created.
SQL> insert into b values(1,1,'B1');
1 row created.
SQL> insert into b values(2,2,'B2');
1 row created.
SQL> insert into b values(3,2,'B3');
1 row created.
SQL> commit;
SQL> select id,name from a;
ID NAME
---------- ----------
1 A1
2 A2
3 A3
SQL> select * from b;
ID AID NAME
---------- ---------- ----------
1 1 B1
2 2 B2
3 2 B3
SQL> select * from b,a where a.id=b.aid;
ID AID NAME ID NAME
---------- ---------- ---------- ---------- ----------
1 1 B1 1 A1
3 2 B3 2 A2
2 2 B2 2 A2
SQL> select id,name from a where exists (select 1 from b where a.id=b.aid);
ID NAME
---------- ----------
1 A1
2 A2
SQL> select id,name from a where exists (select * from b where a.id=b.aid);
ID NAME
---------- ----------
1 A1
2 A2
上述結果分析:原因可以按照如下分析
SELECT ID,NAME FROM A WHERE EXISTS (SELECT * FROM B WHERE B.AID=1)
--->SELECT * FROM B WHERE B.AID=1有值返回真所以有資料
SELECT ID,NAME FROM A WHERE EXISTS (SELECT * FROM B WHERE B.AID=2)
--->SELECT * FROM B WHERE B.AID=2有值返回真所以有資料
SELECT ID,NAME FROM A WHERE EXISTS (SELECT * FROM B WHERE B.AID=3)
--->SELECT * FROM B WHERE B.AID=3無值返回真所以沒有資料
2.NOT EXISTS 就是反過來
SQL> select id,name from a where not exists (select * from b where a.id=b.aid);
ID NAME
---------- ---------- 3 A3
3.in /not in
EXISTS = IN,意思相同不過語法上有點點區別,好像使用IN效率要差點,應該是不會執行索引的原因
SQL> SELECT ID,NAME FROM A WHERE ID IN (SELECT AID FROM B);
ID NAME
---------- --------------------
1 A1
2 A2
NOT EXISTS = NOT IN ,意思相同不過語法上有點點區別
SQL> SELECT ID,NAME FROM A WHERE ID NOT IN(SELECT AID FROM B);
ID NAME
---------- --------------------
3 A3
二、in與exist , not in與not exist 的區別
in和exists in 是把外表和內表作hash 連線,而exists是對外表作loop迴圈,每次loop迴圈再對內表進行查詢。一直以來認為exists比in效率高的說法是不準確的。
如果查詢的兩個表大小相當,那麼用in和exists差別不大。
如果兩個表中一個較小,一個是大表,則子查詢表大的用exists,子查詢表小的用in:
例如:表A(小表),表B(大表)
1:select * from A where cc in (select cc from B)
效率低,用到了A表上cc列的索引;
select * from A where exists(select cc from B where cc=A.cc)
效率高,用到了B表上cc列的索引。
相反的2:
select * from B where cc in (select cc from A)
效率高,用到了B表上cc列的索引;
select * from B where exists(select cc from A where cc=B.cc)
效率低,用到了A表上cc列的索引。
not in 和not exists
如果查詢語句使用了not in 那麼內外表都進行全表掃描,沒有用到索引;而not extsts 的子查詢依然能用到表上的索引。所以無論那個表大,用not exists都比not in要快。
not in 邏輯上不完全等同於not exists,如果你誤用了not in,小心你的程式存在致命的BUG:
請看下面的例子:
create table t1 (c1 number,c2 number);
create table t2 (c1 number,c2 number);
insert into t1 values (1,2);
insert into t1 values (1,3);
insert into t2 values (1,2);
insert into t2 values (1,null);
SQL> select * from t1;
C1 C2
---------- ----------
1 2
1 3
SQL>
SQL> select * from t2;
C1 C2
---------- ----------
1 2
1
--查詢如下:
SQL> select * from t1 where c2 not in (select c2 from t2);
no rows selected
SQL> select * from t1 where not exists (select 1 from t2 where t1.c2=t2.c2);
C1 C2
---------- ----------
1 3
正如所看到的,not in 出現了不期望的結果集,存在邏輯錯誤。如果看一下上述兩個select語句的執行計劃,也會不同。後者使用了hash_aj。
因此,請儘量不要使用not in(它會呼叫子查詢),而儘量使用not exists(它會呼叫關聯子查詢)。如果子查詢中返回的任意一條記錄含有空值,則查詢將不返回任何記錄,正如上面例子所示。 除非子查詢欄位有非空限制,這時可以使用not in ,並且也可以透過提示讓它使用hasg_aj或merge_aj連線
三、實驗測試in,exists,not in, not exists
建立測試表test1/test2:
SQL> create table test1 as select object_id,object_name,object_type from dba_objects;
Table created.
SQL> select count(1) from test1;
COUNT(1)
----------
87042
SQL> create table test2 as select object_id,object_name,object_type from dba_objects where rownum <5000;
Table created.
SQL> select count(1) from test2;
COUNT(1)
----------
4999
SQL> alter table test1 add constraint pk_test1 primary key(object_id);
Table altered.
SQL> alter table test2 add constraint pk_test2 primary key(object_id);
Table altered.
SQL> create index idx_test1_name on test1(object_name);
Index created.
SQL> create index idx_test2_name on test2(object_name);
Index created.
SQL>
SQL> select TABLE_OWNER,table_name,index_name,status from dba_indexes where table_name like 'TEST%';
TABLE_OWNER TABLE_NAME INDEX_NAME STATUS
------------------------------ ------------------------------ ------------------------------ --------
HR TEST1 PK_TEST1 VALID
HR TEST1 IDX_TEST1_NAME VALID
HR TEST2 PK_TEST2 VALID
HR TEST2 IDX_TEST2_NAME VALID
SQL> col COLUMN_NAME for a15
SQL> select TABLE_OWNER,TABLE_NAME,INDEX_NAME,COLUMN_NAME,COLUMN_POSITION from dba_ind_columns where table_name like 'TEST%';
TABLE_OWNER TABLE_NAME INDEX_NAME COLUMN_NAME COLUMN_POSITION
------------------------------ ------------------------------ ------------------------------ --------------- ---------------
HR TEST1 PK_TEST1 OBJECT_ID 1
HR TEST1 IDX_TEST1_NAME OBJECT_NAME 1
HR TEST2 PK_TEST2 OBJECT_ID 1
HR TEST2 IDX_TEST2_NAME OBJECT_NAME 1
1.in & exists 測試
select a.* from test1 a where a.object_name in (select b.object_name from test2 b );
select a.* from test1 a where exists (select 1 from test2 b where a.object_name=b.object_name);
1.1 in 的測試
SQL> set autot trace;
SQL> set lines 200
SQL> select a.* from test1 a where a.object_name in (select b.object_name from test2 b );
5017 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 311842540
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 41686 | 6350K| 158 (1)| 00:00:02 |
|* 1 | HASH JOIN RIGHT SEMI | | 41686 | 6350K| 158 (1)| 00:00:02 |
| 2 | INDEX FAST FULL SCAN| IDX_TEST2_NAME | 4999 | 322K| 9 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | TEST1 | 93550 | 8222K| 149 (1)| 00:00:02 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."OBJECT_NAME"="B"."OBJECT_NAME")
Note
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
26 recursive calls
0 db block gets
1009 consistent gets
20 physical reads
0 redo size
205691 bytes sent via SQL*Net to client
4193 bytes received via SQL*Net from client
336 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
5017 rows processed
1.2 exists的測試
SQL> select a.* from test1 a where exists (select 1 from test2 b where a.object_name=b.object_name);
5017 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 311842540
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 41686 | 6350K| 158 (1)| 00:00:02 |
|* 1 | HASH JOIN RIGHT SEMI | | 41686 | 6350K| 158 (1)| 00:00:02 |
| 2 | INDEX FAST FULL SCAN| IDX_TEST2_NAME | 4999 | 322K| 9 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | TEST1 | 93550 | 8222K| 149 (1)| 00:00:02 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
--------------------------------------------------
1 - access("A"."OBJECT_NAME"="B"."OBJECT_NAME")
Note
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
7 recursive calls
0 db block gets
989 consistent gets
0 physical reads
0 redo size
205691 bytes sent via SQL*Net to client
4193 bytes received via SQL*Net from client
336 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
5017 rows processed
1.3總結
透過比較執行計劃我們可以發現in和exists的執行計劃基本是一致的,都以小表(test2)作為外部迴圈表,然後大表(test1)作為內部迴圈表,所以執行效率是一樣的。
注意: 這裡我們把test1和test2的位置交換一下oracle依然能做出正確的判斷,以小表作為外部迴圈表:
select a.* from test2 a where a.object_name in (select b.object_name from test1 b );
select a.* from test2 a where exists (select 1 from test1 b where b.object_name=a.object_name);
SQL> select a.* from test2 a where a.object_name in (select b.object_name from test1 b );
4999 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2045358773
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4999 | 761K| 135 (1)| 00:00:02 |
|* 1 | HASH JOIN SEMI | | 4999 | 761K| 135 (1)| 00:00:02 |
| 2 | TABLE ACCESS FULL | TEST2 | 4999 | 439K| 9 (0)| 00:00:01 |
| 3 | INDEX FAST FULL SCAN| IDX_TEST1_NAME | 93550 | 6029K| 125 (0)| 00:00:02 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."OBJECT_NAME"="B"."OBJECT_NAME")
Note
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
7 recursive calls
0 db block gets
836 consistent gets
369 physical reads
0 redo size
190890 bytes sent via SQL*Net to client
4182 bytes received via SQL*Net from client
335 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
4999 rows processed
SQL>
SQL> select a.* from test2 a where exists (select 1 from test1 b where b.object_name=a.object_name);
4999 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2045358773
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4999 | 761K| 135 (1)| 00:00:02 |
|* 1 | HASH JOIN SEMI | | 4999 | 761K| 135 (1)| 00:00:02 |
| 2 | TABLE ACCESS FULL | TEST2 | 4999 | 439K| 9 (0)| 00:00:01 |
| 3 | INDEX FAST FULL SCAN| IDX_TEST1_NAME | 93550 | 6029K| 125 (0)| 00:00:02 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("B"."OBJECT_NAME"="A"."OBJECT_NAME")
Note
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
7 recursive calls
0 db block gets
836 consistent gets
0 physical reads
0 redo size
190890 bytes sent via SQL*Net to client
4182 bytes received via SQL*Net from client
335 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
4999 rows processed
2. not in & not exists 測試
2.1 not in 測試
select a.* from test1 a where a.object_name not in (select b.object_name from test2 b );
select a.* from test2 a where a.object_name not in (select b.object_name from test1 b );
SQL> select a.* from test1 a where a.object_name not in (select b.object_name from test2 b );
82025 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3618458784
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 93550 | 13M| 158 (1)| 00:00:02 |
|* 1 | HASH JOIN RIGHT ANTI NA| | 93550 | 13M| 158 (1)| 00:00:02 |
| 2 | TABLE ACCESS FULL | TEST2 | 4999 | 322K| 9 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | TEST1 | 93550 | 8222K| 149 (1)| 00:00:02 |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."OBJECT_NAME"="B"."OBJECT_NAME")
Note
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
7 recursive calls
0 db block gets
6089 consistent gets
0 physical reads
0 redo size
4297213 bytes sent via SQL*Net to client
60667 bytes received via SQL*Net from client
5470 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
82025 rows processed
SQL>
SQL> select a.* from test2 a where a.object_name not in (select b.object_name from test1 b );
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 1072756877
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 50 | 2600 | 158 (1)| 00:00:02 |
|* 1 | HASH JOIN ANTI NA | | 50 | 2600 | 158 (1)| 00:00:02 |
| 2 | TABLE ACCESS FULL| TEST2 | 4999 | 131K| 9 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| TEST1 | 87042 | 2125K| 149 (1)| 00:00:02 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."OBJECT_NAME"="B"."OBJECT_NAME")
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
551 consistent gets
17 physical reads
0 redo size
483 bytes sent via SQL*Net to client
508 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
0 rows processed
2.2 not exists 測試
select a.* from test1 a where not exists (select 1 from test2 b where a.object_name=b.object_name);
select a.* from test2 a where not exists (select 1 from test1 b where a.object_name=b.object_name);
SQL> select a.* from test1 a where not exists (select 1 from test2 b where a.object_name=b.object_name);
82025 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3712868752
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 93550 | 13M| 158 (1)| 00:00:02 |
|* 1 | HASH JOIN RIGHT ANTI | | 93550 | 13M| 158 (1)| 00:00:02 |
| 2 | INDEX FAST FULL SCAN| IDX_TEST2_NAME | 4999 | 322K| 9 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | TEST1 | 93550 | 8222K| 149 (1)| 00:00:02 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."OBJECT_NAME"="B"."OBJECT_NAME")
Note
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
7 recursive calls
0 db block gets
6088 consistent gets
0 physical reads
0 redo size
4297213 bytes sent via SQL*Net to client
60667 bytes received via SQL*Net from client
5470 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
82025 rows processed
SQL> select a.* from test2 a where not exists (select 1 from test1 b where a.object_name=b.object_name);
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 319202148
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4999 | 761K| 135 (1)| 00:00:02 |
|* 1 | HASH JOIN ANTI | | 4999 | 761K| 135 (1)| 00:00:02 |
| 2 | TABLE ACCESS FULL | TEST2 | 4999 | 439K| 9 (0)| 00:00:01 |
| 3 | INDEX FAST FULL SCAN| IDX_TEST1_NAME | 93550 | 6029K| 125 (0)| 00:00:02 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."OBJECT_NAME"="B"."OBJECT_NAME")
Note
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
7 recursive calls
0 db block gets
572 consistent gets
0 physical reads
0 redo size
483 bytes sent via SQL*Net to client
508 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
0 rows processed
2.3 總結
透過觀察這個兩個執行計劃我們可以清楚的看到not in 用的是filter來連線兩張表,而not exists 用的hash join 來連線兩張表。對於只有兩個子節點的filter來說,我們可以把這個filter看成是一個nested loop。
透過執行計劃我們可以清楚的看到,oracle把test2表的結果集作為構造輸入(build input),test1的結果集作為探測資料(probe input),然後把這兩個子集做hash join。
(對於hash join 一般把小表作為build input,這些資料將存放在PGA記憶體中,如果記憶體不夠將放在temporary tablespace)
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/31397003/viewspace-2139624/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- EXISTS、IN、NOT EXISTS、NOT IN用法區別
- SQL中IN和EXISTS用法的區別SQL
- SQL中IN,NOT IN,EXISTS,NOT EXISTS的用法和差別SQL
- in 和 exists區別
- 詳解not in與not exists的區別與用法
- oracle中的exists和not exists和in用法詳解Oracle
- oracle中的exists 和not exists 用法詳解Oracle
- [精選] SQL中的IN與NOT IN、EXISTS與NOT EXISTS的區別及效能分析SQL
- oracle中in和exists的區別Oracle
- EXISTS、IN、NOT EXISTS、NOT IN的區別(ZT)
- Oralce 使用SQL中的exists 和not exists 用法詳解SQL
- SQL語句中exists和in的區別SQL
- (轉)ORACLE 中IN和EXISTS的區別Oracle
- in和exists的一些區別
- 對線面試官:SQL中的IN與NOT IN、EXISTS與NOT EXISTS的區別及效能分析面試SQL
- NOT IN 與NOT EXISTS的區別何在?
- oracle in與exists 的區別Oracle
- In和exists使用及效能分析(三):in和exists的效能分析
- In和exists使用及效能分析(二):exists的使用
- [Oracle] exists 和 not existsOracle
- oracle中關於in和exists,not in 和 not existsOracle
- in/exists和not in/not exists執行效率
- 大神級回答exists與in的區別
- in,exists和not exists ,not in與null的一些關係記載Null
- PTSQLServer中exists和except用法介紹wkaSQLServer
- In和exists使用及效能分析(一):in的使用
- 子查詢中的IN與EXISTS的區別(轉)
- sql中in和exists的原理及使用場景。SQL
- Exists和IN的原理解析
- exists和連線方式
- 在關聯子查詢中in與exists的區別
- oracle sql tuning_in與exists的區別_轉摘OracleSQL
- NOT IN ,NOT EXISTS 區別 11G改變
- fs.exists 與 fs.access的區別是什麼
- mysql 關於exists 和in分析MySql
- [Oracle] minus 和 not exists比較Oracle
- ORACLE 中IN和EXISTS比較Oracle
- 對IN & EXISTS NOT IN & NOT EXISTS的優化優化