今天來學習下， MySQL中in和exists的效能優劣以及各自的檢索資料的過程，文章主要以下面的語句為例子：

select * from user a where name='liuwenhe' and exists (select stuid from department b where depname='yunwei' and a.stuid =b.stuid )；

select * from user where name='liuwenhe' and stuid in (select stuid from department where depname='yunwei')；

一：關於MySQL exists和in檢索資料的過程：

1.首先說下exists檢索過程,

注意其中A代表（user a where name='liuwenhe'）的結果集，B代表（department b where depname='yunwei' ）的結果集：

exists對外表A用loop逐條查詢，每次查詢都會去驗證exists的條件語句（也就是exists後面括號裡面的語句），當 exists裡的條件語句能夠返回記錄行時(只要能返回結果即可，不管你查詢的是什麼內容！！！)，條件就為真，就會返回當前loop到的A的這條記錄，反之如果exists裡的條件語句不能返回記錄行，條件為假，則當前loop到的A的這條記錄被丟棄，注意：exists的條件就像一個bool條件，當能返回結果集則為true，不能返回結果集則為 false；

對於exists的檢索過程可以用下面的指令碼概括：

for ($i = 0; $i < count(A); $i++) {

　　$a = get_record(A, $i); #從A表逐條獲取記錄

　　if (B.id = $a[id]) #如果子條件成立，即返回true

　　　　$result[] = $a;

}

return $result;

舉例說明：

例如：

select * from user where exists (select stuid from department where depname='yunwei' );

對user表的記錄逐條取出，由於exists條件中的select stuid from department where depname='yunwei' 永遠能返回記錄行，那麼user表的所有記錄都將被加入結果集，所以與 select * from user;是一樣的

例如：

select * from user where exists (select stuid from department where depname='yunwei+' );

not exists與exists相反，也就是當exists條件有結果集返回時，loop到的記錄將被丟棄，否則將loop到的記錄加入結果集

總的來說，如果user表結果集有n條記錄，那麼exists查詢就是將這n條記錄逐條取出，然後判斷n遍exists條件。

2.關於in子查詢的檢索過程：

MySQL先將子查詢結果存入臨時表T（可能在記憶體中，也可能磁碟中），確保子查詢只執行一次，該表不記錄重複資料且採用雜湊索引遍歷資料，然後透過T表的資料去遍歷外表，透過關聯關係得到外表的需要的資料，in查詢相當於多個or條件的疊加，這個比較好理解，比如下面的查詢

select * from user where userId in (1, 2, 3);

等效於

select * from user where userId = 1 or userId = 2 or userId = 3;

not in與in相反，如下

select * from user where userId not in (1, 2, 3);

等效於

select * from user where userId != 1 and userId != 2 and userId != 3;

總的來說，in查詢就是先將子查詢條件的記錄全都查出來，假設結果集為B，共有m條記錄，

然後在將子查詢條件的結果集分解成m個，再進行m次主查詢，值得一提的是，in查詢的子條件返回結果必須只有一個欄位，例如

select * from user where userId in (select id from B);

而不能是

select * from user where userId in (select id, age from B);

而exists就沒有這個限制

二：下面來考慮exists和in的效能

select * from user a where name='liuwenhe' and exists (select stuid from department b where depname='yunwei' and a.stuid =b.stuid )；

select * from user where name='liuwenhe' and stuid in (select stuid from department where depname='yunwei')；

1）根據前面介紹的檢索資料的過程，可以知道，針對上面的兩條sql中exists這種方式，是需要遍歷user表name='liuwenhe'的所有資料行N，並且判斷exists條件N次；並且如果department表的stuid 有索引，exists子查詢可以使用連線關係（也就是stuid）上的索引；所以exists方式適合 user表的結果集小，子查詢的結果集大的情況；一句話總結：子查詢可以使用關聯關係列上的索引，所以效率高，故內表大的適合使用exists;

2）not exists類似於exists的遍歷方式，也是loop外表，然後判斷exists條件

3）in是把外表user結果集和內表department結果集做hash連線（應該說類似hash join，因為MySQL不支援hash join的方式），先查詢內表department結果集，再把內表結果集與外表結果集匹配，對外表可以使用關係索引（也就是stuid列上的索引），而內表結果集多大都需要查詢，也就是說department where depname='yunwei'的結果集D多大，都得遍歷全部的D，不可避免，故外表大的使用in，可加快效率。一句話總結：主查詢可以使用關聯關係列上的索引，所以效率高，故外表結果集合大的適合使用in;

3）如果用not in ，和in一樣，內表結果集需要全部掃描，由於not in ，所以外表的結果集也需要權標掃描，都無法使用關係列上的索引（這種!=的範圍查詢無法使用任何索引），效率低，可考慮使用not exists，也可使用A left join B on A.id=B.id where B.id is null 進行最佳化。

總結：

exists先對外表結果集loop迴圈再對內表結果集進行查詢。一直大家都認為exists比in語句的效率要高，這種說法其實是不準確的。這個是要區分環境的。如果查詢的兩個表大小相當，那麼用in和exists差別不大。如果兩個表中一個較小，一個是大表，則子查詢表結果集大的用exists，如果外表結果集大的則適合使用in，然後就是網路中說的外表的和內表大的說法也不準確，應該是外表結果集和內表結果集合的大小，至於結果集前面已經解釋過了。

mysql 關於exists 和in分析

相關文章