對IN & EXISTS NOT IN & NOT EXISTS的優化

在平時工作中,IN & EXISTS NOT IN & NOT EXISTS是使用頻率比較高的SQL語句,
所以對它們的優化工作是很有必要的

測試環境：Oracle 9.2.0.1 for Windows2000

1、IN 和 EXISTS

IN和EXISTS的處理流程是不一樣的：

IN的執行流程
  select * from T1 where x in ( select y from T2 )
  可以理解為:
  select * from t1, ( select distinct y from t2 ) t2 where t1.x = t2.y;

EXISTS的執行流程
  select * from t1 where exists ( select null from t2 where y = x )
  可以理解為:
     for x in ( select * from t1 )
     loop
        if ( exists ( select null from t2 where y = x.x )
        then
           OUTPUT THE RECORD
        end if
     end loop

建立測試用例表big(4000 row)和small(400 row)

create table big as select * from dba_objects where rownum <= 10000;
insert into big select * from big;
insert into big select * from big;
commit;
create table small as select * from dba_objects where rownum <= 400;

當內層表為small,外層表為big時,兩種語法的查詢如下：

  SQL> select count(1) from big a where a.object_id in
       (select b.object_id from sall b);

    COUNT(1)
  ----------
        1600

  已用時間: 00: 00: 00.56

  Execution Plan
  -----------------------------------------------------
     0      SELECT STATEMENT ptimizer=CHOOSE
     1    0   SORT (AGGREGATE)
     2    1     MERGE JOIN
     3    2       SORT (JOIN)
     4    3         TABLE ACCESS (FULL) OF 'BIG'
     5    2       SORT (JOIN)
     6    5         VIEW OF 'VW_NSO_1'
     7    6           SORT (UNIQUE)
     8    7             TABLE ACCESS (FULL) OF 'SMALL'

  Statistics
  -----------------------------------------------------
            0 recursive calls
            0 db block gets
          543 consistent gets
            0 physical reads

  SQL> select count(1) from big a where exists
        (select 1 from small b where a.object_id=b.object_id);

    COUNT(1)
  ----------
         1600

   已用時間: 00: 00: 03.10

  Execution Plan
  -----------------------------------------------------
      0      SELECT STATEMENT ptimizer=CHOOSE
      1    0   SORT (AGGREGATE)
      2    1     FILTER
      3    2       TABLE ACCESS (FULL) OF 'BIG'
      4    2       TABLE ACCESS (FULL) OF 'SMALL'

  Statistics
  -----------------------------------------------------
            0 recursive calls
            0 db block gets
       312157 consistent gets
            0 physical reads

當內層表為big,外層表為small時,兩種語法的查詢如下：

  SQL> select count(1) from small a where a.object_id in
       (select b.object_id from big b);

    COUNT(1)
  ----------
         400

  已用時間: 00: 00: 00.56

  Execution Plan
  -----------------------------------------------------
     0      SELECT STATEMENT ptimizer=CHOOSE
     1    0   SORT (AGGREGATE)
     2    1     MERGE JOIN
     3    2       SORT (JOIN)
     4    3         TABLE ACCESS (FULL) OF 'SMALL'
     5    2       SORT (JOIN)
     6    5         VIEW OF 'VW_NSO_1'
     7    6           SORT (UNIQUE)
     8    7             TABLE ACCESS (FULL) OF 'BIG'

  Statistics
  -----------------------------------------------------
            0 recursive calls
            0 db block gets
          543 consistent gets
            0 physical reads

  SQL> select count(1) from small a where exists
       (select null from big b where a.bject_id=b.object_id);

    COUNT(1)
  ----------
         400

  已用時間: 00: 00: 00.25

  Execution Plan
  -----------------------------------------------------
     0      SELECT STATEMENT ptimizer=CHOOSE
     1    0   SORT (AGGREGATE)
     2    1     FILTER
     3    2       TABLE ACCESS (FULL) OF 'SMALL'
     4    2       TABLE ACCESS (FULL) OF 'BIG'

  Statistics
  -----------------------------------------------------
            0 recursive calls
            0 db block gets
         2562 consistent gets
            0 physical reads

在對錶big、small進行分析後,發現CBO下兩種語法的執行計劃是一樣的,都使用hash連線或者hash半連線

  SQL> analyze table big compute statistics;
  SQL> analyze table small compute statistics;

  SQL> select count(1) from big a where a.object_id in
         (select b.object_id from small b);

    COUNT(1)
  ----------
        1600

  已用時間: 00: 00: 00.09

  Execution Plan
  -------------------------------------------------------
     0      SELECT STATEMENT ptimizer=CHOOSE (Cost=58
                                       Card=1 Bytes=8)
     1    0   SORT (AGGREGATE)
     2    1     HASH JOIN (Cost=58 Card=1600 Bytes=12800)
     3    2       SORT (UNIQUE)
     4    3         TABLE ACCESS (FULL) OF 'SMALL' (Cost=2 Card=400 Bytes=1600)
     5    2       TABLE ACCESS (FULL) OF 'BIG' (Cost=53 Card=40000 Bytes=160000)

  Statistics
  -------------------------------------------------------
            0 recursive calls
            0 db block gets
          543 consistent gets
            0 physical reads

  SQL> select count(1) from big a where exists
       (select 1 from small b where a.object_id=b.object_id);

    COUNT(1)
  ----------
        1600

  已用時間: 00: 00: 00.09

  Execution Plan
  ----------------------------------------------------------
     0      SELECT STATEMENT ptimizer=CHOOSE (Cost=58 Card=1 Bytes=8)
     1    0   SORT (AGGREGATE)
     2    1     HASH JOIN (Cost=58 Card=1600 Bytes=12800)
     3    2       SORT (UNIQUE)
     4    3         TABLE ACCESS (FULL) OF 'SMALL' (Cost=2 Card=400 Bytes=1600)
     5    2       TABLE ACCESS (FULL) OF 'BIG' (Cost=53 Card=40000 Bytes=160000)

  Statistics
  --------------------------------------------------------
            0 recursive calls
            0 db block gets
          543 consistent gets
            0 physical reads


  SQL> select count(1) from small a where a.object_id in
       (select b.object_id from big b);

    COUNT(1)
  ----------
         400

  已用時間: 00: 00: 00.09

  Execution Plan
  ------------------------------------------------------
     0      SELECT STATEMENT ptimizer=CHOOSE (Cost=56 Card=1 Bytes=8)
     1    0   SORT (AGGREGATE)
     2    1     HASH JOIN (SEMI) (Cost=56 Card=400 Bytes=3200)
     3    2       TABLE ACCESS (FULL) OF 'SMALL' (Cost=2 Card=400 Bytes=1600)
     4    2       TABLE ACCESS (FULL) OF 'BIG' (Cost=53 Card=40000 Bytes=160000)

  Statistics
  ------------------------------------------------------
            0 recursive calls
            0 db block gets
          543 consistent gets
            0 physical reads

  SQL> select count(1) from small a where exists
       (select 1 from big b where a.object_id=b.object_id);

    COUNT(1)
  ----------
         400

  已用時間: 00: 00: 00.09

  Execution Plan
  -------------------------------------------------------
     0      SELECT STATEMENT ptimizer=CHOOSE (Cost=56 Card=1 Bytes=8)
     1    0   SORT (AGGREGATE)
     2    1     HASH JOIN (SEMI) (Cost=56 Card=400 Bytes=3200)
     3    2       TABLE ACCESS (FULL) OF 'SMALL' (Cost=2 Card=400 Bytes=1600)
     4    2       TABLE ACCESS (FULL) OF 'BIG' (Cost=53 Card=40000 Bytes=160000)

  Statistics
  -------------------------------------------------------
            0 recursive calls
            0 db block gets
          543 consistent gets
            0 physical reads

刪除表分析,使用提示/*+ use_hash(a,b) */ 或者 /*+ hash_sj */,
兩種語法都可以達到CBO的執行計劃

SQL> analyze table big delete statistics;
SQL> analyze table small delete statistics;

SQL> select /*+ use_hash(a,b) */count(1) from big a where a.object_id in
(select b.object_id from small b);

SQL> select /*+ use_hash(a,b) */count(1) from big a where exists
(select 1 from small b where a.object_id=b.object_id);

SQL> select count(1) from small a where a.object_id in
(select /*+ hash_sj */ b.object_id from big b);

SQL> select count(1) from small a where exists
(select /*+ hash_sj */ 1 from big b where a.object_id=b.object_id);

下表列出了各種情況下的速度情況：
┌───────────┬──────────────────────────┬─────────────────────────┬─────────────┐
│           │ outer big,inner small   │ outer small,inner big │ table rows │
├───────────┼──────────┬───────────────┼──────────┬──────────────┼─────────────┤
│           │ IN SQL │ EXISTS SQL   │ IN SQL │ EXISTS SQL │             │
├───────────┼──────────┼───────────────┼──────────┼──────────────┼─────────────┤
│un-analyze │ 0.56s   │ 3.10s        │ 0.56s   │ 0.25s       │ big=40000 │
├───────────┼──────────┼───────────────┼──────────┼──────────────┤     and     │
│ analyzed │ 0.09s   │ 0.09s        │ 0.09s   │ 0.09s       │ small=400 │
├───────────┼──────────┼───────────────┼──────────┼──────────────┼─────────────┤
│un-analyze │ 0.72s   │ 3.53s        │ 0.25s   │ 2.97s       │ big=5000   │
├───────────┼──────────┼───────────────┼──────────┼──────────────┤     and     │
│ analyzed │ 0.09s   │ 0.09s        │ 0.09s   │ 0.09s       │ small=4000 │
└───────────┴──────────┴───────────────┴──────────┴──────────────┴─────────────┘

結論：在未對錶進行分析前,若兩個表資料量差異很大,則外層表是大表時使用IN較快,
     外層表是小表時使用EXISTS較快;若兩表資料量接近,則使用IN較快;
     分析表後無論用IN還是EXISTS都變得更快,由於執行計劃一樣,所以速度一樣;

     所以：無論使用IN還是EXISTS,只要使用雜湊連線,即提示/*+ use_hash(a,b) */,
          或者在子句中雜湊半連線提示/*+ hash_sj */, 就使其達到最優速度;

附註：半連線的提示有hash_sj、merge_sj、nl_sj

***********************************************************************************************************************
***********************************************************************************************************************

2、NOT IN 和 NOT EXISTS

NOT EXISTS的執行流程
select .....
   from rollup R
where not exists ( select 'Found' from title T
                              where R.source_id = T.Title_ID);
可以理解為:
for x in ( select * from rollup )
       loop
           if ( not exists ( that query ) ) then
                  OUTPUT
           end if;
        end;

注意:NOT EXISTS 與 NOT IN 不能完全互相替換，看具體的需求。如果選擇的列可以為空，則不能被替換。

對於not in 和 not exists的效能區別：
   not in 只有當子查詢中，select 關鍵字後的欄位有not null約束或者有這種暗示時用not in,另外如果主查詢中表大，
   子查詢中的表小但是記錄多，則應當使用not in,並使用anti hash join.

   如果主查詢表中記錄少，子查詢表中記錄多，並有索引，可以使用not exists，另外not in最好也可以用/*+ HASH_AJ */
   或者外連線+is null，NOT IN 在基於成本的應用中較好

   建立測試用例表big(40000 row)和small(1000 row)：

   truncate table big;
   truncate table small;
   insert into big   select * from dba_objects where rownum <=20000;
   insert into big   select * from dba_objects where rownum <=20000;
   insert into small select * from dba_objects where rownum <=1000;
   commit;

   基本句型：
   <1> not in
   SQL> select count(1) from big a where a.object_id not in (select b.object_id from small b);

   COUNT(1)
----------
      38000

已用時間: 00: 00: 12.56

Execution Plan
-----------------------------------------------------
    0      SELECT STATEMENT ptimizer=CHOOSE
    1    0   SORT (AGGREGATE)
    2    1     FILTER
    3    2       TABLE ACCESS (FULL) OF 'BIG'
    4    2       TABLE ACCESS (FULL) OF 'SMALL'

Statistics
--------------------------------

來自 “ ITPUB部落格 ” ，連結：http://blog.itpub.net/10742223/viewspace-343292/，如需轉載，請註明出處，否則將追究法律責任。

對IN & EXISTS NOT IN & NOT EXISTS的優化

相關文章