對IN & EXISTS NOT IN & NOT EXISTS的優化

star_guan2008發表於2008-06-10

在平時工作中,IN & EXISTS NOT IN & NOT EXISTS是使用頻率比較高的SQL語句,
所以對它們的優化工作是很有必要的

測試環境:Oracle 9.2.0.1 for Windows2000


1、IN 和 EXISTS

   IN和EXISTS的處理流程是不一樣的:

 IN的執行流程
  select * from T1 where x in ( select y from T2 )
  可以理解為:
  select * from t1, ( select distinct y from t2 ) t2 where t1.x = t2.y;
 
 EXISTS的執行流程       
  select * from t1 where exists ( select null from t2 where y = x )
  可以理解為:
     for x in ( select * from t1 )
     loop
        if ( exists ( select null from t2 where y = x.x )
        then
           OUTPUT THE RECORD
        end if
     end loop

   建立測試用例表big(4000 row)和small(400 row)

 create table big as select * from dba_objects where rownum <= 10000;
 insert into big select * from big;
 insert into big select * from big;
 commit;
 create table small as select * from dba_objects where rownum <= 400;
  
當內層表為small,外層表為big時,兩種語法的查詢如下:

  SQL> select count(1) from big a where a.object_id in      
       (select b.object_id from sall b);                    
                                                            
    COUNT(1)                                                
  ----------                                                
        1600                                                
                                                            
  已用時間:  00: 00: 00.56                                  
                                                            
  Execution Plan                                            
  -----------------------------------------------------   
     0      SELECT STATEMENT ptimizer=CHOOSE             
     1    0   SORT (AGGREGATE)                              
     2    1     MERGE JOIN                                  
     3    2       SORT (JOIN)                               
     4    3         TABLE ACCESS (FULL) OF 'BIG'            
     5    2       SORT (JOIN)                               
     6    5         VIEW OF 'VW_NSO_1'                      
     7    6           SORT (UNIQUE)                         
     8    7             TABLE ACCESS (FULL) OF 'SMALL'      
                                                            
  Statistics                                                
  -----------------------------------------------------     
            0  recursive calls                              
            0  db block gets                                
          543  consistent gets                            
            0  physical reads                             
  
  SQL> select count(1) from big a where exists               
        (select 1 from small b where a.object_id=b.object_id);
                                                             
    COUNT(1)                                                 
  ----------                                                 
         1600                                                
                                                             
   已用時間:  00: 00: 03.10                                  
                                                             
  Execution Plan                                             
  -----------------------------------------------------      
      0      SELECT STATEMENT ptimizer=CHOOSE               
      1    0   SORT (AGGREGATE)                              
      2    1     FILTER                                      
      3    2       TABLE ACCESS (FULL) OF 'BIG'              
      4    2       TABLE ACCESS (FULL) OF 'SMALL'            
                                                              
  Statistics                                                 
  -----------------------------------------------------      
            0  recursive calls                               
            0  db block gets                                 
       312157  consistent gets                               
            0  physical reads                                 

當內層表為big,外層表為small時,兩種語法的查詢如下:

  SQL> select count(1) from small a where a.object_id in      
       (select b.object_id from big b);                       
                                                              
    COUNT(1)                                                         
  ----------                                                         
         400                                                         
                                                                     
  已用時間:  00: 00: 00.56                                           
                                                                     
  Execution Plan                                                     
  -----------------------------------------------------        
     0      SELECT STATEMENT ptimizer=CHOOSE                        
     1    0   SORT (AGGREGATE)                                       
     2    1     MERGE JOIN                                           
     3    2       SORT (JOIN)                                        
     4    3         TABLE ACCESS (FULL) OF 'SMALL'                   
     5    2       SORT (JOIN)                                        
     6    5         VIEW OF 'VW_NSO_1'                      
     7    6           SORT (UNIQUE)                         
     8    7             TABLE ACCESS (FULL) OF 'BIG'        
                                                            
  Statistics                                                         
  -----------------------------------------------------         
            0  recursive calls                                       
            0  db block gets                                         
          543  consistent gets                                       
            0  physical reads                                        
  
  SQL> select count(1) from small a where exists            
       (select null from big b where a.bject_id=b.object_id);
                                                            
    COUNT(1)                                                
  ----------                                                
         400                                                
                                                            
  已用時間:  00: 00: 00.25                                  
                                                            
  Execution Plan                                            
  -----------------------------------------------------     
     0      SELECT STATEMENT ptimizer=CHOOSE               
     1    0   SORT (AGGREGATE)                              
     2    1     FILTER                                      
     3    2       TABLE ACCESS (FULL) OF 'SMALL'            
     4    2       TABLE ACCESS (FULL) OF 'BIG'              
                                                           
  Statistics                                                
  -----------------------------------------------------     
            0  recursive calls                              
            0  db block gets                                
         2562  consistent gets                              
            0  physical reads                               
                               

在對錶big、small進行分析後,發現CBO下兩種語法的執行計劃是一樣的,都使用hash連線或者hash半連線

  SQL> analyze table big compute statistics;
  SQL> analyze table small compute statistics;
  
  SQL> select count(1) from big a where a.object_id in    
         (select b.object_id from small b);                  
                                                          
    COUNT(1)                                              
  ----------                                              
        1600                                              
                                                          
  已用時間:  00: 00: 00.09                                
                                                          
  Execution Plan                                          
  ------------------------------------------------------- 
     0      SELECT STATEMENT ptimizer=CHOOSE (Cost=58    
                                       Card=1 Bytes=8)    
     1    0   SORT (AGGREGATE)                            
     2    1     HASH JOIN (Cost=58 Card=1600 Bytes=12800) 
     3    2       SORT (UNIQUE)                           
     4    3         TABLE ACCESS (FULL) OF 'SMALL' (Cost=2 Card=400 Bytes=1600)
     5    2       TABLE ACCESS (FULL) OF 'BIG' (Cost=53 Card=40000 Bytes=160000)
                                                          
  Statistics                                              
  ------------------------------------------------------- 
            0  recursive calls                            
            0  db block gets                              
          543  consistent gets                            
            0  physical reads                             
  
  SQL> select count(1) from big a where exists              
       (select 1 from small b where a.object_id=b.object_id);
                                                            
    COUNT(1)                                                
  ----------                                                
        1600                                                
                                                            
  已用時間:  00: 00: 00.09                                  
                                                            
  Execution Plan                                            
  ----------------------------------------------------------
     0      SELECT STATEMENT ptimizer=CHOOSE (Cost=58 Card=1 Bytes=8)      
     1    0   SORT (AGGREGATE)                              
     2    1     HASH JOIN (Cost=58 Card=1600 Bytes=12800)   
     3    2       SORT (UNIQUE)                             
     4    3         TABLE ACCESS (FULL) OF 'SMALL' (Cost=2 Card=400 Bytes=1600)  
     5    2       TABLE ACCESS (FULL) OF 'BIG' (Cost=53 Card=40000 Bytes=160000)    
                                                            
  Statistics                                                
  --------------------------------------------------------  
            0  recursive calls                              
            0  db block gets                                
          543  consistent gets                              
            0  physical reads                               
  
  
  SQL> select count(1) from small a where a.object_id in  
       (select b.object_id from big b);                   
                                                          
    COUNT(1)                                              
  ----------                                              
         400                                              
                                                          
  已用時間:  00: 00: 00.09                                
                                                          
  Execution Plan                                          
  ------------------------------------------------------  
     0      SELECT STATEMENT ptimizer=CHOOSE (Cost=56 Card=1  Bytes=8)  
     1    0   SORT (AGGREGATE)                            
     2    1     HASH JOIN (SEMI) (Cost=56 Card=400 Bytes=3200)     
     3    2       TABLE ACCESS (FULL) OF 'SMALL' (Cost=2 Card=400 Bytes=1600) 
     4    2       TABLE ACCESS (FULL) OF 'BIG' (Cost=53 Card=40000 Bytes=160000) 
                                                          
  Statistics                                              
  ------------------------------------------------------  
            0  recursive calls                            
            0  db block gets                              
          543  consistent gets                            
            0  physical reads                             
  
  SQL> select count(1) from small a where exists           
       (select 1 from big b where a.object_id=b.object_id);
                                                           
    COUNT(1)                                               
  ----------                                               
         400                                               
                                                           
  已用時間:  00: 00: 00.09                                 
                                                           
  Execution Plan                                           
  -------------------------------------------------------  
     0      SELECT STATEMENT ptimizer=CHOOSE (Cost=56 Card=1 Bytes=8)    
     1    0   SORT (AGGREGATE)                             
     2    1     HASH JOIN (SEMI) (Cost=56 Card=400 Bytes=3200)    
     3    2       TABLE ACCESS (FULL) OF 'SMALL' (Cost=2 Card=400 Bytes=1600)  
     4    2       TABLE ACCESS (FULL) OF 'BIG' (Cost=53 Card=40000 Bytes=160000)  
                                                           
  Statistics                                               
  -------------------------------------------------------  
            0  recursive calls                             
            0  db block gets                               
          543  consistent gets                             
            0  physical reads                              

刪除表分析,使用提示/*+ use_hash(a,b) */ 或者 /*+ hash_sj */,
兩種語法都可以達到CBO的執行計劃

SQL> analyze table big delete statistics;
SQL> analyze table small delete statistics;

SQL> select /*+ use_hash(a,b) */count(1) from big a where a.object_id in
     (select b.object_id from small b);

SQL> select /*+ use_hash(a,b) */count(1) from big a where exists
     (select 1 from small b where a.object_id=b.object_id);

SQL> select count(1) from small a where a.object_id in
     (select /*+ hash_sj */ b.object_id from big b);

SQL> select count(1) from small a where exists
     (select /*+ hash_sj */ 1 from big b where a.object_id=b.object_id);

下表列出了各種情況下的速度情況:
┌───────────┬──────────────────────────┬─────────────────────────┬─────────────┐
│           │  outer big,inner small   │  outer small,inner big  │  table rows │
├───────────┼──────────┬───────────────┼──────────┬──────────────┼─────────────┤
│           │  IN SQL  │  EXISTS SQL   │  IN SQL  │  EXISTS SQL  │             │
├───────────┼──────────┼───────────────┼──────────┼──────────────┼─────────────┤
│un-analyze │  0.56s   │  3.10s        │  0.56s   │  0.25s       │  big=40000  │
├───────────┼──────────┼───────────────┼──────────┼──────────────┤     and     │
│ analyzed  │  0.09s   │  0.09s        │  0.09s   │  0.09s       │  small=400  │
├───────────┼──────────┼───────────────┼──────────┼──────────────┼─────────────┤
│un-analyze │  0.72s   │  3.53s        │  0.25s   │  2.97s       │  big=5000   │
├───────────┼──────────┼───────────────┼──────────┼──────────────┤     and     │
│ analyzed  │  0.09s   │  0.09s        │  0.09s   │  0.09s       │  small=4000 │
└───────────┴──────────┴───────────────┴──────────┴──────────────┴─────────────┘

結論:在未對錶進行分析前,若兩個表資料量差異很大,則外層表是大表時使用IN較快,
     外層表是小表時使用EXISTS較快;若兩表資料量接近,則使用IN較快;
     分析表後無論用IN還是EXISTS都變得更快,由於執行計劃一樣,所以速度一樣;
    
     所以:無論使用IN還是EXISTS,只要使用雜湊連線,即提示/*+ use_hash(a,b) */,
          或者在子句中雜湊半連線提示/*+ hash_sj */, 就使其達到最優速度;

附註:半連線的提示有hash_sj、merge_sj、nl_sj
    

***********************************************************************************************************************
***********************************************************************************************************************


2、NOT IN 和 NOT EXISTS

 NOT EXISTS的執行流程
 select .....
   from rollup R
 where not exists ( select 'Found' from title T
                              where R.source_id = T.Title_ID);
 可以理解為:
 for x in ( select * from rollup )
       loop
           if ( not exists ( that query ) ) then
                  OUTPUT
           end if;
        end;
       
注意:NOT EXISTS 與 NOT IN 不能完全互相替換,看具體的需求。如果選擇的列可以為空,則不能被替換。

對於not in 和 not exists的效能區別:
   not in 只有當子查詢中,select 關鍵字後的欄位有not null約束或者有這種暗示時用not in,另外如果主查詢中表大,
   子查詢中的表小但是記錄多,則應當使用not in,並使用anti hash join.
  
   如果主查詢表中記錄少,子查詢表中記錄多,並有索引,可以使用not exists,另外not in最好也可以用/*+ HASH_AJ */
   或者外連線+is null,NOT IN 在基於成本的應用中較好
  
   建立測試用例表big(40000 row)和small(1000 row):
  
   truncate table big;
   truncate table small;
   insert into big   select * from dba_objects where rownum <=20000;
   insert into big   select * from dba_objects where rownum <=20000;
   insert into small select * from dba_objects where rownum <=1000;
   commit;
  
   基本句型:
   <1> not in
   SQL> select count(1) from big a where a.object_id not in (select b.object_id from small b);          
                                                        
   COUNT(1)                                              
 ----------                                              
      38000                                              
                                                         
 已用時間:  00: 00: 12.56                                  
                                                         
 Execution Plan                                          
 -----------------------------------------------------   
    0      SELECT STATEMENT ptimizer=CHOOSE             
    1    0   SORT (AGGREGATE)                            
    2    1     FILTER                                    
    3    2       TABLE ACCESS (FULL) OF 'BIG'            
    4    2       TABLE ACCESS (FULL) OF 'SMALL'          

                                                                                                                                                                                                                                                                                                                                                                                                          
 Statistics                                              
 --------------------------------

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/10742223/viewspace-343292/,如需轉載,請註明出處,否則將追究法律責任。

相關文章