PostgreSQL一複合查詢SQL優化例子-(多個exists,範圍檢索,IN檢索,模糊檢索組合)

德哥發表於2018-09-15

原文網址 : https://flycode.co/archives/185365

SQL優化

背景

當一個SQL包含複雜的多個exists , 範圍檢索 , IN檢索 , 模糊檢索 , 組合查詢時，可能由於索引使用不當導致查詢效能較慢。

主要的問題在於，索引使用不當，可能導致幾個問題：

1、索引本身掃描的耗時過多

2、點陣圖掃描引入的recheck過多

3、subplan 引入的 filter過多

一個現實的例子，可以看到耗時集中在recheck和filter上面，每個索引掃描返回的記錄數都很多，但是組合起來是0條符合條件的記錄。

問題就出在索引不正確上，導致了問題。

->  Subquery Scan on "*SELECT* 2"  (cost=273453.65..432483146.70 rows=223 width=349) (actual time=25932.371..25932.371 rows=0 loops=1)  
      Output: ...................................  
      Buffers: shared hit=920071 read=269255  
      I/O Timings: read=1552.767  
      ->  Bitmap Heap Scan on zjxftypt.tab1010201 t_1  (cost=273453.65..432483144.47 rows=223 width=349) (actual time=25932.370..25932.370 rows=0 loops=1)  
            Output: t_1.storeid, t_1.xfjbh, t_1.wtsd, t_1.rs, t_1.digoal123x, t_1.dz, t_1.blfsjd, t_1.qx, t_1.gk, t_1.xfrq, t_1.djsj, t_1.djdw, t_1.xfjclzt, t_1.digoal123, t_1.xfxs  
            -- 點陣圖掃描的條件重新過濾 , 過濾太多了  
	    Recheck Cond: ((t_1.xfrq < (to_date(`2018-06-11`::character varying, `yyyy-mm-dd`::character varying) + 1)) AND (t_1.xfrq >= to_date(`2014-02-12`::character varying, `yyyy-mm-dd`::character varying)) AND (t_1.digoal123 = 1::numeric))  
            Rows Removed by Index Recheck: 1214155  
              
	    -- 過濾exists的JOIN條件值是否滿足 ，過濾太多了  
	    Filter: (((t_1.digoal123x)::text ~~ `%阿里巴巴%`::text) AND ((alternatives: SubPlan 4 or hashed SubPlan 5) OR(alternatives: SubPlan 6 or hashed SubPlan 7)))  
            Rows Removed by Filter: 5215804  
            Buffers: shared hit=920071 read=269255  
            I/O Timings: read=1552.767  
            -- 條件1,2點陣圖掃描  
	    ->  BitmapAnd  (cost=273453.65..273453.65 rows=4909643 width=0) (actual time=2510.718..2510.718 rows=0 loops=1)  
                  Buffers: shared hit=27036 read=16539  
                  I/O Timings: read=101.425  
                  -- 自身條件1 符合條件的記錄太多了  
		  ->  Bitmap Index Scan on index_tab1010201_xfrq  (cost=0.00..126565.99 rows=4943755 width=0) (actual time=1085.429..1085.429 rows=5268071 loops=1)  
                        Index Cond: ((t_1.xfrq < (to_date(`2018-06-11`::character varying, `yyyy-mm-dd`::character varying) + 1)) AND (t_1.xfrq >= to_date(`2014-02-12`::character varying, `yyyy-mm-dd`::character varying)))  
                        Buffers: shared hit=3288 read=16539  
                        I/O Timings: read=101.425  
                  -- 自身條件2 符合條件的記錄太多了  
		  ->  Bitmap Index Scan on index_tab1010201_digoal123  (cost=0.00..146887.30 rows=6599316 width=0) (actual time=1355.825..1355.825 rows=6845646 loops=1)  
                        Index Cond: (t_1.digoal123 = 1::numeric)  
                        Buffers: shared hit=23748  
                  ..............sub plans

優化舉例

1、復現問題，建立測試表

create table test(id int, c1 text, c2 date, c3 text);

SQL如下

select * from test   
where   
(  
  exists (select 1 from pg_class where oid::int = test.id)   
  or   
  exists (select 1 from pg_attribute where attrelid::int=test.id)   
)   
and c1 in (`1`,`2`,`3`)   
and c2 between current_date-1 and current_date    
and c3 ~ `abcdef`;

2、寫入測試護甲1000萬條

insert into test select id, (random()*10)::int::text, current_date, md5(random()::text) from generate_series(1,10000000) t(id);

3、建立索引，使之可以在索引層面過濾掉所有資料

create extension pg_trgm;  
create extension btree_gin;  
create index idx_test_1 on test using gin (c1, c2, c3 gin_trgm_ops);

如果是復現問題，應該是這兩個索引

create index idx1 on test (c1);  
  
create index idx2 on test (c2);

4、檢視執行計劃

postgres=# explain (analyze,verbose,timing,costs,buffers)   
select * from test   
where   
(  
  exists (select 1 from pg_class where oid::int = test.id)   
  or   
  exists (select 1 from pg_attribute where attrelid::int=test.id)   
)   
and c1 in (`1`,`2`,`3`)   
and c2 between current_date-1 and current_date    
and c3 ~ `abcdef`;   
                                                                                         QUERY PLAN                                                                                           
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.test  (cost=156.43..8593.79 rows=228 width=43) (actual time=837.151..837.151 rows=0 loops=1)  
   Output: test.id, test.c1, test.c2, test.c3  
   -- 點陣圖掃描重新RECHECK過濾  
   Recheck Cond: ((test.c1 = ANY (`{1,2,3}`::text[])) AND (test.c2 >= (CURRENT_DATE - 1)) AND (test.c2 <= CURRENT_DATE) AND (test.c3 ~ `abcdef`::text))  
   Rows Removed by Index Recheck: 1  
     
   -- exists子句的條件檢查，過濾   
   Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (alternatives: SubPlan 3 or hashed SubPlan 4))  
   Rows Removed by Filter: 7  
   Heap Blocks: exact=8  
   Buffers: shared hit=11658 read=23  
     
   -- 所有條件壓到GIN複合索引裡面  
   -- GIN多個條件時，會自動內部點陣圖掃描  
   ->  Bitmap Index Scan on idx_test_1  (cost=0.00..156.37 rows=304 width=0) (actual time=834.418..834.418 rows=8 loops=1)  
         Index Cond: ((test.c1 = ANY (`{1,2,3}`::text[])) AND (test.c2 >= (CURRENT_DATE - 1)) AND (test.c2 <= CURRENT_DATE) AND (test.c3 ~ `abcdef`::text))  
         Buffers: shared hit=11582 read=23  
   SubPlan 1  
     ->  Seq Scan on pg_catalog.pg_class  (cost=0.00..15.84 rows=1 width=0) (never executed)  
           Filter: ((pg_class.oid)::integer = test.id)  
   SubPlan 2  
     ->  Seq Scan on pg_catalog.pg_class pg_class_1  (cost=0.00..14.87 rows=387 width=4) (actual time=0.014..0.155 rows=388 loops=1)  
           Output: (pg_class_1.oid)::integer  
           Buffers: shared hit=11  
   SubPlan 3  
     ->  Index Only Scan using pg_attribute_relid_attnum_index on pg_catalog.pg_attribute  (cost=0.28..84.39 rows=8 width=0) (never executed)  
           Filter: ((pg_attribute.attrelid)::integer = test.id)  
           Heap Fetches: 0  
   SubPlan 4  
     ->  Index Only Scan using pg_attribute_relid_attnum_index on pg_catalog.pg_attribute pg_attribute_1  (cost=0.28..77.13 rows=2904 width=4) (actual time=0.029..1.081 rows=2941 loops=1)  
           Output: (pg_attribute_1.attrelid)::integer  
           Heap Fetches: 459  
           Buffers: shared hit=57  
 Planning time: 1.070 ms  
 Execution time: 839.834 ms  
(29 rows)

看起來還不錯，但是仔細深究實際上並沒有優化太多，還可以有更好的優化。

5、深入優化，需要理解GIN複合索引內部的執行機制(點陣圖掃描)。

因為滿足C3條件的記錄本身就很少，所以完全不需要使用GIN內部的點陣圖掃描。

postgres=# select count(*) from test where c3 ~ `abcdef`;  
 count   
-------  
    23  
(1 row)

修改為如下索引

postgres=# drop index idx_test_1 ;  
DROP INDEX  
  
postgres=# create index idx_test_1 on test using gin (c3 gin_trgm_ops) ;  
CREATE INDEX

6、耗時程式設計24毫秒

postgres=# explain (analyze,verbose,timing,costs,buffers)   
select * from test   
where   
(  
  exists (select 1 from pg_class where oid::int = test.id)   
  or   
  exists (select 1 from pg_attribute where attrelid::int=test.id)  
)   
and c1 in (`1`,`2`,`3`)  
and c2 between current_date-1 and current_date   
and c3 ~ `abcdef`;  
                                                                                                       QUERY PLAN                                                                                                         
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.test  (cost=53.76..27798.16 rows=228 width=43) (actual time=24.287..24.287 rows=0 loops=1)  
   Output: test.id, test.c1, test.c2, test.c3  
   Recheck Cond: (test.c3 ~ `abcdef`::text)  
   Rows Removed by Index Recheck: 6  
   Filter: ((test.c1 = ANY (`{1,2,3}`::text[])) AND (test.c2 <= CURRENT_DATE) AND (test.c2 >= (CURRENT_DATE - 1)) AND ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (alternatives: SubPlan 3 or hashed SubPlan 4)))  
   Rows Removed by Filter: 23  
   Heap Blocks: exact=29  
   Buffers: shared hit=226  
   ->  Bitmap Index Scan on idx_test_1  (cost=0.00..53.70 rows=1000 width=0) (actual time=21.517..21.517 rows=29 loops=1)  
         Index Cond: (test.c3 ~ `abcdef`::text)  
         Buffers: shared hit=128  
   SubPlan 1  
     ->  Seq Scan on pg_catalog.pg_class  (cost=0.00..15.84 rows=1 width=0) (never executed)  
           Filter: ((pg_class.oid)::integer = test.id)  
   SubPlan 2  
     ->  Seq Scan on pg_catalog.pg_class pg_class_1  (cost=0.00..14.87 rows=387 width=4) (actual time=0.011..0.156 rows=387 loops=1)  
           Output: (pg_class_1.oid)::integer  
           Buffers: shared hit=11  
   SubPlan 3  
     ->  Index Only Scan using pg_attribute_relid_attnum_index on pg_catalog.pg_attribute  (cost=0.28..84.39 rows=8 width=0) (never executed)  
           Filter: ((pg_attribute.attrelid)::integer = test.id)  
           Heap Fetches: 0  
   SubPlan 4  
     ->  Index Only Scan using pg_attribute_relid_attnum_index on pg_catalog.pg_attribute pg_attribute_1  (cost=0.28..77.13 rows=2904 width=4) (actual time=0.028..1.099 rows=2938 loops=1)  
           Output: (pg_attribute_1.attrelid)::integer  
           Heap Fetches: 456  
           Buffers: shared hit=58  
 Planning time: 0.801 ms  
 Execution time: 24.403 ms  
(29 rows)  
  
Time: 26.052 ms