PostgreSQL執行計劃變化

dbs101發表於2022-03-06

PostgreSQL執行計劃變化,當PostgreSQL使用繫結變數時,由於優化器無法預估表示式返回的行數,就預設選擇統計資訊收集行數的5%,評估的行數會遠遠低於實際執行時返回的行數。在表關聯時,這種評估會對關聯方式產生影響,驅動表返回行數少,優化器更傾向於使用nested loop關聯方式,執行效率會下降。下面是測試案例:

 

建立測試關聯表。

create table tym_p1 (

id varchar(32) not null default sys_guid()

, tym_no varchar(32) not null

, tym_date timestamp(0) not null

, primary key (id)

)

;

 

create index tym_p1_no on tym_p1 (tym_no);

 

建立測試關聯子表。

create table tym_c1 (

id varchar(32) not null default sys_guid()

, tym_no varchar(32) not null

, tym_date timestamp(0) not null

, tym_user varchar(32) not null

, primary key (id)

)

;

 

create index tym_c1_no on tym_c1 (tym_no);

 

生成測試資料

for i in `seq 100000`; do psql -d phroc -c "insert into tym_p1 (tym_no,tym_date) values(to_char($i,'999999'),now())";done

 

insert into tym_c1 select a.*, tym_no from tym_p1 a;

 

準備語句

prepare tym_prep04 (

varchar(50),varchar(50)

) as

select * from tym_p1 p where

p.tym_date

BETWEEN to_timestamp($1, 'yyyy-MM-dd hh24:mi:ss')

AND to_timestamp($2, 'yyyy-MM-dd hh24:mi:ss')

;

 

explain analyze execute tym_prep04(

'2022-03-02 00:00:00','2022-03-09 23:59:59');

 

執行結果,第一次至第五次的執行計劃,從執行計劃預估行數是返回 99980 行,實際執行時 100000 行。

explain analyze execute tym_prep04(

'2022-03-02 00:00:00','2022-03-09 23:59:59');

                                                                                          QUERY PLAN                                                        

-------------------------------------------------------------------------------------------------------------------------------------------------------------

 Seq Scan on tym_p1 p  (cost=0.00..3031.00 rows= 99980 width=49) (actual time=0.011..148.967 rows=100000 loops=1)

   Filter: ((tym_date >= to_timestamp('2022-03-02 00:00:00'::text, 'yyyy-MM-dd hh24:mi:ss'::text)) AND (tym_date <= to_timestamp('2022-03-09 23:59:59'::text,

 Planning Time: 0.119 ms

 Execution Time: 152.085 ms

(4 rows)

 

執行結果,第六次的執行計劃,,從執行計劃預估行數是返回 500 行,實際執行時 100000 行。

explain analyze execute tym_prep04(

'2022-03-02 00:00:00','2022-03-09 23:59:59');

                                                                         QUERY PLAN                                                                         

-------------------------------------------------------------------------------------------------------------------------------------------------------------

 Seq Scan on tym_p1 p  (cost=0.00..3031.00 rows= 500 width=49) (actual time=0.041..151.032 rows=100000 loops=1)

   Filter: ((tym_date >= to_timestamp(($1)::text, 'yyyy-MM-dd hh24:mi:ss'::text)) AND (tym_date <= to_timestamp(($2)::text, 'yyyy-MM-dd hh24:mi:ss'::text)))

 Planning Time: 0.122 ms

 Execution Time: 154.185 ms

(4 rows)

 

如果是幾張表關聯時,這種評估會對連線方式產生影響。

prepare tym_prep05 (

varchar(50),varchar(50)

) as

select * from tym_p1 p, tym_c1 c where

p.tym_no = c.tym_no and

p.tym_date

BETWEEN to_timestamp($1, 'yyyy-MM-dd hh24:mi:ss')

AND to_timestamp($2, 'yyyy-MM-dd hh24:mi:ss')

;

 

explain analyze execute tym_prep05(

'2022-03-02 00:00:00','2022-03-09 23:59:59');

 

第一次到第五次執行,由於預估返回表tym_p1的行數是99980,優先採用了hash join的方式。

 

explain analyze execute tym_prep05(

'2022-03-02 00:00:00','2022-03-09 23:59:59');

                                                                                             QUERY PLAN                                                     

-------------------------------------------------------------------------------------------------------------------------------------------------------------

 Hash Join  (cost=3387.00..7792.72 rows=99980 width=106) (actual time=35.425..229.255 rows=100000 loops=1)

   Hash Cond: ((p.tym_no)::text = (c.tym_no)::text)

   ->  Seq Scan on tym_p1 p  (cost=0.00..3031.00 rows= 99980 width=49) (actual time=0.016..152.250 rows=100000 loops=1)

         Filter: ((tym_date >= to_timestamp('2022-03-02 00:00:00'::text, 'yyyy-MM-dd hh24:mi:ss'::text)) AND (tym_date <= to_timestamp('2022-03-09 23:59:59':

   ->  Hash  (cost=2137.00..2137.00 rows=100000 width=57) (actual time=34.740..34.741 rows=100000 loops=1)

         Buckets: 131072  Batches: 1  Memory Usage: 10399kB

         ->  Seq Scan on tym_c1 c  (cost=0.00..2137.00 rows=100000 width=57) (actual time=0.006..12.159 rows=100000 loops=1)

 Planning Time: 0.434 ms

 Execution Time: 234.067 ms

(9 rows)

 

Time: 234.852 ms

 

第六次執行,由於預估返回表tym_p1的行數是294,優先採用了nested loop的方式。

explain analyze execute tym_prep05(

'2022-03-02 00:00:00','2022-03-09 23:59:59');

                                                                               QUERY PLAN                                                                   

-------------------------------------------------------------------------------------------------------------------------------------------------------------

 Gather  (cost=1000.42..4474.63 rows=500 width=106) (actual time=0.509..538.023 rows=100000 loops=1)

   Workers Planned: 1

   Workers Launched: 1

   ->  Nested Loop  (cost=0.42..3424.63 rows=294 width=106) (actual time=0.086..496.033 rows=50000 loops=2)

         ->  Parallel Seq Scan on tym_p1 p  (cost=0.00..2207.47 rows= 294 width=49) (actual time=0.035..85.307 rows=50000 loops=2)

               Filter: ((tym_date >= to_timestamp(($1)::text, 'yyyy-MM-dd hh24:mi:ss'::text)) AND (tym_date <= to_timestamp(($2)::text, 'yyyy-MM-dd hh24:mi:s

         ->  Index Scan using tym_c1_no on tym_c1 c  (cost=0.42..4.13 rows=1 width=57) (actual time=0.008..0.008 rows=1 loops=100000)

               Index Cond: ((tym_no)::text = (p.tym_no)::text)

 Planning Time: 0.380 ms

 Execution Time: 543.929 ms

(10 rows)


解決方案:

  1. 不使用繫結變數,優化器可以根據統計資訊的直方圖評估較為接近的行數,生成相對較優的執行計劃。

  2. 使用pg hint外掛,通過外掛固化執行計劃。

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/25105315/viewspace-2865281/,如需轉載,請註明出處,否則將追究法律責任。

相關文章