【PostgreSQL 】PostgreSQL 15對distinct的優化

abce發表於2022-07-17

示例表

table t_ex;
 c1 | c2
----+----
  2 | B
  4 | C
  6 | A
  2 | C
  4 | B
  6 | B
  2 | A
  4 | B
  6 | C
  2 | C

以下SQL語句有序地返回"c1"列中唯一值:

select distinct on(c1) * from abce;

對於c2列,會根據c1的唯一性,從表中找到的第一個值。

postgres=# select distinct on(c1) * from abce;
 c1 | c2 
----+----
 2  | B
 4  | B
 6  | B
(3 rows)

以下SQL語句有序地返回"c2"列中唯一值:

# select distinct on(c2) * from abce;
 c1 | c2 
----+----
 6  | A
 2  | B
 4  | C
(3 rows)

最後從表中返回唯一性的記錄

postgres=# select distinct * from abce;
 c1 | c2 
----+----
 6  | C
 4  | C
 4  | B
 2  | C
 2  | A
 6  | B
 6  | A
 2  | B
(8 rows)

那麼你可能會問,在postgresql15中,distinct的增強體現在哪些方面呢?答案是:併發


在此之前,只有一個cpu或程式來計算不同的值。在postgresql15中,可以使用併發,使用多個cpu程式。
這一特性涉及好幾個引數,但是,我們只聚焦在引數max_parallel_workers_per_gather。

為了演示這個改進,我們建立三個表,沒有索引,填充大約5000000條記錄。注意,表的列數分別為1,5,10。

                 Table "public.t1"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 c1     | integer |           |          |
 
 
                        Table "public.t5"
 Column |         Type          | Collation | Nullable | Default
--------+-----------------------+-----------+----------+---------
 c1     | integer               |           |          |
 c2     | integer               |           |          |
 c3     | integer               |           |          |
 c4     | integer               |           |          |
 c5     | character varying(40) |           |          |
 
                        Table "public.t10"
 Column |         Type          | Collation | Nullable | Default
--------+-----------------------+-----------+----------+---------
 c1     | integer               |           |          |
 c2     | integer               |           |          |
 c3     | integer               |           |          |
 c4     | integer               |           |          |
 c5     | character varying(40) |           |          |
 c6     | integer               |           |          |
 c7     | integer               |           |          |
 c8     | integer               |           |          |
 c9     | integer               |           |          |
 c10    | integer               |           |          |

  

insert into t1 select generate_series(1,500);

insert into t5
select   generate_series(1,500)
        ,generate_series(500,1000)
        ,generate_series(1000,1500)
        ,(random()*100)::int
        ,'aofjaofjwaoeev$#^&ETHE#@#Fasrhk!!@%Q@';
		
		
insert into t10
select   generate_series(1,500)
        ,generate_series(500,1000)
        ,generate_series(1000,1500)
        ,(random()*100)::int
        ,'aofjaofjwaoeev$#^&ETHE#@#Fasrhk!!@%Q@'
        ,generate_series(1500,2000)
        ,generate_series(2500,3000)
        ,generate_series(3000,3500)
        ,generate_series(3500,4000)
        ,generate_series(4000,4500);
		
		
                           List of relations
 Schema | Name | Type  |  Owner   | Persistence | Access method |  Size  |
--------+------+-------+----------+-------------+---------------+--------+
 public | t1   | table | postgres | permanent   | heap          | 173 MB |
 public | t10  | table | postgres | permanent   | heap          | 522 MB |
 public | t5   | table | postgres | permanent   | heap          | 404 MB |

  

下一步是將生成的資料dump到以下的版本中:

PG VERSION
    pg96
    pg10
    pg11
    pg12
    pg13
    pg14
    pg15

  

資料匯入後,使用下面的指令碼生成結果:

#!/bin/bash
for v in 96 10 11 12 13 14 15
do
    # run the explain analzye 5X in order to derive consistent numbers
    for u in $(seq 1 5)
    do
        echo "--- explain analyze: pg${v}, ${u}X ---"
        psql -p 100$v db01 -c "explain analyze select distinct on (c1) * from t1" > t1.pg$v.explain.txt
        psql -p 100$v db01 -c "explain analyze select distinct * from t5" > t5.pg$v.explain.txt
        psql -p 100$v db01 -c "explain analyze select distinct * from t10" > t10.pg$v.explain.txt
    done
done

  

以下是結果比較,可以看到表越大,效能收穫越大。

PG VERSION

1 column (t1), ms

5 column (t5), ms

10 column (t10), ms

pg96

3,382

9,743

20,026

pg10

2,004

5,746

13,241

pg11

1,932

6,062

14,295

pg12

1,876

5,832

13,214

pg13

1,973

2,358

3,135

pg14

1,948

2,316

2,909

pg15

1,439

1,025

1,245

來看看不同版本之間的執行計劃:

                        PG96 QUERY PLAN, TABLE T1
-------------------------------------------------------------------------------
 Unique  (cost=765185.42..790185.42 rows=500 width=4) (actual time=2456.805..3381.230 rows=500 loops=1)
   ->  Sort  (cost=765185.42..777685.42 rows=5000000 width=4) (actual time=2456.804..3163.600 rows=5000000 loops=1)
         Sort Key: c1
         Sort Method: external merge  Disk: 68432kB
         ->  Seq Scan on t1  (cost=0.00..72124.00 rows=5000000 width=4) (actual time=0.055..291.523 rows=5000000 loops=1)
 Planning time: 0.161 ms
 Execution time: 3381.662 ms

  

                        PG15 QUERY PLAN, TABLE T1
---------------------------------------------------------------------------
 Unique  (cost=557992.61..582992.61 rows=500 width=4) (actual time=946.556..1411.421 rows=500 loops=1)
   ->  Sort  (cost=557992.61..570492.61 rows=5000000 width=4) (actual time=946.554..1223.289 rows=5000000 loops=1)
         Sort Key: c1
         Sort Method: external merge  Disk: 58720kB
         ->  Seq Scan on t1  (cost=0.00..72124.00 rows=5000000 width=4) (actual time=0.038..259.329 rows=5000000 loops=1)
 Planning Time: 0.229 ms
 JIT:
   Functions: 1
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 0.150 ms, Inlining 31.332 ms, Optimization 6.746 ms, Emission 6.847 ms, Total 45.074 ms
 Execution Time: 1438.683 ms

 

當DISTINCT列的數量增加時,真正的差異出現了,如查詢表 t10 所示。 可以看到並行化在起作用!

                         PG96 QUERY PLAN, TABLE T10
-------------------------------------------------------------------------------------------
 Unique  (cost=1119650.30..1257425.30 rows=501000 width=73) (actual time=14257.801..20024.271 rows=50601 loops=1)
   ->  Sort  (cost=1119650.30..1132175.30 rows=5010000 width=73) (actual time=14257.800..19118.145 rows=5010000 loops=1)
         Sort Key: c1, c2, c3, c4, c5, c6, c7, c8, c9, c10
         Sort Method: external merge  Disk: 421232kB
         ->  Seq Scan on t10  (cost=0.00..116900.00 rows=5010000 width=73) (actual time=0.073..419.701 rows=5010000 loops=1)
 Planning time: 0.352 ms
 Execution time: 20025.956 ms

  

                         PG15 QUERY PLAN, TABLE T10
------------------------------------------------------------------------------------------- HashAggregate  (cost=699692.77..730144.18 rows=501000 width=73) (actual time=1212.779..1232.667 rows=50601 loops=1)
   Group Key: c1, c2, c3, c4, c5, c6, c7, c8, c9, c10
   Planned Partitions: 16  Batches: 17  Memory Usage: 8373kB  Disk Usage: 2976kB
   ->  Gather  (cost=394624.22..552837.15 rows=1002000 width=73) (actual time=1071.280..1141.814 rows=151803 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  HashAggregate  (cost=393624.22..451637.15 rows=501000 width=73) (actual time=1064.261..1122.628 rows=50601 loops=3)
               Group Key: c1, c2, c3, c4, c5, c6, c7, c8, c9, c10
               Planned Partitions: 16  Batches: 17  Memory Usage: 8373kB  Disk Usage: 15176kB
               Worker 0:  Batches: 17  Memory Usage: 8373kB  Disk Usage: 18464kB
               Worker 1:  Batches: 17  Memory Usage: 8373kB  Disk Usage: 19464kB
               ->  Parallel Seq Scan on t10  (cost=0.00..87675.00 rows=2087500 width=73) (actual time=0.072..159.083 rows=1670000 loops=3)
 Planning Time: 0.286 ms
 JIT:
   Functions: 31
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 3.510 ms, Inlining 123.698 ms, Optimization 200.805 ms, Emission 149.608 ms, Total 477.621 ms
 Execution Time: 1244.556 ms

提高效能:
postgres執行時引數max_parallel_workers_per_gather來提高效能。新初始化的叢集中的預設值為2。
如下表所示,由於測試硬體本身的能力有限,它很快成為收益遞減的原因。

​​在postgresql 15中:

max_parallel_workers_per_gather

1 column (t1)

5 column (t5)

10 column (t10)

2

1,439

1,025

1,245

3

1,464

875

1,013

4

1,391

858

977

6

1,401

846

1,045

8

1,428

856

993

 

 

關於索引:如本查詢計劃中所示,應用索引時未實現效能改進。

PG15,表T10,max_parallel_workers_per_gather=4:

                                     QUERY PLAN                                                                                 
-----------------------------------------------------------------------------------
 Unique  (cost=0.43..251344.40 rows=501000 width=73) (actual time=0.060..1240.729 rows=50601 loops=1)
   ->  Index Only Scan using t10_c1_c2_c3_c4_c5_c6_c7_c8_c9_c10_idx on t10  (cost=0.43..126094.40 rows=5010000 width=73) (actual time=0.058..710.780 rows=5010000 loops=1)
         Heap Fetches: 582675
 Planning Time: 0.596 ms
 JIT:
   Functions: 1
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 0.262 ms, Inlining 0.000 ms, Optimization 0.122 ms, Emission 2.295 ms, Total 2.679 ms
 Execution Time: <strong>1249.391 ms</strong>

  

跨多個CPU執行DISTINCT是效能能力的一大進步。
但是請記住,當增加max_parallel_workers_per_gather的數量並接近硬體的限制時,效能下降的風險。
在正常情況下,查詢計劃器可能會決定使用索引而不是執行並行工作程式。
解決此問題的一種方法是考慮禁用執行時引數,例如enable_indexonlyscan和enable_indexscan。
最後,不要忘記執行EXPLAIN ANALYZE以瞭解發生了什麼。

相關文章