PostgreSQLdblink非同步呼叫實現並行hash分片JOIN-含資料交、並、差提速案例

德哥發表於2018-02-02

標籤

PostgreSQL , 並行雜湊join , parall hash join , dblink , 非同步呼叫 , hash


背景

資料交、並、差是分析型場景常見的需求。例如用來篩選目標使用者、店鋪等。

pic

PostgreSQL中交、並、差可以使用SQL語法來實現(union , union all , except , intersect)。其中只有union all是不去重的,其他都會去重。

例子

select generate_series(1,10) except select generate_series(5,12);    
    
select generate_series(1,10) union select generate_series(5,12);    
    
select generate_series(1,10) union all select generate_series(5,12);    
    
select generate_series(1,10) intersect select generate_series(5,12);    

當資料量非常龐大時,求交集、差集的話這種方法的效果可能會不夠理想。

那麼有什麼好方法呢?

1、我們自己對資料進行hash切片,然後使用dblink的非同步呼叫介面,一對一的並行操作(求交、差)。

pic

2、PostgreSQL 11 已經支援了parallel hash join,可以解決大資料量求交、差的效能問題。

《PostgreSQL 11 preview – parallel hash join(並行雜湊JOIN) 效能極大提升》

原生求交、差效能

測試結構和資料

postgres=# create table tbl(id int, c1 int);  
CREATE TABLE  
postgres=# insert into tbl select generate_series(1,10000000), random()*99;  
INSERT 0 10000000  

1、1千萬 交 1千萬

耗時,31.7秒。

postgres=# explain analyze select * from tbl intersect select * from tbl;    
                                                                      QUERY PLAN                                                                          
------------------------------------------------------------------------------------------------------------------------------------------------------    
 HashSetOp Intersect  (cost=0.00..413495.25 rows=9999977 width=12) (actual time=27730.818..30412.898 rows=10000000 loops=1)    
   ->  Append  (cost=0.00..313495.48 rows=19999954 width=12) (actual time=0.402..18889.746 rows=20000000 loops=1)    
         ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..156747.74 rows=9999977 width=12) (actual time=0.401..7744.308 rows=10000000 loops=1)    
               ->  Gather  (cost=0.00..56747.97 rows=9999977 width=8) (actual time=0.397..5947.082 rows=10000000 loops=1)    
                     Workers Planned: 8    
                     Workers Launched: 8    
                     ->  Parallel Seq Scan on tbl  (cost=0.00..56747.97 rows=1249997 width=8) (actual time=0.015..248.653 rows=1111111 loops=9)    
         ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..156747.74 rows=9999977 width=12) (actual time=0.329..8366.856 rows=10000000 loops=1)    
               ->  Gather  (cost=0.00..56747.97 rows=9999977 width=8) (actual time=0.326..6567.651 rows=10000000 loops=1)    
                     Workers Planned: 8    
                     Workers Launched: 8    
                     ->  Parallel Seq Scan on tbl tbl_1  (cost=0.00..56747.97 rows=1249997 width=8) (actual time=0.013..195.661 rows=1111111 loops=9)    
 Planning time: 0.098 ms    
 Execution time: 31691.115 ms    
(14 rows)    

2、1千萬 差 1千萬

耗時,30秒。

postgres=# explain analyze select * from tbl except select * from tbl;    
                                                                      QUERY PLAN                                                                          
------------------------------------------------------------------------------------------------------------------------------------------------------    
 HashSetOp Except  (cost=0.00..413495.25 rows=9999977 width=12) (actual time=30021.111..30021.111 rows=0 loops=1)    
   ->  Append  (cost=0.00..313495.48 rows=19999954 width=12) (actual time=0.415..20454.584 rows=20000000 loops=1)    
         ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..156747.74 rows=9999977 width=12) (actual time=0.414..8500.176 rows=10000000 loops=1)    
               ->  Gather  (cost=0.00..56747.97 rows=9999977 width=8) (actual time=0.409..6696.932 rows=10000000 loops=1)    
                     Workers Planned: 8    
                     Workers Launched: 8    
                     ->  Parallel Seq Scan on tbl  (cost=0.00..56747.97 rows=1249997 width=8) (actual time=0.019..233.999 rows=1111111 loops=9)    
         ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..156747.74 rows=9999977 width=12) (actual time=0.341..9162.134 rows=10000000 loops=1)    
               ->  Gather  (cost=0.00..56747.97 rows=9999977 width=8) (actual time=0.337..7358.837 rows=10000000 loops=1)    
                     Workers Planned: 8    
                     Workers Launched: 8    
                     ->  Parallel Seq Scan on tbl tbl_1  (cost=0.00..56747.97 rows=1249997 width=8) (actual time=0.015..196.848 rows=1111111 loops=9)    
 Planning time: 0.080 ms    
 Execution time: 30358.560 ms    
(14 rows)    

優化手段1 – 使用hash切片,然後使用dblink的非同步呼叫介面,求交、差效能

dblink非同步呼叫的例子,參考

《驚天效能!單RDS PostgreSQL例項 支撐 2000億 – 實時標籤透視案例》

《阿里雲RDS PostgreSQL OSS 外部表 – (dblink非同步呼叫封裝)並行寫提速案例》

這個方法純屬個人高階玩法。建議我們們還是直接用PG 11。

參與計算相交、差的欄位中的任意一個或多個,作為HASH切片欄位即可。PostgreSQL內建了好多型別轉HASH值得到函式:

postgres=# df *.hash*    
                                    List of functions    
   Schema   |         Name          | Result data type |   Argument data types    | Type     
------------+-----------------------+------------------+--------------------------+------    
 pg_catalog | hash_aclitem          | integer          | aclitem                  | func    
 pg_catalog | hash_aclitem_extended | bigint           | aclitem, bigint          | func    
 pg_catalog | hash_array            | integer          | anyarray                 | func    
 pg_catalog | hash_array_extended   | bigint           | anyarray, bigint         | func    
 pg_catalog | hash_numeric          | integer          | numeric                  | func    
 pg_catalog | hash_numeric_extended | bigint           | numeric, bigint          | func    
 pg_catalog | hash_range            | integer          | anyrange                 | func    
 pg_catalog | hash_range_extended   | bigint           | anyrange, bigint         | func    
 pg_catalog | hashbpchar            | integer          | character                | func    
 pg_catalog | hashbpcharextended    | bigint           | character, bigint        | func    
 pg_catalog | hashchar              | integer          | "char"                   | func    
 pg_catalog | hashcharextended      | bigint           | "char", bigint           | func    
 pg_catalog | hashenum              | integer          | anyenum                  | func    
 pg_catalog | hashenumextended      | bigint           | anyenum, bigint          | func    
 pg_catalog | hashfloat4            | integer          | real                     | func    
 pg_catalog | hashfloat4extended    | bigint           | real, bigint             | func    
 pg_catalog | hashfloat8            | integer          | double precision         | func    
 pg_catalog | hashfloat8extended    | bigint           | double precision, bigint | func    
 pg_catalog | hashhandler           | index_am_handler | internal                 | func    
 pg_catalog | hashinet              | integer          | inet                     | func    
 pg_catalog | hashinetextended      | bigint           | inet, bigint             | func    
 pg_catalog | hashint2              | integer          | smallint                 | func    
 pg_catalog | hashint2extended      | bigint           | smallint, bigint         | func    
 pg_catalog | hashint4              | integer          | integer                  | func    
 pg_catalog | hashint4extended      | bigint           | integer, bigint          | func    
 pg_catalog | hashint8              | integer          | bigint                   | func    
 pg_catalog | hashint8extended      | bigint           | bigint, bigint           | func    
 pg_catalog | hashmacaddr           | integer          | macaddr                  | func    
 pg_catalog | hashmacaddr8          | integer          | macaddr8                 | func    
 pg_catalog | hashmacaddr8extended  | bigint           | macaddr8, bigint         | func    
 pg_catalog | hashmacaddrextended   | bigint           | macaddr, bigint          | func    
 pg_catalog | hashname              | integer          | name                     | func    
 pg_catalog | hashnameextended      | bigint           | name, bigint             | func    
 pg_catalog | hashoid               | integer          | oid                      | func    
 pg_catalog | hashoidextended       | bigint           | oid, bigint              | func    
 pg_catalog | hashoidvector         | integer          | oidvector                | func    
 pg_catalog | hashoidvectorextended | bigint           | oidvector, bigint        | func    
 pg_catalog | hashtext              | integer          | text                     | func    
 pg_catalog | hashtextextended      | bigint           | text, bigint             | func    
 pg_catalog | hashvarlena           | integer          | internal                 | func    
 pg_catalog | hashvarlenaextended   | bigint           | internal, bigint         | func    
(41 rows)    

首先看看切成小片後,求交、差執行時間需要多久:

不開並行,切成48份,每份的intersect時間,大概是1.9秒。

postgres=# explain analyze select t1.* from tbl t1 where mod(abs(hashint4(id)), 48)=0 intersect select t1.* from tbl t1 where mod(abs(hashint4(id)), 48)=0;    
                                                               QUERY PLAN                                                                   
----------------------------------------------------------------------------------------------------------------------------------------    
 HashSetOp Intersect  (cost=0.00..489995.08 rows=50000 width=12) (actual time=1822.887..1867.381 rows=208902 loops=1)    
   ->  Append  (cost=0.00..489495.08 rows=100000 width=12) (actual time=0.021..1679.633 rows=417804 loops=1)    
         ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..244747.54 rows=50000 width=12) (actual time=0.020..811.669 rows=208902 loops=1)    
               ->  Seq Scan on tbl t1  (cost=0.00..244247.54 rows=50000 width=8) (actual time=0.019..774.864 rows=208902 loops=1)    
                     Filter: (mod(abs(hashint4(id)), 48) = 0)    
                     Rows Removed by Filter: 9791098    
         ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..244747.54 rows=50000 width=12) (actual time=0.027..807.215 rows=208902 loops=1)    
               ->  Seq Scan on tbl t1_1  (cost=0.00..244247.54 rows=50000 width=8) (actual time=0.026..770.958 rows=208902 loops=1)    
                     Filter: (mod(abs(hashint4(id)), 48) = 0)    
                     Rows Removed by Filter: 9791098    
 Planning time: 0.116 ms    
 Execution time: 1887.638 ms    
(12 rows)    

也就是說,開48個並行切片的話,最理想的效能是1.9秒。

注意

因為這裡面的HASH分片是掃全表得到的,所以開的併發越多,掃描次數越多。最好是掃一次,並均分到N個臨時空間,然後再從臨時空間中掃,這樣就只需要掃一遍。當然會增加複雜度,如果表不大,實際上多掃幾次也無所謂。

hash並行切片+非同步dblink

1、建立生成dblink連線的函式,重複建立不報錯。

create or replace function conn(      
  name,   -- dblink名字      
  text    -- 連線串,URL      
) returns void as $$        
declare        
begin        
  perform dblink_connect($1, $2);       
  return;        
exception when others then        
  return;        
end;        
$$ language plpgsql strict;        

2、建立一個函式,用於跑並行求交

create or replace function get_intersect(    
  conn text,         -- 連線串    
  OUT id int,    
  OUT c1 int    
) returns setof record as $$       
declare      
begin      
for i in 0..47 loop       
  perform conn(`link`||i,  conn);       
  perform 1 from dblink_get_result(`link`||i) as t(id int, c1 int);      
  perform dblink_send_query(`link`||i, format(`select * from tbl t1 where mod(abs(hashint4(id)), 48)=%s intersect select * from tbl t1 where mod(abs(hashint4(id)), 48)=%s`, i, i));      
end loop;      
      
for i in 0..47 loop      
  return query SELECT * FROM dblink_get_result(`link`||i) as t(id int, c1 int);      
end loop;      
end;      
$$ language plpgsql strict;      

使用這個方法,可以看到執行時間大概3秒。但是耗費了很多時間在將1000萬條記錄從所有的遠端返回給呼叫端。總共差不多8秒。

如果改成返回遊標,響應速度就快得不得了了,比如在圖計算中,用遊標流式返回:

《金融風控、公安刑偵、社會關係、人脈分析等需求分析與資料庫實現 – PostgreSQL圖資料庫場景應用》

create or replace function get_intersect1()   
  returns setof refcursor as $$       
declare   
  ref refcursor[];    
  res refcursor;  
begin      
for i in 0..47 loop       
  ref[i] := `cur`||i;  
  res := ref[i];  
  open res for execute format(`select * from tbl t1 where mod(abs(hashint4(id)), 48)=%s intersect select * from tbl t1 where mod(abs(hashint4(id)), 48)=%s`, i, i);    
  return next res;  
end loop;      
  return;    
end;      
$$ language plpgsql strict;      

用法

postgres=# begin;  
BEGIN  
postgres=# select * from get_intersect1();  
 get_intersect1   
----------------  
 cur0  
 cur1  
 cur2  
 cur3  
 cur4  
 cur5  
 cur6  
 cur7  
 cur8  
 cur9  
 cur10  
 cur11  
 cur12  
 cur13  
 cur14  
 cur15  
 cur16  
 cur17  
 cur18  
 cur19  
 cur20  
 cur21  
 cur22  
 cur23  
 cur24  
 cur25  
 cur26  
 cur27  
 cur28  
 cur29  
 cur30  
 cur31  
 cur32  
 cur33  
 cur34  
 cur35  
 cur36  
 cur37  
 cur38  
 cur39  
 cur40  
 cur41  
 cur42  
 cur43  
 cur44  
 cur45  
 cur46  
 cur47  
(48 rows)  
  
Time: 46.471 ms  
  
-- 第一頁比較慢  
  
postgres=# fetch 10 from cur1;  
   id    | c1   
---------+----  
 3591658 | 70  
 6100015 | 17  
 3222328 | 90  
 5500150 | 23  
 9087335 | 45  
 2463228 | 86  
  870261 | 51  
 9276428 | 85  
 7672240 | 32  
 6828314 | 41  
(10 rows)  
  
Time: 1645.906 ms (00:01.646)  
  
-- 後面就飛快了。  
  
postgres=# fetch 10 from cur1;  
   id    | c1   
---------+----  
 7335851 |  5  
 8007430 | 10  
 6230301 | 27  
 9111491 | 91  
 1400805 | 65  
 3651088 | 33  
 3292697 | 65  
 1431682 | 66  
 2959698 | 66  
 4580225 | 39  
(10 rows)  
  
Time: 0.187 ms  

是不是飛快了呢,使用遊標,從使用者發出請求,到獲取資料,大概的延遲是1.7秒。.

求差與之類似,只是改一下SQL。

create or replace function get_except1()   
  returns setof refcursor as $$       
declare   
  ref refcursor[];    
  res refcursor;  
begin      
for i in 0..47 loop       
  ref[i] := `cur`||i;  
  res := ref[i];  
  open res for execute format(`select * from tbl t1 where mod(abs(hashint4(id)), 48)=%s except select * from tbl t1 where mod(abs(hashint4(id)), 48)=%s`, i, i);    
  return next res;  
end loop;      
  return;    
end;      
$$ language plpgsql strict;     
postgres=# begin;  
BEGIN  
Time: 0.169 ms  
postgres=# select * from get_except1();  
 get_except1   
-------------  
 cur0  
 cur1  
 ..........  
 cur44  
 cur45  
 cur46  
 cur47  
(48 rows)  
  
Time: 46.482 ms  
  
postgres=# fetch 10 from cur1;  
 id | c1   
----+----  
(0 rows)  
  
Time: 1681.922 ms (00:01.682)  

優化手段2 – PostgreSQL 11 求交、差效能

使用PostgreSQL 11,JOIN的手法來求交、差。語義相同。

1、求交

select * from tbl intersect select * from tbl;    

相當於

select t1.* from tbl t1 join tbl t2 on (t1.id=t2.id and t1.c1=t2.c1);  -- 所有參與求交的欄位都加到JOIN ON裡面    

2、求差

select * from tbl except select * from tbl;    

相當於

select * from tbl t1 where not exists     
  ( select 1 from      
      (select t1.id,t1.c1 from tbl t1 join tbl t2 on (t1.id=t2.id and t1.c1=t2.c1) ) t   -- 所有參與求交的欄位都加到JOIN ON裡面    
    where t.id=t1.id and t.c1=t1.c1    
  );    

PostgreSQL 11 求交、差效能如下

1、求交集,3.3秒。

postgres=# explain analyze select t1.* from tbl t1 join tbl t2 on (t1.id = t2.id and t1.c1 = t2.c1);    
                                                                QUERY PLAN                                                                     
-------------------------------------------------------------------------------------------------------------------------------------------    
 Gather  (cost=52060.48..101778.20 rows=100921 width=8) (actual time=407.118..2395.421 rows=10000000 loops=1)    
   Workers Planned: 32    
   Workers Launched: 32    
   ->  Parallel Hash Join  (cost=52060.48..101778.20 rows=3154 width=8) (actual time=378.294..691.692 rows=303030 loops=33)    
         Hash Cond: ((t1.id = t2.id) AND (t1.c1 = t2.c1))    
         ->  Parallel Seq Scan on tbl t1  (cost=0.00..47372.99 rows=312499 width=8) (actual time=0.014..41.780 rows=303030 loops=33)    
         ->  Parallel Hash  (cost=47372.99..47372.99 rows=312499 width=8) (actual time=374.931..374.931 rows=303030 loops=33)    
               Buckets: 16777216  Batches: 1  Memory Usage: 522848kB    
               ->  Parallel Seq Scan on tbl t2  (cost=0.00..47372.99 rows=312499 width=8) (actual time=0.022..48.013 rows=303030 loops=33)    
 Planning time: 0.137 ms    
 Execution time: 3316.010 ms    
(11 rows)    

2、求差集,1.9秒

postgres=# explain analyze select * from tbl t1 where not exists     
  ( select 1 from      
      (select t1.id,t1.c1 from tbl t1 join tbl t2 on (t1.id=t2.id and t1.c1=t2.c1) ) t   -- 所有參與求交的欄位都加到JOIN ON裡面    
    where t.id=t1.id and t.c1=t1.c1    
  );    
                                                                      QUERY PLAN                                                                           
-------------------------------------------------------------------------------------------------------------------------------------------------------    
 Gather  (cost=101825.51..153939.67 rows=9899056 width=8) (actual time=1557.867..1557.867 rows=0 loops=1)    
   Workers Planned: 32    
   Workers Launched: 32    
   ->  Parallel Hash Anti Join  (cost=101825.51..153939.67 rows=309346 width=8) (actual time=1495.529..1495.529 rows=0 loops=33)    
         Hash Cond: ((t1.id = t1_1.id) AND (t1.c1 = t1_1.c1))    
         ->  Parallel Seq Scan on tbl t1  (cost=0.00..47372.99 rows=312499 width=8) (actual time=0.013..44.749 rows=303030 loops=33)    
         ->  Parallel Hash  (cost=101778.20..101778.20 rows=3154 width=8) (actual time=1260.916..1260.916 rows=303030 loops=33)    
               Buckets: 16777216 (originally 131072)  Batches: 1 (originally 1)  Memory Usage: 652800kB    
               ->  Parallel Hash Join  (cost=52060.48..101778.20 rows=3154 width=8) (actual time=387.651..740.551 rows=303030 loops=33)    
                     Hash Cond: ((t1_1.id = t2.id) AND (t1_1.c1 = t2.c1))    
                     ->  Parallel Seq Scan on tbl t1_1  (cost=0.00..47372.99 rows=312499 width=8) (actual time=0.013..46.111 rows=303030 loops=33)    
                     ->  Parallel Hash  (cost=47372.99..47372.99 rows=312499 width=8) (actual time=384.666..384.666 rows=303030 loops=33)    
                           Buckets: 16777216  Batches: 1  Memory Usage: 522784kB    
                           ->  Parallel Seq Scan on tbl t2  (cost=0.00..47372.99 rows=312499 width=8) (actual time=0.024..47.326 rows=303030 loops=33)    
 Planning time: 0.251 ms    
 Execution time: 1939.745 ms    
(16 rows)    

小結

1000萬 與 1000萬 求交、差的效能指標:

方法 求交 求差
原生intersect, except 31.7秒 30秒
自定義切片+dblink非同步呼叫 1.7秒 1.7秒
PostgreSQL 11 並行hashjoin 3.3秒 1.9秒

通過改寫SQL,PostgreSQL 11可以利用平行計算,更好的支撐求資料交、差的效能。(但是需要注意,NULL值在except, intersect中會視為相同,而join時取等的話,是匹配不到的。這個特別需要注意。(所以語義上不完全一樣))

postgres=# select 1,null except select 1,null;    
 ?column? | ?column?     
----------+----------    
(0 rows)    
    
postgres=# select 1,null intersect select 1,null;    
 ?column? | ?column?     
----------+----------    
        1 |     
(1 row)    

如果要讓語義完全一樣,可以用這種寫法,但是就用不到hashjoin了。

即:  等號改成 is not distinct from    
    
select t1.* from tbl t1 join tbl t2 on ((t1.id is not distinct from t2.id) and (t1.c1 is not distinct from t2.c1));    

而使用dblink非同步的方式,需要注意:

因為我們使用dblink的方法進行HASH分片是掃全表得到的,所以開的併發越多,掃描次數越多。最好是掃一次,並均分到N個臨時空間,然後再從臨時空間中掃,這樣就只需要掃一遍。當然會增加複雜度,如果表不大,實際上多掃幾次也無所謂。

參考

《驚天效能!單RDS PostgreSQL例項 支撐 2000億 – 實時標籤透視案例》

《PostgreSQL 11 preview – parallel hash join(並行雜湊JOIN) 效能極大提升》

https://www.postgresql.org/docs/10/static/dblink.html

《阿里雲RDS PostgreSQL OSS 外部表 – (dblink非同步呼叫封裝)並行寫提速案例》

《金融風控、公安刑偵、社會關係、人脈分析等需求分析與資料庫實現 – PostgreSQL圖資料庫場景應用》


相關文章