PostgreSQLdblink非同步呼叫實現並行hash分片JOIN-含資料交、並、差提速案例
標籤
PostgreSQL , 並行雜湊join , parall hash join , dblink , 非同步呼叫 , hash
背景
資料交、並、差是分析型場景常見的需求。例如用來篩選目標使用者、店鋪等。
PostgreSQL中交、並、差可以使用SQL語法來實現(union , union all , except , intersect)。其中只有union all是不去重的,其他都會去重。
例子
select generate_series(1,10) except select generate_series(5,12);
select generate_series(1,10) union select generate_series(5,12);
select generate_series(1,10) union all select generate_series(5,12);
select generate_series(1,10) intersect select generate_series(5,12);
當資料量非常龐大時,求交集、差集的話這種方法的效果可能會不夠理想。
那麼有什麼好方法呢?
1、我們自己對資料進行hash切片,然後使用dblink的非同步呼叫介面,一對一的並行操作(求交、差)。
2、PostgreSQL 11 已經支援了parallel hash join,可以解決大資料量求交、差的效能問題。
《PostgreSQL 11 preview – parallel hash join(並行雜湊JOIN) 效能極大提升》
原生求交、差效能
測試結構和資料
postgres=# create table tbl(id int, c1 int);
CREATE TABLE
postgres=# insert into tbl select generate_series(1,10000000), random()*99;
INSERT 0 10000000
1、1千萬 交 1千萬
耗時,31.7秒。
postgres=# explain analyze select * from tbl intersect select * from tbl;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
HashSetOp Intersect (cost=0.00..413495.25 rows=9999977 width=12) (actual time=27730.818..30412.898 rows=10000000 loops=1)
-> Append (cost=0.00..313495.48 rows=19999954 width=12) (actual time=0.402..18889.746 rows=20000000 loops=1)
-> Subquery Scan on "*SELECT* 1" (cost=0.00..156747.74 rows=9999977 width=12) (actual time=0.401..7744.308 rows=10000000 loops=1)
-> Gather (cost=0.00..56747.97 rows=9999977 width=8) (actual time=0.397..5947.082 rows=10000000 loops=1)
Workers Planned: 8
Workers Launched: 8
-> Parallel Seq Scan on tbl (cost=0.00..56747.97 rows=1249997 width=8) (actual time=0.015..248.653 rows=1111111 loops=9)
-> Subquery Scan on "*SELECT* 2" (cost=0.00..156747.74 rows=9999977 width=12) (actual time=0.329..8366.856 rows=10000000 loops=1)
-> Gather (cost=0.00..56747.97 rows=9999977 width=8) (actual time=0.326..6567.651 rows=10000000 loops=1)
Workers Planned: 8
Workers Launched: 8
-> Parallel Seq Scan on tbl tbl_1 (cost=0.00..56747.97 rows=1249997 width=8) (actual time=0.013..195.661 rows=1111111 loops=9)
Planning time: 0.098 ms
Execution time: 31691.115 ms
(14 rows)
2、1千萬 差 1千萬
耗時,30秒。
postgres=# explain analyze select * from tbl except select * from tbl;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
HashSetOp Except (cost=0.00..413495.25 rows=9999977 width=12) (actual time=30021.111..30021.111 rows=0 loops=1)
-> Append (cost=0.00..313495.48 rows=19999954 width=12) (actual time=0.415..20454.584 rows=20000000 loops=1)
-> Subquery Scan on "*SELECT* 1" (cost=0.00..156747.74 rows=9999977 width=12) (actual time=0.414..8500.176 rows=10000000 loops=1)
-> Gather (cost=0.00..56747.97 rows=9999977 width=8) (actual time=0.409..6696.932 rows=10000000 loops=1)
Workers Planned: 8
Workers Launched: 8
-> Parallel Seq Scan on tbl (cost=0.00..56747.97 rows=1249997 width=8) (actual time=0.019..233.999 rows=1111111 loops=9)
-> Subquery Scan on "*SELECT* 2" (cost=0.00..156747.74 rows=9999977 width=12) (actual time=0.341..9162.134 rows=10000000 loops=1)
-> Gather (cost=0.00..56747.97 rows=9999977 width=8) (actual time=0.337..7358.837 rows=10000000 loops=1)
Workers Planned: 8
Workers Launched: 8
-> Parallel Seq Scan on tbl tbl_1 (cost=0.00..56747.97 rows=1249997 width=8) (actual time=0.015..196.848 rows=1111111 loops=9)
Planning time: 0.080 ms
Execution time: 30358.560 ms
(14 rows)
優化手段1 – 使用hash切片,然後使用dblink的非同步呼叫介面,求交、差效能
dblink非同步呼叫的例子,參考
《驚天效能!單RDS PostgreSQL例項 支撐 2000億 – 實時標籤透視案例》
《阿里雲RDS PostgreSQL OSS 外部表 – (dblink非同步呼叫封裝)並行寫提速案例》
這個方法純屬個人高階玩法。建議我們們還是直接用PG 11。
參與計算相交、差的欄位中的任意一個或多個,作為HASH切片欄位即可。PostgreSQL內建了好多型別轉HASH值得到函式:
postgres=# df *.hash*
List of functions
Schema | Name | Result data type | Argument data types | Type
------------+-----------------------+------------------+--------------------------+------
pg_catalog | hash_aclitem | integer | aclitem | func
pg_catalog | hash_aclitem_extended | bigint | aclitem, bigint | func
pg_catalog | hash_array | integer | anyarray | func
pg_catalog | hash_array_extended | bigint | anyarray, bigint | func
pg_catalog | hash_numeric | integer | numeric | func
pg_catalog | hash_numeric_extended | bigint | numeric, bigint | func
pg_catalog | hash_range | integer | anyrange | func
pg_catalog | hash_range_extended | bigint | anyrange, bigint | func
pg_catalog | hashbpchar | integer | character | func
pg_catalog | hashbpcharextended | bigint | character, bigint | func
pg_catalog | hashchar | integer | "char" | func
pg_catalog | hashcharextended | bigint | "char", bigint | func
pg_catalog | hashenum | integer | anyenum | func
pg_catalog | hashenumextended | bigint | anyenum, bigint | func
pg_catalog | hashfloat4 | integer | real | func
pg_catalog | hashfloat4extended | bigint | real, bigint | func
pg_catalog | hashfloat8 | integer | double precision | func
pg_catalog | hashfloat8extended | bigint | double precision, bigint | func
pg_catalog | hashhandler | index_am_handler | internal | func
pg_catalog | hashinet | integer | inet | func
pg_catalog | hashinetextended | bigint | inet, bigint | func
pg_catalog | hashint2 | integer | smallint | func
pg_catalog | hashint2extended | bigint | smallint, bigint | func
pg_catalog | hashint4 | integer | integer | func
pg_catalog | hashint4extended | bigint | integer, bigint | func
pg_catalog | hashint8 | integer | bigint | func
pg_catalog | hashint8extended | bigint | bigint, bigint | func
pg_catalog | hashmacaddr | integer | macaddr | func
pg_catalog | hashmacaddr8 | integer | macaddr8 | func
pg_catalog | hashmacaddr8extended | bigint | macaddr8, bigint | func
pg_catalog | hashmacaddrextended | bigint | macaddr, bigint | func
pg_catalog | hashname | integer | name | func
pg_catalog | hashnameextended | bigint | name, bigint | func
pg_catalog | hashoid | integer | oid | func
pg_catalog | hashoidextended | bigint | oid, bigint | func
pg_catalog | hashoidvector | integer | oidvector | func
pg_catalog | hashoidvectorextended | bigint | oidvector, bigint | func
pg_catalog | hashtext | integer | text | func
pg_catalog | hashtextextended | bigint | text, bigint | func
pg_catalog | hashvarlena | integer | internal | func
pg_catalog | hashvarlenaextended | bigint | internal, bigint | func
(41 rows)
首先看看切成小片後,求交、差執行時間需要多久:
不開並行,切成48份,每份的intersect時間,大概是1.9秒。
postgres=# explain analyze select t1.* from tbl t1 where mod(abs(hashint4(id)), 48)=0 intersect select t1.* from tbl t1 where mod(abs(hashint4(id)), 48)=0;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
HashSetOp Intersect (cost=0.00..489995.08 rows=50000 width=12) (actual time=1822.887..1867.381 rows=208902 loops=1)
-> Append (cost=0.00..489495.08 rows=100000 width=12) (actual time=0.021..1679.633 rows=417804 loops=1)
-> Subquery Scan on "*SELECT* 1" (cost=0.00..244747.54 rows=50000 width=12) (actual time=0.020..811.669 rows=208902 loops=1)
-> Seq Scan on tbl t1 (cost=0.00..244247.54 rows=50000 width=8) (actual time=0.019..774.864 rows=208902 loops=1)
Filter: (mod(abs(hashint4(id)), 48) = 0)
Rows Removed by Filter: 9791098
-> Subquery Scan on "*SELECT* 2" (cost=0.00..244747.54 rows=50000 width=12) (actual time=0.027..807.215 rows=208902 loops=1)
-> Seq Scan on tbl t1_1 (cost=0.00..244247.54 rows=50000 width=8) (actual time=0.026..770.958 rows=208902 loops=1)
Filter: (mod(abs(hashint4(id)), 48) = 0)
Rows Removed by Filter: 9791098
Planning time: 0.116 ms
Execution time: 1887.638 ms
(12 rows)
也就是說,開48個並行切片的話,最理想的效能是1.9秒。
注意
因為這裡面的HASH分片是掃全表得到的,所以開的併發越多,掃描次數越多。最好是掃一次,並均分到N個臨時空間,然後再從臨時空間中掃,這樣就只需要掃一遍。當然會增加複雜度,如果表不大,實際上多掃幾次也無所謂。
hash並行切片+非同步dblink
1、建立生成dblink連線的函式,重複建立不報錯。
create or replace function conn(
name, -- dblink名字
text -- 連線串,URL
) returns void as $$
declare
begin
perform dblink_connect($1, $2);
return;
exception when others then
return;
end;
$$ language plpgsql strict;
2、建立一個函式,用於跑並行求交
create or replace function get_intersect(
conn text, -- 連線串
OUT id int,
OUT c1 int
) returns setof record as $$
declare
begin
for i in 0..47 loop
perform conn(`link`||i, conn);
perform 1 from dblink_get_result(`link`||i) as t(id int, c1 int);
perform dblink_send_query(`link`||i, format(`select * from tbl t1 where mod(abs(hashint4(id)), 48)=%s intersect select * from tbl t1 where mod(abs(hashint4(id)), 48)=%s`, i, i));
end loop;
for i in 0..47 loop
return query SELECT * FROM dblink_get_result(`link`||i) as t(id int, c1 int);
end loop;
end;
$$ language plpgsql strict;
使用這個方法,可以看到執行時間大概3秒。但是耗費了很多時間在將1000萬條記錄從所有的遠端返回給呼叫端。總共差不多8秒。
如果改成返回遊標,響應速度就快得不得了了,比如在圖計算中,用遊標流式返回:
《金融風控、公安刑偵、社會關係、人脈分析等需求分析與資料庫實現 – PostgreSQL圖資料庫場景應用》
create or replace function get_intersect1()
returns setof refcursor as $$
declare
ref refcursor[];
res refcursor;
begin
for i in 0..47 loop
ref[i] := `cur`||i;
res := ref[i];
open res for execute format(`select * from tbl t1 where mod(abs(hashint4(id)), 48)=%s intersect select * from tbl t1 where mod(abs(hashint4(id)), 48)=%s`, i, i);
return next res;
end loop;
return;
end;
$$ language plpgsql strict;
用法
postgres=# begin;
BEGIN
postgres=# select * from get_intersect1();
get_intersect1
----------------
cur0
cur1
cur2
cur3
cur4
cur5
cur6
cur7
cur8
cur9
cur10
cur11
cur12
cur13
cur14
cur15
cur16
cur17
cur18
cur19
cur20
cur21
cur22
cur23
cur24
cur25
cur26
cur27
cur28
cur29
cur30
cur31
cur32
cur33
cur34
cur35
cur36
cur37
cur38
cur39
cur40
cur41
cur42
cur43
cur44
cur45
cur46
cur47
(48 rows)
Time: 46.471 ms
-- 第一頁比較慢
postgres=# fetch 10 from cur1;
id | c1
---------+----
3591658 | 70
6100015 | 17
3222328 | 90
5500150 | 23
9087335 | 45
2463228 | 86
870261 | 51
9276428 | 85
7672240 | 32
6828314 | 41
(10 rows)
Time: 1645.906 ms (00:01.646)
-- 後面就飛快了。
postgres=# fetch 10 from cur1;
id | c1
---------+----
7335851 | 5
8007430 | 10
6230301 | 27
9111491 | 91
1400805 | 65
3651088 | 33
3292697 | 65
1431682 | 66
2959698 | 66
4580225 | 39
(10 rows)
Time: 0.187 ms
是不是飛快了呢,使用遊標,從使用者發出請求,到獲取資料,大概的延遲是1.7秒。.
求差與之類似,只是改一下SQL。
create or replace function get_except1()
returns setof refcursor as $$
declare
ref refcursor[];
res refcursor;
begin
for i in 0..47 loop
ref[i] := `cur`||i;
res := ref[i];
open res for execute format(`select * from tbl t1 where mod(abs(hashint4(id)), 48)=%s except select * from tbl t1 where mod(abs(hashint4(id)), 48)=%s`, i, i);
return next res;
end loop;
return;
end;
$$ language plpgsql strict;
postgres=# begin;
BEGIN
Time: 0.169 ms
postgres=# select * from get_except1();
get_except1
-------------
cur0
cur1
..........
cur44
cur45
cur46
cur47
(48 rows)
Time: 46.482 ms
postgres=# fetch 10 from cur1;
id | c1
----+----
(0 rows)
Time: 1681.922 ms (00:01.682)
優化手段2 – PostgreSQL 11 求交、差效能
使用PostgreSQL 11,JOIN的手法來求交、差。語義相同。
1、求交
select * from tbl intersect select * from tbl;
相當於
select t1.* from tbl t1 join tbl t2 on (t1.id=t2.id and t1.c1=t2.c1); -- 所有參與求交的欄位都加到JOIN ON裡面
2、求差
select * from tbl except select * from tbl;
相當於
select * from tbl t1 where not exists
( select 1 from
(select t1.id,t1.c1 from tbl t1 join tbl t2 on (t1.id=t2.id and t1.c1=t2.c1) ) t -- 所有參與求交的欄位都加到JOIN ON裡面
where t.id=t1.id and t.c1=t1.c1
);
PostgreSQL 11 求交、差效能如下
1、求交集,3.3秒。
postgres=# explain analyze select t1.* from tbl t1 join tbl t2 on (t1.id = t2.id and t1.c1 = t2.c1);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=52060.48..101778.20 rows=100921 width=8) (actual time=407.118..2395.421 rows=10000000 loops=1)
Workers Planned: 32
Workers Launched: 32
-> Parallel Hash Join (cost=52060.48..101778.20 rows=3154 width=8) (actual time=378.294..691.692 rows=303030 loops=33)
Hash Cond: ((t1.id = t2.id) AND (t1.c1 = t2.c1))
-> Parallel Seq Scan on tbl t1 (cost=0.00..47372.99 rows=312499 width=8) (actual time=0.014..41.780 rows=303030 loops=33)
-> Parallel Hash (cost=47372.99..47372.99 rows=312499 width=8) (actual time=374.931..374.931 rows=303030 loops=33)
Buckets: 16777216 Batches: 1 Memory Usage: 522848kB
-> Parallel Seq Scan on tbl t2 (cost=0.00..47372.99 rows=312499 width=8) (actual time=0.022..48.013 rows=303030 loops=33)
Planning time: 0.137 ms
Execution time: 3316.010 ms
(11 rows)
2、求差集,1.9秒
postgres=# explain analyze select * from tbl t1 where not exists
( select 1 from
(select t1.id,t1.c1 from tbl t1 join tbl t2 on (t1.id=t2.id and t1.c1=t2.c1) ) t -- 所有參與求交的欄位都加到JOIN ON裡面
where t.id=t1.id and t.c1=t1.c1
);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=101825.51..153939.67 rows=9899056 width=8) (actual time=1557.867..1557.867 rows=0 loops=1)
Workers Planned: 32
Workers Launched: 32
-> Parallel Hash Anti Join (cost=101825.51..153939.67 rows=309346 width=8) (actual time=1495.529..1495.529 rows=0 loops=33)
Hash Cond: ((t1.id = t1_1.id) AND (t1.c1 = t1_1.c1))
-> Parallel Seq Scan on tbl t1 (cost=0.00..47372.99 rows=312499 width=8) (actual time=0.013..44.749 rows=303030 loops=33)
-> Parallel Hash (cost=101778.20..101778.20 rows=3154 width=8) (actual time=1260.916..1260.916 rows=303030 loops=33)
Buckets: 16777216 (originally 131072) Batches: 1 (originally 1) Memory Usage: 652800kB
-> Parallel Hash Join (cost=52060.48..101778.20 rows=3154 width=8) (actual time=387.651..740.551 rows=303030 loops=33)
Hash Cond: ((t1_1.id = t2.id) AND (t1_1.c1 = t2.c1))
-> Parallel Seq Scan on tbl t1_1 (cost=0.00..47372.99 rows=312499 width=8) (actual time=0.013..46.111 rows=303030 loops=33)
-> Parallel Hash (cost=47372.99..47372.99 rows=312499 width=8) (actual time=384.666..384.666 rows=303030 loops=33)
Buckets: 16777216 Batches: 1 Memory Usage: 522784kB
-> Parallel Seq Scan on tbl t2 (cost=0.00..47372.99 rows=312499 width=8) (actual time=0.024..47.326 rows=303030 loops=33)
Planning time: 0.251 ms
Execution time: 1939.745 ms
(16 rows)
小結
1000萬 與 1000萬 求交、差的效能指標:
方法 | 求交 | 求差 |
---|---|---|
原生intersect, except | 31.7秒 | 30秒 |
自定義切片+dblink非同步呼叫 | 1.7秒 | 1.7秒 |
PostgreSQL 11 並行hashjoin | 3.3秒 | 1.9秒 |
通過改寫SQL,PostgreSQL 11可以利用平行計算,更好的支撐求資料交、差的效能。(但是需要注意,NULL值在except, intersect中會視為相同,而join時取等的話,是匹配不到的。這個特別需要注意。(所以語義上不完全一樣))
postgres=# select 1,null except select 1,null;
?column? | ?column?
----------+----------
(0 rows)
postgres=# select 1,null intersect select 1,null;
?column? | ?column?
----------+----------
1 |
(1 row)
如果要讓語義完全一樣,可以用這種寫法,但是就用不到hashjoin了。
即: 等號改成 is not distinct from
select t1.* from tbl t1 join tbl t2 on ((t1.id is not distinct from t2.id) and (t1.c1 is not distinct from t2.c1));
而使用dblink非同步的方式,需要注意:
因為我們使用dblink的方法進行HASH分片是掃全表得到的,所以開的併發越多,掃描次數越多。最好是掃一次,並均分到N個臨時空間,然後再從臨時空間中掃,這樣就只需要掃一遍。當然會增加複雜度,如果表不大,實際上多掃幾次也無所謂。
參考
《驚天效能!單RDS PostgreSQL例項 支撐 2000億 – 實時標籤透視案例》
《PostgreSQL 11 preview – parallel hash join(並行雜湊JOIN) 效能極大提升》
https://www.postgresql.org/docs/10/static/dblink.html
《阿里雲RDS PostgreSQL OSS 外部表 – (dblink非同步呼叫封裝)並行寫提速案例》
《金融風控、公安刑偵、社會關係、人脈分析等需求分析與資料庫實現 – PostgreSQL圖資料庫場景應用》
相關文章
- C# 集合交、並、差、去重,物件集合交併差C#物件
- 並行和非並行在不通場景中的效能差異並行
- Oracle 中Union、Union All、Intersect、Minus(並,交,差)Oracle
- 程式執行緒、同步非同步、阻塞非阻塞、併發並行執行緒非同步並行
- 使用Speedment實現並行資料庫流並行資料庫
- ♻️同步和非同步;並行和併發;阻塞和非阻塞非同步並行
- 透過一個示例形象地理解C# async await 非並行非同步、並行非同步、並行非同步的併發量控制C#AI並行非同步
- JS實現並集,交集和差集JS
- 【OS】同步非同步/阻塞非阻塞、併發並行序列的區分非同步並行
- 【資料集合】並集、交集、差集、子集
- LLM並行訓練3-資料並行並行
- 程式與執行緒、同步與非同步、阻塞與非阻塞、併發與並行執行緒非同步並行
- RestSharp編寫api介面測試,並實現非同步呼叫(不卡頓)RESTAPI非同步
- 探索:優雅地實現非同步方法的並行化非同步並行
- 使用 PyTorch 完全分片資料並行技術加速大模型訓練PyTorch並行大模型
- 兩個超級大表HASH,開並行並且不廣播並行
- Map中放置類指標並實現呼叫指標
- ItermCF的MR並行實現並行
- 搭建rsync服務並同步重要資料
- 大資料與雲端計算:並非天作之合大資料
- 用canal監控binlog並實現mysql定製同步資料的功能MySql
- pygame播放影片並實現音影片同步GAM
- MySQL案例-並行複製亂序提交引起的同步異常MySql並行
- .net 呼叫API並解析Json資料方法APIJSON
- 亞信安慧AntDB資料並行載入工具的實現(二)並行
- asp.net Repeater拖拽實現排序並同步排序欄位到資料庫中ASP.NET排序資料庫
- python能實現並行嗎Python並行
- DBSync如何連線並同步MySQL資料庫MySql資料庫
- iOS執行緒、同步非同步、序列並行佇列iOS執行緒非同步並行佇列
- Python呼叫Prometheus監控資料並計算PythonPrometheus
- 非同步和並行的區別非同步並行
- [原始碼解析] 模型並行分散式訓練 Megatron (3) ---模型並行實現原始碼模型並行分散式
- 在react中使用redux並實現計數器案例ReactRedux
- redis之使用twemproxy實現資料分片Redis
- 連線資料庫並實現增、刪、改、查資料庫
- Java大型資料集合實現並行加速處理幾種方法 - DZoneJava並行
- 一行 Python 程式碼實現並行Python並行
- 非同步與並行~大話目錄非同步並行