PostgreSQLOracle相容性之-PartitionByOuterJoin實現稠化報表
標籤
PostgreSQL , Oracle , 稠化報表 , partition by outer join
背景
背景介紹:借用Oracle的一篇例子:
http://blog.sina.com.cn/s/blog_4cef5c7b01016lm5.html
在資料庫表中,儲存的資料經常是稀疏資料(sparse data),而不是稠密資料(dense data)。
先來了解一下什麼是稀疏資料,比如一個產品銷售情況表(比如有產品名、銷售時間(精確到年月)、銷售量3個列),假設某個時間某些產品它沒有銷售,一般也不會將這些產品的銷售量儲存為0,而是不儲存,這樣在產品銷售情況表中就會產生很多缺失的行(gap rows),導致的結果就是特定產品銷售資料按時間維度進行排序,是不連續的,或者說此產品銷售在時間序列上是有缺失的。顧名思義,稠密資料是相對於稀疏資料來說的,還是用上面的假設說明,也就是說產品在某個時間沒有銷售,也必須儲存此產品銷售情況,銷售量置0儲存,這樣對某個特定產品來說它在時間序列就是連續的,但是事實經常不是如此,所以才有將稀疏資料稠密化的過程,資料稠密化在資料倉儲應用中很常見。
例如以下報表,由於某些商品在某些月份沒有銷售,所以不會有記錄。
postgres=# create table t_sales(year text, month text, prod name, sales int);
CREATE TABLE
postgres=# insert into t_sales values (`2008`, `1`, `a`, 1000);
INSERT 0 1
postgres=# insert into t_sales values (`2008`, `1`, `b`, 1500);
INSERT 0 1
postgres=# insert into t_sales values (`2008`, `2`, `a`, 2000);
INSERT 0 1
postgres=# insert into t_sales values (`2008`, `2`, `b`, 3000);
INSERT 0 1
postgres=# insert into t_sales values (`2008`, `2`, `c`, 1000);
INSERT 0 1
postgres=# insert into t_sales values (`2008`, `3`, `a`, 3000);
INSERT 0 1
postgres=# select * from t_sales;
year | month | prod | sales
------+-------+------+-------
2008 | 1 | a | 1000
2008 | 1 | b | 1500
2008 | 2 | a | 2000
2008 | 2 | b | 3000
2008 | 2 | c | 1000
2008 | 3 | a | 3000
(6 rows)
Oracle 通過partition by outer join語法可以方便的實現稠化。
SELECT .....
FROM table_reference
PARTITION BY (expr [, expr ]... )
RIGHT OUTER JOIN table_reference
SELECT .....
FROM table_reference
LEFT OUTER JOIN table_reference
達到類似這樣的效果
postgres=# select * from continue_out();
year | month | prod | sales
------+-------+------+-------
2008 | 2 | a | 2000
2008 | 2 | b | 3000
2008 | 2 | c | 1000
2008 | 3 | a | 3000
2008 | 3 | b | 0
2008 | 3 | c | 0
2008 | 1 | a | 1000
2008 | 1 | b | 1500
2008 | 1 | c | 0
(9 rows)
PostgreSQL Partition By Outer Join實現
通過UDF,可以達到類似的效果:
create or replace function continue_out () returns setof t_sales as $$
declare
v_year text;
v_month text;
begin
-- 按年、月為分割槽 補齊prod (select distinct year, month from t_sales)
-- 如果已有元表,則直接取元表
-- 如無元表,對於連續型資料,亦可以使用PG的generate_series直接生成。
for v_year, v_month in
select distinct year, month from t_sales
loop
-- 全量 prod (select distinct prod from t_sales)
-- 如已有元表,則直接取元表
-- 如無元表,對於連續型資料,亦可以使用PG的generate_series直接生成。
return query
select
coalesce(t1.year,v_year),
coalesce(t1.month,v_month),
t2.prod,
coalesce(t1.sales, 0)
from
t_sales t1
right outer join
(select distinct prod from t_sales) t2
on (t1.prod=t2.prod and t1.year=v_year and t1.month=v_month);
end loop;
end;
$$ language plpgsql strict;
如下
postgres=# select * from continue_out();
year | month | prod | sales
------+-------+------+-------
2008 | 2 | a | 2000
2008 | 2 | b | 3000
2008 | 2 | c | 1000
2008 | 3 | a | 3000
2008 | 3 | b | 0
2008 | 3 | c | 0
2008 | 1 | a | 1000
2008 | 1 | b | 1500
2008 | 1 | c | 0
(9 rows)
通過視窗,可以實現同比,從而實現快速的搜尋到銷量逆增長的商品。
postgres=# select *, sales - coalesce(lag(sales) over w1, sales) from continue_out() window w1 as (partition by prod order by year,month);
year | month | prod | sales | ?column?
------+-------+------+-------+----------
2008 | 1 | a | 1000 | 0
2008 | 2 | a | 2000 | 1000
2008 | 3 | a | 3000 | 1000
2008 | 1 | b | 1500 | 0
2008 | 2 | b | 3000 | 1500
2008 | 3 | b | 0 | -3000
2008 | 1 | c | 0 | 0
2008 | 2 | c | 1000 | 1000
2008 | 3 | c | 0 | -1000
(9 rows)
詳細執行計劃如下:
postgres=# load `auto_explain`;
LOAD
Time: 22.171 ms
postgres=# set auto_explain.
auto_explain.log_analyze auto_explain.log_format auto_explain.log_nested_statements auto_explain.log_triggers auto_explain.sample_rate
auto_explain.log_buffers auto_explain.log_min_duration auto_explain.log_timing auto_explain.log_verbose
postgres=# set auto_explain.log_analyze =on;
SET
Time: 0.272 ms
postgres=# set auto_explain.log_buffers =on;
SET
Time: 0.257 ms
postgres=# set auto_explain.log_min_duration =0;
SET
Time: 0.279 ms
postgres=# set auto_explain.log_nested_statements =on;
SET
Time: 0.262 ms
postgres=# set auto_explain.log_timing =on;
SET
Time: 0.269 ms
postgres=# set auto_explain.log_verbose =on;
SET
Time: 0.245 ms
postgres=# set client_min_messages =`log`;
SET
Time: 0.265 ms
postgres=# select *, sales - coalesce(lag(sales) over w1, sales) from continue_out() window w1 as (partition by prod order by year,month);
LOG: duration: 0.030 ms plan:
Query Text: select coalesce(t1.year,v_year), coalesce(t1.month,v_month), t2.prod, coalesce(t1.sales, 0) from t_sales t1
right outer join
(select distinct prod from t_sales) t2
on (t1.prod=t2.prod and t1.year=v_year and t1.month=v_month)
Hash Left Join (cost=34.04..38.80 rows=200 width=132) (actual time=0.024..0.027 rows=3 loops=1)
Output: COALESCE(t1.year, $2), COALESCE(t1.month, $3), t_sales.prod, COALESCE(t1.sales, 0)
Hash Cond: (t_sales.prod = t1.prod)
Buffers: shared hit=2
-> HashAggregate (cost=16.38..18.38 rows=200 width=64) (actual time=0.009..0.009 rows=3 loops=1)
Output: t_sales.prod
Group Key: t_sales.prod
Buffers: shared hit=1
-> Seq Scan on public.t_sales (cost=0.00..15.10 rows=510 width=64) (actual time=0.002..0.003 rows=6 loops=1)
Output: t_sales.year, t_sales.month, t_sales.prod, t_sales.sales
Buffers: shared hit=1
-> Hash (cost=17.65..17.65 rows=1 width=132) (actual time=0.007..0.007 rows=3 loops=1)
Output: t1.year, t1.month, t1.sales, t1.prod
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=1
-> Seq Scan on public.t_sales t1 (cost=0.00..17.65 rows=1 width=132) (actual time=0.003..0.004 rows=3 loops=1)
Output: t1.year, t1.month, t1.sales, t1.prod
Filter: ((t1.year = $2) AND (t1.month = $3))
Rows Removed by Filter: 3
Buffers: shared hit=1
LOG: duration: 0.020 ms plan:
Query Text: select coalesce(t1.year,v_year), coalesce(t1.month,v_month), t2.prod, coalesce(t1.sales, 0) from t_sales t1
right outer join
(select distinct prod from t_sales) t2
on (t1.prod=t2.prod and t1.year=v_year and t1.month=v_month)
Hash Left Join (cost=34.04..38.80 rows=200 width=132) (actual time=0.016..0.018 rows=3 loops=1)
Output: COALESCE(t1.year, $2), COALESCE(t1.month, $3), t_sales.prod, COALESCE(t1.sales, 0)
Hash Cond: (t_sales.prod = t1.prod)
Buffers: shared hit=2
-> HashAggregate (cost=16.38..18.38 rows=200 width=64) (actual time=0.008..0.009 rows=3 loops=1)
Output: t_sales.prod
Group Key: t_sales.prod
Buffers: shared hit=1
-> Seq Scan on public.t_sales (cost=0.00..15.10 rows=510 width=64) (actual time=0.003..0.004 rows=6 loops=1)
Output: t_sales.year, t_sales.month, t_sales.prod, t_sales.sales
Buffers: shared hit=1
-> Hash (cost=17.65..17.65 rows=1 width=132) (actual time=0.004..0.004 rows=1 loops=1)
Output: t1.year, t1.month, t1.sales, t1.prod
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=1
-> Seq Scan on public.t_sales t1 (cost=0.00..17.65 rows=1 width=132) (actual time=0.002..0.003 rows=1 loops=1)
Output: t1.year, t1.month, t1.sales, t1.prod
Filter: ((t1.year = $2) AND (t1.month = $3))
Rows Removed by Filter: 5
Buffers: shared hit=1
LOG: duration: 0.019 ms plan:
Query Text: select coalesce(t1.year,v_year), coalesce(t1.month,v_month), t2.prod, coalesce(t1.sales, 0) from t_sales t1
right outer join
(select distinct prod from t_sales) t2
on (t1.prod=t2.prod and t1.year=v_year and t1.month=v_month)
Hash Left Join (cost=34.04..38.80 rows=200 width=132) (actual time=0.014..0.017 rows=3 loops=1)
Output: COALESCE(t1.year, $2), COALESCE(t1.month, $3), t_sales.prod, COALESCE(t1.sales, 0)
Hash Cond: (t_sales.prod = t1.prod)
Buffers: shared hit=2
-> HashAggregate (cost=16.38..18.38 rows=200 width=64) (actual time=0.007..0.008 rows=3 loops=1)
Output: t_sales.prod
Group Key: t_sales.prod
Buffers: shared hit=1
-> Seq Scan on public.t_sales (cost=0.00..15.10 rows=510 width=64) (actual time=0.002..0.003 rows=6 loops=1)
Output: t_sales.year, t_sales.month, t_sales.prod, t_sales.sales
Buffers: shared hit=1
-> Hash (cost=17.65..17.65 rows=1 width=132) (actual time=0.004..0.004 rows=2 loops=1)
Output: t1.year, t1.month, t1.sales, t1.prod
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=1
-> Seq Scan on public.t_sales t1 (cost=0.00..17.65 rows=1 width=132) (actual time=0.001..0.003 rows=2 loops=1)
Output: t1.year, t1.month, t1.sales, t1.prod
Filter: ((t1.year = $2) AND (t1.month = $3))
Rows Removed by Filter: 4
Buffers: shared hit=1
LOG: duration: 0.045 ms plan:
Query Text: select distinct year, month from t_sales
HashAggregate (cost=17.65..19.65 rows=200 width=64) (actual time=0.040..0.041 rows=3 loops=1)
Output: year, month
Group Key: t_sales.year, t_sales.month
Buffers: shared hit=1
-> Seq Scan on public.t_sales (cost=0.00..15.10 rows=510 width=64) (actual time=0.028..0.029 rows=6 loops=1)
Output: year, month, prod, sales
Buffers: shared hit=1
LOG: duration: 0.832 ms plan:
Query Text: select *, sales - coalesce(lag(sales) over w1, sales) from continue_out() window w1 as (partition by prod order by year,month);
WindowAgg (cost=60.08..85.08 rows=1000 width=136) (actual time=0.808..0.818 rows=9 loops=1)
Output: year, month, prod, sales, (sales - COALESCE(lag(sales) OVER (?), sales))
Buffers: shared hit=9
-> Sort (cost=60.08..62.58 rows=1000 width=132) (actual time=0.803..0.804 rows=9 loops=1)
Output: year, month, prod, sales
Sort Key: continue_out.prod, continue_out.year, continue_out.month
Sort Method: quicksort Memory: 26kB
Buffers: shared hit=9
-> Function Scan on public.continue_out (cost=0.25..10.25 rows=1000 width=132) (actual time=0.782..0.783 rows=9 loops=1)
Output: year, month, prod, sales
Function Call: continue_out()
Buffers: shared hit=9
year | month | prod | sales | ?column?
------+-------+------+-------+----------
2008 | 1 | a | 1000 | 0
2008 | 2 | a | 2000 | 1000
2008 | 3 | a | 3000 | 1000
2008 | 1 | b | 1500 | 0
2008 | 2 | b | 3000 | 1500
2008 | 3 | b | 0 | -3000
2008 | 1 | c | 0 | 0
2008 | 2 | c | 1000 | 1000
2008 | 3 | c | 0 | -1000
(9 rows)
Time: 1.362 ms
PostgreSQL的UDF非常強大,可以實現很多功能。
當然也期待PostgreSQL直接將partition by語法引入,那就更加方便了。
參考
https://www.postgresql.org/docs/10/static/functions-window.html
https://www.postgresql.org/docs/10/static/plpgsql.html
http://blog.sina.com.cn/s/blog_4cef5c7b01016lm5.html
相關文章
- 使用Partitioned Outer Join實現稠化報表
- PostgreSQLOracle相容性之NUMTODSINTERVALSQLOracle
- PostgreSQLOracle相容性之-全域性臨時表globaltemptableSQLOracle
- PostgreSQLOracle相容性之-roundintervalSQLOracle
- PostgreSQLOracle相容性-Analysis函式之keepSQLOracle函式
- PostgreSQLOracle相容性-synonym匿名SQLOracle
- PostgreSQLOracle相容性-connectby2SQLOracle
- PostgreSQLOracle相容性-substrb-orafcesubstrb適配OraclesubstrbSQLOracle
- PostgreSQLOracle相容性之-系統列(ctid,oid,cmin,cmax,xmin,xmax)SQLOracle
- PostgreSQLOracle相容性之-PL/SQLDETERMINISTIC與PG函式穩定性(immutable,stable,volatile)SQLOracle函式
- Python 實現自動化 Excel 報表PythonExcel
- 分欄報表-物品清單報表實現
- 輕鬆實現報表整合
- 報表展現時如何實現固定表頭效果
- 如何實現報表視覺化,有沒有工具推薦視覺化
- PostgreSQLoracle相容性-字串內嵌NULL字元(空字元)chr(0)轉換為chr(32)SQLOracle字串Null字元
- PostgreSQLOracle相容性之-connectby高階選項CONNECT_BY_ISLEAF、SYS_CONNECT_BY_PATH、CONNECT_BY_ISCYCLE、LEVELSQLOracle
- 如何實現報表直接列印需求
- Ruby Ruport實踐—報表引數實現
- 如何透過前端表格控制元件實現自動化報表?前端控制元件
- 如何實現報表的批次列印需求
- 報表如何實現行列互換效果?
- 報表合計需求的實現方法
- python實現報表用什麼?Python
- 如何實現報表的點選表頭排序需求排序
- Ruby Ruport實踐—報表引數實現(二)
- 如何橋接優化Java方法返回型別實現相容性? - Gunnar橋接優化Java型別
- Ruby Ruport實踐—中文PDF報表之PRAWN
- Ruby Ruport實踐—中文PDF報表之FPDF
- 填報表實現隔行異色的效果
- 怎樣實現填報表定時提交
- 實現報表資料分庫儲存
- 實現報表資料預先計算
- JS原生實現表單序列化JS
- 自動化報警的實現思路
- 複雜報表設計之動態報表
- 怎樣實現動態列報表,也就是列數不固定的報表?
- 不用第三方之C#實現大容量報表系統(轉)C#