數倉sql場景:迭代求結果問題

技术即艺术發表於2024-08-02

1.需求

2.sql實現

這道題先需要去分析結果集,本質上是一個迭代累加的過程,先要得到如下結果

如果在面試數倉中實現了以上結果,基本上面試官會很透過,也在短時間內可以實現,實現sql如下
with tb as (
select 1 as s,'a' as pv
union all
select 2 as s,'b' as pv
union all
select 3 as s,'c' as pv
union all
select 4 as s,'d' as pv
union all
select 5 as s,'e' as pv
union all
select 6 as s,'f' as pv
)

select s,pv,concat_ws('+',collect_list(re1) OVER (ORDER BY s ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)) from 
(select *,concat_ws('+',collect_list(pv) OVER (ORDER BY s ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)) as re1 from tb) tb1;
首先開窗將pv分組連線起來作為一個結果集,然後再將結果集分組連線起來。核心知識點:collect_list,concat_ws,開窗函式及開窗中的ROWS BETWEEN。其中`ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`為開始行到當前行

另一種實現方式,透過不斷解析實現,分組實現,這種實現主要考察對sql的深度理解和組裝,實現起來還是比較複雜的,sql如下
with tb as (
select '1' as s,'a' as pv
union all
select '2' as s,'b' as pv
union all
select '3' as s,'c' as pv
union all
select '4' as s,'d' as pv
union all
select '5' as s,'e' as pv
union all
select '6' as s,'f' as pv
),
tb1 as 
(select *,
concat_ws(',',collect_list(pv) OVER (ORDER BY s ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)) as pv_n,
reverse(concat_ws(',',collect_list(s) OVER (ORDER BY s ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW))) as s_n 
from tb),
tb2 as 
(select s,pv,pv_n,s_n from tb1 
lateral view explode(SPLIT(pv_n,',')) tb1 as pv_n),
tb3 as 
(select *,row_number() over(partition by s order by pv_n) as rn from tb2)
select s,pv,concat_ws('+',collect_list(concat(pv_n,'*',split(s_n,',')[rn-1]))) as res from tb3 group by s,pv order by s;
該種實現方式包含的知識點比較多,包括collect_list,concat_ws,lateral view explode,開窗函式。其中將開窗出來的值作為切割出來的陣列下標來使用這種思路在實際中不容易想到。但這個sql本質上有隱患,其中reverse函式如果s欄位超過個位數後存在問題,當做練sql吧 實際中要多想想

上圖中,要取pv_n*s_n[rn-1],然後再將結果連線起來

然後再分組連線起來

相關文章