[Hive]用concat_w實現將多行記錄合併成一行

OkidoGreen發表於2020-04-05

https://blog.csdn.net/yeweiouyang/article/details/41286469

https://blog.csdn.net/waiwai3/article/details/79071544

需求:對使用者的訂單進行分析,將不同訂單型別分別多少單展示出來,每個使用者一行

原資料:

user    order_type    order_number
user1    delivered    10
user2    returned    1
user1    returned    3
user2    delivered    20
目標:
user    order
user1    delivered(10),returned(3)
user2    delivered(20),returned(1)

1.使用concat()函式將order_type和order_number連線起來

concat(order_type,'(',order_number,')')

user    order
user1    delivered(10)
user2    returned(1)
user1    returned(3)
user2    delivered(20)
2.使用concat_ws()和collect_set()進行合併行
將上面列表中一個user可能會佔用多行轉換為每個user佔一行的目標表格式,實際是“列轉行”

select user,concat_ws(',',collect_set(concat(order_type,'(',order_number,')')))  order from table group by user

order是別名

collect_set的作用:

(1)去重,對group by後面的user進行去重

(2)對group by以後屬於同一user的形成一個集合,結合concat_ws對集合中元素使用,進行分隔形成字串
 

 

建表如下:

# 建立商品與促銷活動的對映表
hive -e "set mapred.job.queue.name=pms;
set hive.exec.reducers.max=32;
set mapred.reduce.tasks=32;
 
drop table if exists product_promotion;
create table product_promotion(product_id bigint, promotion_id String);
 
insert into table product_promotion 
select p2.product_id, p2.promotion_id 
from pms.promotionv2 p1 inner join pms.promotionv2_main_product_sku p2 
on (p1.id=p2.promotion_id)
where from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss') between p1.start_date and p1.end_date;"
資料表的記錄如下:

5112 960024
5112 960025
5112 960026
5112 960027
5112 960028
5113 960043
5113 960044
5113 960045
5113 960046
對promotion_id進行合併:

select product_id, concat_ws('_',collect_set(promotion_id)) as promotion_ids from product_promotion group by product_id
執行結果:

hive > select product_id, concat_ws('_',collect_set(promotion_id)) as promotion_ids from product_promotion group by product_id;
OK
5112 960024_960025_960026_960027_960028
5113 960043_960044_960045_960046
Time taken: 3.116 seconds

相關文章