hive使用者標籤體系的任務怎麼寫

huobumingbai1234發表於2020-12-03

背景:

      最近有接到業務上的這麼個需求,每天計算圈定使用者的一些標籤,同時把新增和移除標籤的資訊及當前使用者的資訊發給下游,舉例:

      這個使用者昨日新增了tag3和tag2這個標籤,同時當前codes也是這2個

任務設計:

首先,儲存每天的全量的計算結果,因為新增和減少都需要和之前的資料作比較才知道,同時全部的codes用全量的結果

然後,用昨天的資料和前天的資料full join下,這樣形成一個新增和減少的關係表

最後,按照指定格式取出來新增和減少的code及當前的codes值

 

帖下最後一步的處理邏輯:

insert overwrite table ba_user_push partition (dt)
select 	concat(
            '{"add":['
            ,if(add_code is null,'',add_code)
            ,'],"client_str":'
            ,stringaddquotation(t1.client_str)
            ,',"registration_id":'
            ,stringaddquotation(t1.registration_id)
            ,',"codes":'
            ,if(all_code is null,'""',all_code)
            ,',"remove":['
            ,if(remove_code is null,'',remove_code)
            ,']}'
        ) as message
from 	(
			select 	if(client_str1 is null,client_str2,client_str1) as client_str
					,if(registration_id1 is null,registration_id2,registration_id1) as registration_id
					,if(hupu_uid1 is null,hupu_uid2,hupu_uid1) as hupu_uid
			from 	tmp_push_step1
			where 	dt = ${bdp.system.bizdate}
            and     (registration_id1 is null        --有減少
            or      registration_id2 is null)        --有新增  只處理這2部分
			group by if(client_str1 is null,client_str2,client_str1)
					 ,if(registration_id1 is null,registration_id2,registration_id1)
		) t1 left
join	(
			select 	client_str1 as client_str
					,registration_id1 as registration_id
					,stringaddquotation(wm_concat(',',code1)) as all_code
			from 	bigdata2c.tmp_shihuo_push_step1
			where 	dt = ${bdp.system.bizdate}
			and 	client_str1 is not null
			group by client_str1
					 ,registration_id1
		) t2
on 		t1.client_str = t2.client_str
and 	t1.registration_id = t2.registration_id left
join 	(
			select 	client_str1 as client_str
					,registration_id1 as registration_id
					,wm_concat(',',stringaddquotation(code1)) as add_code
			from 	bigdata2c.tmp_shihuo_push_step1
			where 	dt = ${bdp.system.bizdate}
			and 	registration_id2 is null
			group by client_str1
					 ,registration_id1
		) t3
on 		t1.client_str = t3.client_str
and 	t1.registration_id = t3.registration_id left
join 	(
			select 	client_str2 as client_str
					,registration_id2 as registration_id
					,wm_concat(',',stringaddquotation(code2)) as remove_code
			from 	bigdata2c.tmp_shihuo_push_step1
			where 	dt = ${bdp.system.bizdate}
			and 	registration_id1 is null
			group by client_str2
					 ,registration_id2
		) t4
on 		t1.client_str = t4.client_str
and 	t1.registration_id = t4.registration_id
;

 

相關文章