Hive-lateral view explode

weixin_34007291發表於2017-04-14

Hive lateral view explode

介紹

當我們想對hive表中某一列進行split之後,想對其轉換成1 to N的模式,即一行轉多列。

hive不允許我們在UDTF函式之外,再新增其它select語句。

如下,我們想將登入某個遊戲的使用者id放在一個欄位user_ids裡,對每一行資料用UDTF後輸出多行。

lateral view explain

Lateral view is used in conjunction with user-defined table generatingfunctions such as explode(). As mentioned in Built-in Table-Generating Functions, a UDTF generates zero or more output rows foreach input row. A lateral view first applies the UDTF to each row of base tableand then joins resulting output rows to the input rows to form a virtual tablehaving the supplied table alias.

Lateral view 其實就是用來和像類似explode這種UDTF函式聯用的。lateral view 會將UDTF生成的結果放到一個虛擬表中,然後這個虛擬表會和輸入行即每個game_id進行join 來達到連線UDTF外的select欄位的目的。

舉個例子:

建立一個檔案。裡面兩列用\t分割。

hive> create table test_lateral_view_shengli(game_id string,userl_ids string) row format delimited fields terminated by '\t' stored as textfile;  
OK  
Time taken: 2.451 seconds  
hive> load data local inpath '/home/hadoop/test_lateral' into table test_lateral_view_shengli;  
Copying data from file:/home/hadoop/test_lateral  
Copying file: file:/home/hadoop/test_lateral  
Loading data to table dw.test_lateral_view_shengli  
OK  
Time taken: 6.716 seconds  
hive> select * from test_lateral_view_shengli;                                                                                                             
OK  
game101       15358083654[[[ab33787873[[[zjy18052480603[[[shlg1881826[[[lxqab110  
game66       winning1ren[[[13810537508  
game101       hu330602003[[[hu330602004[[[hu330602005[[[15967506560 

下面使用lateral view

hive> select game_id, user_id    
    > from test_lateral_view_shengli lateral view explode(split(userl_ids,'\\[\\[\\[')) snTable as user_id   
    > ;  
Total MapReduce jobs = 1  
Launching Job 1 out of 1  
Number of reduce tasks is set to 0 since there's no reduce operator  
Starting Job = job_201403301416_445839, Tracking URL = http://10.1.9.10:50030/jobdetails.jsp?jobid=job_201403301416_445839  
Kill Command = /app/home/hadoop/src/hadoop-0.20.2-cdh3u5/bin/../bin/hadoop job  -Dmapred.job.tracker=10.1.9.10:9001 -kill job_201403301416_445839  
2014-05-16 17:39:19,108 Stage-1 map = 0%,  reduce = 0%  
2014-05-16 17:39:28,157 Stage-1 map = 100%,  reduce = 0%  
2014-05-16 17:39:38,830 Stage-1 map = 100%,  reduce = 100%  
Ended Job = job_201403301416_445839  
OK  
game101       hu330602003  
game101       hu330602004  
game101       hu330602005  
game101       15967506560  
game101       15358083654  
game101       ab33787873  
game101       zjy18052480603  
game101       shlg1881826  
game101       lxqab110  
game66       winning1ren  
game66       13810537508 

多個lateral view

From語句後可以跟多個Lateral View。

給定資料

Array<int> col1 Array<string> col2
[1, 2] [a", "b", "c"]
[3, 4] [d", "e", "f"]

語句

SELECT myCol1, myCol2 FROM baseTable  
LATERAL VIEW explode(col1) myTable1 AS myCol1  
LATERAL VIEW explode(col2) myTable2 AS myCol2; 

還有一種情況,如果UDTF轉換的Array是空的怎麼辦呢?

在Hive0.12裡面會支援outer關鍵字,如果UDTF的結果是空,預設會被忽略輸出。

如果加上outer關鍵字,則會像left outer join 一樣,還是會輸出select出的列,而UDTF的輸出結果是NULL。

hive> select * FROM test_lateral_view_shengli LATERAL VIEW explode(array()) C AS a ;  

什麼也不輸出

如果加上outer關鍵字

SELECT * FROM src LATERAL VIEW OUTER explode(array()) C AS a limit 10;  

結果

238 val_238 NULL  
86 val_86 NULL  
311 val_311 NULL  
27 val_27 NULL  
165 val_165 NULL  
409 val_409 NULL  
255 val_255 NULL  
278 val_278 NULL  
98 val_98 NULL

轉載:http://blog.csdn.net/oopsoom/article/details/26001307

相關文章