1、建表並關聯資料：
進入hive命令列：
hive
執行：
create external table wordcounts(line string) row format delimited fields terminated by '\n' stored as textfile location '/input/wordcount';

2、建立ruozedata.txt檔案，並上傳hdfs中的/input/wordcount目錄下。
vi ruozedata.txt
hello,ruoze
hello,jepsondb
hello,
hi,man
hi,gril

上傳命令：hdfs dfs -put ruozedata.txt /input/wordcount

3、進入HIVE，查詢是否已經將資料關聯到表中
select * from wordcounts;

4、進入HIVE，進行拆分，把每行資料拆分成單詞，這裡需要用到一個hive的內建表生成函式（UDTF）：explode(array)，引數是array，其實就是行變多列：
split是拆分函式，與java的split功能一樣，這裡是按照逗號拆分，再對子查詢中的結果進行group by word，執行完hql語句如下：
select word, count(*) from (select explode(split(line, ",")) as word from wordcounts) t group by word;

總結：hive比較簡便，對於比較複雜的統計操作可以建一些中間表。

若澤大資料交流群：671914634

HIVE實現wordcount過程

相關文章