Hive內部函式簡介及查詢語法

weixin_34054866發表於2018-07-02

原文網址 : https://blog.csdn.net/weixin_34054866/article/details/86838739

1.Hive內建函式：

在Hive中系統給我們內建了很多函式具體參考官方地址

看下官網給我們的介紹:

SHOW FUNCTIONS; --檢視所有內建函式
DESCRIBE FUNCTION <function_name>; --檢視某個函式的描述
DESCRIBE FUNCTION EXTENDED <function_name>; --檢視某個函式的具體使用方法

hive> DESCRIBE FUNCTION case;
OK
CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END - When a = b, returns c; when a = d, return e; else return f
Time taken: 0.006 seconds, Fetched: 1 row(s)
hive> DESCRIBE FUNCTION EXTENDED case;
OK
CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END - When a = b, returns c; when a = d, return e; else return f
Example:
 SELECT
 CASE deptno
   WHEN 1 THEN Engineering
   WHEN 2 THEN Finance
   ELSE admin
 END,
 CASE zone
   WHEN 7 THEN Americas
   ELSE Asia-Pac
 END
 FROM emp_details
Time taken: 0.008 seconds, Fetched: 13 row(s)
# DESCRIBE 可簡寫為desc
hive> desc FUNCTION EXTENDED case;
OK
CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END - When a = b, returns c; when a = d, return e; else return f
Example:
 SELECT
 CASE deptno
   WHEN 1 THEN Engineering
   WHEN 2 THEN Finance
   ELSE admin
 END,
 CASE zone
   WHEN 7 THEN Americas
   ELSE Asia-Pac
 END
 FROM emp_details
Time taken: 0.009 seconds, Fetched: 13 row(s)

下面我們瞭解下常用函式的使用方法：

# 為了方便測試 我們建立常用的dual表
hive> create table dual(x string);
OK
Time taken: 0.11 seconds
hive> insert into table dual values('');
Query ID = hadoop_20180702100505_f0566585-06b2-4c53-910a-b6a58791fc2d
Total jobs = 3
Launching Job 1 out of 3
...
OK
Time taken: 29.535 seconds
hive> select * from dual;
OK

Time taken: 0.147 seconds, Fetched: 1 row(s)
# 測試當前時間
hive> select current_date from dual;
OK
2018-07-02
Time taken: 0.111 seconds, Fetched: 1 row(s)
# 測試當前時間戳
hive> select current_timestamp from dual;
OK
2018-07-02 15:03:28.919
Time taken: 0.117 seconds, Fetched: 1 row(s)
# 測試substr函式 用於擷取字串
hive> desc function extended substr;
OK
substr(str, pos[, len]) - returns the substring of str that starts at pos and is of length len orsubstr(bin, pos[, len]) - returns the slice of byte array that starts at pos and is of length len
Synonyms: substring
pos is a 1-based index. If pos<0 the starting position is determined by counting backwards from the end of str.
Example:
   > SELECT substr('Facebook', 5) FROM src LIMIT 1;
  'book'
  > SELECT substr('Facebook', -5) FROM src LIMIT 1;
  'ebook'
  > SELECT substr('Facebook', 5, 1) FROM src LIMIT 1;
  'b'
Time taken: 0.016 seconds, Fetched: 10 row(s)
hive> SELECT substr('helloworld',-5) FROM dual;
OK
world
Time taken: 0.171 seconds, Fetched: 1 row(s)
hive> SELECT substr('helloworld',5) FROM dual;
OK
oworld
Time taken: 0.12 seconds, Fetched: 1 row(s)
hive> SELECT substr('helloworld',5,3) FROM dual;
OK
owo
Time taken: 0.142 seconds, Fetched: 1 row(s)
# 測試函式concat 用於將字元連線起來
hive> desc function extended concat_ws;
OK
concat_ws(separator, [string | array(string)]+) - returns the concatenation of the strings separated by the separator.
Example:
  > SELECT concat_ws('.', 'www', array('facebook', 'com')) FROM src LIMIT 1;
  'www.facebook.com'
Time taken: 0.019 seconds, Fetched: 4 row(s)
hive> select concat_ws(".","192","168","199","151") from dual;
OK
192.168.199.151
Time taken: 0.152 seconds, Fetched: 1 row(s)
# 測試函式split 用於拆分
hive> desc function extended split;
OK
split(str, regex) - Splits str around occurances that match regex
Example:
  > SELECT split('oneAtwoBthreeC', '[ABC]') FROM src LIMIT 1;
  ["one", "two", "three"]
Time taken: 0.021 seconds, Fetched: 4 row(s)
hive> select split("192.168.199.151","\\.") from dual;
OK
["192","168","199","151"]
Time taken: 0.169 seconds, Fetched: 1 row(s)

2.Hive查詢語法：

簡單select語法：

# 簡單select語法
hive> select * from emp where deptno=10;
OK
7782    CLARK   MANAGER 7839    1981-6-9        2450.0  NULL    10
7839    KING    PRESIDENT       NULL    1981-11-17      5000.0  NULL    10
7934    MILLER  CLERK   7782    1982-1-23       1300.0  NULL    10
Time taken: 0.899 seconds, Fetched: 3 row(s)
hive> select * from emp where empno <= 7800;
OK
7369    SMITH   CLERK   7902    1980-12-17      800.0   NULL    20
7499    ALLEN   SALESMAN        7698    1981-2-20       1600.0  300.0   30
7521    WARD    SALESMAN        7698    1981-2-22       1250.0  500.0   30
7566    JONES   MANAGER 7839    1981-4-2        2975.0  NULL    20
7654    MARTIN  SALESMAN        7698    1981-9-28       1250.0  1400.0  30
7698    BLAKE   MANAGER 7839    1981-5-1        2850.0  NULL    30
7782    CLARK   MANAGER 7839    1981-6-9        2450.0  NULL    10
7788    SCOTT   ANALYST 7566    1987-4-19       3000.0  NULL    20
Time taken: 0.277 seconds, Fetched: 8 row(s)
hive> select * from emp where salary between 1000 and 1500;
OK
7521    WARD    SALESMAN        7698    1981-2-22       1250.0  500.0   30
7654    MARTIN  SALESMAN        7698    1981-9-28       1250.0  1400.0  30
7844    TURNER  SALESMAN        7698    1981-9-8        1500.0  0.0     30
7876    ADAMS   CLERK   7788    1987-5-23       1100.0  NULL    20
7934    MILLER  CLERK   7782    1982-1-23       1300.0  NULL    10
Time taken: 0.187 seconds, Fetched: 5 row(s)
hive> select * from emp limit 5;
OK
7369    SMITH   CLERK   7902    1980-12-17      800.0   NULL    20
7499    ALLEN   SALESMAN        7698    1981-2-20       1600.0  300.0   30
7521    WARD    SALESMAN        7698    1981-2-22       1250.0  500.0   30
7566    JONES   MANAGER 7839    1981-4-2        2975.0  NULL    20
7654    MARTIN  SALESMAN        7698    1981-9-28       1250.0  1400.0  30
Time taken: 0.154 seconds, Fetched: 5 row(s)
hive> select * from emp where empno in(7566,7499);
OK
7499    ALLEN   SALESMAN        7698    1981-2-20       1600.0  300.0   30
7566    JONES   MANAGER 7839    1981-4-2        2975.0  NULL    20
Time taken: 0.153 seconds, Fetched: 2 row(s)
hive> select * from emp where comm is not null;
OK
7499    ALLEN   SALESMAN        7698    1981-2-20       1600.0  300.0   30
7521    WARD    SALESMAN        7698    1981-2-22       1250.0  500.0   30
7654    MARTIN  SALESMAN        7698    1981-9-28       1250.0  1400.0  30
7844    TURNER  SALESMAN        7698    1981-9-8        1500.0  0.0     30
Time taken: 0.291 seconds, Fetched: 4 row(s)

聚合函式及分組函式：

# 聚合函式及分組函式
#  max/min/count/sum/avg 特點：多進一出，進來很多條記錄出去只有一條記錄

# 查詢部門編號為10的有多少條記錄
hive> select count(1) from emp where deptno=10;
Query ID = hadoop_20180703142525_af460dc7-287b-41b2-8af3-ba27cc0ea6ce
Total jobs = 1
...
OK
3
Time taken: 38.951 seconds, Fetched: 1 row(s)
# 求最大工資，最小工資，平均工資，工資的和
hive> select max(salary),min(salary),avg(salary),sum(salary) from emp;
Query ID = hadoop_20180703142525_af460dc7-287b-41b2-8af3-ba27cc0ea6ce
Total jobs = 1
...
OK
5000.0  800.0   2073.214285714286       29025.0
Time taken: 23.748 seconds, Fetched: 1 row(s)

# 分組函式 group by
# 求部門的平均工資
# 注：select中出現的欄位，如果沒有出現在組函式/聚合函式中，必須出現在group by裡面，否則就會產生報錯
hive> select deptno,avg(salary) from emp group by deptno;
Query ID = hadoop_20180703142525_af460dc7-287b-41b2-8af3-ba27cc0ea6ce
Total jobs = 1
...
OK
10      2916.6666666666665
20      2175.0
30      1566.6666666666667
Time taken: 36.502 seconds, Fetched: 3 row(s)
# 求每個部門(deptno)、工作崗位(job)的最高工資(salary)
hive> select deptno,job,max(salary) from emp group by deptno,job;
Query ID = hadoop_20180703142525_af460dc7-287b-41b2-8af3-ba27cc0ea6ce
Total jobs = 1
...
OK
10      CLERK   1300.0
10      MANAGER 2450.0
10      PRESIDENT       5000.0
20      ANALYST 3000.0
20      CLERK   1100.0
20      MANAGER 2975.0
30      CLERK   950.0
30      MANAGER 2850.0
30      SALESMAN        1600.0
Time taken: 36.096 seconds, Fetched: 9 row(s)
# 查詢平均工資大於2000的部門（使用having子句限定分組查詢）
hive> select deptno,avg(salary) from emp group by deptno having avg(salary) >2000;
Query ID = hadoop_20180703142525_af460dc7-287b-41b2-8af3-ba27cc0ea6ce
Total jobs = 1
...
OK
10      2916.6666666666665
20      2175.0
Time taken: 24.71 seconds, Fetched: 2 row(s)

# case when then end（不會跑mr）
hive> select ename, salary, 
    > case
    > when salary > 1 and salary <= 1000 then 'LOWER'
    > when salary > 1000 and salary <= 2000 then 'MIDDLE'
    > when salary > 2000 and salary <= 4000 then 'HIGH'
    > ELSE 'HIGHEST'
    > end
    > from emp;
OK
SMITH   800.0   LOWER
ALLEN   1600.0  MIDDLE
WARD    1250.0  MIDDLE
JONES   2975.0  HIGH
MARTIN  1250.0  MIDDLE
BLAKE   2850.0  HIGH
CLARK   2450.0  HIGH
SCOTT   3000.0  HIGH
KING    5000.0  HIGHEST
TURNER  1500.0  MIDDLE
ADAMS   1100.0  MIDDLE
JAMES   950.0   LOWER
FORD    3000.0  HIGH
MILLER  1300.0  MIDDLE
Time taken: 0.096 seconds, Fetched: 14 row(s)

多表join查詢：

# 建立測試表
hive> create table a(
    > id int, name string
    > ) row format delimited fields terminated by '\t';
OK
Time taken: 0.311 seconds
hive> create table b(
    > id int, age int
    > ) row format delimited fields terminated by '\t';
OK
Time taken: 0.142 seconds
# insert或load資料 最後表資料如下
hive> select * from a;
OK
1       zhangsan
2       lisi
3       wangwu
hive> select * from b;
OK
1       20
2       30
4       40
Time taken: 0.2 seconds, Fetched: 3 row(s)

# 內連線 inner join = join 僅列出表1和表2符合連線條件的資料
hive> select a.id,a.name,b.age from a join b on a.id=b.id;
Query ID = hadoop_20180703142525_af460dc7-287b-41b2-8af3-ba27cc0ea6ce
Total jobs = 1
...
OK
1       zhangsan        20
2       lisi    30
Time taken: 24.415 seconds, Fetched: 2 row(s)
# 左外連線（left join） 以左邊的為基準，左邊的資料全部資料全部出現，如果沒有出現就賦null值
hive> select a.id,a.name,b.age from a left join b on a.id=b.id;
Query ID = hadoop_20180703142525_af460dc7-287b-41b2-8af3-ba27cc0ea6ce
Total jobs = 1
...
OK
1       zhangsan        20
2       lisi    30
3       wangwu  NULL
Time taken: 26.218 seconds, Fetched: 3 row(s)
# 右外連線（right join） 以右表為基準
hive> select a.id,a.name,b.age from a right join b on a.id=b.id;
Query ID = hadoop_20180703142525_af460dc7-287b-41b2-8af3-ba27cc0ea6ce
Total jobs = 1
...
OK
1       zhangsan        20
2       lisi    30
NULL    NULL    40
Time taken: 24.027 seconds, Fetched: 3 row(s)
# 全連線（full join）相當於表1和表2的資料都顯示，如果沒有對應的資料，則顯示Null.
hive> select a.id,a.name,b.age from a full join b on a.id=b.id;
Query ID = hadoop_20180703142525_af460dc7-287b-41b2-8af3-ba27cc0ea6ce
Total jobs = 1
...
OK
1       zhangsan        20
2       lisi    30
3       wangwu  NULL
NULL    NULL    40
Time taken: 32.94 seconds, Fetched: 4 row(s)
# 笛卡爾積（cross join） 沒有連線條件 會針對表1和表2的每條資料做連線
hive> select a.id,a.name,b.age from a cross join b;
Warning: Map Join MAPJOIN[7][bigTable=a] in task 'Stage-3:MAPRED' is a cross product
Query ID = hadoop_20180703142525_af460dc7-287b-41b2-8af3-ba27cc0ea6ce
Total jobs = 1
...
OK
1       zhangsan        20
1       zhangsan        30
1       zhangsan        40
2       lisi    20
2       lisi    30
2       lisi    40
3       wangwu  20
3       wangwu  30
3       wangwu  40
Time taken: 29.825 seconds, Fetched: 9 row(s)

3.利用Hive sql實現wordcount：

# 建立表 載入測試資料
hive> create table hive_wc(sentence string);
OK
Time taken: 0.149 seconds

[hadoop@hadoop000 ~]$ cat hive-wc.txt
hello,world,welcome
hello,welcome

hive> load data local inpath '/home/hadoop/hive-wc.txt' into table hive_wc;
Loading data to table default.hive_wc
Table default.hive_wc stats: [numFiles=1, totalSize=34]
OK
Time taken: 0.729 seconds
hive> select * from hive_wc;
OK
hello,world,welcome
hello,welcome
Time taken: 0.13 seconds, Fetched: 2 row(s)

# 獲取每個單詞 利用split分割
hive> select split(sentence,",") from hive_wc;
OK
["hello","world","welcome"]
["hello","welcome"]
Time taken: 0.163 seconds, Fetched: 2 row(s)
# explode把陣列轉成多行 結合split使用如下
hive> select explode(split(sentence,",")) from hive_wc;
OK
hello
world
welcome
hello
welcome
Time taken: 0.068 seconds, Fetched: 5 row(s)
# 做group by操作 一條語句即可實現wordcount統計
hive> select word, count(1) as c 
    > from (select explode(split(sentence,",")) as word from hive_wc) t
    > group by word ;
Query ID = hadoop_20180703142525_af460dc7-287b-41b2-8af3-ba27cc0ea6ce
Total jobs = 1
...
OK
hello   2
welcome 2
world   1
Time taken: 34.168 seconds, Fetched: 3 row(s)

簡單的查詢語法
2020-12-11
hive內建函式
2018-10-17
Hive函式
MySQL入門系列：查詢簡介（三）之表示式和函式
2019-03-11
MySql函式
ElasticSearch基礎及查詢語法
2019-05-03
Elasticsearch
Hive常用函式及自定義函式
2018-06-08
Hive函式
Hive函式（內建函式+自定義標準函式UDF）
2020-09-23
Hive函式
Hive的基本介紹以及常用函式
2020-06-04
Hive函式
Python語法—函式及引數傳遞
2019-02-28
Python函式
TiDB 查詢優化及調優系列（二）TiDB 查詢計劃簡介
2022-04-27
TiDB優化
函式式API簡介
2024-05-31
函式API
Solr複雜查詢一：函式查詢
2020-04-17
Solr函式
如何在函式內部定義函式？
2024-03-01
函式
JMeter36個內建函式及11個新增函式介紹
2021-06-03
JMeter函式
php 函式簡介
2019-05-11
PHP函式
MongoDB 常用查詢語法
2019-10-09
MongoDB
hive函式
2020-12-14
Hive函式
SPF 記錄：原理、語法及配置方法簡介
2018-06-21
好程式設計師大資料學習路線hive內部函式
2019-08-12
程式設計師大資料Hive函式
JavaScript 函式語法
2019-03-24
JavaScript函式
Python中compile函式的語法及例項！
2024-01-22
PythonCompile函式
Hive -------- hive常見查詢練習
2018-11-13
Hive
[Hive]Hive實現抽樣查詢
2018-08-16
Hive
MySQL入門系列：查詢簡介（五）之子查詢
2019-03-13
MySql
10-函式-1-函式簡介
2020-11-18
函式
webgl內建函式--向量函式與紋理查詢函式
2018-10-23
Web函式
好程式設計師大資料培訓分享之hive常用內部函式
2020-05-25
程式設計師大資料Hive函式
PostgreSQL 查詢替換函式
2021-12-21
SQL函式
C++純虛擬函式簡介及區別
2021-05-02
C++函式
Django：查詢基礎語法
2024-08-19
Django
Hive函式大全
2018-08-20
Hive函式
gin框架函式語法
2024-03-21
框架函式
MySQL 函式語法整理
2020-12-12
MySql函式
(譯) 函式式 JS #1：簡介
2019-01-11
函式JS
hive學習筆記之七：內建函式
2021-07-06
Hive筆記函式
PostgreSQL 原始碼解讀（83）- 查詢語句#68（PortalStart函式）
2018-11-12
SQL原始碼函式
函式內部This的指向問題
2020-11-18
函式
函式內部的變數提升
2024-06-08
函式變數
SQL-基礎語法 - 條件查詢 - 模糊查詢
2024-12-04
SQL

Hive內部函式簡介及查詢語法

1.Hive內建函式：

2.Hive查詢語法：

3.利用Hive sql實現wordcount：

相關文章