好程式設計師大資料學習路線 hive內部函式，持續為大家更新了大資料學習路線，希望對正在學習大資料的小夥伴有所幫助。

1 、取隨機數函式： rand()

語法 : rand(),rand(int seed) 返回值 : double 說明 : 返回一個0到1範圍內的隨機數。如果指定seed，則會得到一個穩定的隨機數序列

select rand();
select rand(10);

2 、分割字串函式 :split(str,splitor)

語法 : split(string str, string pat) 返回值 : array 說明 : 按照pat字串分割str，會返回分割後的字串陣列，注意特殊分割符的轉義

select split(5.0,"\.")[0];
select split(rand(10)*100,"\.")[0];

3 、字串擷取函式： substr,substring

語法 : substr(string A, int start),substring(string A, int start) 返回值 : string 說明：返回字串 A從start位置到結尾的字串

語法 : substr(string A, int start, int len),substring(string A, int start, int len) 返回值 : string 說明：返回字串 A從start位置開始，長度為len的字串

select substr(rand()*100,0,2);
select substring(rand()*100,0,2);

4 、 If 函式 :if

語法 : if(boolean testCondition, T valueTrue, T valueFalseOrNull) 返回值 : T 說明 : 當條件testCondition為TRUE時，返回valueTrue；否則返回valueFalseOrNull

select if(100>10,"this is true","this is false");
select if(2=1," 男 "," 女 ");
select if(1=1," 男 ",(if(1=2," 女 "," 不知道 ")));
select if(3=1," 男 ",(if(3=2," 女 "," 不知道 ")));

5 、條件判斷函式： CASE

第一種格式：

語法 : CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END 返回值 : T 說明：如果 a為TRUE,則返回b；如果c為TRUE，則返回d；否則返回e

第二種格式：

語法 : CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END 返回值 : T 說明：如果 a等於b，那麼返回c；如果a等於d，那麼返回e；否則返回f

select
case 6
when 1 then "100"
when 2 then "200"
when 3 then "300"
when 4 then "400"
else "others"
end
;
## 建立表
create table if not exists cw(
flag int
)
;
load data local inpath '/home/flag' into table cw;
## 第一種格式
select
case c.flag
when 1 then "100"
when 2 then "200"
when 3 then "300"
when 4 then "400"
else "others"
end
from cw c
;
## 第二種格式
select
case
when 1=c.flag then "100"
when 2=c.flag then "200"
when 3=c.flag then "300"
when 4=c.flag then "400"
else "others"
end
from cw c
;

6 、正規表示式替換函式： regexp_replace

語法 : regexp replace(string A, string B, string C) 返回值 : string 說明 ：將字串 A中的符合java正規表示式B的部分替換為C。注意，在有些情況下要使用跳脫字元,類似oracle中的regexp replace函式

select regexp_replace("1.jsp",".jsp",".html");

7 、型別轉換函式 : cast

語法 : cast(expr as ) 返回值 : Expected "=" to follow "type" 說明 : 返回轉換後的資料型別

select 1;
select cast(1 as double);
select cast("12" as int);

8 、字串連線函式： concat ；帶分隔符字串連線函式： concat_ws

語法 : concat(string A, string B…) 返回值 : string 說明：返回輸入字串連線後的結果，支援任意個輸入字串

語法 : concat_ws(string SEP, string A, string B…) 返回值 : string 說明：返回輸入字串連線後的結果， SEP表示各個字串間的分隔符

select " 千峰 " + 1603 + " 班級 ";
select concat(" 千峰 ",1603," 班級 ");
select concat_ws("|"," 千峰 ","1603"," 班級 ");

9 、排名函式：

row number(): 名次不併列 rank():名次並列，但空位 dense rank():名次並列，但不空位

## 資料
id class score
1 1 90
2 1 85
3 1 87
4 1 60
5 2 82
6 2 70
7 2 67
8 2 88
9 2 93

1 1 90 1
3 1 87 2
2 1 85 3
9 2 93 1
8 2 88 2
5 2 82 3

create table if not exists uscore(
uid int,
classid int,
score double
)
row format delimited fields terminated by '\t'
;
load data local inpath '/home/uscore' into table uscore;
select
u.uid,
u.classid,
u.score
from uscore u
group by u.classid,u.uid,u.score
limit 3
;
select
u.uid,
u.classid,
u.score,
row_number() over(distribute by u.classid sort by u.score desc) rn
from uscore u
;

取前三名

select
t.uid,
t.classid,
t.score
from
(
select
u.uid,
u.classid,
u.score,
row_number() over(distribute by u.classid sort by u.score desc) rn
from uscore u
) t
where t.rn < 4
;

檢視三個排名區別

select
u.uid,
u.classid,
u.score,
row_number() over(distribute by u.classid sort by u.score desc) rn,
rank() over(distribute by u.classid sort by u.score desc) rank,
dense_rank() over(distribute by u.classid sort by u.score desc) dr
from uscore u
;

10. 聚合函式：

min() max() count() count(distinct ) sum() avg()

count(1):不管正行有沒有值，只要出現就累計1 count(*):正行值只要有一個不為空就給類計1 count(col)：col列有值就累計1 count(distinct col)：col列有值並且不相同才累計1

11.null 值操作

幾乎任何數和 NULL操作都返回NULL

select 1+null;
select 1/0;
select null%2;

12. 等值操作

select null=null; #null
select null<=>null;#true

好程式設計師大資料學習路線hive內部函式