@
目錄
官方文件地址
Hive官網,點我就進
oracle,sqlserver都提供了視窗函式,但是在mysql5.5和5.6都沒有提供視窗函式!
視窗函式: 視窗+函式
- 視窗: 函式執行時計算的資料集的範圍
- 函式: 執行的函式!
僅僅支援以下函式:
Windowing functions
- LEAD (scalar_expression [,offset] [,default]): 返回當前行以下N行的指定列的列值!如果找不到,就採用預設值
- LAG (scalar_expression [,offset] [,default]): 返回當前行以上N行的指定列的列值!如果找不到,就採用預設值
- FIRST_VALUE(列名,[false(預設)]):返回當前視窗指定列的第一個值,第二個引數如果為true,代表加入第一個值為null,跳過空值,繼續尋找!
- LAST_VALUE(列名,[false(預設)]):返回當前視窗指定列的最後一個值,第二個引數如果為true,代表加入第一個值為null,跳過空值,繼續尋找!
統計類的函式(一般都需要結合over使用):min,max,avg,sum,count
排名分析:
- RANK
- ROW_NUMBER
- DENSE_RANK
- CUME_DIST
- PERCENT_RANK
- NTILE
注意:不是所有的函式在執行都是可以通過改變視窗的大小,來控制計算的資料集的範圍!所有的排名函式和LAG,LEAD,支援使用over(),但是在over()中不能定義 window_clause
格式: 函式 over( partition by 欄位 ,order by 欄位 window_clause )
視窗的大小可以通過windows_clause來指定:
(rows | range) between (unbounded | [num]) preceding and ([num] preceding | current row | (unbounded | [num]) following)
(rows | range) between current row and (current row | (unbounded | [num]) following)
(rows | range) between [num] following and (unbounded | [num]) following
特殊情況:
- ①在over()中既沒有出現windows_clause,也沒有出現order by,視窗預設為rows between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING
- ②在over()中(沒有出現windows_clause),指定了order by,視窗預設為rows between UNBOUNDED PRECEDING and CURRENT ROW
視窗函式和分組有什麼區別?
- ①如果是分組操作,select後只能寫分組後的欄位
- ②如果是視窗函式,視窗函式是在指定的視窗內,對每條記錄都執行一次函式
- ③如果是分組操作,有去重效果,而partition不去重!
練習
(9) 查詢前20%時間的訂單資訊
精確演算法:
select *
from
(select name,orderdate,cost,cume_dist() over(order by orderdate ) cdnum
from business) tmp
where cdnum<=0.2
不精確計算:
select *
from
(select name,orderdate,cost,ntile(5) over(order by orderdate ) cdnum
from business) tmp
where cdnum=1
(8)查詢顧客的購買明細及顧客最近三次cost花費
最近三次: 當前和之前兩次 或 當前+前一次+後一次
當前和之前兩次:
select name,orderdate,cost,sum(cost) over(partition by name order by orderdate rows between 2 PRECEDING and CURRENT row)
from business
當前+前一次+後一次:
select name,orderdate,cost,sum(cost) over(partition by name order by orderdate rows between 1 PRECEDING and 1 FOLLOWING)
from business
或
select name,orderdate,cost,cost+
lag(cost,1,0) over(partition by name order by orderdate )+
lead(cost,1,0) over(partition by name order by orderdate )
from business
(7) 查詢顧客的購買明細及顧客本月最後一次購買的時間
select name,orderdate,cost,LAST_VALUE(orderdate,true) over(partition by name,substring(orderdate,1,7) order by orderdate rows between CURRENT row and UNBOUNDED FOLLOWING)
from business
(6) 查詢顧客的購買明細及顧客本月第一次購買的時間
select name,orderdate,cost,FIRST_VALUE(orderdate,true) over(partition by name,substring(orderdate,1,7) order by orderdate )
from business
(5) 查詢顧客的購買明細及顧客下次的購買時間
select name,orderdate,cost,lead(orderdate,1,'無資料') over(partition by name order by orderdate )
from business
(4)查詢顧客的購買明細及顧客上次的購買時間
select name,orderdate,cost,lag(orderdate,1,'無資料') over(partition by name order by orderdate )
from business
(3)查詢顧客的購買明細要將cost按照日期進行累加
select name,orderdate,cost,sum(cost) over(partition by name order by orderdate )
from business
(2)查詢顧客的購買明細及月購買總額
select name,orderdate,cost,sum(cost) over(partition by name,substring(orderdate,1,7) )
from business
(1)查詢在2017年4月份購買過的顧客及總人數
select name,count(*) over(rows between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING)
from business
where substring(orderdate,1,7)='2017-04'
group by name
等價於
select name,count(*) over()
from business
where substring(orderdate,1,7)='2017-04'
group by name