SQL入門-進階教程
前言
接上一次課程《SQL入門-基礎教程》,本次課程是對上一次課程的深入,主要包括分組、聚合、連線、分析&視窗函式等方面,目的是通過講解常用的語法和函式,爭取用最短的時間對SQL有個整體的認識,每一塊內容的詳細講解會在後面出一個系列的文章進行詳細的詳解,敬請關注…
好了,我們開始吧
00、預備知識
先講幾個資料倉儲中的概念,目的是將口徑拉齊,大家有個初步的瞭解即可。
1、事實表
發生在現實世界中的操作型事件,其所產生的可度量數值,比如,門店交易資料、倉庫出庫量資料、供應鏈進銷存資料等
2、維度表
維度表就是你觀察該事物的角度(維度),比如商品資訊表、門店資訊表、物流商品表等
3、大寬表
指業務主題相關的指標、維度、屬性關聯在一起的一張表,一般是由N張事件表和N張維度表關聯生成
一、資料準備
1、執行環境
本次課程的執行環境為開源的Hue系統:HUE連結
使用者名稱:demo 密碼:demo
PS:如果有公司的Hive SQL環境也可以,本次課程中的SQL都可以在Hive下面執行
2、資料表準備
本次使用with函式來代替資料表,在學習過程中with函式中的資料不需要改變,只需要在下面寫自己的邏輯即可。
with dw_wms_outbound_info_v100 as(
select '10700001' as order_no, '0011' as shop_no, '5501' as product_code, '2020-07-31' as order_date, 10 as payment_amount, '20200901' as dt union all
select '10700001' as order_no, '0011' as shop_no, '5502' as product_code, '2020-08-01' as order_date, 90 as payment_amount, '20200901' as dt union all
select '10700002' as order_no, '0011' as shop_no, '5501' as product_code, '2020-08-02' as order_date, 20 as payment_amount, '20200901' as dt union all
select '10700002' as order_no, '0011' as shop_no, '5503' as product_code, '2020-08-02' as order_date, 30 as payment_amount, '20200901' as dt union all
select '10700003' as order_no, '0011' as shop_no, '5501' as product_code, '2020-08-02' as order_date, 20 as payment_amount, '20200901' as dt union all
select '10700004' as order_no, '0012' as shop_no, '6601' as product_code, '2020-07-31' as order_date, 40 as payment_amount, '20200901' as dt union all
select '10700004' as order_no, '0012' as shop_no, '6601' as product_code, '2020-08-01' as order_date, 50 as payment_amount, '20200901' as dt
)
, dim_store_info_v100 as (
select '0011' as shop_no, '北京一店' as shop_name, '20200901' as dt union all
select '0012' as shop_no, '北京二店' as shop_name, '20200901' as dt
)
, dim_sku_info_v100 as (
select '5501' as product_code, '可樂350ml' as product_name, '20200901' as dt union all
select '5502' as product_code, '雪碧500ml' as product_name, '20200901' as dt union all
select '5504' as product_code, '芬達243ml' as product_name, '20200901' as dt union all
select '6601' as product_code, '巧克力' as product_name, '20200901' as dt
)
---------------------上面的SQL不要動-----在下面寫自己的SQL--------------------
select
t1.dt,
t1.order_no,
t1.shop_no,
t2.shop_name,
t1.product_code,
t3.product_name,
t1.order_date,
t1.payment_amount
from dw_wms_outbound_info_v100 t1
left join dim_store_info_v100 t2 on t2.dt = '20200901' and t1.shop_no = t2.shop_no
left join dim_sku_info_v100 t3 on t3.dt = '20200901' and t1.product_code = t3.product_code
where t1.dt = '20200901'
如下圖所示:
3、資料表的關係
如下圖所示:dw_wms_outbound_info_v100表為事實表,dim_store_info_v100和dim_sku_info_v100表為維度表,分別通過shop_no和product_code關聯起來。
二、進階語法
1、 with … as 語法
當我們在寫一些結構相對複雜的SQL語句時,可能某個子查詢在多個層級多個地方存在重複使用的情況,這個時候我們可以使用 with as 語句將其獨立出來,極大提高SQL可讀性,簡化SQL~
樣例:參考本次課程的[資料表準備SQL]
2、分組&聚合
2.0 語法結構
基本語法如下:
2.1 分組(group by)
分組顧名思義,即將原來的一整塊資料按某種維度分成幾小塊,語法關鍵字為 group by,對比excel的透視表功能。
語法:group by col1, col2, …
例如:按門店和商品對資料進行分組
select
shop_no,
product_code
from dw_wms_outbound_info_v100
where dt = '20200901'
group by
shop_no,
product_code
查詢結果:
shop_no | product_code |
---|---|
0011 | 5501 |
0011 | 5502 |
0012 | 6601 |
0011 | 5503 |
2.2 聚合
聚合是在每個分組內進行一些統計,如在分組內的最大值(max)、最小值(min)、平均值(avg)、資料條數(count)等聚合函式
例:統計每個分組內的資料條數?(使用count函式來統計)
select
shop_no,
product_code,
count(*) as record_cnt
from dw_wms_outbound_info_v100
where dt = '20200901'
group by
shop_no,
product_code
查詢結果:
shop_no | product_code | record_cnt |
---|---|---|
0011 | 5501 | 3 |
0011 | 5502 | 1 |
0012 | 6601 | 2 |
0011 | 5503 | 1 |
注:count(*)代表的是統計組內所有的行,詳細使用參考下面的聚合函式使用
2.2 分組過濾(having)
當在gropu by 子句中使用having 子句時,查詢結果中只返回滿足having條件的組
例:統計每筆訂單支付的金額大於50的訂單號?(使用count函式來統計)
select
order_no,
sum(payment_amount) as payment_amount
from dw_wms_outbound_info_v100
group by order_no
having payment_amount > 50
查詢結果:
order_no | payment_amount |
---|---|
10700004 | 90 |
10700001 | 100 |
注:sum(payment_amount)代表的是統計組內所有的行的支付金額加總
3、表連線
3.0 語法結構
需要從多個資料表中讀取資料,這時我們就可以使用SQL語句中的連線(JOIN),基本格式為 join ... on...and,如果不顯示指定join型別,預設為inner,
在兩個或多個資料表中查詢資料。大家可以想象excel中的vlook使用過程。
</br>
PS:
1. 如下圖中的A和B為兩個表名
2. 下面圖來源於網上部落格
3.1 內連線(Inner join)
產生的結果是A和B的交集(相同列裡面的相同值)
語法:[inner] join … on 條件1 and 條件2 and …
如下圖所示:
例:查詢商品表dim_sku_info_v100中所有商品在門店有銷售的記錄
SQL:
select
t1.dt,
t1.order_no,
t1.shop_no,
t1.product_code,
t3.product_name,
t1.order_date,
t1.payment_amount
from dw_wms_outbound_info_v100 t1
inner join dim_sku_info_v100 t3 on t1.product_code = t3.product_code
結果:
+--------+--------+-------+------------+-------------+------------+----------+--------------+
|dt |order_no|shop_no|product_code|product_code2|product_name|order_date|payment_amount|
+--------+--------+-------+------------+-------------+------------+----------+--------------+
|20200901|10700001|0011 |5501 |5501 |可樂350ml |2020-07-31|10 |
|20200901|10700001|0011 |5502 |5502 |雪碧500ml |2020-08-01|90 |
|20200901|10700002|0011 |5501 |5501 |可樂350ml |2020-08-02|20 |
|20200901|10700003|0011 |5501 |5501 |可樂350ml |2020-08-02|20 |
|20200901|10700004|0012 |6601 |6601 |巧克力 |2020-07-31|40 |
|20200901|10700004|0012 |6601 |6601 |巧克力 |2020-08-01|50 |
+--------+--------+-------+------------+-------------+------------+----------+--------------+
說明:連線條件為商品編碼,商品5503和5504沒有顯示出來
3.2 外連線(Full Outer Join)
產生的結果是A和B的並集(如果沒有相同的值會用null作為值)
語法:full join … on 條件1 and 條件2 and …
例:查詢商品表dim_sku_info_v100中所有商品在門店的記錄,如果銷售記錄中的商品編碼不在dim_sku_info_v100表中也要顯示
SQL:
select
t1.dt,
t1.order_no,
t1.shop_no,
t1.product_code,
t3.product_code as product_code2,
t3.product_name,
t1.order_date,
t1.payment_amount
from dw_wms_outbound_info_v100 t1
full join dim_sku_info_v100 t3 on t1.product_code = t3.product_code
結果:
+--------+--------+-------+------------+-------------+------------+----------+--------------+
|dt |order_no|shop_no|product_code|product_code2|product_name|order_date|payment_amount|
+--------+--------+-------+------------+-------------+------------+----------+--------------+
|20200901|10700002|0011 |5501 |5501 |可樂350ml |2020-08-02|20 |
|20200901|10700003|0011 |5501 |5501 |可樂350ml |2020-08-02|20 |
|20200901|10700001|0011 |5501 |5501 |可樂350ml |2020-07-31|10 |
|20200901|10700001|0011 |5502 |5502 |雪碧500ml |2020-08-01|90 |
|20200901|10700004|0012 |6601 |6601 |巧克力 |2020-07-31|40 |
|20200901|10700004|0012 |6601 |6601 |巧克力 |2020-08-01|50 |
|NULL |NULL |NULL |NULL |5504 |芬達243ml |NULL |NULL |
|20200901|10700002|0011 |5503 |NULL |NULL |2020-08-02|30 |
+--------+--------+-------+------------+-------------+------------+----------+--------------+
3.3 左連線(Left outer join)
產生表A的完全集,根據on條件,B表中匹配的則有值(沒有匹配的則以null值取代)
語法:left join … on 條件1 and 條件2 and …
例:查詢門店所有銷售記錄的商品名稱
SQL:
select
t1.dt,
t1.order_no,
t1.shop_no,
t1.product_code,
t3.product_code as product_code2,
t3.product_name,
t1.order_date,
t1.payment_amount
from dw_wms_outbound_info_v100 t1
left join dim_sku_info_v100 t3 on t1.product_code = t3.product_code
結果:
+--------+--------+-------+------------+-------------+------------+----------+--------------+
|dt |order_no|shop_no|product_code|product_code2|product_name|order_date|payment_amount|
+--------+--------+-------+------------+-------------+------------+----------+--------------+
|20200901|10700001|0011 |5501 |5501 |可樂350ml |2020-07-31|10 |
|20200901|10700001|0011 |5502 |5502 |雪碧500ml |2020-08-01|90 |
|20200901|10700002|0011 |5501 |5501 |可樂350ml |2020-08-02|20 |
|20200901|10700002|0011 |5503 |NULL |NULL |2020-08-02|30 |
|20200901|10700003|0011 |5501 |5501 |可樂350ml |2020-08-02|20 |
|20200901|10700004|0012 |6601 |6601 |巧克力 |2020-07-31|40 |
|20200901|10700004|0012 |6601 |6601 |巧克力 |2020-08-01|50 |
+--------+--------+-------+------------+-------------+------------+----------+--------------+
3.4 右連線(Right Outer Join)
產生表B的完全集,根據on條件,A表中匹配的則有值(沒有匹配的則以null值取代),與Left Join相反
語法:right join … on 條件1 and 條件2 and …
例:查詢在商品表中所有商品在門店的銷售記錄
SQL:
select
t1.dt,
t1.order_no,
t1.shop_no,
t1.product_code,
t3.product_code as product_code2,
t3.product_name,
t1.order_date,
t1.payment_amount
from dw_wms_outbound_info_v100 t1
right join dim_sku_info_v100 t3 on t1.product_code = t3.product_code
結果:
+--------+--------+-------+------------+-------------+------------+----------+--------------+
|dt |order_no|shop_no|product_code|product_code2|product_name|order_date|payment_amount|
+--------+--------+-------+------------+-------------+------------+----------+--------------+
|20200901|10700003|0011 |5501 |5501 |可樂350ml |2020-08-02|20 |
|20200901|10700002|0011 |5501 |5501 |可樂350ml |2020-08-02|20 |
|20200901|10700001|0011 |5501 |5501 |可樂350ml |2020-07-31|10 |
|20200901|10700001|0011 |5502 |5502 |雪碧500ml |2020-08-01|90 |
|NULL |NULL |NULL |NULL |5504 |芬達243ml |NULL |NULL |
|20200901|10700004|0012 |6601 |6601 |巧克力 |2020-08-01|50 |
|20200901|10700004|0012 |6601 |6601 |巧克力 |2020-07-31|40 |
+--------+--------+-------+------------+-------------+------------+----------+--------------+
3.5 笛卡爾積(cross join)
cross對兩個表執行笛卡爾乘積(可以理解為沒有連線條件)。它為左錶行和右錶行的每種可能的組合返回一行,一般出現笛卡爾積時大概率現出了邏輯錯誤,比如連線條件寫錯表名,導致兩個連線時沒有連線條件。除非特殊情況下需要使用笛卡爾積
例:查詢門店所有銷售記錄的商品名稱
ps:注意下圖中的連線條件
SQL:
select
t1.dt,
t1.order_no,
t1.shop_no,
t1.product_code,
t3.product_code as product_code2,
t3.product_name,
t1.order_date,
t1.payment_amount
from dw_wms_outbound_info_v100 t1
left join dim_sku_info_v100 t3 on t1.product_code = t1.product_code
結果:如下
+--------+--------+-------+------------+-------------+------------+----------+--------------+
|dt |order_no|shop_no|product_code|product_code2|product_name|order_date|payment_amount|
+--------+--------+-------+------------+-------------+------------+----------+--------------+
|20200901|10700001|0011 |5501 |5501 |可樂350ml |2020-07-31|10 |
|20200901|10700001|0011 |5501 |5502 |雪碧500ml |2020-07-31|10 |
|20200901|10700001|0011 |5501 |5504 |芬達243ml |2020-07-31|10 |
|20200901|10700001|0011 |5501 |6601 |巧克力 |2020-07-31|10 |
|20200901|10700001|0011 |5502 |5501 |可樂350ml |2020-08-01|90 |
|20200901|10700001|0011 |5502 |5502 |雪碧500ml |2020-08-01|90 |
|20200901|10700001|0011 |5502 |5504 |芬達243ml |2020-08-01|90 |
|20200901|10700001|0011 |5502 |6601 |巧克力 |2020-08-01|90 |
|20200901|10700002|0011 |5501 |5501 |可樂350ml |2020-08-02|20 |
|20200901|10700002|0011 |5501 |5502 |雪碧500ml |2020-08-02|20 |
|20200901|10700002|0011 |5501 |5504 |芬達243ml |2020-08-02|20 |
|20200901|10700002|0011 |5501 |6601 |巧克力 |2020-08-02|20 |
|20200901|10700002|0011 |5503 |5501 |可樂350ml |2020-08-02|30 |
|20200901|10700002|0011 |5503 |5502 |雪碧500ml |2020-08-02|30 |
|20200901|10700002|0011 |5503 |5504 |芬達243ml |2020-08-02|30 |
|20200901|10700002|0011 |5503 |6601 |巧克力 |2020-08-02|30 |
|20200901|10700003|0011 |5501 |5501 |可樂350ml |2020-08-02|20 |
|20200901|10700003|0011 |5501 |5502 |雪碧500ml |2020-08-02|20 |
|20200901|10700003|0011 |5501 |5504 |芬達243ml |2020-08-02|20 |
|20200901|10700003|0011 |5501 |6601 |巧克力 |2020-08-02|20 |
|20200901|10700004|0012 |6601 |5501 |可樂350ml |2020-07-31|40 |
|20200901|10700004|0012 |6601 |5502 |雪碧500ml |2020-07-31|40 |
|20200901|10700004|0012 |6601 |5504 |芬達243ml |2020-07-31|40 |
|20200901|10700004|0012 |6601 |6601 |巧克力 |2020-07-31|40 |
|20200901|10700004|0012 |6601 |5501 |可樂350ml |2020-08-01|50 |
|20200901|10700004|0012 |6601 |5502 |雪碧500ml |2020-08-01|50 |
|20200901|10700004|0012 |6601 |5504 |芬達243ml |2020-08-01|50 |
|20200901|10700004|0012 |6601 |6601 |巧克力 |2020-08-01|50 |
+--------+--------+-------+------------+-------------+------------+----------+--------------+
Time taken: 0.815 seconds, Fetched 28 row(s)
可見,由於出現了笛卡爾積,由於寫錯將條件寫為(t1.product_code = t1.product_code),正確為t1.product_code = t2.product_code,最終資料最從7條變為了28條,資料量增長了4倍,
問題:如果A表有10萬條資料,B表有10萬條資料,如果發生邏輯錯誤,產生了笛卡爾積的操作,結果會是多少條資料?
10萬? 100萬? 10億? 100億?
4、連線後聚合
實際操作中常會對多表連線後的結果執行聚合操作,語法格式為:
SQL的執行順序為 from -> join -> group -> having -> select -> order -> limit 在後面的課程中會專門對SQL的執行順序進行詳細講解
例:查詢門店每天商品的銷售額,按日期+商品名稱聚合
SQL:
select
t1.order_date,
t3.product_name,
sum(t1.payment_amount) as payment_amount
from dw_wms_outbound_info_v100 t1
left join dim_sku_info_v100 t3 on t1.product_code = t3.product_code
group by
t1.order_date,
t3.product_name
結果:如下
+----------+------------+--------------+
|order_date|product_name|payment_amount|
+----------+------------+--------------+
|2020-07-31|可樂350ml |10 |
|2020-08-02|NULL |30 |
|2020-08-01|雪碧500ml |90 |
|2020-08-02|可樂350ml |40 |
|2020-08-01|巧克力 |50 |
|2020-07-31|巧克力 |40 |
+----------+------------+--------------+
5、union[distinct | all]
UNION將多個SELECT語句的結果集合併為一個獨立的結果集,distinct會對結果進行去重 (預設為 distinct,去重操作會影響效能),all保留所有
語法:
select_statement UNION [ALL | DISTINCT]
select_statement UNION [ALL | DISTINCT]
...
select_statement
例:使用union 來對0011門店和0012門店的有銷售商品進行彙總,
SQL:
select
shop_no,product_code
from dw_wms_outbound_info_v100
where shop_no = '0011'
union
select
shop_no,product_code
from dw_wms_outbound_info_v100
where shop_no = '0012'
結果:去重
+-------+------------+
|shop_no|product_code|
+-------+------------+
|0011 |5501 |
|0011 |5502 |
|0012 |6601 |
|0011 |5503 |
+-------+------------+
四、進階函式
1、聚合函式
函式名 | 用途 |
---|---|
count([DISTINCT] col) | 統計行數 |
avg(col) | 平均數 |
min(col) | 最小值 |
max(col) | 最大值 |
sum(col) | 求和 |
percentile_approx(col, p) | 返回組中列的精確第p個百分點 p必須介於0和1之間 |
stddev(col) | 標準差 |
collect_set(col) | 欄位去重後的集合 |
collect_list(col) | 欄位不去重的集合 |
corr(col1, col2) | 返回兩列的皮爾遜相關係數 |
解釋:
- count(統計行數):對分組內的資料條數進行統計,有如下幾種方式:
- count(1):該種方式會對分組所有的記錄進行統計,沒有進行任何的過濾(1可以換成任何整數,習慣使用1)
- count(*):該方式同count(1)
- count(col):根據某列進行統計,如果某一條記錄該欄位為null,那麼此記錄不會統計
- count(distinct col):與count(col)相比,該方法會對欄位的值進行去重後統計
例如,首先查詢門店所有銷售記錄的商品名稱,並根據商品名稱欄位進行統計資料條數
SQL:
select
order_date as od,
count(1) as count1,
count(*) as count2,
count(product_code) as count3,
count(product_name) as count4,
round(avg(payment_amount),1) as avg,
min(payment_amount) as min,
max(payment_amount) as max,
sum(payment_amount) as sum,
round(percentile_approx(payment_amount, 0.5),1) as percentile,
round(stddev(payment_amount),1) as stddev,
collect_set(shop_no) as set,
collect_list(shop_no) as list
from (
select
t1.dt,
t1.order_no,
t1.shop_no,
t1.product_code,
t3.product_code as product_code2,
t3.product_name,
t1.order_date,
t1.payment_amount
from dw_wms_outbound_info_v100 t1
left join dim_sku_info_v100 t3 on t1.product_code = t3.product_code
) t1
group by order_date
查詢結果:
+----------+------+------+------+------+----+---+---+---+----------+------+---------------+----------------------+
|od |count1|count2|count3|count4|avg |min|max|sum|percentile|stddev|set |list |
+----------+------+------+------+------+----+---+---+---+----------+------+---------------+----------------------+
|2020-08-02|3 |3 |3 |2 |23.3|20 |30 |70 |20 |5.8 |["0011"] |["0011","0011","0011"]|
|2020-08-01|2 |2 |2 |2 |70.0|50 |90 |140|50 |28.3 |["0011","0012"]|["0011","0012"] |
|2020-07-31|2 |2 |2 |2 |25.0|10 |40 |50 |10 |21.2 |["0011","0012"]|["0012","0011"] |
+----------+------+------+------+------+----+---+---+---+----------+------+---------------+----------------------+
2、分析&視窗函式
常用的函式:
函式名 | 用途 |
---|---|
row_number() OVER([partition_by_clause] order_by_clause) | 分組排序,同一組內相同值會給不同序號 |
rank() OVER([partition_by_clause] order_by_clause) | 組內從1開始按order by排序後,將相同的值輸出為相同的值,其它值不變(好亂,看下面樣例吧) |
dense_rank() OVER([partition_by_clause] order_by_clause) | 返回從1開始的遞增整數序列。輸出序列為ORDER BY表示式的重複值生成重複整數 |
lag(expr [, offset] [, default]) OVER ([partition_by_clause] order_by_clause) | 該函式使用前一行的列值返回表示式的值。您可以指定一個整數偏移量,它指定一個行位置,位於當前行之前的某些行數 |
lead | 與lag對應,記住一個就可以 |
2.1 row_number | rank | dense_rank
例如 :根據支付金額排序
SQL:
select
order_date,
order_no,
shop_no,
payment_amount,
row_number() over(order by payment_amount) as row_number,
rank() over(order by payment_amount) as rank,
dense_rank() over(order by payment_amount) as dense_rank
from dw_wms_outbound_info_v100
結果:
+----------+--------+-------+--------------+----------+----+----------+
|order_date|order_no|shop_no|payment_amount|row_number|rank|dense_rank|
+----------+--------+-------+--------------+----------+----+----------+
|2020-07-31|10700001|0011 |10 |1 |1 |1 |
|2020-08-02|10700002|0011 |20 |2 |2 |2 |
|2020-08-02|10700003|0011 |20 |3 |2 |2 |
|2020-08-02|10700002|0011 |30 |4 |4 |3 |
|2020-07-31|10700004|0012 |40 |5 |5 |4 |
|2020-08-01|10700004|0012 |50 |6 |6 |5 |
|2020-08-01|10700001|0011 |90 |7 |7 |6 |
+----------+--------+-------+--------------+----------+----+----------+
2.2 lag | lead
例如 :計算每家門店每天相對於昨天的銷售額的變化?
SQL:
select
t1.order_date,
t1.shop_no,
t1.payment_amount,
-- 以下為昨天的銷售額
lag(payment_amount) over(partition by shop_no order by t1.order_date) as yest_payment_amount,
lag(payment_amount,1, 0) over(partition by shop_no order by t1.order_date) as yest_payment_amount2,
lag(payment_amount,2, 0) over(partition by shop_no order by t1.order_date) as yest_payment_amount3,
lead(payment_amount,1, 0) over(partition by shop_no order by t1.order_date) as yest_payment_amount4,
lag(payment_amount,1, 0) over(partition by shop_no order by t1.order_date desc) as yest_payment_amount5
from (
select
order_date,
shop_no,
sum(payment_amount) as payment_amount --當天銷售額
from dw_wms_outbound_info_v100
group by
order_date,
shop_no
) t1
結果:
+----------+-------+--------------+-------------------+--------------------+--------------------+--------------------+--------------------+
|order_date|shop_no|payment_amount|yest_payment_amount|yest_payment_amount2|yest_payment_amount3|yest_payment_amount4|yest_payment_amount5|
+----------+-------+--------------+-------------------+--------------------+--------------------+--------------------+--------------------+
|2020-07-31|0012 |40 |NULL |0 |0 |50 |50 |
|2020-08-01|0012 |50 |40 |40 |0 |0 |0 |
|2020-07-31|0011 |10 |NULL |0 |0 |90 |90 |
|2020-08-01|0011 |90 |10 |10 |0 |70 |70 |
|2020-08-02|0011 |70 |90 |90 |10 |0 |0 |
+----------+-------+--------------+-------------------+--------------------+--------------------+--------------------+--------------------+
2.3 over
與over 配置使用的聚合函式:
- COUNT
- SUM
- MIN
- MAX
- AVG
使用樣例:
SQL:
select
t1.order_date,
t1.order_no,
t1.shop_no,
t1.product_code,
t1.payment_amount,
--統計每天門店每天的單量
count(order_no) over(partition by order_date, shop_no) as shop_order_cnt,
--統計每天門店每天的銷售額
sum(payment_amount) over(partition by order_date, shop_no) as payment_amount_sum,
--統計每天所有門店單品最高支付金額
max(payment_amount) over(partition by order_date) as payment_amount_max
from dw_wms_outbound_info_v100 t1
結果:
+----------+--------+-------+------------+--------------+--------------+------------------+------------------+
|order_date|order_no|shop_no|product_code|payment_amount|shop_order_cnt|payment_amount_sum|payment_amount_max|
+----------+--------+-------+------------+--------------+--------------+------------------+------------------+
|2020-08-02|10700002|0011 |5503 |30 |3 |70 |30 |
|2020-08-02|10700002|0011 |5501 |20 |3 |70 |30 |
|2020-08-02|10700003|0011 |5501 |20 |3 |70 |30 |
|2020-08-01|10700004|0012 |6601 |50 |1 |50 |90 |
|2020-08-01|10700001|0011 |5502 |90 |1 |90 |90 |
|2020-07-31|10700004|0012 |6601 |40 |1 |40 |40 |
|2020-07-31|10700001|0011 |5501 |10 |1 |10 |40 |
+----------+--------+-------+------------+--------------+--------------+------------------+------------------+
3、複雜型別
函式名 | 說明 | demo | 結果 |
---|---|---|---|
map(key1, value1, …) | 根據給定的key-value生成map | select map(‘a’, ‘001’, ‘b’,‘002’) as res select map(‘a’, ‘001’, ‘b’,‘002’, ‘a’, ‘003’)[‘a’] as res | {“a”:“001”,“b”:“002”} 003 |
named_struct(name1, val1, name2, val2, …) | 建立struct結構 | select named_struct(‘expression’,‘1*1’,‘unit’,10).expression | 1*1 |
array(val1, val2, …) | 根據給定的元素生成array | 生成陣列:select array(‘a’, ‘b’) as res 查詢元素:select array(‘a’, ‘b’)[0] as res | [“a”,“b”] a |
4、集合函式
函式名 | 說明 | demo | 結果 |
---|---|---|---|
array_contains | Returns TRUE if the array contains value. | select array_contains(array(‘a’, ‘b’), ‘b’) as res select array_contains(array(‘a’, ‘b’), ‘c’) as res | true false |
size | 返回陣列或map元素的個數 | select size(array(‘a’, ‘b’)) as res select size(map(‘a’, ‘b’)) as res | 2 1 |
sort_array | 對陣列進行排序 | select sort_array( array(‘b’, ‘c’,‘a’ ) ) as res | [“a”,“b”,“c”] |
參考文件
相關文章
- Linux Capabilities 入門教程:進階實戰篇Linux
- elk 入門到進階
- Redis入門--進階詳解Redis
- Redis從入門到進階Redis
- Python入門教程100天:Day09-物件導向進階Python物件
- (一)《SQL進階教程》學習記錄--CASESQL
- SQL進階SQL
- Laravel 入門到進階-推薦Laravel
- Android DataBinding 從入門到進階Android
- (二)《SQL進階教程》學習記錄--GROUP BY、PARTITION BYSQL
- 教程:Apache Spark SQL入門及實踐指南!ApacheSparkSQL
- python3入門與進階(二)Python
- Flutter入門進階之旅(三)Text WidgetsFlutter
- Flutter入門進階之旅(十三)Flutter 路由Flutter路由
- Android DataBinding 從入門到進階(2)Android
- Python 從入門到進階之路(三)Python
- Python 從入門到進階之路(四)Python
- Python 從入門到進階之路(五)Python
- React 從入門到進階之路(八)React
- React 從入門到進階之路(九)React
- React 從入門到進階之路(七)React
- Python 從入門到進階之路(七)Python
- 前端基礎入門六(JQuery進階)前端jQuery
- Gradle外掛從入門到進階Gradle
- SparkStreaming入門教程(三)高階輸入源:Flume、KaSpark
- Python教程學習入門影片到進階,Python全套影片合集助你快速提升Python
- Jmeter(三十四) - 從入門到精通進階篇 - 引數化(詳解教程)JMeter
- SQL 入門SQL
- Flutter入門進階之旅(十四)ListView&GridViewFlutterView
- Flutter入門進階之旅(十)Dialog&ToastFlutterAST
- 前端入門13-JavaScript進階之原型前端JavaScript原型
- Apache Flink 進階入門(二):Time 深度解析Apache
- Python爬蟲進階之JS逆向入門Python爬蟲JS
- Arthas 進階教程
- python入門與進階篇(六)之高階語法及用法Python
- C#快速入門教程(27)—— SQL Server資料庫C#SQLServer資料庫
- DbForge Studio for SQL Server入門教程:如何編輯資料SQLServer
- 10、Entity Framework Core 3.1入門教程-執行原生SQLFrameworkSQL