Flink SQL之Over 聚合操作
應用場景:計算最近一段滑動視窗的聚合結果資料。
實際案例:查詢每個產品最近一小時訂單的金額總和:
SELECT order_id, order_time, amount, SUM(amount) OVER ( PARTITION BY product ORDER BY order_time RANGE BETWEEN INTERVAL '1' HOUR PRECEDING AND CURRENT ROW ) AS one_hour_prod_amount_sum FROM Orders
Over 聚合的語法總結如下:
SELECT agg_func(agg_col) OVER ( [PARTITION BY col1[, col2, ...]] ORDER BY time_col range_definition), ... FROM ...
其中:
- ORDER BY:必須是時間戳列(事件時間、處理時間)
- PARTITION BY:標識了聚合視窗的聚合粒度,如上述案例是按照 product 進行聚合
- range_definition:這個標識聚合視窗的聚合資料範圍,在 Flink 中有兩種指定資料範圍的方式。第一種為按照行數聚合,第二種為按照時間區間聚合。
如下案例所示:
時間區間聚合
按照時間區間聚合就是時間區間的一個滑動視窗,比如下面案例 1 小時的區間,最新輸出的一條資料的 sum 聚合結果就是最近一小時資料的 amount 之和。
CREATE TABLE source_table ( order_id BIGINT, product BIGINT, amount BIGINT, order_time as cast(CURRENT_TIMESTAMP as TIMESTAMP(3)), WATERMARK FOR order_time AS order_time - INTERVAL '0.001' SECOND ) WITH ( 'connector' = 'datagen', 'rows-per-second' = '1', 'fields.order_id.min' = '1', 'fields.order_id.max' = '2', 'fields.amount.min' = '1', 'fields.amount.max' = '10', 'fields.product.min' = '1', 'fields.product.max' = '2' ); CREATE TABLE sink_table ( product BIGINT, order_time TIMESTAMP(3), amount BIGINT, one_hour_prod_amount_sum BIGINT ) WITH ( 'connector' = 'print' ); INSERT INTO sink_table SELECT product, order_time, amount, SUM(amount) OVER ( PARTITION BY product ORDER BY order_time -- 標識統計範圍是一個 product 的最近 1 小時的資料 RANGE BETWEEN INTERVAL '1' HOUR PRECEDING AND CURRENT ROW ) AS one_hour_prod_amount_sum FROM source_table
結果如下:
+I[2, 2021-12-24T22:08:26.583, 7, 73] +I[2, 2021-12-24T22:08:27.583, 7, 80] +I[2, 2021-12-24T22:08:28.583, 4, 84] +I[2, 2021-12-24T22:08:29.584, 7, 91] +I[2, 2021-12-24T22:08:30.583, 8, 99] +I[1, 2021-12-24T22:08:31.583, 9, 138] +I[2, 2021-12-24T22:08:32.584, 6, 105] +I[1, 2021-12-24T22:08:33.584, 7, 145]
行數聚合
按照行數聚合就是資料行數的一個滑動視窗,比如下面案例,最新輸出的一條資料的 sum 聚合結果就是最近 5 行資料的 amount 之和。
CREATE TABLE source_table ( order_id BIGINT, product BIGINT, amount BIGINT, order_time as cast(CURRENT_TIMESTAMP as TIMESTAMP(3)), WATERMARK FOR order_time AS order_time - INTERVAL '0.001' SECOND ) WITH ( 'connector' = 'datagen', 'rows-per-second' = '1', 'fields.order_id.min' = '1', 'fields.order_id.max' = '2', 'fields.amount.min' = '1', 'fields.amount.max' = '2', 'fields.product.min' = '1', 'fields.product.max' = '2' ); CREATE TABLE sink_table ( product BIGINT, order_time TIMESTAMP(3), amount BIGINT, one_hour_prod_amount_sum BIGINT ) WITH ( 'connector' = 'print' ); INSERT INTO sink_table SELECT product, order_time, amount, SUM(amount) OVER ( PARTITION BY product ORDER BY order_time -- 標識統計範圍是一個 product 的最近 5 行資料 ROWS BETWEEN 5 PRECEDING AND CURRENT ROW ) AS one_hour_prod_amount_sum FROM source_table
預跑結果如下:
+I[2, 2021-12-24T22:18:19.147, 1, 9] +I[1, 2021-12-24T22:18:20.147, 2, 11] +I[1, 2021-12-24T22:18:21.147, 2, 12] +I[1, 2021-12-24T22:18:22.147, 2, 12] +I[1, 2021-12-24T22:18:23.148, 2, 12] +I[1, 2021-12-24T22:18:24.147, 1, 11] +I[1, 2021-12-24T22:18:25.146, 1, 10] +I[1, 2021-12-24T22:18:26.147, 1, 9] +I[2, 2021-12-24T22:18:27.145, 2, 11] +I[2, 2021-12-24T22:18:28.148, 1, 10] +I[2, 2021-12-24T22:18:29.145, 2, 10]
當然,如果你在一個 SELECT 中有多個聚合視窗的聚合方式,Flink SQL 支援了一種簡化寫法,如下案例:
SELECT order_id, order_time, amount, SUM(amount) OVER w AS sum_amount, AVG(amount) OVER w AS avg_amount FROM Orders -- 使用下面子句,定義 Over Window WINDOW w AS ( PARTITION BY product ORDER BY order_time RANGE BETWEEN INTERVAL '1' HOUR PRECEDING AND CURRENT ROW)
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/69901823/viewspace-2903055/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- Flink sql 之 兩階段聚合與 TwoStageOptimizedAggregateRule(原始碼分析)SQLZed原始碼
- es筆記七之聚合操作之桶聚合和矩陣聚合筆記矩陣
- es筆記六之聚合操作之指標聚合筆記指標
- MongoDB學習之聚合操作MongoDB
- flink將bathch dataset 轉換為sql操作BATSQL
- SQL-ROWNUMBER-OVER彙總SQL
- Django筆記十六之aggregate聚合操作Django筆記
- SQL語句中的rank () over , row_number() over ,rank_dense ()SQL
- 【Mongo】mongo聚合操作Go
- SQL 聚合查詢SQL
- Flink 1.16:Hive SQL 如何平遷到 Flink SQLHiveSQL
- mongodb聚合操作記錄MongoDB
- sql中row_number over語句SQL
- elasticsearch 聚合之 date_histogram 聚合ElasticsearchHistogram
- 3 SQL 聚合與排序SQL排序
- MongoDB之聚合MongoDB
- Flink SQL Client初探SQLclient
- flink快速入門(部署+flink-sql)SQL
- Spark2 Dataset聚合操作Spark
- Java 8 聚合操作詳解Java
- Mathematical算術和聚合操作
- PostgreSQLOraclePL/SQL相容性之-AGGREGATEUSINGClause聚合函式SQLOracleGC函式
- Sql Server系列:聚合函式SQLServer函式
- sql case when, Exist ,group by ,聚合SQL
- Flink的Table以及SQLSQL
- elasticsearch之metric聚合Elasticsearch
- T-SQL之資料庫操作SQL資料庫
- oracle SQL 基本操作之 使用者OracleSQL
- Oracle SQL 基本操作之 表空間OracleSQL
- SQL Server之查詢檢索操作SQLServer
- Pandas 分組聚合操作詳解
- Flink的join操作樣例
- flink batch dataset 的基本操作BAT
- Flink -- Operator操作運算元
- 淺析 Flink Table/SQL APISQLAPI
- Flink SQL Client綜合實戰SQLclient
- SQL優化之操作符篇(zt)SQL優化
- Elasticsearch聚合學習之一:基本操作Elasticsearch