Flink SQL之Over 聚合操作
應用場景:計算最近一段滑動視窗的聚合結果資料。
實際案例:查詢每個產品最近一小時訂單的金額總和:
SELECT order_id, order_time, amount, SUM(amount) OVER ( PARTITION BY product ORDER BY order_time RANGE BETWEEN INTERVAL '1' HOUR PRECEDING AND CURRENT ROW ) AS one_hour_prod_amount_sum FROM Orders
Over 聚合的語法總結如下:
SELECT agg_func(agg_col) OVER ( [PARTITION BY col1[, col2, ...]] ORDER BY time_col range_definition), ... FROM ...
其中:
- ORDER BY:必須是時間戳列(事件時間、處理時間)
- PARTITION BY:標識了聚合視窗的聚合粒度,如上述案例是按照 product 進行聚合
- range_definition:這個標識聚合視窗的聚合資料範圍,在 Flink 中有兩種指定資料範圍的方式。第一種為按照行數聚合,第二種為按照時間區間聚合。
如下案例所示:
時間區間聚合
按照時間區間聚合就是時間區間的一個滑動視窗,比如下面案例 1 小時的區間,最新輸出的一條資料的 sum 聚合結果就是最近一小時資料的 amount 之和。
CREATE TABLE source_table ( order_id BIGINT, product BIGINT, amount BIGINT, order_time as cast(CURRENT_TIMESTAMP as TIMESTAMP(3)), WATERMARK FOR order_time AS order_time - INTERVAL '0.001' SECOND ) WITH ( 'connector' = 'datagen', 'rows-per-second' = '1', 'fields.order_id.min' = '1', 'fields.order_id.max' = '2', 'fields.amount.min' = '1', 'fields.amount.max' = '10', 'fields.product.min' = '1', 'fields.product.max' = '2' ); CREATE TABLE sink_table ( product BIGINT, order_time TIMESTAMP(3), amount BIGINT, one_hour_prod_amount_sum BIGINT ) WITH ( 'connector' = 'print' ); INSERT INTO sink_table SELECT product, order_time, amount, SUM(amount) OVER ( PARTITION BY product ORDER BY order_time -- 標識統計範圍是一個 product 的最近 1 小時的資料 RANGE BETWEEN INTERVAL '1' HOUR PRECEDING AND CURRENT ROW ) AS one_hour_prod_amount_sum FROM source_table
結果如下:
+I[2, 2021-12-24T22:08:26.583, 7, 73] +I[2, 2021-12-24T22:08:27.583, 7, 80] +I[2, 2021-12-24T22:08:28.583, 4, 84] +I[2, 2021-12-24T22:08:29.584, 7, 91] +I[2, 2021-12-24T22:08:30.583, 8, 99] +I[1, 2021-12-24T22:08:31.583, 9, 138] +I[2, 2021-12-24T22:08:32.584, 6, 105] +I[1, 2021-12-24T22:08:33.584, 7, 145]
行數聚合
按照行數聚合就是資料行數的一個滑動視窗,比如下面案例,最新輸出的一條資料的 sum 聚合結果就是最近 5 行資料的 amount 之和。
CREATE TABLE source_table ( order_id BIGINT, product BIGINT, amount BIGINT, order_time as cast(CURRENT_TIMESTAMP as TIMESTAMP(3)), WATERMARK FOR order_time AS order_time - INTERVAL '0.001' SECOND ) WITH ( 'connector' = 'datagen', 'rows-per-second' = '1', 'fields.order_id.min' = '1', 'fields.order_id.max' = '2', 'fields.amount.min' = '1', 'fields.amount.max' = '2', 'fields.product.min' = '1', 'fields.product.max' = '2' ); CREATE TABLE sink_table ( product BIGINT, order_time TIMESTAMP(3), amount BIGINT, one_hour_prod_amount_sum BIGINT ) WITH ( 'connector' = 'print' ); INSERT INTO sink_table SELECT product, order_time, amount, SUM(amount) OVER ( PARTITION BY product ORDER BY order_time -- 標識統計範圍是一個 product 的最近 5 行資料 ROWS BETWEEN 5 PRECEDING AND CURRENT ROW ) AS one_hour_prod_amount_sum FROM source_table
預跑結果如下:
+I[2, 2021-12-24T22:18:19.147, 1, 9] +I[1, 2021-12-24T22:18:20.147, 2, 11] +I[1, 2021-12-24T22:18:21.147, 2, 12] +I[1, 2021-12-24T22:18:22.147, 2, 12] +I[1, 2021-12-24T22:18:23.148, 2, 12] +I[1, 2021-12-24T22:18:24.147, 1, 11] +I[1, 2021-12-24T22:18:25.146, 1, 10] +I[1, 2021-12-24T22:18:26.147, 1, 9] +I[2, 2021-12-24T22:18:27.145, 2, 11] +I[2, 2021-12-24T22:18:28.148, 1, 10] +I[2, 2021-12-24T22:18:29.145, 2, 10]
當然,如果你在一個 SELECT 中有多個聚合視窗的聚合方式,Flink SQL 支援了一種簡化寫法,如下案例:
SELECT order_id, order_time, amount, SUM(amount) OVER w AS sum_amount, AVG(amount) OVER w AS avg_amount FROM Orders -- 使用下面子句,定義 Over Window WINDOW w AS ( PARTITION BY product ORDER BY order_time RANGE BETWEEN INTERVAL '1' HOUR PRECEDING AND CURRENT ROW)
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/69901823/viewspace-2903055/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- Flink sql 之 兩階段聚合與 TwoStageOptimizedAggregateRule(原始碼分析)SQLZed原始碼
- es筆記六之聚合操作之指標聚合筆記指標
- es筆記七之聚合操作之桶聚合和矩陣聚合筆記矩陣
- MongoDB學習之聚合操作MongoDB
- flink將bathch dataset 轉換為sql操作BATSQL
- SQL-ROWNUMBER-OVER彙總SQL
- Django筆記十六之aggregate聚合操作Django筆記
- Flink 1.16:Hive SQL 如何平遷到 Flink SQLHiveSQL
- PostgreSQLOraclePL/SQL相容性之-AGGREGATEUSINGClause聚合函式SQLOracleGC函式
- SQL 聚合查詢SQL
- sql中row_number over語句SQL
- Flink SQL Client初探SQLclient
- flink快速入門(部署+flink-sql)SQL
- 【Mongo】mongo聚合操作Go
- 3 SQL 聚合與排序SQL排序
- Flink的Table以及SQLSQL
- T-SQL之資料庫操作SQL資料庫
- mongodb聚合操作記錄MongoDB
- sql case when, Exist ,group by ,聚合SQL
- spark RDD運算元(五)之鍵值對聚合操作combineByKeySpark
- Z008-O001-08 PDB Switching Over 操作
- DDD之4聚合和聚合根
- elasticsearch 聚合之 date_histogram 聚合ElasticsearchHistogram
- 淺析 Flink Table/SQL APISQLAPI
- SQL-函式 - 聚合函式SQL函式
- Flink的join操作樣例
- flink batch dataset 的基本操作BAT
- Flink -- Operator操作運算元
- MySQL工具之binlog2sql閃回操作MySql
- Flink SQL Client綜合實戰SQLclient
- elasticsearch之metric聚合Elasticsearch
- Pandas 分組聚合操作詳解
- Flink SQL管理平臺flink-streaming-platform-web安裝搭建SQLPlatformWeb
- SQL基本操作SQL
- 常用sql 操作SQL
- 【SQL】Lag/Rank/Over視窗函式揭秘,資料分析之旅SQL函式
- flink connector 之FlinkKafkaProducerKafka
- Apache-Flink深度解析-SQL概覽ApacheSQL