flink 中window的開窗開始時間、結束時間講解

zyj_369發表於2020-11-08

一、原始碼展示:

在這裡插入圖片描述
在這裡插入圖片描述
在這裡插入圖片描述
在這裡插入圖片描述

可以看到計算公式如下

/**
	 * Method to get the window start for a timestamp.
	 *
	 * @param timestamp epoch millisecond to get the window start.
	 * @param offset The offset which window start would be shifted by.
	 * @param windowSize The size of the generated windows.
	 * @return window start
	 */
public static long getWindowStartWithOffset(long timestamp, long offset, long windowSize) {
		return timestamp - (timestamp - offset + windowSize) % windowSize;
	}

二、程式碼測試:

import java.time.Duration
import org.apache.flink.api.common.eventtime.{SerializableTimestampAssigner, WatermarkStrategy}
import org.apache.flink.streaming.api.TimeCharacteristic
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.windowing.time.Time

// 定義樣例類,溫度感測器
case class SensorReading(id: String,timeStamp: Long,temperature: Double)

object WindowTest {
  def main(args: Array[String]): Unit = {
    // 獲取執行環境
    val environment: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
    // 設定全域性的並行度為1,方便測試
    environment.setParallelism(1)
    // 設定時間語義為事件時間
    environment.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)

	// 從socket中獲取資料
    val inputStream2: DataStream[String] = environment.socketTextStream("centos7-1", 4444)

    // 將源資料轉換成樣例類型別,設定waterMark延遲時間和事件時間欄位
    val dataStream: DataStream[SensorReading] = inputStream2.map(data => {
      val arr = data.split(",")
      SensorReading(arr(0), arr(1).toLong, arr(2).toDouble)
    }).assignTimestampsAndWatermarks(WatermarkStrategy.forBoundedOutOfOrderness[SensorReading](Duration.ofSeconds(3))
      .withTimestampAssigner(new SerializableTimestampAssigner[SensorReading] {
        override def extractTimestamp(element: SensorReading, recordTimestamp: Long): Long = element.timeStamp * 1000L
      }))

    // 建立側輸出流
    val latetag = new OutputTag[(String, Double, Long)]("late")

    // 每15秒統計一次,視窗內各感測器所有溫度的最小值,以及最新的時間戳
    val resultStream: DataStream[(String, Double, Long)] = dataStream
      .map(data => (data.id, data.temperature, data.timeStamp))
      .keyBy(_._1) // 根據id分組
      .timeWindow(Time.seconds(15)) // 時間滾動視窗,視窗大小15秒
      .allowedLateness(Time.minutes(1)) // 允許處理遲到的資料
      .sideOutputLateData(latetag) // 將遲到的資料放入側輸出流
      .reduce(
      // (String, Double, Long) id,最小溫度,最新時間戳
        (currRes, newDate) => (currRes._1, currRes._2.min(newDate._2), newDate._3)
      )

    resultStream.getSideOutput(latetag).print("late")
    resultStream.print("window result")

    // 執行
    environment.execute("window test")
  }
}

測試開窗開始時間:
在這裡插入圖片描述

在這裡插入圖片描述
第一條資料時間戳減去4秒就是195開始

相關文章