flink狀態容錯

weixin_33797791發表於2018-03-14

什麼是State(狀態)?

  • 某task/operator在某時刻的一箇中間結果
  • 快照(shapshot)
  • 在flink中狀態可以理解為一種資料結構
  • 舉例
    對輸入源為<key,value>的資料,計算其中某key的最大值,如果使用HashMap,也可以進行計算,但是每次都需要重新遍歷,使用狀態的話,可以獲取最近的一次計算結果,減少了系統的計算次數
  • 程式一旦crash,恢復
  • 程式擴容

State型別

Operator State(運算元狀態)


With Operator State (or non-keyed state), each operator state is bound to one parallel operator instance. The Kafka Connector is a good motivating example for the use of Operator State in Flink. Each parallel instance of the Kafka consumer maintains a map of topic partitions and offsets as its Operator State.
The Operator State interfaces support redistributing state among parallel operator instances when the parallelism is changed. There can be different schemes for doing this redistribution.

6116404-b1bcb91ccbb9e748.png
kafka示例

flink官方文件用kafka的消費者舉例,認為kafka消費者的partitionId和offset類似flink的operator state

提供的資料結構:ListState<T>
每一個Operator都存在自己的狀態

key State


Keyed State is always relative to keys and can only be used in functions and operators on a KeyedStream.
You can think of Keyed State as Operator State that has been partitioned, or sharded, with exactly one state-partition per key. Each keyed-state is logically bound to a unique composite of <parallel-operator-instance, key>, and since each key “belongs” to exactly one parallel instance of a keyed operator, we can think of this simply as <operator, key>.
Keyed State is further organized into so-called Key Groups. Key Groups are the atomic unit by which Flink can redistribute Keyed State; there are exactly as many Key Groups as the defined maximum parallelism. During execution each parallel instance of a keyed operator works with the keys for one or more Key Groups.

基於KeyStream之上的狀態
可理解為dataStream.keyBy()之後的Operator State,Operator State是對每一個Operator的狀態進行記錄,而key State則是在dataSteam進行keyBy()後,記錄相同keyId的keyStream上的狀態
key State提供的資料型別:ValueState<T>、ListState<T>、ReducingState<T>、MapState<T>

狀態容錯

  • Introduction
    Apache Flink offers a fault tolerance mechanism to consistently recover the state of data streaming applications. The mechanism ensures that even in the presence of failures, the program’s state will eventually reflect every record from the data stream exactly once. Note that there is a switch to downgrade the guarantees to at least once (described below).
    The fault tolerance mechanism continuously draws snapshots of the distributed streaming data flow. For streaming applications with small state, these snapshots are very light-weight and can be drawn frequently without impacting the performance much. The state of the streaming applications is stored at a configurable place (such as the master node, or HDFS).
    In case of a program failure (due to machine-, network-, or software failure), Flink stops the distributed streaming dataflow. The system then restarts the operators and resets them to the latest successful checkpoint. The input streams are reset to the point of the state snapshot. Any records that are processed as part of the restarted parallel dataflow are guaranteed to not have been part of the checkpointed state before.
    Note: For this mechanism to realize its full guarantees, the data stream source (such as message queue or broker) needs to be able to rewind the stream to a defined recent point. Apache Kafka has this ability and Flink’s connector to Kafka exploits this ability.
    Note: Because Flink’s checkpoints are realized through distributed snapshots, we use the words snapshot and checkpointinterchangeably.
依靠checkPoint

checkPoint概念:進行全域性快照,持久化儲存所有的task/operator的State

  • 特點:
    非同步:輕量級,不影響系統處理資料
    Barrier機制
    失敗情況下可回滾致最近一次成功的checkpoint
    週期性
  • 保證exactly-once


    6116404-32be94b72724f84e.png
    chcekPoint

    6116404-9df9ba9616595f26.png
    Restore
shapshot(快照)
  • Barriers(屏障)
    Barriers是flink分散式快照中的重要元素
    6116404-20b4d3a6249aae2c.png
    單並行度Barriers

    6116404-cf8afa8794356ad9.png
    多並行度Barriers

    Barrier被注入資料流中,並隨著資料流和記錄一起流動,每一個Barrier攜帶者快照ID,並且十分輕量級,不會打斷資料的流動,不同時期的快照的barrier可以同時存在資料流中,所以各種快照可以同時發生。
    相對於單並行度,多並行度的快照需要不同資料流中攜帶相同快照ID的Barrier經過operator之後,才能進行checkpoint
6116404-dbc1470f570daacf.png
image.png

個人理解:感覺對於Flink的狀態遷移和容錯來說,主要依賴checkpoint機制,而其中最重要的元素就是Barrier,通過Barrier保證流入Operator的資料都進行了checkpoint

相關文章