workflow 之 Prefect 基本用法(qbit)

qbit發表於2022-02-25

前言

安裝

  • 用 poetry 初始化專案後在 pyproject.toml 新增以下依賴,然後執行 poetry update -vvv

    # 國內映象源(可選)
    [[tool.poetry.source]]
    name = "aliyun"
    url = "https://mirrors.aliyun.com/pypi/simple/"
    default = true
    
    [tool.poetry.dependencies]
    python = "^3.8"
    prefect = "~1.0.0"
  • 設定啟用檢查點環境變數

    # Linux
    export PREFECT__FLOWS__CHECKPOINTING=true
    # Windows powershell
    $env:PREFECT__FLOWS__CHECKPOINTING="true"
    # Windows cmd(注意行尾不要有空格)
    set PREFECT__FLOWS__CHECKPOINTING=true
  • 測試程式碼
  • test_prefect.py

    # encoding: utf-8
    # author: qbit
    # date: 2022-01-12
    # summary: 測試 prefect,加減乘除
    
    import os
    import sys
    import shutil
    import prefect
    from prefect import task, Flow
    from prefect.engine.results import LocalResult
    
    logger = prefect.context.get("logger")
    cur_dir_fullpath = os.path.dirname(os.path.abspath(__file__))
    cur_filename = os.path.basename(__file__)
    dirname = f".{os.path.splitext(cur_filename)[0]}"       # 以當前 py 檔名作為快取目錄名
    PrefectLocalResultDir = os.path.join(cur_dir_fullpath, dirname)
    
    def ClearDirectory(dir):
      r""" 清空目錄 """
      for filename in os.listdir(dir):
          file = os.path.join(dir, filename)
          try:
              if os.path.isfile(file) or os.path.islink(file):
                  os.remove(file)
              elif os.path.isdir(file):
                  shutil.rmtree(file)
          except Exception as e:
              print(f'Failed to delete{file}. Reason: {e}')
    
    @task(target="{task_name}.target", checkpoint=True, result=LocalResult(dir=PrefectLocalResultDir))
    def TaskAdd(x, y):
      result = x + y
      logger.info(f"{x} + {y} = {result}")
      return result
    
    @task
    def TaskSubtract(x):
      r""" 讀入引數減 1 """
      result = x - 1
      logger.info(f"{x} - 1 = {result}")
      return result
    
    @task
    def TaskMultiply(x):
      r""" 讀入引數乘以 2 """
      result = x * 2
      logger.info(f"{x} * 2 = {result}")
      print(f"result: {result}")
      return result
    
    @task(log_stdout=True)
    def TaskDivide(x, y):
      r""" 讀入引數做除法 """
      result = y / x
      logger.info(f"{y} / {x} = {result}")
      return result
    
    if __name__ == '__main__':
      if (len(sys.argv) > 1) and (sys.argv[1] == "restart"):
          print(f"****** Clear {PrefectLocalResultDir} ...")
          ClearDirectory(PrefectLocalResultDir)
    
      with Flow("示例: 四則運算") as flow:
          addResult = TaskAdd(2, 1)
          subResult = TaskSubtract(addResult)
          mulResult = TaskMultiply(addResult)
          TaskDivide(subResult, mulResult)
    
      flow_state = flow.run()

執行

  • 第一次執行(注意第一個計算結果的 state 是 Success)

    # 執行命令
    poetry run python ./test_prefect.py
    # 結果輸出
    [2022-02-24 17:35:48+0800] INFO - prefect.FlowRunner | Beginning Flow run for '示例: 四則運算'
    [2022-02-24 17:35:49+0800] INFO - prefect.TaskRunner | Task 'TaskAdd': Starting task run...
    [2022-02-24 17:35:49+0800] INFO - prefect | 2 + 1 = 3
    [2022-02-24 17:35:49+0800] INFO - prefect.TaskRunner | Task 'TaskAdd': Finished task run for task with final state: 'Success'
    [2022-02-24 17:35:49+0800] INFO - prefect.TaskRunner | Task 'TaskMultiply': Starting task run...
    [2022-02-24 17:35:49+0800] INFO - prefect.TaskRunner | Task 'TaskSubtract': Starting task run...
    [2022-02-24 17:35:49+0800] INFO - prefect | 3 * 2 = 6
    [2022-02-24 17:35:49+0800] INFO - prefect | 3 - 1 = 2
    [2022-02-24 17:35:49+0800] INFO - prefect.TaskRunner | Task 'TaskMultiply': Finished task run for task with final state: 'Success'
    [2022-02-24 17:35:49+0800] INFO - prefect.TaskRunner | Task 'TaskSubtract': Finished task run for task with final state: 'Success'
    [2022-02-24 17:35:49+0800] INFO - prefect.TaskRunner | Task 'TaskDivide': Starting task run...
    [2022-02-24 17:35:49+0800] INFO - prefect | 6 / 2 = 3.0
    [2022-02-24 17:35:49+0800] INFO - prefect.TaskRunner | result: 3.0
    [2022-02-24 17:35:49+0800] INFO - prefect.TaskRunner | Task 'TaskDivide': Finished task run for task with final state: 'Success'
    [2022-02-24 17:35:49+0800] INFO - prefect.FlowRunner | Flow run SUCCESS: all reference tasks succeeded
  • 第二次執行(注意第一個計算結果的 state 是 cached)

    # 執行命令
    poetry run python ./test_prefect.py
    # 結果輸出
    [2022-02-24 17:36:47+0800] INFO - prefect.FlowRunner | Beginning Flow run for '示例: 四則運算'
    [2022-02-24 17:36:47+0800] INFO - prefect.TaskRunner | Task 'TaskAdd': Starting task run...
    [2022-02-24 17:36:47+0800] INFO - prefect.TaskRunner | Task 'TaskAdd': Finished task run for task with final state: 'Cached'
    [2022-02-24 17:36:47+0800] INFO - prefect.TaskRunner | Task 'TaskMultiply': Starting task run...
    [2022-02-24 17:36:47+0800] INFO - prefect.TaskRunner | Task 'TaskSubtract': Starting task run...
    [2022-02-24 17:36:47+0800] INFO - prefect | 3 * 2 = 6
    [2022-02-24 17:36:47+0800] INFO - prefect | 3 - 1 = 2
    [2022-02-24 17:36:47+0800] INFO - prefect.TaskRunner | Task 'TaskMultiply': Finished task run for task with final state: 'Success'
    [2022-02-24 17:36:47+0800] INFO - prefect.TaskRunner | Task 'TaskSubtract': Finished task run for task with final state: 'Success'
    [2022-02-24 17:36:47+0800] INFO - prefect.TaskRunner | Task 'TaskDivide': Starting task run...
    [2022-02-24 17:36:47+0800] INFO - prefect | 6 / 2 = 3.0
    [2022-02-24 17:36:47+0800] INFO - prefect.TaskRunner | result: 3.0
    [2022-02-24 17:36:47+0800] INFO - prefect.TaskRunner | Task 'TaskDivide': Finished task run for task with final state: 'Success'
    [2022-02-24 17:36:47+0800] INFO - prefect.FlowRunner | Flow run SUCCESS: all reference tasks succeeded

靜態 DAG 圖

  • 官方文件:https://docs.prefect.io/core/...
  • 下載 Graphviz 並配置到 PATH 環境變數
  • 修改 pyproject.toml,新增 viz extra,然後執行 poetry update -vvv

    [tool.poetry.dependencies]
    python = "^3.8"
    prefect = { version = "~1.0.0", extras = ["viz"] }
  • 修改 test_prefect.py 的主函式

    if __name__ == '__main__':
      if (len(sys.argv) > 1) and (sys.argv[1] == "restart"):
          print(f"****** Clear {PrefectLocalResultDir} ...")
          ClearDirectory(PrefectLocalResultDir)
    
      with Flow("示例: 四則運算") as flow:
          addResult = TaskAdd(2, 1)
          subResult = TaskSubtract(addResult)
          mulResult = TaskMultiply(addResult)
          TaskDivide(subResult, mulResult)
    
      flow.visualize(filename='flow_start', format='png')
      flow_state = flow.run()
      flow.visualize(flow_state=flow_state, filename='flow_end', format='png')
  • 執行程式碼會生成 flow_start.png flow_end.png 兩張圖片

    poetry run python ./test_prefect.py restart
  • flow_start.png
    flow_start.png
  • flow_end.png
    flow_end.png
  • 顏色代表的狀態:https://docs.prefect.io/api/l...
    image.png
本文出自 qbit snap

相關文章