ArgoWorkflow教程(七)---高效的步驟間檔案共享策略

探索云原生發表於2024-10-22

argoworkflow-7-file-share-between-steps.png

之前我們分析了使用 artifact 實現步驟間檔案共享,今天分享一下如何使用 PVC 實現高效的步驟間檔案共享。

1. 概述

之前在 artifact 篇我們演示瞭如何使用 artifact 實現步驟間檔案傳遞,今天介紹一種更為簡單的檔案傳遞方式:PVC 共享

artifact 畢竟是藉助 S3 實現中轉,效率上肯定是低於直接共享 PVC 的,而且 artifact 一般用於結果輸出,將最終結果儲存到 S3,而不是單純的用來共享檔案。

2. 使用 artifact 共享檔案

之前已經分享過了怎麼透過 artifact 在不同步驟之間傳遞檔案,這裡在回顧一下。

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: artifact-passing-
spec:
  entrypoint: artifact-example
  templates:
  - name: artifact-example
    steps:
    - - name: generate-artifact
        template: whalesay
    - - name: consume-artifact
        template: print-message
        arguments:
          artifacts:
          # bind message to the hello-art artifact
          # generated by the generate-artifact step
          - name: message
            from: "{{steps.generate-artifact.outputs.artifacts.hello-art}}"

  - name: whalesay
    container:
      image: docker/whalesay:latest
      command: [sh, -c]
      args: ["cowsay hello world | tee /tmp/hello_world.txt"]
    outputs:
      artifacts:
      # generate hello-art artifact from /tmp/hello_world.txt
      # artifacts can be directories as well as files
      - name: hello-art
        path: /tmp/hello_world.txt

  - name: print-message
    inputs:
      artifacts:
      # unpack the message input artifact
      # and put it at /tmp/message
      - name: message
        path: /tmp/message
    container:
      image: alpine:latest
      command: [sh, -c]
      args: ["cat /tmp/message"]

可以看到,artifact 方式共享檔案步驟間傳遞引數是比較類似。

匯出 artifact

outputs:
  artifacts:
  # generate hello-art artifact from /tmp/hello_world.txt
  # artifacts can be directories as well as files
  - name: hello-art
    path: /tmp/hello_world.txt

後續步驟引用匯出的 artifact

arguments:
  artifacts:
  # bind message to the hello-art artifact
  # generated by the generate-artifact step
  - name: message
    from: "{{steps.generate-artifact.outputs.artifacts.hello-art}}"

以及步驟中怎麼將 artifact 引入,比如下面 demo 則是將 artifact 做為 /tmp/message 掛載到 Pod 中。

inputs:
  artifacts:
  # unpack the message input artifact
  # and put it at /tmp/message
  - name: message
    path: /tmp/message

3. 使用 PVC 高效共享檔案

顧名思義,就是不同步驟都掛載同一個 PVC,這樣自然就實現了檔案共享。

ArgoWorkflow 中的每一步都會單獨啟動一個 Pod 來執行

也有兩種方式:

  • 1)動態建立 PVC:Workflow 執行時建立 PVC,執行結束後刪除 PVC
  • 2)使用已有 PVC:不會建立也不會刪除

動態建立 PVC

完整 Demo 如下:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: volumes-pvc-
spec:
  entrypoint: volumes-pvc-example
  volumeClaimTemplates:                 # define volume, same syntax as k8s Pod spec
  - metadata:
      name: workdir                     # name of volume claim
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 1Gi                  # Gi => 1024 * 1024 * 1024

  templates:
  - name: volumes-pvc-example
    steps:
    - - name: generate
        template: whalesay
    - - name: print
        template: print-message

  - name: whalesay
    container:
      image: docker/whalesay:latest
      command: [sh, -c]
      args: ["echo generating message in volume; cowsay hello world | tee /mnt/vol/hello_world.txt"]
      # Mount workdir volume at /mnt/vol before invoking docker/whalesay
      volumeMounts:                     # same syntax as k8s Pod spec
      - name: workdir
        mountPath: /mnt/vol

  - name: print-message
    container:
      image: alpine:latest
      command: [sh, -c]
      args: ["echo getting message from volume; find /mnt/vol; cat /mnt/vol/hello_world.txt"]
      # Mount workdir volume at /mnt/vol before invoking docker/whalesay
      volumeMounts:                     # same syntax as k8s Pod spec
      - name: workdir
        mountPath: /mnt/vol

首先定義了一個 PVC 模版,Workflow 執行時就會使用該模版建立一個 PVC

spec:
  entrypoint: volumes-pvc-example
  volumeClaimTemplates:                 # define volume, same syntax as k8s Pod spec
  - metadata:
      name: workdir                     # name of volume claim
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 1Gi                  # Gi => 1024 * 1024 * 1024

然後其他步驟需要將該 PVC 掛載到對應目錄

  - name: whalesay
    container:
      image: docker/whalesay:latest
      command: [sh, -c]
      args: ["echo generating message in volume; cowsay hello world | tee /mnt/vol/hello_world.txt"]
      # Mount workdir volume at /mnt/vol before invoking docker/whalesay
      volumeMounts:                     # same syntax as k8s Pod spec
      - name: workdir
        mountPath: /mnt/vol

  - name: print-message
    container:
      image: alpine:latest
      command: [sh, -c]
      args: ["echo getting message from volume; find /mnt/vol; cat /mnt/vol/hello_world.txt"]
      # Mount workdir volume at /mnt/vol before invoking docker/whalesay
      volumeMounts:                     # same syntax as k8s Pod spec
      - name: workdir
        mountPath: /mnt/vol

這樣就實現了檔案共享,非常簡單。

等 Workflow 執行結束後,Argo 會自動將建立出的 PVC 刪除。

使用已有 PVC

在某些情況下,我們可以希望訪問已經存在的卷,而不是動態建立/銷燬卷。

完整 Demo 如下:

# Define Kubernetes PVC
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: my-existing-volume
spec:
  accessModes: [ "ReadWriteOnce" ]
  resources:
    requests:
      storage: 1Gi

---
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: volumes-existing-
spec:
  entrypoint: volumes-existing-example
  volumes:
  # Pass my-existing-volume as an argument to the volumes-existing-example template
  # Same syntax as k8s Pod spec
  - name: workdir
    persistentVolumeClaim:
      claimName: my-existing-volume

  templates:
  - name: volumes-existing-example
    steps:
    - - name: generate
        template: whalesay
    - - name: print
        template: print-message

  - name: whalesay
    container:
      image: docker/whalesay:latest
      command: [sh, -c]
      args: ["echo generating message in volume; cowsay hello world | tee /mnt/vol/hello_world.txt"]
      volumeMounts:
      - name: workdir
        mountPath: /mnt/vol

  - name: print-message
    container:
      image: alpine:latest
      command: [sh, -c]
      args: ["echo getting message from volume; find /mnt/vol; cat /mnt/vol/hello_world.txt"]
      volumeMounts:
      - name: workdir
        mountPath: /mnt/vol

首先就是手動建立一個 PVC

# Define Kubernetes PVC
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: my-existing-volume
spec:
  accessModes: [ "ReadWriteOnce" ]
  resources:
    requests:
      storage: 1Gi

然後在 Workflow 中定義要使用這個 PVC

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: volumes-existing-
spec:
  entrypoint: volumes-existing-example
  volumes:
  # Pass my-existing-volume as an argument to the volumes-existing-example template
  # Same syntax as k8s Pod spec
  - name: workdir
    persistentVolumeClaim:
      claimName: my-existing-volume

可以看做是使用 persistentVolumeClaim 來替換了之前的 volumeClaimTemplates

然後就是步驟將這個 PVC 掛載到對應目錄

  - name: whalesay
    container:
      image: docker/whalesay:latest
      command: [sh, -c]
      args: ["echo generating message in volume; cowsay hello world | tee /mnt/vol/hello_world.txt"]
      volumeMounts:
      - name: workdir
        mountPath: /mnt/vol

  - name: print-message
    container:
      image: alpine:latest
      command: [sh, -c]
      args: ["echo getting message from volume; find /mnt/vol; cat /mnt/vol/hello_world.txt"]
      volumeMounts:
      - name: workdir
        mountPath: /mnt/vol

這一步和使用動態建立 PVC 時沒有任何變化。


【ArgoWorkflow 系列】持續更新中,搜尋公眾號【探索雲原生】訂閱,閱讀更多文章。


4. 小結

本文主要分析了 Argo 中的 Workflow 中怎麼使用 PVC 共享檔案。

  • 1)定義 PVC 模版或者指定使用已有的 PVC
  • 2)步驟中將 PVC 掛載到對應目錄使用

相關文章