30-StatefulSets

cucytoman發表於2019-10-28

concepts/workloads/controllers/statefulset/

StatefulSet is the workload API object used to manage stateful applications. statefulset是用於管理有狀態應用程式的工作負載api物件。

Manages the deployment and scaling of a set of Pods, and provides guarantees about the ordering and uniqueness of these Pods. 管理一組pod,並保證這些pod的有序性和唯一性

Like a Deployment, a StatefulSet manages Pods that are based on an identical container spec. Unlike a Deployment, a StatefulSet maintains a sticky identity for each of their Pods. These pods are created from the same spec, but are not interchangeable: each has a persistent identifier that it maintains across any rescheduling. 像部署一樣,statefulset管理基於相同容器規範的pod。與部署不同,statefulset為每個pod維護一個粘性標識。這些pod是從同一個規範建立的,但不能互換:每個pod都有一個持久識別符號,它在任何重新排程中都會維護這個識別符號。

Using StatefulSets

StatefulSets are valuable for applications that require one or more of the following. statefulset對於需要以下一項或多項的應用程式很有價值。

  • Stable, unique network identifiers. 穩定、唯一的網路識別符號。
  • Stable, persistent storage. 穩定持久的儲存。
  • Ordered, graceful deployment and scaling. 有序、優雅的部署和擴充套件。
  • Ordered, automated rolling updates. 有序、自動滾動更新。

In the above, stable is synonymous with persistence across Pod (re)scheduling. If an application doesn’t require any stable identifiers or ordered deployment, deletion, or scaling, you should deploy your application using a workload object that provides a set of stateless replicas. Deployment or ReplicaSet may be better suited to your stateless needs. 在上面,stable是pod(re)排程中永續性的同義詞。如果應用程式不需要任何穩定的識別符號或有序的部署、刪除或擴充套件,則應使用提供一組無狀態副本的工作負載物件來部署應用程式。部署複製集可能更適合您的無狀態需求。

Limitations

  • The storage for a given Pod must either be provisioned by a PersistentVolume Provisioner based on the requested storage class, or pre-provisioned by an admin. 給定pod的儲存必須由基於請求的“儲存類”的PersistentVolume Provisioner提供,或者由管理員預先提供。
  • Deleting and/or scaling a StatefulSet down will not delete the volumes associated with the StatefulSet. This is done to ensure data safety, which is generally more valuable than an automatic purge of all related StatefulSet resources. 刪除和/或縮放statefulset down將刪除與statefulset關聯的卷。這樣做是為了確保資料安全,這通常比自動清除所有相關的statefulset資源更有價值。
  • StatefulSets currently require a Headless Service to be responsible for the network identity of the Pods. You are responsible for creating this Service. statefulset當前需要一個無頭服務來負責pods的網路標識。您負責建立此服務。
  • StatefulSets do not provide any guarantees on the termination of pods when a StatefulSet is deleted. To achieve ordered and graceful termination of the pods in the StatefulSet, it is possible to scale the StatefulSet down to 0 prior to deletion. statefulset不保證在刪除statefulset時終止pods。為了實現statefulset中pods的有序和優雅終止,可以在刪除之前將statefulset縮小到0。
  • When using Rolling Updates with the default Pod Management Policy (OrderedReady), it’s possible to get into a broken state that requires manual intervention to repair. 使用帶有預設pod管理策略(orderedready)的滾動更新時,可能會進入需要手動干預才能修復的斷開狀態。

Components

The example below demonstrates the components of a StatefulSet. 下面的示例演示statefulset的元件。

  • A Headless Service, named nginx, is used to control the network domain. 一個名為nginx的無頭服務用於控制網路域。
  • The StatefulSet, named web, has a Spec that indicates that 3 replicas of the nginx container will be launched in unique Pods. statefulset名為web,它有一個規範,指出nginx容器的3個副本將在惟一的pods中啟動。
  • The volumeClaimTemplates will provide stable storage using PersistentVolumes provisioned by a PersistentVolume Provisioner. VolumeClaimTemplates將使用由PersistentVolume Provisioner配置的PersistentVolumes提供穩定的儲存。
apiVersion: v1
kind: Service
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  ports:
  - port: 80
    name: web
  clusterIP: None
  selector:
    app: nginx
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web
spec:
  selector:
    matchLabels:
      app: nginx # has to match .spec.template.metadata.labels
  serviceName: "nginx"
  replicas: 3 # by default is 1
  template:
    metadata:
      labels:
        app: nginx # has to match .spec.selector.matchLabels
    spec:
      terminationGracePeriodSeconds: 10
      containers:
      - name: nginx
        image: k8s.gcr.io/nginx-slim:0.8
        ports:
        - containerPort: 80
          name: web
        volumeMounts:
        - name: www
          mountPath: /usr/share/nginx/html
  volumeClaimTemplates:
  - metadata:
      name: www
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: "my-storage-class"
      resources:
        requests:
          storage: 1Gi

Pod Selector

You must set the .spec.selector field of a StatefulSet to match the labels of its .spec.template.metadata.labels. Prior to Kubernetes 1.8, the .spec.selector field was defaulted when omitted. In 1.8 and later versions, failing to specify a matching Pod Selector will result in a validation error during StatefulSet creation. 必須設定statefulset的.spec.selector欄位,使其與其.spec.template.metadata.labels的標籤匹配。在kubernetes 1.8之前,.spec.selector欄位在省略時是預設的。在1.8及更高版本中,如果未能指定匹配的pod選擇器,將在statefulset建立期間導致驗證錯誤。

Pod Identity

StatefulSet Pods have a unique identity that is comprised of an ordinal, a stable network identity, and stable storage. The identity sticks to the Pod, regardless of which node it’s (re)scheduled on. statefulset pod有一個唯一的標識,它由序數、穩定的網路標識和穩定的儲存組成。無論它在哪個節點上(重新)排程,標識都會固定在pod上。

Ordinal Index

For a StatefulSet with N replicas, each Pod in the StatefulSet will be assigned an integer ordinal, from 0 up through N-1, that is unique over the Set. 對於具有n個副本的statefulset,statefulset中的每個pod將被分配一個整數序數,從0到n-1,在該集合上是唯一的。

Stable Network ID

Each Pod in a StatefulSet derives its hostname from the name of the StatefulSet and the ordinal of the Pod. The pattern for the constructed hostname is $(statefulset name)-$(ordinal). The example above will create three Pods named web-0,web-1,web-2. A StatefulSet can use a Headless Service to control the domain of its Pods. The domain managed by this Service takes the form: $(service name).$(namespace).svc.cluster.local, where “cluster.local” is the cluster domain. As each Pod is created, it gets a matching DNS subdomain, taking the form: $(podname).$(governing service domain), where the governing service is defined by the serviceName field on the StatefulSet. statefulset中的每個pod從statefulset的名稱和pod的序號派生其主機名。構造的主機名的模式是$(statefulset name)-$(ordinal)。上面的示例將建立三個名為web-0、web-1、web-2的pod。statefulset可以使用[headless service](https://kubernetes.io/docs/concepts/servic... networking/service/headless services)來控制其pods的域。此服務管理的域的格式為:$(服務名稱)。$(名稱空間).svc.cluster.local,其中“cluster.local”是群集域。建立每個pod時,它將獲得一個匹配的dns子域,其格式為:$(podname).$(管理服務域),其中管理服務由statefulset上的“servicename”欄位定義。

As mentioned in the limitations section, you are responsible for creating the Headless Service responsible for the network identity of the pods. 如限制部分所述,您負責建立負責pods的網路標識的無頭服務。

Here are some examples of choices for Cluster Domain, Service name, StatefulSet name, and how that affects the DNS names for the StatefulSet’s Pods. 下面是一些選擇群集域、服務名稱、statefulset名稱的示例,以及這如何影響statefulset的pod的dns名稱。

Cluster Domain Service (ns/name) StatefulSet (ns/name) StatefulSet Domain Pod DNS Pod Hostname
cluster.local default/nginx default/web nginx.default.svc.cluster.local web-{0..N-1}.nginx.default.svc.cluster.local web-{0..N-1}
cluster.local foo/nginx foo/web nginx.foo.svc.cluster.local web-{0..N-1}.nginx.foo.svc.cluster.local web-{0..N-1}
kube.local foo/nginx foo/web nginx.foo.svc.kube.local web-{0..N-1}.nginx.foo.svc.kube.local web-{0..N-1}

Note: Cluster Domain will be set to cluster.local unless otherwise configured. 除非另有配置,否則群集域將設定為cluster.local。

Stable Storage

Kubernetes creates one PersistentVolume for each VolumeClaimTemplate. In the nginx example above, each Pod will receive a single PersistentVolume with a StorageClass of my-storage-class and 1 Gib of provisioned storage. If no StorageClass is specified, then the default StorageClass will be used. When a Pod is (re)scheduled onto a node, its volumeMounts mount the PersistentVolumes associated with its PersistentVolume Claims. Note that, the PersistentVolumes associated with the Pods’ PersistentVolume Claims are not deleted when the Pods, or StatefulSet are deleted. This must be done manually. Kubernetes為每個VolumeClaimTemplate建立一個PersistentVolume。在上面的nginx示例中,每個pod將接收一個persistenvolume,其中一個storage class是我的儲存類,另一個是1gib的已配置儲存。如果未指定StorageClass,則將使用預設的StorageClass。當pod被(重新)排程到節點上時,它的volumemounts裝載與其persistenvolume宣告相關聯的persistenvolumes。注意,當pods或statefulset被刪除時,與pods的persistenvolume宣告相關聯的persistenvolumes不會被刪除。這必須手動完成。

Pod Name Label

When the StatefulSet Controller creates a Pod, it adds a label, statefulset.kubernetes.io/pod-name, that is set to the name of the Pod. This label allows you to attach a Service to a specific Pod in the StatefulSet. 當statefulset控制器建立pod時,它會新增一個標籤“statefulset.kubernetes.io/pod name”,該標籤設定為pod的名稱。此標籤允許您將服務附加到statefulset中的特定pod。

Deployment and Scaling Guarantees

  • For a StatefulSet with N replicas, when Pods are being deployed, they are created sequentially, in order from {0..N-1}. 對於具有n個副本的statefulset,在部署pod時,按照{0..n-1}的順序依次建立它們。
  • When Pods are being deleted, they are terminated in reverse order, from {N-1..0}. 當pod被刪除時,它們以相反的順序終止,從
  • Before a scaling operation is applied to a Pod, all of its predecessors must be Running and Ready. 在將縮放操作應用於pod之前,它的所有前置任務都必須執行並準備就緒。
  • Before a Pod is terminated, all of its successors must be completely shutdown. 在終止POD之前,必須完全關閉其所有後續程式。

The StatefulSet should not specify a pod.Spec.TerminationGracePeriodSeconds of 0. This practice is unsafe and strongly discouraged. For further explanation, please refer to force deleting StatefulSet Pods. statefulset不應將“pod.spec.terminationgraceperiodseconds”指定為0。這種做法是不安全的,強烈反對。有關進一步的解釋,請參閱強制刪除statefulset pods。

When the nginx example above is created, three Pods will be deployed in the order web-0, web-1, web-2. web-1 will not be deployed before web-0 is Running and Ready, and web-2 will not be deployed until web-1 is Running and Ready. If web-0 should fail, after web-1 is Running and Ready, but before web-2 is launched, web-2 will not be launched until web-0 is successfully relaunched and becomes Running and Ready. 當建立上述nginx示例時,將按照web-0、web-1和web-2的順序部署三個pod。在web-0執行並準備就緒之前,不會部署web-1,在web-1執行並準備就緒之前,不會部署web-2。如果web-0失敗,則在web-1執行並準備就緒之後,但在web-2啟動之前,web-2將不會啟動,直到web-0成功重新啟動並開始執行並準備就緒。

If a user were to scale the deployed example by patching the StatefulSet such that replicas=1, web-2 would be terminated first. web-1 would not be terminated until web-2 is fully shutdown and deleted. If web-0 were to fail after web-2 has been terminated and is completely shutdown, but prior to web-1’s termination, web-1 would not be terminated until web-0 is Running and Ready. 如果使用者要通過修補statefulset以使replicas=1來擴充套件部署的示例,那麼web-2將首先終止。在完全關閉並刪除web-2之前,web-1不會終止。如果web-0在web-2終止並完全關閉之後失敗,但在web-1終止之前,web-1將不會終止,直到web-0執行並準備就緒。

Pod Management Policies

In Kubernetes 1.7 and later, StatefulSet allows you to relax its ordering guarantees while preserving its uniqueness and identity guarantees via its .spec.podManagementPolicy field. 在Kubernetes 1.7及更高版本中,statefulset允許您放寬其排序保證,同時通過其.spec.podmanagementpolicy欄位保留其唯一性和標識保證。

OrderedReady Pod Management

OrderedReady pod management is the default for StatefulSets. It implements the behavior described above. orderedready pod management是statefulset的預設設定。它實現了上面描述的行為。

Parallel Pod Management

Parallel pod management tells the StatefulSet controller to launch or terminate all Pods in parallel, and to not wait for Pods to become Running and Ready or completely terminated prior to launching or terminating another Pod. This option only affects the behavior for scaling operations. Updates are not affected. 並行pod管理告訴statefulset控制器並行啟動或終止所有pod,不要等到pod執行並準備就緒或完全終止後再啟動或終止另一個pod。此選項僅影響縮放操作的行為。更新不受影響。

Update Strategies

In Kubernetes 1.7 and later, StatefulSet’s .spec.updateStrategy field allows you to configure and disable automated rolling updates for containers, labels, resource request/limits, and annotations for the Pods in a StatefulSet. 在Kubernetes 1.7及更高版本中,statefulset的.spec.updatestrategy欄位允許您配置和禁用statefulset中pods的容器、標籤、資源請求/限制和註釋的自動滾動更新。

On Delete

The OnDelete update strategy implements the legacy (1.6 and prior) behavior. When a StatefulSet’s .spec.updateStrategy.type is set to OnDelete, the StatefulSet controller will not automatically update the Pods in a StatefulSet. Users must manually delete Pods to cause the controller to create new Pods that reflect modifications made to a StatefulSet’s .spec.template. “ondelete”更新策略實現遺留(1.6及以前版本)行為。當statefulset的.spec.updatestrategy.type設定為“ondelete”時,statefulset控制器不會自動更新statefulset中的pods。使用者必須手動刪除pod,以使控制器建立反映對statefulset的.spec.template所做修改的新pod。

Rolling Updates

The RollingUpdate update strategy implements automated, rolling update for the Pods in a StatefulSet. It is the default strategy when .spec.updateStrategy is left unspecified. When a StatefulSet’s .spec.updateStrategy.type is set to RollingUpdate, the StatefulSet controller will delete and recreate each Pod in the StatefulSet. It will proceed in the same order as Pod termination (from the largest ordinal to the smallest), updating each Pod one at a time. It will wait until an updated Pod is Running and Ready prior to updating its predecessor.
rolling update更新策略為statefulset中的pod實現自動的滾動更新。當.spec.updatestregy未指定時,它是預設策略。當statefulset的.spec.updatestrategy.type設定為rollingupdate時,statefulset控制器將刪除並重新建立statefulset中的每個pod。它將按照與pod終止相同的順序(從最大序數到最小序數)進行,每次更新一個pod。它將等到更新後的pod執行並準備就緒後,才能更新其前身。

Partitions

The RollingUpdate update strategy can be partitioned, by specifying a .spec.updateStrategy.rollingUpdate.partition. If a partition is specified, all Pods with an ordinal that is greater than or equal to the partition will be updated when the StatefulSet’s .spec.template is updated. All Pods with an ordinal that is less than the partition will not be updated, and, even if they are deleted, they will be recreated at the previous version. If a StatefulSet’s .spec.updateStrategy.rollingUpdate.partition is greater than its .spec.replicas, updates to its .spec.template will not be propagated to its Pods. In most cases you will not need to use a partition, but they are useful if you want to stage an update, roll out a canary, or perform a phased roll out. 可以通過指定.spec.update strategy.rollingupdate.partition對rollingupdate更新策略進行分割槽。如果指定了分割槽,則當statefulset的.spec.template更新時,序號大於或等於分割槽的所有pod都將更新。序號小於分割槽的所有pod都將不會更新,即使刪除它們,也將在以前的版本中重新建立。如果statefulset的.spec.updatestrategy.rollingupdate.partition大於其.spec.replicas,則對其.spec.template的更新將不會傳播到其pods。在大多數情況下,您不需要使用分割槽,但如果您希望進行更新、展開金絲雀或執行分階段展開,則分割槽非常有用。

Forced Rollback

When using Rolling Updates with the default Pod Management Policy (OrderedReady), it’s possible to get into a broken state that requires manual intervention to repair. 使用帶有預設pod管理策略(orderedready)的滾動更新時,可能會進入需要手動干預才能修復的斷開狀態。

If you update the Pod template to a configuration that never becomes Running and Ready (for example, due to a bad binary or application-level configuration error), StatefulSet will stop the rollout and wait. 如果將pod模板更新為一個永遠不會執行並準備就緒的配置(例如,由於錯誤的二進位制或應用程式級配置錯誤),statefulset將停止卷展並等待。

In this state, it’s not enough to revert the Pod template to a good configuration. Due to a known issue, StatefulSet will continue to wait for the broken Pod to become Ready (which never happens) before it will attempt to revert it back to the working configuration. 在此狀態下,僅將pod模板還原為良好配置是不夠的。由於已知的問題,statefulset將繼續等待損壞的pod準備就緒(這永遠不會發生),然後再嘗試將其還原回工作配置。

After reverting the template, you must also delete any Pods that StatefulSet had already attempted to run with the bad configuration. StatefulSet will then begin to recreate the Pods using the reverted template. 還原模板後,還必須刪除statefulset已嘗試使用錯誤配置執行的任何pod。然後statefulset將開始使用還原的模板重新建立pods。

What's next

Feedback

Was this page helpful?

本作品採用《CC 協議》,轉載必須註明作者和本文連結