25-pod-Disruptions

cucytoman發表於2019-10-16

concepts/workloads/pods/disruptions/

This guide is for application owners who want to build highly available applications, and thus need to understand what types of Disruptions can happen to Pods. 本指南適用於希望構建高可用性應用程式的應用程式所有者,因此需要了解pod可能會發生哪些型別的中斷。

It is also for Cluster Administrators who want to perform automated cluster actions, like upgrading and autoscaling clusters. 它也適用於希望執行自動群集操作(如升級和自動調整群集)的群集管理員。

Voluntary and Involuntary Disruptions 自願和非自願中斷

Pods do not disappear until someone (a person or a controller) destroys them, or there is an unavoidable hardware or system software error. 直到有人(一個人或一個控制器)摧毀它們,或者出現不可避免的硬體或系統軟體錯誤,pod才會消失。

We call these unavoidable cases involuntary disruptions to an application. Examples are 我們稱這些不可避免的情況為應用程式的非自願中斷。例如:

  • a hardware failure of the physical machine backing the node 備份節點的物理計算機的硬體故障
  • cluster administrator deletes VM (instance) by mistake 群集管理器錯誤地刪除虛擬機器(例項)
  • cloud provider or hypervisor failure makes VM disappear 雲提供程式或虛擬機器監控程式故障使虛擬機器消失
  • a kernel panic 核心恐慌
  • the node disappears from the cluster due to cluster network partition 由於群集網路分割槽,節點從群集中消失
  • eviction of a pod due to the node being out-of-resources. 由於節點資源不足而逐出pod。

Except for the out-of-resources condition, all these conditions should be familiar to most users; they are not specific to Kubernetes. 除了資源不足的情況外,大多數使用者都應該熟悉所有這些情況;它們不是kubernetes特有的。

We call other cases voluntary disruptions. These include both actions initiated by the application owner and those initiated by a Cluster Administrator. Typical application owner actions include: 我們稱其他案件為“自願中斷”。這些操作包括由應用程式所有者啟動的操作和由群集管理員啟動的操作。典型的應用程式所有者操作包括:

  • deleting the deployment or other controller that manages the pod 刪除部署或管理pod的其他控制器
  • updating a deployment’s pod template causing a restart 更新部署的pod模板導致重新啟動
  • directly deleting a pod (e.g. by accident) 直接刪除POD(例如,意外刪除)

Cluster Administrator actions include: 群集管理器操作包括:

  • Draining a node for repair or upgrade. 正在排出節點以進行修復或升級。
  • Draining a node from a cluster to scale the cluster down (learn about Cluster Autoscaling ). 從群集中排出節點以縮小群集(瞭解群集自動縮放)。
  • Removing a pod from a node to permit something else to fit on that node. 從一個節點上移除一個pod以允許在該節點上安裝其他東西。

These actions might be taken directly by the cluster administrator, or by automation run by the cluster administrator, or by your cluster hosting provider. 這些操作可以由群集管理器直接執行,也可以由群集管理器執行的自動化或由群集宿主提供程式執行。

Ask your cluster administrator or consult your cloud provider or distribution documentation to determine if any sources of voluntary disruptions are enabled for your cluster. If none are enabled, you can skip creating Pod Disruption Budgets. 請詢問群集管理員或諮詢雲提供商或分發文件,以確定是否為群集啟用了任何自願中斷源。如果未啟用,則可以跳過建立pod中斷預算。

Caution: Not all voluntary disruptions are constrained by Pod Disruption Budgets. For example, deleting deployments or pods bypasses Pod Disruption Budgets. 警告:並非所有的自願中斷都受到POD中斷預算的限制。例如,刪除部署或pod會繞過pod中斷預算。

Dealing with Disruptions

Here are some ways to mitigate involuntary disruptions 以下是一些減輕非自願中斷的方法:

  • Ensure your pod requests the resources it needs. 確保你的吊艙請求它需要的資源。
  • Replicate your application if you need higher availability. (Learn about running replicated stateless and stateful applications.) 如果需要更高的可用性,請複製應用程式。(瞭解如何執行復制的無狀態和有狀態應用程式。)
  • For even higher availability when running replicated applications, spread applications across racks (using anti-affinity) or across zones (if using a multi-zone cluster.) 為了在執行復制應用程式時獲得更高的可用性,請跨機架(使用反關聯)或跨區域(如果使用多區域群集)分佈應用程式。

The frequency of voluntary disruptions varies. On a basic Kubernetes cluster, there are no voluntary disruptions at all. However, your cluster administrator or hosting provider may run some additional services which cause voluntary disruptions. For example, rolling out node software updates can cause voluntary disruptions. Also, some implementations of cluster (node) autoscaling may cause voluntary disruptions to defragment and compact nodes. Your cluster administrator or hosting provider should have documented what level of voluntary disruptions, if any, to expect. 自願中斷的頻率各不相同。在一個基本的kubernetes叢集上,根本就沒有自願的中斷。但是,您的群集管理器或宿主提供程式可能會執行一些附加服務,從而導致自願中斷。例如,推出節點軟體更新可能會導致自願中斷。此外,群集(節點)自動縮放的某些實現可能會導致自動中斷碎片整理和壓縮節點。您的叢集管理員或主機提供商應該記錄下預期的自願中斷(如果有的話)級別。

Kubernetes offers features to help run highly available applications at the same time as frequent voluntary disruptions. We call this set of features Disruption Budgets. kubernetes提供的特性有助於在頻繁的自願中斷的同時執行高可用的應用程式。我們稱這組功能為中斷預算。

How Disruption Budgets Work 中斷預算的工作原理

An Application Owner can create a PodDisruptionBudget object (PDB) for each application. A PDB limits the number of pods of a replicated application that are down simultaneously from voluntary disruptions. For example, a quorum-based application would like to ensure that the number of replicas running is never brought below the number needed for a quorum. A web front end might want to ensure that the number of replicas serving load never falls below a certain percentage of the total. 應用程式所有者可以為每個應用程式建立一個podsruptionbudget物件(pdb)。pdb限制了一個複製應用程式的pod的數量,這些pod同時從自願中斷中減少。例如,基於仲裁的應用程式希望確保執行的副本數量永遠不會低於仲裁所需的數量。Web前端可能希望確保為負載提供服務的副本數永遠不會低於總副本數的某個百分比。

Cluster managers and hosting providers should use tools which respect Pod Disruption Budgets by calling the Eviction API instead of directly deleting pods or deployments. Examples are the kubectl drain command and the Kubernetes-on-GCE cluster upgrade script (cluster/gce/upgrade.sh). 叢集管理器和宿主提供者應該使用尊重pod中斷預算的工具,方法是呼叫逐出api,而不是直接刪除pod或部署。例如kubectl drain命令和gce cluster upgrade指令碼(cluster/gce/upgrade.sh)上的kubernetes。

When a cluster administrator wants to drain a node they use the kubectl drain command. That tool tries to evict all the pods on the machine. The eviction request may be temporarily rejected, and the tool periodically retries all failed requests until all pods are terminated, or until a configurable timeout is reached. 當叢集管理員想要排出一個節點時,他們使用'kubectl drain'命令。那個工具試圖把機器上所有的pod都趕走。收回請求可能會被暫時拒絕,工具會定期重試所有失敗的請求,直到所有pod終止,或者直到達到可配置的超時。

A PDB specifies the number of replicas that an application can tolerate having, relative to how many it is intended to have. For example, a Deployment which has a .spec.replicas: 5 is supposed to have 5 pods at any given time. If its PDB allows for there to be 4 at a time, then the Eviction API will allow voluntary disruption of one, but not two pods, at a time. pdb指定一個應用程式可以容忍的副本數量,相對於它打算擁有的副本數量。例如,具有.spec.replicas:5的部署在任何給定時間都應該有5個pod。如果pdb允許一次有4個pod,那麼逐出api將允許一次自願中斷一個pod,而不是兩個pod。

The group of pods that comprise the application is specified using a label selector, the same as the one used by the application’s controller (deployment, stateful-set, etc). 組成應用程式的pod組是使用標籤選擇器指定的,與應用程式的控制器(部署、有狀態集等)使用的標籤選擇器相同。

The “intended” number of pods is computed from the .spec.replicas of the pods controller. The controller is discovered from the pods using the .metadata.ownerReferences of the object. pods的“預期”數量是根據pods控制器的.spec.replicas計算的。控制器是使用物件的.metadata.owner引用從pods中發現的。

PDBs cannot prevent involuntary disruptions from occurring, but they do count against the budget. PDB無法防止非自願中斷的發生,但它們確實計入預算。

Pods which are deleted or unavailable due to a rolling upgrade to an application do count against the disruption budget, but controllers (like deployment and stateful-set) are not limited by PDBs when doing rolling upgrades – the handling of failures during application updates is configured in the controller spec. (Learn about updating a deployment.) 由於應用程式的滾動升級而被刪除或不可用的播客將計入中斷預算,但控制器(如部署和狀態集)在進行滾動升級時不受pdb的限制,應用程式更新期間的故障處理在控制器規範中進行了配置(瞭解如何更新部署)。

When a pod is evicted using the eviction API, it is gracefully terminated (see terminationGracePeriodSeconds in PodSpec.) 當使用逐出api逐出pod時,該pod將被優雅地終止(請參閱podspec中的終止寬限期秒數)。

PDB Example

Consider a cluster with 3 nodes, node-1 through node-3. The cluster is running several applications. One of them has 3 replicas initially called pod-a, pod-b, and pod-c. Another, unrelated pod without a PDB, called pod-x, is also shown. Initially, the pods are laid out as follows: 考慮一個有3個節點的叢集,節點1到節點3。群集正在執行多個應用程式。其中一個有3個最初稱為pod-a、pod-b和pod-c的複製品,另一個沒有pdb的無關pod,稱為pod-x。最初,吊艙佈置如下:

node-1 node-2 node-3
pod-a available pod-b available pod-c available
pod-x available

All 3 pods are part of a deployment, and they collectively have a PDB which requires there be at least 2 of the 3 pods to be available at all times. 所有3個pod都是部署的一部分,它們共同擁有一個pdb,該pdb要求3個pod中至少有2個始終可用。

For example, assume the cluster administrator wants to reboot into a new kernel version to fix a bug in the kernel. The cluster administrator first tries to drain node-1 using the kubectl drain command. That tool tries to evict pod-a and pod-x. This succeeds immediately. Both pods go into the terminating state at the same time. This puts the cluster in this state: 例如,假設叢集管理員希望重新啟動到新的核心版本以修復核心中的錯誤。群集管理器首先嚐試使用“kubectl drain”命令排出“node-1”。那個工具試圖逐出“pod-a”和“pod-x”。這很快就成功了。兩個pod同時進入“終止”狀態。這將使群集處於以下狀態:

node-1 draining node-2 node-3
pod-a terminating pod-b available pod-c available
pod-x terminating

The deployment notices that one of the pods is terminating, so it creates a replacement called pod-d. Since node-1 is cordoned, it lands on another node. Something has also created pod-y as a replacement for pod-x. 部署會注意到其中一個pod正在終止,因此它會建立一個名為pod-d的替換項。由於node-1被封鎖,因此它會落在另一個節點上。一些東西也創造了pod-y作為pod-x的替代品。

(Note: for a StatefulSet, pod-a, which would be called something like pod-0, would need to terminate completely before its replacement, which is also called pod-0 but has a different UID, could be created. Otherwise, the example applies to a StatefulSet as well.) (注意:對於statefulset,pod-a(類似於'pod-0)需要完全終止,然後才能建立其替換項(也稱為'pod-0,但具有不同的uid)。否則,該示例也適用於statefulset。)

Now the cluster is in this state:

node-1 draining node-2 node-3
pod-a terminating pod-b available pod-c available
pod-x terminating pod-d starting pod-y

At some point, the pods terminate, and the cluster looks like this: 在某個點上,pods終止,叢集如下所示:

node-1 drained node-2 node-3
pod-b available pod-c available
pod-d starting pod-y

At this point, if an impatient cluster administrator tries to drain node-2 or node-3, the drain command will block, because there are only 2 available pods for the deployment, and its PDB requires at least 2. After some time passes, pod-d becomes available. 此時,如果不耐煩的叢集管理員試圖耗盡node-2或node-3,則drain命令將被阻塞,因為只有2個pod可用於部署,而其pdb至少需要2個。經過一段時間後,POD-D變得可用。

The cluster state now looks like this: 群集狀態現在如下所示:

node-1 drained node-2 node-3
pod-b available pod-c available
pod-d available pod-y

Now, the cluster administrator tries to drain node-2. The drain command will try to evict the two pods in some order, say pod-b first and then pod-d. It will succeed at evicting pod-b. But, when it tries to evict pod-d, it will be refused because that would leave only one pod available for the deployment. 現在,群集管理器嘗試排出node-2。排水命令將嘗試按一定順序逐出兩個吊艙,例如先逐出吊艙-B,然後逐出吊艙-D。它將成功逐出吊艙-B。但是,當它試圖逐出吊艙-D時,將被拒絕,因為這將只留下一個吊艙可供部署。

The deployment creates a replacement for pod-b called pod-e. Because there are not enough resources in the cluster to schedule pod-e the drain will again block. The cluster may end up in this state: 部署建立了一個名為pod-e的pod-b的替代品。由於叢集中沒有足夠的資源來排程pod-e,因此消耗將再次阻塞。群集可能最終處於以下狀態:

node-1 drained node-2 node-3 no node
pod-b available pod-c available pod-e pending
pod-d available pod-y

At this point, the cluster administrator needs to add a node back to the cluster to proceed with the upgrade. 此時,群集管理器需要將節點新增回群集才能繼續升級。

You can see how Kubernetes varies the rate at which disruptions can happen, according to: 您可以看到kubernetes如何改變中斷髮生的速率,根據:

  • how many replicas an application needs 一個應用程式需要多少副本
  • how long it takes to gracefully shutdown an instance 優雅地關閉例項需要多長時間
  • how long it takes a new instance to start up 新例項啟動需要多長時間
  • the type of controller 控制器型別
  • the cluster’s resource capacity 群集的資源容量

Separating Cluster Owner and Application Owner Roles 分離群集所有者和應用程式所有者角色

Often, it is useful to think of the Cluster Manager and Application Owner as separate roles with limited knowledge of each other. This separation of responsibilities may make sense in these scenarios: 通常,將叢集管理器和應用程式所有者視為相互瞭解有限的單獨角色是有用的。這種職責分離在以下情況下可能是有意義的:

  • when there are many application teams sharing a Kubernetes cluster, and there is natural specialization of roles 當有許多應用程式團隊共享一個kubernetes叢集時,角色自然會專門化
  • when third-party tools or services are used to automate cluster management 當第三方工具或服務用於自動化群集管理時

Pod Disruption Budgets support this separation of roles by providing an interface between the roles. pod中斷預算通過提供角色之間的介面來支援角色的分離。

If you do not have such a separation of responsibilities in your organization, you may not need to use Pod Disruption Budgets. 如果您的組織中沒有這樣的職責分離,您可能不需要使用pod中斷預算。

How to perform Disruptive Actions on your Cluster 如何在群集上執行中斷操作

If you are a Cluster Administrator, and you need to perform a disruptive action on all the nodes in your cluster, such as a node or system software upgrade, here are some options: 如果您是群集管理員,並且需要對群集中的所有節點執行中斷操作,例如節點或系統軟體升級,則以下是一些選項:

  • Accept downtime during the upgrade. 接受升級期間的停機時間。
  • Failover to another complete replica cluster. 故障轉移到另一個完整的副本群集。
    • No downtime, but may be costly both for the duplicated nodes and for human effort to orchestrate the switchover. 沒有停機時間,但對於複製的節點和人為安排切換的工作來說都可能代價高昂。
  • Write disruption tolerant applications and use PDBs. 編寫抗中斷應用程式並使用pdb。
    • No downtime. 沒有停機時間。
    • Minimal resource duplication. 最少的資源重複。
    • Allows more automation of cluster administration. 允許更自動化的群集管理。
    • Writing disruption-tolerant applications is tricky, but the work to tolerate voluntary disruptions largely overlaps with work to support autoscaling and tolerating involuntary disruptions. 編寫能夠容忍中斷的應用程式是很棘手的,但是容忍自願中斷的工作與支援自動縮放和容忍非自願中斷的工作在很大程度上是重疊的。

What's next

Feedback

Was this page helpful?

本作品採用《CC 協議》,轉載必須註明作者和本文連結