Karmada跨叢集優雅故障遷移特性解析

華為雲開發者聯盟發表於2022-11-23

原文網址 : https://www.cnblogs.com/huaweiyun/p/16917787.html

摘要：在 Karmada 最新版本 v1.3中，跨叢集故障遷移特性支援優雅故障遷移，確保遷移過程足夠平滑。

本文分享自華為雲社群《Karmada跨叢集優雅故障遷移特性解析》，作者：Karmada社群。

在多雲多叢集應用場景中，為了提高業務的高可用性，使用者的工作負載可能會被部署在多個叢集中。然而當某個叢集發生故障時，為保證業務的可用性與連續性，使用者希望故障叢集上的工作負載被自動的遷移到其他條件適合的叢集中去，從而達成故障遷移的目的。

Karmada 在 v1.0 版本釋出之前便已支援跨叢集故障遷移能力，經歷過社群多個版本的開發迭代，跨叢集故障遷移能力不斷完善。在 Karmada 最新版本 v1.3 （https://github.com/karmada-io/karmada/tree/release-1.3）中，跨叢集故障遷移特性支援優雅故障遷移，確保遷移過程足夠平滑。

下面我們對該特性展開解析。

▍回顧：單叢集故障遷移

在 Kubernetes 的架構中，Node 作為執行 Pod 例項的單元，不可避免地面臨出現故障的可能性，故障來源不限於自身資源短缺、與 Kubernetes 控制面失去連線等。提供服務的可靠性、在節點故障發生後保持服務的穩定一直是 Kubernetes 關注的重點之一。在 Kubernetes 管理面，當節點出現故障或是使用者不希望在節點上執行 Pod 時，節點狀態將被標記為不可用的狀態，node-controller 會為節點打上汙點，以避免新的例項排程到當前節點上、以及將已有的 Pod 例項遷移到其他節點上。

▍叢集故障判定

相較於單叢集故障遷移，Karmada 的跨叢集故障遷移單位由節點變為了叢集。Karmada 支援Push 和 Pull 兩種模式來管理成員叢集，有關叢集註冊的資訊可以參考Cluster Registration（http://karmada.io/docs/next/userguide/clustermanager/cluster-registration/）。Karmada 根據叢集的心跳來判定叢集當前的狀態。叢集心跳探測有兩種方式：1.叢集狀態收集，更新叢集的 .status 欄位（包括 Push 和 Pull 兩種模式）；2.控制面中 karmada-cluster 名稱空間下的 Lease 物件，每個 Pull 叢集都有一個關聯的 Lease 物件。

叢集狀態收集

對於 Push 叢集，Karmada 控制面中的 clusterStatus-controller 將定期執行叢集狀態的收集任務；對於 Pull 叢集，叢集中部署的 karmada-agent 元件負責建立並定期更新叢集的 .status 欄位。叢集狀態的定期更新任務可以透過 --cluster-status-update-frequency 標籤進行配置（預設值為10秒）。叢集的 Ready 條件在滿足以下條件時將會被設定為 False ：· 叢集持續一段時間無法訪問；· 叢集健康檢查響應持續一段時間不正常。上述持續時間間隔可以透過 --cluster-failure-threshold 標籤進行配置（預設值為30秒）。

叢集 Lease 物件更新

每當有 Pull 叢集加入時，Karmada將為該叢集建立一個 Lease 物件和一個 lease-controller。每個 lease-controller 負責更新對應的 Lease 物件，續租時間可以透過 --cluster-lease-duration 和 --cluster-lease-renew-interval-fraction 標籤進行配置（預設值為10秒）。由於叢集的狀態更新由 clusterStatus-controller 負責維護，因此 Lease 物件的更新過程與叢集狀態的更新過程相互獨立。Karmada 控制面中的 cluster-controller 將每隔 --cluster-monitor-period 時間（預設值為5秒）檢查 Pull 叢集的狀態，當 cluster-controller 在 --cluster-monitor-grace-period 時間段（預設值為40秒）內沒有收到來著叢集的訊息時，叢集的 Ready 條件將被更改為 Unknown 。

檢查叢集狀態

你可以使用 kubectl 命令來檢查叢集的狀態細節：kubectl describe cluster

▍故障遷移過程

叢集汙點新增

當叢集被判定為不健康之後，叢集將會被新增上Effect值為NoSchedule的汙點，具體情況為： · 當叢集 Ready 狀態為 False 時，將被新增如下汙點：key: cluster.karmada.io/not-ready effect: NoSchedule· 當叢集 Ready 狀態為 Unknown 時，將被新增如下汙點：key: cluster.karmada.io/unreachable effect: NoSchedule 如果叢集的不健康狀態持續一段時間（該時間可以透過 --failover-eviction-timeout 標籤進行配置，預設值為5分鐘）仍未恢復，叢集將會被新增上Effect值為NoExecute的汙點，具體情況為：

·當叢集 Ready 狀態為 False 時，將被新增如下汙點：key: cluster.karmada.io/not-ready effect: NoExecute

·當叢集 Ready 狀態為 Unknown 時，將被新增如下汙點：key: cluster.karmada.io/unreachable effect: NoExecute

容忍叢集汙點

當使用者建立 PropagationPolicy/ClusterPropagationPolicy 資源後，Karmada 會透過 webhook 為它們自動增加如下叢集汙點容忍（以 PropagationPolicy 為例）：

apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
  name: nginx-propagation
  namespace: default
spec:
  placement:
 clusterTolerations:
    - effect: NoExecute
      key: cluster.karmada.io/not-ready
      operator: Exists
 tolerationSeconds: 600
    - effect: NoExecute
      key: cluster.karmada.io/unreachable
      operator: Exists
 tolerationSeconds: 600
  ...

其中，tolerationSeconds 值可以透過 --default-not-ready-toleration-seconds 與--default-unreachable-toleration-seconds 標籤進行配置，這兩個標籤的預設值均為600。

故障遷移

當 Karmada 檢測到故障群集不再被 PropagationPolicy/ClusterPropagationPolicy 容忍時，該叢集將被從資源排程結果中移除，隨後，karmada-scheduler 重排程相關工作負載。重排程的過程有以下幾個限制：·對於每個重排程的工作負載，其仍然需要滿足PropagationPolicy/ClusterPropagationPolicy 的約束，如 ClusterAffinity 或 SpreadConstraints 。· 應用初始排程結果中健康的叢集在重排程過程中仍將被保留。

-複製 Duplicated 排程型別

對於 Duplicated 排程型別，當滿足分發策略限制的候選叢集數量不小於故障叢集數量時，將根據故障叢集數量將工作負載重新排程到候選叢集；否則，不進行重排程。

...
  placement:
 clusterAffinity:
 clusterNames:
        - member1
        - member2
        - member3
        - member5
 spreadConstraints:
      - maxGroups: 2
 minGroups: 2
 replicaScheduling:
 replicaSchedulingType: Duplicated
  ...

假設有5個成員叢集，初始排程結果在 member1和 member2 叢集中。當 member2 叢集發生故障，觸發 karmada-scheduler 重排程。

需要注意的是，重排程不會刪除原本狀態為 Ready 的叢集 member1 上的工作負載。在其餘3個叢集中，只有 member3 和 member5 匹配 clusterAffinity 策略。由於傳播約束的限制，最後應用排程的結果將會是 [member1, member3] 或 [member1, member5] 。

-分發 Divided 排程型別

對於 Divided 排程型別，karmada-scheduler 將嘗試將應用副本遷移到其他健康的叢集中去。

  ...
  placement:
 clusterAffinity:
 clusterNames:
        - member1
        - member2
 replicaScheduling:
 replicaDivisionPreference: Weighted
 replicaSchedulingType: Divided
 weightPreference:
 staticWeightList:
          - targetCluster:
 clusterNames:
                - member1
            weight: 1
          - targetCluster:
 clusterNames:
                - member2
            weight: 2
  ...

Karmada-scheduler 將根據權重表 weightPreference 來劃分應用副本數。初始排程結果中， member1 叢集上有1個副本，member2 叢集上有2個副本。當 member1 叢集故障之後，觸發重排程，最後的排程結果是 member2 叢集上有3個副本。

▍優雅故障遷移

為了防止叢集故障遷移過程中服務發生中斷，Karmada 需要確保故障叢集中應用副本的刪除動作延遲到應用副本在新叢集上可用之後才執行。ResourceBinding/ClusterResourceBinding 中增加了 GracefulEvictionTasks 欄位來表示優雅驅逐任務佇列：

 // GracefulEvictionTasks holds the eviction tasks that are expected to perform
 // the eviction in a graceful way.
 // The intended workflow is:
 // 1. Once the controller(such as 'taint-manager') decided to evict the resource that
 //    is referenced by current ResourceBinding or ClusterResourceBinding from a target
 //    cluster, it removes(or scale down the replicas) the target from Clusters(.spec.Clusters)
 //    and builds a graceful eviction task.
 // 2. The scheduler may perform a re-scheduler and probably select a substitute cluster
 //    to take over the evicting workload(resource).
 // 3. The graceful eviction controller takes care of the graceful eviction tasks and
 //    performs the final removal after the workload(resource) is available on the substitute
 //    cluster or exceed the grace termination period(defaults to 10 minutes).
 //
 // +optional
 GracefulEvictionTasks []GracefulEvictionTask `json:"gracefulEvictionTasks,omitempty"`

當故障叢集被 taint-manager 從資源排程結果中刪除時，它將被新增到優雅驅逐任務佇列中。gracefulEvction-controller 負責處理優雅驅逐任務佇列中的任務。在處理過程中，gracefulEvction-controller 逐個評估優雅驅逐任務佇列中的任務是否可以從佇列中移除。判斷條件如下：

檢查當前資源排程結果中資源的健康狀態。如果資源健康狀態為健康，則滿足條件。
檢查當前任務的等待時長是否超過超時時間，超時時間可以透過graceful-evction-timeout 標籤進行配置（預設為10分鐘）。如果超過，則滿足條件。

▍總結

Karmada 跨叢集優雅故障遷移特性提升了叢集故障後業務的平滑遷移能力，希望透過上述分析過程能幫大家更好的理解和使用Karmada 跨叢集故障遷移能力。有關該特性的更多詳細資訊可以參考 Karmada 官網。大家也可以檢視 Karmada release （https://github.com/karmada-io/karmada/releases）來跟進 Karmada 最新版本動態。如果大家對 Karmada 跨叢集故障遷移特性有更多興趣與見解，或是對其他特性和功能感興趣，也歡迎大家積極參與到 Karmada 社群中來，參與社群討論與開發。附：Karmada社群技術交流地址

專案地址：

https://github.com/karmada-io/karmada

Slack地址：https://slack.cncf.io/

點選關注，第一時間瞭解華為雲新鮮技術~

elasticsearch跨叢集資料遷移
2020-09-13
Elasticsearch
使用 Velero 跨雲平臺遷移叢集資源到 TKE
2021-03-29
redis叢集資料遷移方案
2024-04-06
Redis
【Redis】Redis Cluster-叢集故障轉移
2022-06-19
Redis
Redis叢集slot遷移改造實踐
2024-09-12
Redis
Elasticsearch 叢集誇網路快照遷移
2021-06-29
Elasticsearch
伺服器叢集的故障轉移方案
2020-07-16
伺服器
使用Karmada實現Helm應用的跨叢集部署
2022-06-28
SQL Server 2008的故障轉移叢集概述UB
2022-03-22
SQLServer
impala 資料表在叢集間遷移方案
2022-11-01
基於istio實現單叢集地域故障轉移
2024-04-10
docker搭建redis叢集和Sentinel，實現故障轉移
2021-03-07
DockerRedis
Velero：備份、遷移Kubernetes叢集資源和PV
2022-02-08
ES 筆記三十一：分片與叢集的故障轉移
2019-12-18
筆記
如何優雅地使用雲原生 Prometheus 監控叢集
2020-12-22
Prometheus
Elasticsearch跨叢集同步
2018-06-04
Elasticsearch
Redis Cluster高可用叢集線上遷移操作記錄
2018-10-24
Redis
大神推薦Redis叢集遷移工具：redis-migrate-tool
2019-10-24
Redis
有贊大資料離線叢集遷移實戰
2020-07-30
大資料
在 TKE 中使用 Velero 遷移複製叢集資源
2021-02-23
WebSphere 叢集建立及故障排除
2020-04-07
Web
如何在零停機的情況下遷移 Kubernetes 叢集
2022-01-06
Windows Server2012 故障轉移叢集之動態仲裁（Dynamic Quorum）
2018-04-03
WindowsServer
使用 `postMessage` 跨域名遷移 `localStorage`
2023-02-09
跨域
太強了！分散式Elasticsearch叢集資料遷移企業案例
2024-04-04
分散式Elasticsearch
K8s叢集備份還原與遷移利器-Velero
2024-04-06
K8S
分散式 PostgreSQL 叢集(Citus)官方教程 - 遷移現有應用程式
2022-03-16
分散式SQL
Karmada v1.10釋出，新增多叢集宣告式負載重平衡
2024-06-19
負載
掌握 Kubernetes 故障排除：有效維護叢集的優秀實踐和工具
2023-10-23
Oracle 12c叢集啟動故障
2018-04-14
Oracle
redis cluster 叢集故障恢復操作思路
2022-04-07
Redis
故障分析 | MySQL 遷移後 timestamp 列 cannot be null
2021-11-01
MySqlNull
Docker Swarms 跨主機叢集搭建
2019-08-20
DockerSwarm
揭秘Karmada百倍叢集規模多雲基礎設施體系
2023-05-11
Kubernetes 跨 StorageClass 遷移 Persistent Volumes 完全指南
2022-06-25
windows故障轉移叢集 “群集事件” 經常出現 1135 錯誤的解決
2018-04-08
Windows事件
大資料叢集遷移的那一夜是怎麼過的
2020-09-21
大資料
mongodb叢集節點故障的切換方法
2019-06-20
MongoDB