深入分析KubernetesCriticalPod（二）

清俠發表於2018-06-28

原文網址 : https://flycode.co/archives/208003

深入分析Kubernetes Critical Pod（一）介紹了Scheduler對Critical Pod的處理邏輯，下面我們再看下Kubelet Eviction Manager對Critical Pod的處理邏輯是怎樣的，以便我們瞭解Kubelet Evict Pod時對Critical Pod是否有保護措施，如果有，又是如何保護的。

Kubelet Eviction Manager Admit

kubelet在syncLoop中每個1s會迴圈呼叫syncLoopIteration，從config change channel | pleg channel | sync channel | houseKeeping channel | liveness manager`s update channel中獲取event，然後分別呼叫對應的event handler進行處理。

configCh: dispatch the pods for the config change to the appropriate handler callback for the event type
plegCh: update the runtime cache; sync pod
syncCh: sync all pods waiting for sync
houseKeepingCh: trigger cleanup of pods
liveness manager`s update channel: sync pods that have failed or in which one or more containers have failed liveness checks

特別提一下，houseKeeping channel是每隔houseKeeping（10s）時間就會有event，然後執行HandlePodCleanups，執行以下清理操作：

Stop the workers for no-longer existing pods.（每個pod對應會有一個worker，也就是goruntine）
killing unwanted pods
removes the volumes of pods that should not be running and that have no containers running.
Remove any orphaned mirror pods.
Remove any cgroups in the hierarchy for pods that are no longer running.

pkg/kubelet/kubelet.go:1753

func (kl *Kubelet) syncLoopIteration(configCh <-chan kubetypes.PodUpdate, handler SyncHandler,
    syncCh <-chan time.Time, housekeepingCh <-chan time.Time, plegCh <-chan *pleg.PodLifecycleEvent) bool {
    select {
    case u, open := <-configCh:
        
        if !open {
            glog.Errorf("Update channel is closed. Exiting the sync loop.")
            return false
        }

        switch u.Op {
        case kubetypes.ADD:
            
            handler.HandlePodAdditions(u.Pods)
        ...
        case kubetypes.RESTORE:
            glog.V(2).Infof("SyncLoop (RESTORE, %q): %q", u.Source, format.Pods(u.Pods))
            // These are pods restored from the checkpoint. Treat them as new
            // pods.
            handler.HandlePodAdditions(u.Pods)
        ...
        }

        if u.Op != kubetypes.RESTORE {
            ...
        }
    case e := <-plegCh:
        ...
    case <-syncCh:
        ...
    case update := <-kl.livenessManager.Updates():
        ...
    case <-housekeepingCh:
        ...
    }
    return true
}

syncLoopIteration中定義了當kubelet配置變更重啟後的邏輯：kubelet會對正在running的Pods進行Admission處理，Admission的結果有可能會讓該Pod被本節點拒絕。

HandlePodAdditions就是用來處理Kubelet ConficCh中的event的Handler。

// HandlePodAdditions is the callback in SyncHandler for pods being added from a config source.
func (kl *Kubelet) HandlePodAdditions(pods []*v1.Pod) {
    start := kl.clock.Now()
    sort.Sort(sliceutils.PodsByCreationTime(pods))
    for _, pod := range pods {
        ...

        if !kl.podIsTerminated(pod) {
            ...
            // Check if we can admit the pod; if not, reject it.
            if ok, reason, message := kl.canAdmitPod(activePods, pod); !ok {
                kl.rejectPod(pod, reason, message)
                continue
            }
        }
        ...
    }
}

如果該Pod Status不是屬於Terminated，就呼叫canAdmitPod對該Pod進行准入檢查。如果准入檢查結果表示該Pod被拒絕，那麼就會將該Pod Phase設定為Failed。

pkg/kubelet/kubelet.go:1643

func (kl *Kubelet) canAdmitPod(pods []*v1.Pod, pod *v1.Pod) (bool, string, string) {
    // the kubelet will invoke each pod admit handler in sequence
    // if any handler rejects, the pod is rejected.
    // TODO: move out of disk check into a pod admitter
    // TODO: out of resource eviction should have a pod admitter call-out
    attrs := &lifecycle.PodAdmitAttributes{Pod: pod, OtherPods: pods}
    for _, podAdmitHandler := range kl.admitHandlers {
        if result := podAdmitHandler.Admit(attrs); !result.Admit {
            return false, result.Reason, result.Message
        }
    }

    return true, "", ""
}

canAdmitPod就會呼叫kubelet啟動時註冊的一系列admitHandlers對該Pod進行准入檢查，其中就包括kubelet eviction manager對應的admitHandle。

pkg/kubelet/eviction/eviction_manager.go:123

// Admit rejects a pod if its not safe to admit for node stability.
func (m *managerImpl) Admit(attrs *lifecycle.PodAdmitAttributes) lifecycle.PodAdmitResult {
    m.RLock()
    defer m.RUnlock()
    if len(m.nodeConditions) == 0 {
        return lifecycle.PodAdmitResult{Admit: true}
    }
    
    if utilfeature.DefaultFeatureGate.Enabled(features.ExperimentalCriticalPodAnnotation) && kubelettypes.IsCriticalPod(attrs.Pod) {
        return lifecycle.PodAdmitResult{Admit: true}
    }

    if hasNodeCondition(m.nodeConditions, v1.NodeMemoryPressure) {
        notBestEffort := v1.PodQOSBestEffort != v1qos.GetPodQOS(attrs.Pod)
        if notBestEffort {
            return lifecycle.PodAdmitResult{Admit: true}
        }
    }

        return lifecycle.PodAdmitResult{
        Admit:   false,
        Reason:  reason,
        Message: fmt.Sprintf(message, m.nodeConditions),
    }
}

eviction manager的Admit的邏輯如下：

如果該node的Conditions為空，則Admit成功；
如果enable了ExperimentalCriticalPodAnnotation Feature Gate，並且該Pod是Critical Pod（Pod有Critical的Annotation，或者Pod的優先順序不小於SystemCriticalPriority），則Admit成功；
- SystemCriticalPriority的值為2 billion。
如果該node的Condition為Memory Pressure，並且Pod QoS為非best-effort，則Admit成功；
其他情況都表示Admit失敗，即不允許該Pod在該node上Running。

Kubelet Eviction Manager SyncLoop

另外，在kubelet eviction manager的syncLoop中，也會對Critical Pod有特殊處理，程式碼如下。

pkg/kubelet/eviction/eviction_manager.go:226

// synchronize is the main control loop that enforces eviction thresholds.
// Returns the pod that was killed, or nil if no pod was killed.
func (m *managerImpl) synchronize(diskInfoProvider DiskInfoProvider, podFunc ActivePodsFunc) []*v1.Pod {
    ...

    // we kill at most a single pod during each eviction interval
    for i := range activePods {
        pod := activePods[i]
        
        if utilfeature.DefaultFeatureGate.Enabled(features.ExperimentalCriticalPodAnnotation) &&
            kubelettypes.IsCriticalPod(pod) && kubepod.IsStaticPod(pod) {
            continue
        }
        ...
        return []*v1.Pod{pod}
    }
    glog.Infof("eviction manager: unable to evict any pods from the node")
    return nil
}

當觸發了kubelet evict pod時，如果該pod滿足以下所有條件時，將不會被kubelet eviction manager kill掉。

該Pod Status不是Terminated；
Enable ExperimentalCriticalPodAnnotation Feature Gate；
該Pod是Critical Pod；
該Pod時Static Pod；

總結

經過上面的分析，我們得到以下Kubelet Eviction Manager對Critical Pod處理的關鍵點：

kubelet重啟後，eviction manager的Admit流程中對Critical Pod做如下特殊處理：如果enable了ExperimentalCriticalPodAnnotation Feature Gate，則允許該Critical Pod准入該node，無視該node的Condition。
當觸發了kubelet evict pod時，如果該Critical Pod滿足以下所有條件時，將不會被kubelet eviction manager kill掉。
- 該Pod Status不是Terminated；
- Enable ExperimentalCriticalPodAnnotation Feature Gate；
- 該Pod是Static Pod；

深入分析synchronized原理和鎖膨脹過程(二)
2019-03-22
synchronized
高階Java工程師必備 ----- 深入分析 Java IO （二）NIO
2019-06-25
Java工程師
Python MetaClass深入分析
2018-10-26
Python
SPI機制深入分析
2018-11-12
深入分析 Hello World 程式
2018-05-19
深入分析C++引用
2018-06-03
C++
深入分析Session和Cookie
2018-08-21
SessionCookie
深入分析 Fiesta Exploit Kit
2020-08-19
深入分析 Golang 的 Error
2022-04-26
GolangError
Android動畫深入分析
2021-09-09
Android動畫
Redis API & Java RedisTemplate深入分析
2018-10-25
RedisAPIJava
深入分析 Javac 編譯原理
2018-09-17
Java編譯原理
MySQL latch爭用深入分析
2020-09-10
MySql
ATL Thunk機制深入分析
2019-03-21
深入分析 synchronized 關鍵字
2019-03-04
synchronized
深入分析HTTP代理的原理
2021-09-11
HTTP
深入分析JVM執行引擎
2022-08-31
JVM
Dart Sound Null Safety 深入分析
2021-04-03
DartNull
深入分析kube-batch（4）——actions
2018-10-18
BAT
Go 包管理機制深入分析
2020-06-12
Go
ijkplayer-丟幀策略深入分析
2019-04-28
深入分析 Flutter 初始化流程
2019-01-01
Flutter
Redis記憶體碎片深入分析
2023-04-05
Redis記憶體
[譯] 深入分析 Angular 變更檢測
2019-03-04
Angular
Tensor與tensor深入分析與異同
2019-01-20
OkHttp深入分析——基礎認知部分
2018-12-15
HTTP
【MySQL】資料庫事務深入分析
2019-09-16
MySql資料庫
CVE-2021-4034 pkexec再深入分析
2022-03-01
WebLogic 反序列化漏洞深入分析
2022-09-29
Web
深入分析 Redis Lua 指令碼執行原理
2018-10-23
Redis指令碼
通過GitHub Blame深入分析Redux原始碼
2019-02-23
GithubRedux原始碼
element ScrollBar滾動元件原始碼深入分析
2019-01-21
元件原始碼
以太坊原始碼分析(10)CMD深入分析
2018-05-13
原始碼
啃碎併發（七）：深入分析Synchronized原理
2018-07-18
synchronized
Thinkphp 反序列化利用鏈深入分析
2019-09-29
PHP
深入分析JVM中的物件及引用（十六）
2020-09-28
JVM物件
併發——深入分析ThreadLocal的實現原理
2020-04-16
thread
JVM(四):深入分析Java位元組碼-下
2019-06-11
JVMJava

深入分析KubernetesCriticalPod（二）

Kubelet Eviction Manager Admit

Kubelet Eviction Manager SyncLoop

總結

相關文章