深入分析KubernetesCriticalPod(二)
深入分析Kubernetes Critical Pod(一)介紹了Scheduler對Critical Pod的處理邏輯,下面我們再看下Kubelet Eviction Manager對Critical Pod的處理邏輯是怎樣的,以便我們瞭解Kubelet Evict Pod時對Critical Pod是否有保護措施,如果有,又是如何保護的。
Kubelet Eviction Manager Admit
kubelet在syncLoop中每個1s會迴圈呼叫syncLoopIteration,從config change channel | pleg channel | sync channel | houseKeeping channel | liveness manager`s update channel
中獲取event,然後分別呼叫對應的event handler進行處理。
- configCh: dispatch the pods for the config change to the appropriate handler callback for the event type
- plegCh: update the runtime cache; sync pod
- syncCh: sync all pods waiting for sync
- houseKeepingCh: trigger cleanup of pods
- liveness manager`s update channel: sync pods that have failed or in which one or more containers have failed liveness checks
特別提一下,houseKeeping channel是每隔houseKeeping(10s)時間就會有event,然後執行HandlePodCleanups,執行以下清理操作:
- Stop the workers for no-longer existing pods.(每個pod對應會有一個worker,也就是goruntine)
- killing unwanted pods
- removes the volumes of pods that should not be running and that have no containers running.
- Remove any orphaned mirror pods.
- Remove any cgroups in the hierarchy for pods that are no longer running.
pkg/kubelet/kubelet.go:1753
func (kl *Kubelet) syncLoopIteration(configCh <-chan kubetypes.PodUpdate, handler SyncHandler,
syncCh <-chan time.Time, housekeepingCh <-chan time.Time, plegCh <-chan *pleg.PodLifecycleEvent) bool {
select {
case u, open := <-configCh:
if !open {
glog.Errorf("Update channel is closed. Exiting the sync loop.")
return false
}
switch u.Op {
case kubetypes.ADD:
handler.HandlePodAdditions(u.Pods)
...
case kubetypes.RESTORE:
glog.V(2).Infof("SyncLoop (RESTORE, %q): %q", u.Source, format.Pods(u.Pods))
// These are pods restored from the checkpoint. Treat them as new
// pods.
handler.HandlePodAdditions(u.Pods)
...
}
if u.Op != kubetypes.RESTORE {
...
}
case e := <-plegCh:
...
case <-syncCh:
...
case update := <-kl.livenessManager.Updates():
...
case <-housekeepingCh:
...
}
return true
}
syncLoopIteration中定義了當kubelet配置變更重啟後的邏輯:kubelet會對正在running的Pods進行Admission處理,Admission的結果有可能會讓該Pod被本節點拒絕。
HandlePodAdditions就是用來處理Kubelet ConficCh中的event的Handler。
// HandlePodAdditions is the callback in SyncHandler for pods being added from a config source.
func (kl *Kubelet) HandlePodAdditions(pods []*v1.Pod) {
start := kl.clock.Now()
sort.Sort(sliceutils.PodsByCreationTime(pods))
for _, pod := range pods {
...
if !kl.podIsTerminated(pod) {
...
// Check if we can admit the pod; if not, reject it.
if ok, reason, message := kl.canAdmitPod(activePods, pod); !ok {
kl.rejectPod(pod, reason, message)
continue
}
}
...
}
}
如果該Pod Status不是屬於Terminated,就呼叫canAdmitPod對該Pod進行准入檢查。如果准入檢查結果表示該Pod被拒絕,那麼就會將該Pod Phase設定為Failed。
pkg/kubelet/kubelet.go:1643
func (kl *Kubelet) canAdmitPod(pods []*v1.Pod, pod *v1.Pod) (bool, string, string) {
// the kubelet will invoke each pod admit handler in sequence
// if any handler rejects, the pod is rejected.
// TODO: move out of disk check into a pod admitter
// TODO: out of resource eviction should have a pod admitter call-out
attrs := &lifecycle.PodAdmitAttributes{Pod: pod, OtherPods: pods}
for _, podAdmitHandler := range kl.admitHandlers {
if result := podAdmitHandler.Admit(attrs); !result.Admit {
return false, result.Reason, result.Message
}
}
return true, "", ""
}
canAdmitPod就會呼叫kubelet啟動時註冊的一系列admitHandlers對該Pod進行准入檢查,其中就包括kubelet eviction manager對應的admitHandle。
pkg/kubelet/eviction/eviction_manager.go:123
// Admit rejects a pod if its not safe to admit for node stability.
func (m *managerImpl) Admit(attrs *lifecycle.PodAdmitAttributes) lifecycle.PodAdmitResult {
m.RLock()
defer m.RUnlock()
if len(m.nodeConditions) == 0 {
return lifecycle.PodAdmitResult{Admit: true}
}
if utilfeature.DefaultFeatureGate.Enabled(features.ExperimentalCriticalPodAnnotation) && kubelettypes.IsCriticalPod(attrs.Pod) {
return lifecycle.PodAdmitResult{Admit: true}
}
if hasNodeCondition(m.nodeConditions, v1.NodeMemoryPressure) {
notBestEffort := v1.PodQOSBestEffort != v1qos.GetPodQOS(attrs.Pod)
if notBestEffort {
return lifecycle.PodAdmitResult{Admit: true}
}
}
return lifecycle.PodAdmitResult{
Admit: false,
Reason: reason,
Message: fmt.Sprintf(message, m.nodeConditions),
}
}
eviction manager的Admit的邏輯如下:
- 如果該node的Conditions為空,則Admit成功;
-
如果enable了ExperimentalCriticalPodAnnotation Feature Gate,並且該Pod是Critical Pod(Pod有Critical的Annotation,或者Pod的優先順序不小於SystemCriticalPriority),則Admit成功;
- SystemCriticalPriority的值為2 billion。
- 如果該node的Condition為Memory Pressure,並且Pod QoS為非best-effort,則Admit成功;
- 其他情況都表示Admit失敗,即不允許該Pod在該node上Running。
Kubelet Eviction Manager SyncLoop
另外,在kubelet eviction manager的syncLoop中,也會對Critical Pod有特殊處理,程式碼如下。
pkg/kubelet/eviction/eviction_manager.go:226
// synchronize is the main control loop that enforces eviction thresholds.
// Returns the pod that was killed, or nil if no pod was killed.
func (m *managerImpl) synchronize(diskInfoProvider DiskInfoProvider, podFunc ActivePodsFunc) []*v1.Pod {
...
// we kill at most a single pod during each eviction interval
for i := range activePods {
pod := activePods[i]
if utilfeature.DefaultFeatureGate.Enabled(features.ExperimentalCriticalPodAnnotation) &&
kubelettypes.IsCriticalPod(pod) && kubepod.IsStaticPod(pod) {
continue
}
...
return []*v1.Pod{pod}
}
glog.Infof("eviction manager: unable to evict any pods from the node")
return nil
}
當觸發了kubelet evict pod時,如果該pod滿足以下所有條件時,將不會被kubelet eviction manager kill掉。
- 該Pod Status不是Terminated;
- Enable ExperimentalCriticalPodAnnotation Feature Gate;
- 該Pod是Critical Pod;
- 該Pod時Static Pod;
總結
經過上面的分析,我們得到以下Kubelet Eviction Manager對Critical Pod處理的關鍵點:
- kubelet重啟後,eviction manager的Admit流程中對Critical Pod做如下特殊處理:如果enable了ExperimentalCriticalPodAnnotation Feature Gate,則允許該Critical Pod准入該node,無視該node的Condition。
-
當觸發了kubelet evict pod時,如果該Critical Pod滿足以下所有條件時,將不會被kubelet eviction manager kill掉。
- 該Pod Status不是Terminated;
- Enable ExperimentalCriticalPodAnnotation Feature Gate;
- 該Pod是Static Pod;
相關文章
- Shared pool深入分析及效能調整(二)
- 深入分析synchronized原理和鎖膨脹過程(二)synchronized
- 高階Java工程師必備 ----- 深入分析 Java IO (二)NIOJava工程師
- TThread深入分析thread
- 深入分析 Fiesta Exploit Kit
- 深入分析 Golang 的 ErrorGolangError
- Android動畫深入分析Android動畫
- 深入分析Session和CookieSessionCookie
- SPI機制深入分析
- Python MetaClass深入分析Python
- 深入分析C++引用C++
- 深入分析 Hello World 程式
- 深入分析 Docker 映象原理Docker
- ATL Thunk機制深入分析
- 深入分析 synchronized 關鍵字synchronized
- 深入分析HTTP代理的原理HTTP
- 深入分析 Javac 編譯原理Java編譯原理
- Redis API & Java RedisTemplate深入分析RedisAPIJava
- MySQL latch爭用深入分析MySql
- 深入分析JVM執行引擎JVM
- 深入分析CAS(樂觀鎖)
- 深入分析MVC、MVP、MVVM、VIPERMVCMVPMVVM
- Buffer Busy Waits深入分析AI
- TLB與cache的深入分析
- RACSignal 的 Subscription 深入分析
- Go 包管理機制深入分析Go
- Dart Sound Null Safety 深入分析DartNull
- 深入分析 Flutter 初始化流程Flutter
- 深入分析kube-batch(4)——actionsBAT
- Redis記憶體碎片深入分析Redis記憶體
- 深入分析LRU與DIRTY LIST(轉)
- 深入分析Oracle日誌檔案Oracle
- 【MySQL】資料庫事務深入分析MySql資料庫
- [譯] 深入分析 Angular 變更檢測Angular
- ijkplayer-丟幀策略深入分析
- OkHttp深入分析——基礎認知部分HTTP
- 深入分析Struts2工作流程
- Corda共識機制的深入分析