Kubernetes 排程器
Kubernetes 排程器
Kubernetes 依賴 scheduler 元件於以確保排程 pod 能在叢集中找到一個合適的節點,並使其以期望的狀態執行。排程過程中,排程器不會修改Pod資源,而是從中讀取資料並根據配置的策略挑選出最適合的節點,而後通過API呼叫將Pod繫結至挑選出的節點之上以完成排程過程.
工作邏輯
- kuberlet 的工作概述
當使用者請求通過 APIserver 到達 scheduler 後,通過 scheduler 的演算法得出一個最適合執行該 pod 的節點後,會將結果傳回 APIserver 並儲存在 Etcd 當中,如非節點當機或 pod 被 OOM 等原因驅逐,那麼該 pod 會一致執行在這個節點,及時 pod 被被重建依然不會改變排程結果,而節點上的 kubelet 會一直 which APIserver 一旦出現關於自身節點的事件變動,這時候節點就會去獲取 APIserver 上生命的資源清單來生成 pod,如根據清單下載或啟動本地映象,以及是否需要掛載儲存卷等一系列工作 - kube-proxy 的工作概述
建立 service 則與建立 pod 的形式相同,唯一不同的是 service 只是節點上的 iptables 或 lvs 規則,這個規則是通過節點上的 kube-proxy 來 which APIserver 並最終建立生成出來的 - APIserver 的資料序列化
對 APIserver 來說,任何請求訪問都視為 client,並檢查授權和認證,只不過不通的 client 資料序列化的方式有所不同, kubectl 通過 json 來進行資料序列化,而叢集內部元件通訊則使用由 Google 研發的 Protobuf 方式來實現
Scheduler 排程演算法
Kubernetes內建了適合絕大多數場景中Pod資源排程需求的預設排程器,它支援同時使用演算法基於原生及可定製的工具來選出叢集中最適合執行當前Pod資源的一個節點,其核心目標是基於資源可用性將各Pod資源公平地分佈於叢集節點之上。目前,平臺提供的預設排程器也稱為通用排程器,它通過三個步驟完成排程操作:節點預選 Predicate
、節點優先順序排序 Priority
及節點擇優 Select
Predicate
對一個容器來說能做兩個維度的限制,第一維度為起始資源基本要求,滿足才可以執行.第二維度為資源的限額,超出限額則不分配任何記憶體,而容器本身則提供當前佔用狀態,而眾多節點當中不能滿足起始資源基本要求的就會在 Predicate 中被排查,當然其中也包括其他諸如監聽節點埠的容器而節點的埠已經被佔用的情況,總之對這一步來說就是在所有節點中排除掉完全不能符合對應 pod 執行的基本要求的節點,預選策略工作機制遵循一票否則與反對法則機制
kubernetes 1.10 支援的預選策略,在所有 Scheduler 的排程演算法中,預設情況下只啟用了部分子集,如果需要生效其他排程策略則需要部署或後期配置時增加需要的排程演算法
常用排程策略
- ChecknodeCondition,檢查是否可以在節點報告磁碟或網路狀態不可用的或未準備好的情況下,將 pod 排程到上面,預設啟用該策略
- GaneralPredicates, 策略子集,預設啟用該策略,包括多種預選:
- hostName: 檢查
pod.spec.hostname
如果 pod 定義了 hostName 那麼則檢查節點上的其他 pod 是否佔用了該名稱 - podFistHostPorts: 檢查
pod.spec.containers.ports.hostPort
如果 container 定義了 ports 那麼檢查節點上其他 pod 是否佔用了該埠 - matchNodeSelector: 檢查節點上是否存在該 pod 的標籤選擇器需要的標籤
- podFistResources: 檢查節點是否滿足該 pod 的資源需求,在 describe node 的
Allocated resources
- hostName: 檢查
- NoDiskconflict: 是否不存在磁碟衝突,檢查節點上是否滿足 pod 上儲存卷的需求,預設這個策略不啟用
- PodToleratesNodeTaints: 檢查 pod 的
pod.spec.tolerations
是否包含 Node 的汙點,預設啟用該策略 - PodToleratesNodeNoExecuteTaints: 檢查 pod 的
pod.spec.tolerations
是否包含 Node 的 NoExecute 汙點,預設這個策略不啟用 - CheckNodeLabelPresence: 檢查 Node 標籤的存在性,預設這個策略不啟用
- CheckServiceAffinity: 根據 pod 其縮在 service 的其他 pod 是否在該節點來決定是否排程到該節點,預設這個策略不啟用
- 三個 CNCF 雲原生計算基金會的預設啟用的排程策略
- MaxEBSVolume
- MaxGCEPDVolumeCount
- MaxAzureDiskVolumeCount
- CheckVolumeBinding: 檢查節點上已繫結和未繫結的 PVC 是否能滿足 pod 儲存卷的需求,預設啟用
- NoVolumZoneConfict: 在當前區域中檢查節點的儲存卷與 pod 物件是否存在存在衝突,預設啟用
- CheckNodeMemoryPressure: 檢查節點記憶體是否存在壓力,預設啟用
- CheckNodePIDPressure: 檢查節點 PID 資源壓力過大,預設啟用
- CheckNodeDiskPressure: 檢查節點磁碟 IO 壓力是否過大,預設啟用
- MatchInterPodAffinity: 檢查節點是否滿足 pod 的親和或反親和性條件,預設啟用
Priority
預選策略篩選並生成一個節點列表後即進入第二階段的優選過程.在這個過程中,排程器向每個通過預選的節點傳遞一系列的優選函式來計算其優先順序分值,優先順序分值介於0到10之間,其中0表示不適用,10表示最適合託管該 Pod 物件
常用優選函式
-
LeastRequested: 節點的空閒資源與總容量的比值,得分高即表示空限量更大級的最優,他的演算法如下
(CPU(capacity-sum(pod_requested))*10/capacity+ MEM(capacity-sum(pod_requested))*10/capacity)/2
每個數值乘以 10 的原因是因為每一個優選函式的計算得分是 10,再將 CPU 和 MEM 的得分相加,總和再除以 2 因為是兩個維度的數值
-
BalancedResourceAlloction: CPU 和 MEM 資源被佔用的比率越相近得分越高,需要結合
LeastRequested
來評估節點資源的使用量 -
NodePreferAvoidPods: 此優選級函式許可權預設為10000,它將根據節點是否設定了註解資訊
scheduler.alpha.kubernetes.io/preferAvoidPods
來計算其優選級,計算方式是- 給定的節點無此註解資訊時,其得分為10乘以權重10000
- 存在此註解資訊時,對於那些由 ReplicationController 或 ReplicaSet 控制器管控的Pod物件的得分為0,其他Pod物件會被忽略(得最高分)
-
Nodeaffinity: 基於節點的親和性排程偏好進行評估,它根據 Pod 資源中的 nodeSelector 對給定節點進行匹配度檢查,成功匹配到的條目越多則節點得分越高,不過,其評估過程使用首選而非強制型的
PreferredDuringSchedulingIgnoredDuringExecution
標籤選擇器 -
TaintToleration: 基於 Pod 物件對節點的汙點容忍排程偏好進行其優先順序的評估,它將 Pod 物件的 tolerations 列表與節點的 Taints 汙點進行匹配度檢查,成功匹配的條目越多,則節點得分越低
-
SelectorSpread: 標籤選擇器分散度,查詢與當前 pod 物件匹配的
Service、ReplicationController、ReplicaSet(RS)
和StatefulSet
而後查詢與這些選擇器匹配的現存Pod 物件及其所在的節點,則執行此類 Pod 物件越少的節點得分將越高.簡單來說,如其名稱所示此優選函式會盡量將同一標籤選擇器匹配到的Pod資源分散到不同的節點上執行 -
InterPodAffinity: 遍歷此 pod 的親和性條目,並將那些能夠匹配到的給定節點的的條目相加,值越大得分越高
-
MostRequested: 與
LeastRequested
演算法同樣,但得分判斷相反,這個函式儘可能的將一個節點資源用完,一般來說不與LeastRequested
同時使用 -
NodeLabel: 根據節點是否擁有某些標籤,存在時得分不存在則不得分,或以標籤個數來評定的分
-
ImageLocality: 基於給定節點上擁有的執行當前 Pod 物件中的容器所依賴到的映象檔案來計算節點得分,不具有 Pod 依賴到的任何映象檔案的節點其得分為0,而擁有相應映象檔案的各節點中,所擁有的被依賴到的映象檔案其體積之和越大則節點得分越高,即節省下載的頻寬流量
優選評估:
對於 pod 來說會根據所有已啟用的優選函式做評估,並將得分相加峰值最高則為最佳,多個則進入 select 階段,另外排程器還支援為每個優選函式指定一個簡單的由正數值表示的權重,進行節點優先順序分值的計算時,它首先將每個優選函式的計算得分乘以其權重(大多數優先順序的預設 權重為1)然後將所有優選函式的得分相加從而得出節點的最終優先順序分值,權重屬性賦予了管理員定義優選函式傾向性的能力,下面是每個節點的最終優先順序得分的計算公式:
finalScoreNode=(weight1*priorityFunc1)+(weight2*priorityFunc2)+ ...
Select
將 pod 繫結在優選的節點上,如果當優選結果不止一個則隨機挑選
特有傾向
為特殊的 pod 的提供的一種選擇節點的方式,可以通過該種方式參與或改變預選與優選的判斷結果,從而實現高階排程方法,特殊傾向有如下三種型別
節點標籤
當一些 pod 需要執行在特定 node 節點上時,此時應該對節點用標籤做分類,而後 pod 定義時可以額外定義特有傾向性 pods.spec.nodeName
或 pods.spce.nodeSelector
此操作會在 Predicate 中判斷
-
資源清單模板
[root@master-0 ~]# kubectl explain pod.spec.nodeSelector KIND: Pod VERSION: v1 FIELD: nodeSelector <map[string]string> DESCRIPTION: NodeSelector is a selector which must be true for the pod to fit on a node. Selector which must match a node's labels for the pod to be scheduled on that node. More info: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
-
示例
[root@master-0 ~]# cat nodeselector.yaml apiVersion: v1 kind: Pod metadata: name: pod-demo namespace: default labels: app: myapp spec: containers: - name: myapp image: ikubernetes/myapp:v1 nodeSelector: disktype: ssd [root@master-0 ~]# kubectl apply -f nodeselector.yaml pod/pod-demo created [root@master-0 ~]# kubectl label nodes slave-0.shared disktype=ssd node/slave-0.shared labeled
親和性
親和性可以在 pod.spec.affinity
中檢視,並且從節點和 pod 兩個維度定義
[root@master-0 ~]# kubectl explain pod.spec.affinity
KIND: Pod
VERSION: v1
RESOURCE: affinity <Object>
DESCRIPTION:
If specified, the pod's scheduling constraints
Affinity is a group of affinity scheduling rules.
FIELDS:
nodeAffinity <Object>
Describes node affinity scheduling rules for the pod.
podAffinity <Object>
Describes pod affinity scheduling rules (e.g. co-locate this pod in the
same node, zone, etc. as some other pod(s)).
podAntiAffinity <Object>
Describes pod anti-affinity scheduling rules (e.g. avoid putting this pod
in the same node, zone, etc. as some other pod(s)).
節點親和性
定義節點親和性規則時有兩種型別的節點親和性規則:
- 硬親和性(required): 硬親和性實現的是強制性規則,它是Pod排程時必須要滿足的規則,而在不存在滿足規則的節點時,Pod物件會被置為Pending狀態
- 軟親和性(preferred): 軟親和性規則實現的是一種柔性排程限制,它傾向於將 Pod 物件執行於某類特定的節點之上,而排程器也將盡量滿足此需求,但在無法滿足排程需求時它將退而求其次地選擇一個不匹配規則的節點
無論是 required 和 preferred,在 Pod 資源基完成排程至某節點後,節點標籤發生了改變而不再符合此節點親和性規則時,排程器不會將Pod物件從此節點上移出
節點硬親和性
-
節點硬親和性
pod.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution
[root@master-0 ~]# kubectl explain pod.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution KIND: Pod VERSION: v1 RESOURCE: requiredDuringSchedulingIgnoredDuringExecution <Object> DESCRIPTION: If the affinity requirements specified by this field are not met at scheduling time, the pod will not be scheduled onto the node. If the affinity requirements specified by this field cease to be met at some point during pod execution (e.g. due to an update), the system may or may not try to eventually evict the pod from its node. A node selector represents the union of the results of one or more label queries over a set of nodes; that is, it represents the OR of the selectors represented by the node selector terms. FIELDS: nodeSelectorTerms <[]Object> -required- # 親和的節點 Required. A list of node selector terms. The terms are ORed. [root@master-0 ~]# cat nodeaffinity.yaml apiVersion: v1 kind: Pod metadata: name: pod-nodeaffinity namespace: default labels: app: myapp spec: containers: - name: myapp image: ikubernetes/myapp:v1 affinity: nodeAffinity: nodeSelectorTerms: - matchExpressions: - key: zone operator: In values: - foo - bar [root@master-0 ~]# kubectl apply -f nodeaffinity.yaml pod/pod-nodeaffinity created # 此時節點中如果有標籤為 zone 且包括值為 foo 或者 bar 則該 pod 才會 running
-
關於
pod.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms
的兩種匹配方法- matchExpressions: 按節點標籤列出的節點選擇器要求列表
- matchFields: 允許你根據一個或多個資源欄位的值 篩選 Kubernetes 資源,如
- metadata.name=my-service
- metadata.namespace!=default
- status.phase=Pending
節點軟親和性
節點軟親和性 pod.spec.affinity.nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution
[root@master-0 ~]# kubectl explain pod.spec.affinity.nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution
KIND: Pod
VERSION: v1
RESOURCE: preferredDuringSchedulingIgnoredDuringExecution <[]Object>
DESCRIPTION:
The scheduler will prefer to schedule pods to nodes that satisfy the
affinity expressions specified by this field, but it may choose a node that
violates one or more of the expressions. The node that is most preferred is
the one with the greatest sum of weights, i.e. for each node that meets all
of the scheduling requirements (resource request, requiredDuringScheduling
affinity expressions, etc.), compute a sum by iterating through the
elements of this field and adding "weight" to the sum if the node matches
the corresponding matchExpressions; the node(s) with the highest sum are
the most preferred.
An empty preferred scheduling term matches all objects with implicit weight
0 (i.e. it's a no-op). A null preferred scheduling term matches no objects
(i.e. is also a no-op).
FIELDS:
preference <Object> -required- # 傾向的節點
A node selector term, associated with the corresponding weight.
weight <integer> -required- # 傾向權重
Weight associated with matching the corresponding nodeSelectorTerm, in the
range 1-100.
[root@master-0 ~]# cat nodeaffinity-demo.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod-nodeaffinity-demo
namespace: default
labels:
app: myapp
spec:
containers:
- name: myapp
image: ikubernetes/myapp:v1
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
- key: zone
operator: In
values:
- foo
- bar
weight: 60
[root@master-0 ~]# kubectl apply -f nodeaffinity-demo.yaml
pod/pod-nodeaffinity-demo created
Pod 親和性
讓需要有關聯性的 pod 與 pod 之間執行在一起,雖然通過節點親和性也可以實現但需要精心編排,而 pod 親和性則是排程器會把第一個 pod 放置於任何位置,而後與其有親和或反親和性關係的 pod 根據此動態完成位置編排,而必須通過某些手段如節點標籤來讓 pod 親和性與反親和性的時有章可循
如果某些 pod 傾向於執行在同一位置,則表示它們具有親和性,如果傾向於不要執行在同一位置,則表示他們有反親和性,如兩個 Nginx 同時監聽 80 或出於安全考慮來隔離 pod
Pod 硬親和性
-
pod 硬親和性
pods.spec.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution
[root@master-0 ~]# kubectl explain pods.spec.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution KIND: Pod VERSION: v1 RESOURCE: requiredDuringSchedulingIgnoredDuringExecution <[]Object> DESCRIPTION: If the affinity requirements specified by this field are not met at scheduling time, the pod will not be scheduled onto the node. If the affinity requirements specified by this field cease to be met at some point during pod execution (e.g. due to a pod label update), the system may or may not try to eventually evict the pod from its node. When there are multiple elements, the lists of nodes corresponding to each podAffinityTerm are intersected, i.e. all terms must be satisfied. Defines a set of pods (namely those matching the labelSelector relative to the given namespace(s)) that this pod should be co-located (affinity) or not co-located (anti-affinity) with, where co-located is defined as running on a node whose value of the label with key <topologyKey> matches that of any node on which a pod of the set of pods is running FIELDS: labelSelector <Object> # 跟那個 pod 親和,選定目標 pod 資源 A label query over a set of resources, in this case pods. namespaces <[]string> # 這組標籤選擇器匹配到的 pod 是哪個名稱空間下的,如果不指定則預設使用正在建立的這個 pod 的 ns namespaces specifies which namespaces the labelSelector applies to (matches against); null or empty list means "this pod's namespace" topologyKey <string> -required- # 位置拓撲的鍵 This pod should be co-located (affinity) or not co-located (anti-affinity) with the pods matching the labelSelector in the specified namespaces, where co-located is defined as running on a node whose value of the label with key topologyKey matches that of any node on which any of the selected pods is running. Empty topologyKey is not allowed.
-
定義基準 pod 與 pod 硬親和
[root@master-0 ~]# cat pod-requiredaffinity-demo.yaml apiVersion: v1 kind: Pod metadata: name: pod-first namespace: default labels: app: myapp spec: containers: - name: myapp image: ikubernetes/myapp:v1 --- apiVersion: v1 kind: Pod metadata: name: pod-second namespace: default labels: app: db spec: containers: - name: busybox image: busybox:latest imagePullPolicy: IfNotPresent command: ["sh","-c","sleep 3600"] affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - {"key": "app", "operator": "In", "values": ["myapp"]} # 選擇基準 pod 的 label topologyKey: kubernetes.io/hostname # 後置 busybox 的 pod 可以執行在那些節點,這裡的條件則為 hostname 一致則只能是基準 pod 執行的那個節點 [root@master-0 ~]# kubectl apply -f pod-requiredaffinity-demo.yaml pod/pod-first created pod/pod-second created
基於單一節點的 Pod 親和性只在極個別的情況下才有可能會用到,較為常用的通常是基於同region、zone、或 rack 的拓撲位置約束,例如部署應用程式服務與資料庫服務相關的 Pod 時,db Pod 可能會部署 foo 或 bar 這兩個區域中的某節點之上,依賴於資料服務的 myapp Pod 物件可部署於 db Pod 所在區域內的節點上,當然,如果 db Pod 在兩個區域 foo 和 bar 中各有副本執行,那麼 myapp Pod 將可以執行於這兩個區域的任何節點之上
Pod 反親和性
在於 topologyKey
是一定不能相同的,除此之外則無任何區別
[root@master-0 ~]# kubectl label nodes slave-0.shared zone=foo
node/slave-0.shared labeled
[root@master-0 ~]# kubectl label nodes slave-1.shared zone=foo
node/slave-1.shared labeled
[root@master-0 ~]# cat pod-required-antiaffinity-demo.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod-first
namespace: default
labels:
app: myapp
spec:
containers:
- name: myapp
image: ikubernetes/myapp:v1
---
apiVersion: v1
kind: Pod
metadata:
name: pod-second
namespace: default
labels:
app: db
spec:
containers:
- name: busybox
image: busybox:latest
imagePullPolicy: IfNotPresent
command: ["sh","-c","sleep 3600"]
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- {"key": "app", "operator": "In", "values": ["myapp"]}
topologyKey: zone
[root@master-0 ~]# kubectl apply -f pod-required-antiaffinity-demo.yaml
pod/pod-first created
pod/pod-second created
[root@master-0 ~]# kubectl get pod
NAME READY STATUS RESTARTS AGE
pod-first 1/1 Running 0 3s
pod-second 0/1 Pending 0 3s
pod 軟親和和軟非親和
與 node 軟親和功能一致,不再贅述
Taints and Tolerations
Taints 在 node 上新增的鍵值屬性,Tolerations 是 pod 上定義的能容忍 Taints 的列表,node 上可以標識某些汙點,而 pod 能否執行在該 node 上則取決於 pod 是否能容忍這些汙點標識
檢查汙點的排程在預選和優選中都會涉及,並且當 node 節點上出現新的且不被 pod 容忍的汙點時,會有兩種結果而結果取決 Taints.effect
中定義對 pod 的排斥效果
- NoSchedule: 隻影響排程過程,對已存 pod 不產生影響
- NoExecute: 即影響排程過程也影響現存 pod,不容忍的則會被主動驅逐 pod,這個動作可以在
pods.spec.tolerations.tolerationSeconds
設定驅逐容忍期,預設為 0 秒 - PreferNoSchedule: 柔性 NoSchedule
Taints
-
在 node 中定義,先看模板
[root@master-0 ~]# kubectl explain node.spec.taints KIND: Node VERSION: v1 RESOURCE: taints <[]Object> DESCRIPTION: If specified, the node's taints. The node this Taint is attached to has the "effect" on any pod that does not tolerate the Taint. FIELDS: effect <string> -required- Required. The effect of the taint on pods that do not tolerate the taint. Valid effects are NoSchedule, PreferNoSchedule and NoExecute. key <string> -required- Required. The taint key to be applied to a node. timeAdded <string> TimeAdded represents the time at which the taint was added. It is only written for NoExecute taints. value <string> The taint value corresponding to the taint key.
-
命令列形式
Usage: kubectl taint NODE NAME KEY_1=VAL_1:TAINT_EFFECT_1 ... KEY_N=VAL_N:TAINT_EFFECT_N [options] [root@master-0 ~]# kubectl taint node slave-0.shared node-type=production:NoSchedule node/slave-0.shared tainted [root@master-0 ~]# kubectl get pod -owide # 所有 pod 沒有容忍度 NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES myapp-98skj 0/1 ContainerCreating 0 6m27s <none> slave-1.shared <none> <none> myapp-deploy-5d645d645-7dsg5 0/1 ContainerCreating 0 30s <none> slave-1.shared <none> <none> myapp-deploy-5d645d645-fm8tm 0/1 ContainerCreating 0 30s <none> slave-1.shared <none> <none> myapp-deploy-5d645d645-wskql 0/1 ContainerCreating 0 30s <none> slave-1.shared <none> <none> myapp-ms6lv 0/1 ContainerCreating 0 6m27s <none> slave-1.shared <none> <none> [root@master-0 ~]# kubectl taint node slave-1.shared node-type=dev:NoExecute node/slave-1.shared tainted [root@master-0 ~]# kubectl get pod NAME READY STATUS RESTARTS AGE myapp-deploy-5d645d645-dppsh 0/1 Pending 0 23s myapp-deploy-5d645d645-pcpfp 0/1 Pending 0 23s myapp-deploy-5d645d645-rtghf 0/1 Pending 0 23s myapp-gmxm6 0/1 Pending 0 23s myapp-j8dhg 0/1 Pending 0 23s
Tolerations
在 Pod 物件上定義容忍度時,它支援兩種操作符
- 等值比較: 表示容忍度與汙點必須在key、value 和 effect 三者之上完全匹配
- 存在性判斷: 表示二者的 key 和 effect 必須完全匹配,而容忍度中的 value 欄位要使用空值
-
Toleration 模板
[root@master-0 ~]# kubectl explain pods.spec.tolerations KIND: Pod VERSION: v1 RESOURCE: tolerations <[]Object> DESCRIPTION: If specified, the pod's tolerations. The pod this Toleration is attached to tolerates any taint that matches the triple <key,value,effect> using the matching operator <operator>. FIELDS: effect <string> Effect indicates the taint effect to match. Empty means match all taint effects. When specified, allowed values are NoSchedule, PreferNoSchedule and NoExecute. key <string> Key is the taint key that the toleration applies to. Empty means match all taint keys. If the key is empty, operator must be Exists; this combination means to match all values and all keys. operator <string> # Equal 等值比較和 Exists 存在性比較 Operator represents a key's relationship to the value. Valid operators are Exists and Equal. Defaults to Equal. Exists is equivalent to wildcard for value, so that a pod can tolerate all taints of a particular category. tolerationSeconds <integer> # 容忍期限 TolerationSeconds represents the period of time the toleration (which must be of effect NoExecute, otherwise this field is ignored) tolerates the taint. By default, it is not set, which means tolerate the taint forever (do not evict). Zero and negative values will be treated as 0 (evict immediately) by the system. value <string> Value is the taint value the toleration matches to. If the operator is Exists, the value should be empty, otherwise just a regular string.
-
設定等值比較的容忍列表
[root@master-0 ~]# cat deploy.yaml apiVersion: apps/v1 kind: Deployment metadata: name: myapp-deploy namespace: default spec: replicas: 3 selector: matchLabels: app: myapp release: canary template: metadata: labels: app: myapp release: canary spec: containers: - name: myapp image: nginx:1.7 ports: - name: http containerPort: 80 tolerations: - key: "node-type" operator: "Equal" value: "production" effect: "NoSchedule" [root@master-0 ~]# kubectl apply -f deploy.yaml deployment.apps/myapp-deploy configured [root@master-0 ~]# kubectl get pod -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES myapp-deploy-9f9d6df86-8w6qb 0/1 ContainerCreating 0 2s <none> slave-0.shared <none> <none> myapp-deploy-9f9d6df86-d6vjg 0/1 ContainerCreating 0 2s <none> slave-0.shared <none> <none> myapp-deploy-9f9d6df86-lhh78 0/1 ContainerCreating 0 2s <none> slave-0.shared <none> <none>
-
設定存在性判斷的容忍列表
[root@master-0 ~]# cat deploy.yaml apiVersion: apps/v1 kind: Deployment metadata: name: myapp-deploy namespace: default spec: replicas: 3 selector: matchLabels: app: myapp release: canary template: metadata: labels: app: myapp release: canary spec: containers: - name: myapp image: nginx:1.7 ports: - name: http containerPort: 80 tolerations: - key: "node-type" operator: "Exists" value: "" effect: "" # Exists 狀態下 value 預設為萬用字元,所以可以通過 effect 來匹配節點,比如此時如果值為 NoSchedule 則 pod 會被全部排程到 slave-0 上 [root@master-0 ~]# kubectl apply -f deploy.yaml deployment.apps/myapp-deploy configured [root@master-0 ~]# kubectl get pod -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES myapp-deploy-7c7968f87c-d6b69 1/1 Running 0 12s 10.244.1.24 slave-1.shared <none> <none> myapp-deploy-7c7968f87c-f798g 1/1 Running 0 12s 10.244.2.21 slave-0.shared <none> <none> myapp-deploy-7c7968f87c-nvf9m 1/1 Running 0 12s 10.244.2.22 slave-0.shared <none> <none>
問題節點標識
Kubernetes 自1.6版本起支援使用汙點自動標識問題節點,它通過節點控制器在特定條件下自動為節點新增汙點資訊實現,它們都使用 NoExecute 效用標識,因此不能容忍此類汙點的現有 Pod 物件也會遭到驅逐,目前內建使用的此類汙點包含如下幾個
- node.kubernetes.io/not-ready: 節點進入 NotReady 狀態時被自動新增的汙點
- node.alpha.kubernetes.io/unreachable: 節點進入 NotReachable 狀態時被自動新增的汙點
- node.kubernetes.io/out-of-disk: 節點進入 OutOfDisk 狀態時被自動新增的汙點
- node.kubernetes.io/memory-pressure: 節點記憶體資源面臨壓力
- node.kubernetes.io/disk-pressure: 節點磁碟資源面臨壓力
相關文章
- Kubernetes 排程器實現初探
- 改造 Kubernetes 自定義排程器
- kubernetes 排程
- Kubernetes之Pod排程
- Kubernetes叢集排程器原理剖析及思考
- kubernetes負載感知排程負載
- Flink排程之排程器、排程策略、排程模式模式
- Kubernetes排程流程與安全(七)
- kubernetes排程概念與工作流程
- Go排程器系列(2)巨集觀看排程器Go
- 進擊的 Kubernetes 排程系統(一):Kubernetes scheduling frameworkFramework
- kubernetes實踐之三十八:Pod排程
- Go排程器系列(3)圖解排程原理Go圖解
- 排程器簡介,以及Linux的排程策略Linux
- Go語言排程器之主動排程(20)Go
- Go runtime 排程器精講(五):排程策略Go
- Go runtime 排程器精講(二):排程器初始化Go
- Yarn的排程器Yarn
- kubernetes叢集內排程與負載均衡負載
- Kubernetes 資源拓撲感知排程優化優化
- Go語言排程器之排程main goroutine(14)GoAI
- Pod的排程是由排程器(kube-scheduler)
- k8s排程器介紹(排程框架版本)K8S框架
- 也談goroutine排程器Go
- Linux I/O排程器Linux
- Go Runtime 的排程器Go
- Kubernetes高階排程- Taint和Toleration、Node Affinity分析AI
- Kubernetes 資源拓撲感知排程最佳化
- TKE 使用者故事 | 作業幫 Kubernetes 原生排程器優化實踐優化
- [典藏版] Golang 排程器 GMP 原理與排程全分析Golang
- k8s排程器K8S
- 如何選擇IO排程器
- RxJava排程器的選擇RxJava
- LeetCode 621 任務排程器LeetCode
- Kubernetes Pod排程:從基礎到高階實戰技巧
- MySQL中的事件排程器EVENTMySql事件
- Linux程序排程器-CPU負載Linux負載
- 如何更改Linux的I/O排程器Linux