1、概述
Node Feature Discovery(NFD)是由Intel建立的專案,能夠幫助Kubernetes叢集更智慧地管理節點資源。它透過檢測每個節點的特效能力(例如CPU型號、GPU型號、記憶體大小等)並將這些能力以標籤的形式傳送到Kubernetes叢集的API伺服器(kube-apiserver)。然後,透過kube-apiserver修改節點的標籤。這些標籤可以幫助排程器(kube-scheduler)更智慧地選擇最適合特定工作負載的節點來執行Pod。
Github:https://github.com/kubernetes-sigs/node-feature-discovery
Docs:https://kubernetes-sigs.github.io/node-feature-discovery/master/get-started/index.html
2、元件架構
NFD 細分為 NFD-Master 和 NFD-Worker 兩個元件:
NFD-Master:是一個負責與 kubernetes API Server 通訊的Deployment Pod,它從 NFD-Worker 接收節點特性並相應地修改 Node 資源物件(標籤、註解)。
NFD-Worker:是一個負責對 Node 的特效能力進行檢測的 Daemon Pod,然後它將資訊傳遞給 NFD-Master,NFD-Worker 應該在每個 Node 上執行。
可以檢測發現的硬體特徵源(feature sources)清單包括:
- CPU
- IOMMU
- Kernel
- Memory
- Network
- PCI
- Storage
- System
- USB
- Custom (rule-based custom features)
- Local (hooks for user-specific features)
3、元件安裝
(1)安裝前檢視叢集節點狀態
[root@master-10 ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION master-10.20.31.105 Ready control-plane,master,worker 31h v1.21.5
節點詳細資訊,主要關注標籤、註解。
[root@master-10 ~]# kubectl describe nodes master-10.20.31.105 Name: master-10.20.31.105 Roles: control-plane,master,worker Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=master-10.20.31.105 kubernetes.io/os=linux node-role.kubernetes.io/control-plane= node-role.kubernetes.io/master= node-role.kubernetes.io/worker= node.kubernetes.io/exclude-from-external-load-balancers= Annotations: flannel.alpha.coreos.com/backend-data: {"VtepMAC":"c6:fb:4b:8a:bb:12"} flannel.alpha.coreos.com/backend-type: vxlan flannel.alpha.coreos.com/kube-subnet-manager: true flannel.alpha.coreos.com/public-ip: 10.20.31.105 kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock node.alpha.kubernetes.io/ttl: 0 volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Tue, 12 Mar 2024 21:01:31 -0400 Taints: <none> ........
(2)元件安裝
[root@master-10 opt]# kubectl apply -k https://github.com/kubernetes-sigs/node-feature-discovery/deployment/overlays/default?ref=v0.14.2 namespace/node-feature-discovery created customresourcedefinition.apiextensions.k8s.io/nodefeaturerules.nfd.k8s-sigs.io created customresourcedefinition.apiextensions.k8s.io/nodefeatures.nfd.k8s-sigs.io created serviceaccount/nfd-master created serviceaccount/nfd-worker created role.rbac.authorization.k8s.io/nfd-worker created clusterrole.rbac.authorization.k8s.io/nfd-master created rolebinding.rbac.authorization.k8s.io/nfd-worker created clusterrolebinding.rbac.authorization.k8s.io/nfd-master created configmap/nfd-master-conf created configmap/nfd-worker-conf created service/nfd-master created deployment.apps/nfd-master created daemonset.apps/nfd-worker created
(3)檢視元件狀態
[root@master-10 opt]# kubectl get pods -n=node-feature-discovery NAME READY STATUS RESTARTS AGE nfd-master-5c4684f5cb-hvjjb 1/1 Running 0 4m11s nfd-worker-cpwx6 1/1 Running 0 4m11s
(4)檢視元件日誌
可以看到nfd-worker元件預設每隔一分鐘檢測一次節點特性。
[root@master-10 ~]# kubectl logs -f -n=node-feature-discovery nfd-worker-rlf5t I0314 06:30:32.003264 1 main.go:66] "-server is deprecated, will be removed in a future release along with the deprecated gRPC API" I0314 06:30:32.003372 1 nfd-worker.go:219] "Node Feature Discovery Worker" version="v0.14.2" nodeName="master-10.20.31.105" namespace="node-feature-discovery" I0314 06:30:32.003589 1 nfd-worker.go:520] "configuration file parsed" path="/etc/kubernetes/node-feature-discovery/nfd-worker.conf" I0314 06:30:32.004500 1 nfd-worker.go:552] "configuration successfully updated" configuration={"Core":{"Klog":{},"LabelWhiteList":{},"NoPublish":false,"FeatureSources":["all"],"Sources":null,"LabelSources":["all"],"SleepInterval":{"Duration":60000000000}},"Sources":{"cpu":{"cpuid":{"attributeBlacklist":["BMI1","BMI2","CLMUL","CMOV","CX16","ERMS","F16C","HTT","LZCNT","MMX","MMXEXT","NX","POPCNT","RDRAND","RDSEED","RDTSCP","SGX","SGXLC","SSE","SSE2","SSE3","SSE4","SSE42","SSSE3","TDX_GUEST"]}},"custom":[],"fake":{"labels":{"fakefeature1":"true","fakefeature2":"true","fakefeature3":"true"},"flagFeatures":["flag_1","flag_2","flag_3"],"attributeFeatures":{"attr_1":"true","attr_2":"false","attr_3":"10"},"instanceFeatures":[{"attr_1":"true","attr_2":"false","attr_3":"10","attr_4":"foobar","name":"instance_1"},{"attr_1":"true","attr_2":"true","attr_3":"100","name":"instance_2"},{"name":"instance_3"}]},"kernel":{"KconfigFile":"","configOpts":["NO_HZ","NO_HZ_IDLE","NO_HZ_FULL","PREEMPT"]},"local":{},"pci":{"deviceClassWhitelist":["03","0b40","12"],"deviceLabelFields":["class","vendor"]},"usb":{"deviceClassWhitelist":["0e","ef","fe","ff"],"deviceLabelFields":["class","vendor","device"]}}} I0314 06:30:32.004796 1 metrics.go:70] "metrics server starting" port=8081 I0314 06:30:32.019135 1 nfd-worker.go:562] "starting feature discovery..." I0314 06:30:32.019364 1 nfd-worker.go:577] "feature discovery completed" I0314 06:31:32.021520 1 nfd-worker.go:562] "starting feature discovery..." I0314 06:31:32.021695 1 nfd-worker.go:577] "feature discovery completed" I0314 06:32:32.027970 1 nfd-worker.go:562] "starting feature discovery..." I0314 06:32:32.028141 1 nfd-worker.go:577] "feature discovery completed"
可以看到nfd-master元件啟動後預設第一分鐘相應地修改 Node 資源物件(標籤、註解),之後是每隔一個小時修改一次 Node 資源物件(標籤、註解),也就是說如果一個小時以內使用者手動誤修改node資源特性資訊(標籤、註解),最多需要一個小時nfd-master元件才自動更正node資源特性資訊。
[root@master-10 ~]# kubectl logs -n=node-feature-discovery nfd-master-5c4684f5cb-hvjjb I0314 06:23:08.190218 1 nfd-master.go:213] "Node Feature Discovery Master" version="v0.14.2" nodeName="master-10.20.31.105" namespace="node-feature-discovery" I0314 06:23:08.190356 1 nfd-master.go:1214] "configuration file parsed" path="/etc/kubernetes/node-feature-discovery/nfd-master.conf" I0314 06:23:08.190912 1 nfd-master.go:1274] "configuration successfully updated" configuration=< DenyLabelNs: {} EnableTaints: false ExtraLabelNs: {} Klog: {} LabelWhiteList: {} LeaderElection: LeaseDuration: Duration: 15000000000 RenewDeadline: Duration: 10000000000 RetryPeriod: Duration: 2000000000 NfdApiParallelism: 10 NoPublish: false ResourceLabels: {} ResyncPeriod: Duration: 3600000000000 > I0314 06:23:08.190928 1 nfd-master.go:1338] "starting the nfd api controller" I0314 06:23:08.191105 1 node-updater-pool.go:79] "starting the NFD master node updater pool" parallelism=10 I0314 06:23:08.860810 1 metrics.go:115] "metrics server starting" port=8081 I0314 06:23:08.861033 1 component.go:36] [core][Server #1] Server created I0314 06:23:08.861050 1 nfd-master.go:347] "gRPC server serving" port=8080 I0314 06:23:08.861084 1 component.go:36] [core][Server #1 ListenSocket #2] ListenSocket created I0314 06:23:09.860886 1 nfd-master.go:694] "will process all nodes in the cluster" I0314 06:23:09.923362 1 nfd-master.go:1086] "node updated" nodeName="master-10.20.31.105" I0314 07:23:09.224254 1 nfd-master.go:1086] "node updated" nodeName="master-10.20.31.105" I0314 08:23:09.081362 1 nfd-master.go:1086] "node updated" nodeName="master-10.20.31.105"
(5)檢視節點特性資訊
可以看到NFD元件已經把節點特性資訊維護到了節點標籤、註解上,其中標籤字首預設為 feature.node.kubernetes.io/。
[root@master-10 opt]# kubectl describe node master-10.20.31.105 Name: master-10.20.31.105 Roles: control-plane,master,worker Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux feature.node.kubernetes.io/cpu-cpuid.ADX=true feature.node.kubernetes.io/cpu-cpuid.AESNI=true feature.node.kubernetes.io/cpu-cpuid.AVX=true feature.node.kubernetes.io/cpu-cpuid.AVX2=true feature.node.kubernetes.io/cpu-cpuid.AVX512BW=true feature.node.kubernetes.io/cpu-cpuid.AVX512CD=true feature.node.kubernetes.io/cpu-cpuid.AVX512DQ=true feature.node.kubernetes.io/cpu-cpuid.AVX512F=true feature.node.kubernetes.io/cpu-cpuid.AVX512VL=true feature.node.kubernetes.io/cpu-cpuid.CMPXCHG8=true feature.node.kubernetes.io/cpu-cpuid.FMA3=true feature.node.kubernetes.io/cpu-cpuid.FXSR=true feature.node.kubernetes.io/cpu-cpuid.FXSROPT=true feature.node.kubernetes.io/cpu-cpuid.HLE=true feature.node.kubernetes.io/cpu-cpuid.HYPERVISOR=true feature.node.kubernetes.io/cpu-cpuid.LAHF=true feature.node.kubernetes.io/cpu-cpuid.MOVBE=true feature.node.kubernetes.io/cpu-cpuid.MPX=true feature.node.kubernetes.io/cpu-cpuid.OSXSAVE=true feature.node.kubernetes.io/cpu-cpuid.RTM=true feature.node.kubernetes.io/cpu-cpuid.SYSCALL=true feature.node.kubernetes.io/cpu-cpuid.SYSEE=true feature.node.kubernetes.io/cpu-cpuid.X87=true feature.node.kubernetes.io/cpu-cpuid.XSAVE=true feature.node.kubernetes.io/cpu-cpuid.XSAVEC=true feature.node.kubernetes.io/cpu-cpuid.XSAVEOPT=true feature.node.kubernetes.io/cpu-cpuid.XSAVES=true feature.node.kubernetes.io/cpu-hardware_multithreading=false feature.node.kubernetes.io/cpu-model.family=6 feature.node.kubernetes.io/cpu-model.id=85 feature.node.kubernetes.io/cpu-model.vendor_id=Intel feature.node.kubernetes.io/kernel-config.NO_HZ=true feature.node.kubernetes.io/kernel-config.NO_HZ_FULL=true feature.node.kubernetes.io/kernel-version.full=3.10.0-1160.105.1.el7.x86_64 feature.node.kubernetes.io/kernel-version.major=3 feature.node.kubernetes.io/kernel-version.minor=10 feature.node.kubernetes.io/kernel-version.revision=0 feature.node.kubernetes.io/pci-0300_15ad.present=true feature.node.kubernetes.io/system-os_release.ID=centos feature.node.kubernetes.io/system-os_release.VERSION_ID=7 feature.node.kubernetes.io/system-os_release.VERSION_ID.major=7 kubernetes.io/arch=amd64 kubernetes.io/hostname=master-10.20.31.105 kubernetes.io/os=linux node-role.kubernetes.io/control-plane= node-role.kubernetes.io/master= node-role.kubernetes.io/worker= node.kubernetes.io/exclude-from-external-load-balancers= Annotations: flannel.alpha.coreos.com/backend-data: {"VtepMAC":"c6:fb:4b:8a:bb:12"} flannel.alpha.coreos.com/backend-type: vxlan flannel.alpha.coreos.com/kube-subnet-manager: true flannel.alpha.coreos.com/public-ip: 10.20.31.105 kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock nfd.node.kubernetes.io/feature-labels: cpu-cpuid.ADX,cpu-cpuid.AESNI,cpu-cpuid.AVX,cpu-cpuid.AVX2,cpu-cpuid.AVX512BW,cpu-cpuid.AVX512CD,cpu-cpuid.AVX512DQ,cpu-cpuid.AVX512F,cpu-... nfd.node.kubernetes.io/master.version: v0.14.2 nfd.node.kubernetes.io/worker.version: v0.14.2 node.alpha.kubernetes.io/ttl: 0 volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Tue, 12 Mar 2024 21:01:31 -0400
4、元件應用場景
5、總結
如果您的 Kubernetes 叢集需要根據節點的硬體特性進行智慧排程或者對節點的硬體資源進行感知和利用,那麼安裝 Node Feature Discovery(NFD)是有必要的。然而,如果您的叢集中的節點都具有相似的硬體配置,且不需要考慮硬體資源的差異,那麼不需要安裝 NFD。