前文我們聊到了k8s的configmap和secret資源的說明和相關使用示例,回顧請參考:https://www.cnblogs.com/qiuhom-1874/p/14194944.html;今天我們來了解下k8s上的statefulSet控制器的相關話題;
1、statefulset控制器的作用
簡單講statefulset控制器主要用來在k8s上管理有狀態應用pod;我們經常運維的一些應用主要可以分為4類,分別從是否有狀態和是否有儲存兩個維度去描述,我們可以將應用分為有狀態無儲存,有狀態有儲存,無狀態無儲存和無狀態有儲存這四種;大部份應用都是有狀態有儲存或無狀態無儲存的應用,只有少數應用是有狀態無儲存或無狀態有儲存;比如mysql的主從複製叢集就是一個有狀態有儲存的應用;又比如一些http服務,如nginx,apache這些服務它就是無狀態無儲存(沒有使用者上傳資料);有狀態和無狀態的最本質區別是有狀態應用使用者的每次請求對應狀態都不一樣,狀態是隨時發生變化的,這種應用如果執行在k8s上,一旦對應pod崩潰,此時重建一個pod來替代之前的pod就必須滿足,重建的pod必須和之前的pod上的資料保持一致,其次重建的pod要和現有叢集的框架適配;比如mysql主從複製叢集,當一個從發生故障,重建的pod必須滿足能夠正常掛在之前pod的pvc儲存卷,以保證兩者資料的一致;其次就是新建的pod要適配到當前mysql主從複製的架構;從上述描述來看,在k8s上託管有狀態服務我們必須解決上述問題才能夠讓一個有狀態服務在k8s上正常跑起來為使用者提供服務;為此k8s專門弄了一個statefulset控制器來管理有狀態pod,但是k8s上的statefulset控制器它不是幫我們把上述的問題全部解決,它只負責幫我啟動對應數量的pod,並且把每個pod的名稱序列化,如果對應pod崩潰,重建後的pod名稱和原來的pod名稱是一樣的;所謂序列化是指pod名稱不再是pod控制名稱加隨機生成的字串,而是pod控制器名稱加一個順序的數字;比如statefulset控制器的名稱為web-demo,那麼對應控制器啟動的pod就是web-demo-0、web-demo-1類似這樣的邏輯命名;其次statefulset它還會把之前pod的pvc儲存卷自動掛載到重建後的pod上(這取決pvc回收策略必須為Retain,即刪除對應pod後端pvc儲存卷保持不變),從而實現新建pod持有資料和之前的pod相同;簡單講statefulset控制器只是幫助我們在k8s上啟動對應數量的pod,每個pod分配一個固定不變的名稱,不管pod怎麼排程,對應pod的名稱是一直不變的;即便把對應pod刪除再重建,重建後的pod的名稱還是和之前的pod名稱一樣;其次就是自動幫我們把對應pod的pvc掛載到重建後的pod上,以保證兩者資料的相同;statefulset控制器只幫我們做這些事,至於pod內部跑的容器應用怎麼去適配對應的叢集架構,類似業務邏輯的問題需要我們使用者手動去寫程式碼解決,因為對於不同的應用其叢集邏輯架構和組織方式都不同,statefulset控制器不能做到以某種機制去適配所有的有狀態應用的邏輯架構和組織方式;
2、statefulset控制器示意圖
提示:statefulset控制器主要由pod模版和pvc模版組成;其中pod模版主要定義pod相關屬性資訊,對應pvc模版主要用來為對應pod提供儲存卷,該儲存卷可以使用sc資源來動態建立並關聯pv,也可以管理員手動建立並關聯對應的pv;
3、statefulset控制器的建立和使用
[root@master01 ~]# cat statefulset-demo.yaml apiVersion: v1 kind: Service metadata: name: nginx labels: app: nginx spec: ports: - port: 80 name: web clusterIP: None selector: app: nginx --- apiVersion: apps/v1 kind: StatefulSet metadata: name: web spec: selector: matchLabels: app: nginx serviceName: nginx replicas: 3 template: metadata: labels: app: nginx spec: terminationGracePeriodSeconds: 10 containers: - name: nginx image: nginx:1.14-alpine ports: - containerPort: 80 name: web volumeMounts: - name: www mountPath: /usr/share/nginx/html volumeClaimTemplates: - metadata: name: www spec: accessModes: [ "ReadWriteOnce" ] resources: requests: storage: 1Gi [root@master01 ~]#
提示:statefulset控制器依賴handless型別service來管理pod的訪問;statefulset控制器會根據定義的副本數量和定義的pod模板和pvc模板來啟動pod,並給每個pod分配一個固定不變的名稱,一般這個名稱都是statefulset名稱加上一個索引id,如上述清單,它會建立3個pod,這三個pod的名稱分別是web-0、web-1、web-2;pod名稱會結合handless service為其每個pod分配一個dns子域,訪問對應pod就可以直接用這個子域名訪問即可;子域名格式為$(pod_name).$(service_name).namespace_name.svc.叢集域名(如果在初始化為指定預設叢集域名為cluster.local);上述清單定義了一個handless服務,以及一個statefulset控制器,對應控制器下定義了一個pod模板,和一個pvc模板;其中在pod模板中的terminationGracePeriodSeconds欄位用來指定終止容器的寬限期時長,預設不指定為30秒;定義pvc模板需要用到volumeClaimTemplates欄位,該欄位的值為一個物件列表;其內部我們可以定義pvc模板;如果後端儲存支援動態供給pv,還可以在此模板中直接呼叫對應的sc資源;
在nfs伺服器上匯出共享目錄
[root@docker_registry ~]# cat /etc/exports /data/v1 192.168.0.0/24(rw,no_root_squash) /data/v2 192.168.0.0/24(rw,no_root_squash) /data/v3 192.168.0.0/24(rw,no_root_squash) [root@docker_registry ~]# ll /data/v* /data/v1: total 0 /data/v2: total 0 /data/v3: total 0 [root@docker_registry ~]# exportfs -av exporting 192.168.0.0/24:/data/v3 exporting 192.168.0.0/24:/data/v2 exporting 192.168.0.0/24:/data/v1 [root@docker_registry ~]#
手動建立pv
[root@master01 ~]# cat pv-demo.yaml apiVersion: v1 kind: PersistentVolume metadata: name: nfs-pv-v1 labels: storsystem: nfs rel: stable spec: capacity: storage: 1Gi volumeMode: Filesystem accessModes: ["ReadWriteOnce","ReadWriteMany","ReadOnlyMany"] persistentVolumeReclaimPolicy: Retain mountOptions: - hard - nfsvers=4.1 nfs: path: /data/v1 server: 192.168.0.99 --- apiVersion: v1 kind: PersistentVolume metadata: name: nfs-pv-v2 labels: storsystem: nfs rel: stable spec: capacity: storage: 1Gi volumeMode: Filesystem accessModes: ["ReadWriteOnce","ReadWriteMany","ReadOnlyMany"] persistentVolumeReclaimPolicy: Retain mountOptions: - hard - nfsvers=4.1 nfs: path: /data/v2 server: 192.168.0.99 --- apiVersion: v1 kind: PersistentVolume metadata: name: nfs-pv-v3 labels: storsystem: nfs rel: stable spec: capacity: storage: 1Gi volumeMode: Filesystem accessModes: ["ReadWriteOnce","ReadWriteMany","ReadOnlyMany"] persistentVolumeReclaimPolicy: Retain mountOptions: - hard - nfsvers=4.1 nfs: path: /data/v3 server: 192.168.0.99 [root@master01 ~]#
提示:手動建立pv需要將其pv回收策略設定為Retain,以免對應pod刪除以後,對應pv變成release狀態而導致不可用;
應用配置清單
[root@master01 ~]# kubectl apply -f pv-demo.yaml persistentvolume/nfs-pv-v1 created persistentvolume/nfs-pv-v2 created persistentvolume/nfs-pv-v3 created [root@master01 ~]# kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE nfs-pv-v1 1Gi RWO,ROX,RWX Retain Available 3s nfs-pv-v2 1Gi RWO,ROX,RWX Retain Available 3s nfs-pv-v3 1Gi RWO,ROX,RWX Retain Available 3s [root@master01 ~]#
提示:如果後端儲存支援動態供給pv,此步驟可以省略;直接建立sc資源物件,然後在statefulset資源清單中的pvc模板中引用對應的sc物件名稱就可以實現動態供給pv並繫結對應的pvc;
應用statefulset資源清單
[root@master01 ~]# kubectl apply -f statefulset-demo.yaml service/nginx created statefulset.apps/web created [root@master01 ~]# kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE nfs-pv-v1 1Gi RWO,ROX,RWX Retain Bound default/www-web-0 4m7s nfs-pv-v2 1Gi RWO,ROX,RWX Retain Bound default/www-web-1 4m7s nfs-pv-v3 1Gi RWO,ROX,RWX Retain Bound default/www-web-2 4m7s [root@master01 ~]# kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE www-web-0 Bound nfs-pv-v1 1Gi RWO,ROX,RWX 14s www-web-1 Bound nfs-pv-v2 1Gi RWO,ROX,RWX 12s www-web-2 Bound nfs-pv-v3 1Gi RWO,ROX,RWX 7s [root@master01 ~]# kubectl get sts NAME READY AGE web 3/3 27s [root@master01 ~]# kubectl get pods NAME READY STATUS RESTARTS AGE web-0 1/1 Running 0 38s web-1 1/1 Running 0 36s web-2 1/1 Running 0 31s [root@master01 ~]# kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 48m nginx ClusterIP None <none> 80/TCP 41s [root@master01 ~]#
提示:可以看到應用statefulset資源清單以後,對應的pv從available狀態變為了bound狀態,並且自動建立了3個pvc,對應pod的名稱不再是控制器名稱加一串隨機字串,而是statefulset控制器名稱加一個有序的數字;通常這個數字從0開始,依次向上加,我們把這個數字叫做對應pod的索引;
檢視statefulset控制器詳細資訊
[root@master01 ~]# kubectl describe sts web Name: web Namespace: default CreationTimestamp: Mon, 28 Dec 2020 19:34:11 +0800 Selector: app=nginx Labels: <none> Annotations: <none> Replicas: 3 desired | 3 total Update Strategy: RollingUpdate Partition: 0 Pods Status: 3 Running / 0 Waiting / 0 Succeeded / 0 Failed Pod Template: Labels: app=nginx Containers: nginx: Image: nginx:1.14-alpine Port: 80/TCP Host Port: 0/TCP Environment: <none> Mounts: /usr/share/nginx/html from www (rw) Volumes: <none> Volume Claims: Name: www StorageClass: Labels: <none> Annotations: <none> Capacity: 1Gi Access Modes: [ReadWriteOnce] Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulCreate 5m59s statefulset-controller create Claim www-web-0 Pod web-0 in StatefulSet web success Normal SuccessfulCreate 5m59s statefulset-controller create Pod web-0 in StatefulSet web successful Normal SuccessfulCreate 5m57s statefulset-controller create Claim www-web-1 Pod web-1 in StatefulSet web success Normal SuccessfulCreate 5m57s statefulset-controller create Pod web-1 in StatefulSet web successful Normal SuccessfulCreate 5m52s statefulset-controller create Claim www-web-2 Pod web-2 in StatefulSet web success Normal SuccessfulCreate 5m52s statefulset-controller create Pod web-2 in StatefulSet web successful [root@master01 ~]#
提示:從上面的詳細資訊中可以瞭解到,對應statefulset控制器是建立一個pvc,然後在建立一個pod;只有當第一個pod和pvc都成功並就緒以後,對應才會進行下一個pvc和pod的建立和掛載;簡單講它裡面是有序序列進行的;
驗證:在k8s叢集上任意節點檢視對應nginx服務名稱,看看是否能夠查到對應服務名稱域名下的pod記錄
安裝dns工具包
[root@master01 ~]# yum install -y bind-utils
用dig工具檢視對應nginx.default.cluster.local域名在coredns上的解析記錄
[root@master01 ~]# kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 64m nginx ClusterIP None <none> 80/TCP 20m [root@master01 ~]# kubectl get svc -n kube-system NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 20d [root@master01 ~]# dig nginx.default.svc.cluster.local @10.96.0.10 ; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.el7_9.3 <<>> nginx.default.svc.cluster.local @10.96.0.10 ;; global options: +cmd ;; Got answer: ;; WARNING: .local is reserved for Multicast DNS ;; You are currently testing what happens when an mDNS query is leaked to DNS ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 61539 ;; flags: qr aa rd; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;nginx.default.svc.cluster.local. IN A ;; ANSWER SECTION: nginx.default.svc.cluster.local. 30 IN A 10.244.2.109 nginx.default.svc.cluster.local. 30 IN A 10.244.4.27 nginx.default.svc.cluster.local. 30 IN A 10.244.3.108 ;; Query time: 0 msec ;; SERVER: 10.96.0.10#53(10.96.0.10) ;; WHEN: Mon Dec 28 19:54:34 CST 2020 ;; MSG SIZE rcvd: 201 [root@master01 ~]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES web-0 1/1 Running 0 22m 10.244.4.27 node04.k8s.org <none> <none> web-1 1/1 Running 0 22m 10.244.2.109 node02.k8s.org <none> <none> web-2 1/1 Running 0 22m 10.244.3.108 node03.k8s.org <none> <none> [root@master01 ~]#
提示:從上面的查詢結果可以看到,對應default名稱空間下nginx服務名稱,對應在coredns上的記錄有3條;並且對應的解析記錄就是對應服務後端的podip地址;
驗證:查詢web-0的記錄是否是對應的web-0這個pod的ip地址呢?
[root@master01 ~]# kubectl get pods web-0 -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES web-0 1/1 Running 0 24m 10.244.4.27 node04.k8s.org <none> <none> [root@master01 ~]# dig web-0.nginx.default.svc.cluster.local @10.96.0.10 ; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.el7_9.3 <<>> web-0.nginx.default.svc.cluster.local @10.96.0.10 ;; global options: +cmd ;; Got answer: ;; WARNING: .local is reserved for Multicast DNS ;; You are currently testing what happens when an mDNS query is leaked to DNS ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 13000 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;web-0.nginx.default.svc.cluster.local. IN A ;; ANSWER SECTION: web-0.nginx.default.svc.cluster.local. 30 IN A 10.244.4.27 ;; Query time: 0 msec ;; SERVER: 10.96.0.10#53(10.96.0.10) ;; WHEN: Mon Dec 28 19:58:58 CST 2020 ;; MSG SIZE rcvd: 119 [root@master01 ~]#
提示:可以看到在對應查詢服務名稱域名前加上對應的pod名稱,在coredns上能夠查詢到對應pod的ip記錄,這說明,後續訪問我們可以直接通過pod名稱加服務名稱直接訪問到對應pod;
驗證:把叢集節點的dns伺服器更改為coredns服務ip,然後使用pod名稱加服務名稱域名的方式訪問pod,看看是否能夠訪問到pod?
[root@master01 ~]# cat /etc/resolv.conf # Generated by NetworkManager search k8s.org nameserver 10.96.0.10 [root@master01 ~]# curl web-0.nginx.default.svc.cluster.local <html> <head><title>403 Forbidden</title></head> <body bgcolor="white"> <center><h1>403 Forbidden</h1></center> <hr><center>nginx/1.14.2</center> </body> </html> [root@master01 ~]#
提示:這裡能夠響應403,說明pod能夠正常訪問,只不過對應pod沒有主頁,所以提示403;
驗證:進入對應pod裡面,提供主頁頁面,再次訪問,看看是否能夠訪問到對應的頁面內容呢?
[root@master01 ~]# kubectl exec -it web-0 -- /bin/sh / # cd /usr/share/nginx/html/ /usr/share/nginx/html # ls /usr/share/nginx/html # echo "this web-0 pod index" > index.html /usr/share/nginx/html # ls index.html /usr/share/nginx/html # cat index.html this web-0 pod index /usr/share/nginx/html # exit [root@master01 ~]# curl web-0.nginx.default.svc.cluster.local this web-0 pod index [root@master01 ~]#
提示:可以看到對應pod能夠被訪問到;
刪除web-0,看看對應pod是否自動重建?
[root@master01 ~]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES web-0 1/1 Running 0 33m 10.244.4.27 node04.k8s.org <none> <none> web-1 1/1 Running 0 33m 10.244.2.109 node02.k8s.org <none> <none> web-2 1/1 Running 0 33m 10.244.3.108 node03.k8s.org <none> <none> [root@master01 ~]# kubectl delete pod web-0 pod "web-0" deleted [root@master01 ~]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES web-0 1/1 Running 0 7s 10.244.4.28 node04.k8s.org <none> <none> web-1 1/1 Running 0 33m 10.244.2.109 node02.k8s.org <none> <none> web-2 1/1 Running 0 33m 10.244.3.108 node03.k8s.org <none> <none> [root@master01 ~]#
提示:可以看到手動刪除web-0以後,對應控制器會自動根據pod模板,重建一個名稱為web-0的pod執行起來;
驗證:訪問新建後的pod,看看是否能夠訪問到對應的主頁呢?
[root@master01 ~]# curl web-0.nginx.default.svc.cluster.local this web-0 pod index [root@master01 ~]#
提示:可以看到使用pod名稱加服務名稱域名,能夠正常訪問到對應pod的主頁,這意味著新建的pod能夠自動將之前刪除的pod的pvc儲存卷掛載到自己對一個的目錄下;
擴充套件pod副本
在nfs伺服器上建立共享目錄,並匯出對應的目錄
[root@docker_registry ~]# mkdir -pv /data/v{4,5,6} mkdir: created directory ‘/data/v4’ mkdir: created directory ‘/data/v5’ mkdir: created directory ‘/data/v6’ [root@docker_registry ~]# echo "/data/v4 192.168.0.0/24(rw,no_root_squash)" >> /etc/exports [root@docker_registry ~]# echo "/data/v5 192.168.0.0/24(rw,no_root_squash)" >> /etc/exports [root@docker_registry ~]# echo "/data/v6 192.168.0.0/24(rw,no_root_squash)" >> /etc/exports [root@docker_registry ~]# cat /etc/exports /data/v1 192.168.0.0/24(rw,no_root_squash) /data/v2 192.168.0.0/24(rw,no_root_squash) /data/v3 192.168.0.0/24(rw,no_root_squash) /data/v4 192.168.0.0/24(rw,no_root_squash) /data/v5 192.168.0.0/24(rw,no_root_squash) /data/v6 192.168.0.0/24(rw,no_root_squash) [root@docker_registry ~]# exportfs -av exporting 192.168.0.0/24:/data/v6 exporting 192.168.0.0/24:/data/v5 exporting 192.168.0.0/24:/data/v4 exporting 192.168.0.0/24:/data/v3 exporting 192.168.0.0/24:/data/v2 exporting 192.168.0.0/24:/data/v1 [root@docker_registry ~]#
複製建立pv的資源清單,更改為要建立pv對應的配置
[root@master01 ~]# cat pv-demo2.yaml apiVersion: v1 kind: PersistentVolume metadata: name: nfs-pv-v4 labels: storsystem: nfs rel: stable spec: capacity: storage: 1Gi volumeMode: Filesystem accessModes: ["ReadWriteOnce","ReadWriteMany","ReadOnlyMany"] persistentVolumeReclaimPolicy: Retain mountOptions: - hard - nfsvers=4.1 nfs: path: /data/v4 server: 192.168.0.99 --- apiVersion: v1 kind: PersistentVolume metadata: name: nfs-pv-v5 labels: storsystem: nfs rel: stable spec: capacity: storage: 1Gi volumeMode: Filesystem accessModes: ["ReadWriteOnce","ReadWriteMany","ReadOnlyMany"] persistentVolumeReclaimPolicy: Retain mountOptions: - hard - nfsvers=4.1 nfs: path: /data/v5 server: 192.168.0.99 --- apiVersion: v1 kind: PersistentVolume metadata: name: nfs-pv-v6 labels: storsystem: nfs rel: stable spec: capacity: storage: 1Gi volumeMode: Filesystem accessModes: ["ReadWriteOnce","ReadWriteMany","ReadOnlyMany"] persistentVolumeReclaimPolicy: Retain mountOptions: - hard - nfsvers=4.1 nfs: path: /data/v6 server: 192.168.0.99 [root@master01 ~]#
應用配置清單,建立pv
[root@master01 ~]# kubectl apply -f pv-demo2.yaml persistentvolume/nfs-pv-v4 created persistentvolume/nfs-pv-v5 created persistentvolume/nfs-pv-v6 created [root@master01 ~]# kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE nfs-pv-v1 1Gi RWO,ROX,RWX Retain Bound default/www-web-0 55m nfs-pv-v2 1Gi RWO,ROX,RWX Retain Bound default/www-web-1 55m nfs-pv-v3 1Gi RWO,ROX,RWX Retain Bound default/www-web-2 55m nfs-pv-v4 1Gi RWO,ROX,RWX Retain Available 4s nfs-pv-v5 1Gi RWO,ROX,RWX Retain Available 4s nfs-pv-v6 1Gi RWO,ROX,RWX Retain Available 4s [root@master01 ~]#
擴充套件sts副本數為6個
[root@master01 ~]# kubectl get sts NAME READY AGE web 3/3 53m [root@master01 ~]# kubectl scale sts web --replicas=6 statefulset.apps/web scaled [root@master01 ~]#
檢視對應的pod擴充套件過程
[root@master01 ~]# kubectl get pod -w NAME READY STATUS RESTARTS AGE web-0 1/1 Running 0 19m web-1 1/1 Running 0 53m web-2 1/1 Running 0 53m web-3 0/1 Pending 0 0s web-3 0/1 Pending 0 0s web-3 0/1 Pending 0 0s web-3 0/1 ContainerCreating 0 0s web-3 1/1 Running 0 2s web-4 0/1 Pending 0 0s web-4 0/1 Pending 0 0s web-4 0/1 Pending 0 2s web-4 0/1 ContainerCreating 0 2s web-4 1/1 Running 0 4s web-5 0/1 Pending 0 0s web-5 0/1 Pending 0 0s web-5 0/1 Pending 0 2s web-5 0/1 ContainerCreating 0 2s web-5 1/1 Running 0 4s
提示:從上面的擴充套件過程可以看到對應pod是序列擴充套件,當web-3就緒running以後,才會進行web-4,依次類推;
檢視pv和pvc
[root@master01 ~]# kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE nfs-pv-v1 1Gi RWO,ROX,RWX Retain Bound default/www-web-0 60m nfs-pv-v2 1Gi RWO,ROX,RWX Retain Bound default/www-web-1 60m nfs-pv-v3 1Gi RWO,ROX,RWX Retain Bound default/www-web-2 60m nfs-pv-v4 1Gi RWO,ROX,RWX Retain Bound default/www-web-4 5m6s nfs-pv-v5 1Gi RWO,ROX,RWX Retain Bound default/www-web-3 5m6s nfs-pv-v6 1Gi RWO,ROX,RWX Retain Bound default/www-web-5 5m6s [root@master01 ~]# kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE www-web-0 Bound nfs-pv-v1 1Gi RWO,ROX,RWX 57m www-web-1 Bound nfs-pv-v2 1Gi RWO,ROX,RWX 57m www-web-2 Bound nfs-pv-v3 1Gi RWO,ROX,RWX 57m www-web-3 Bound nfs-pv-v5 1Gi RWO,ROX,RWX 3m31s www-web-4 Bound nfs-pv-v4 1Gi RWO,ROX,RWX 3m29s www-web-5 Bound nfs-pv-v6 1Gi RWO,ROX,RWX 3m25s [root@master01 ~]#
提示:從上面的結果可以看到pv和pvc都處於bound狀態;
縮減pod數量為4個
[root@master01 ~]# kubectl scale sts web --replicas=4 statefulset.apps/web scaled [root@master01 ~]# kubectl get pods NAME READY STATUS RESTARTS AGE web-0 1/1 Running 0 28m web-1 1/1 Running 0 61m web-2 1/1 Running 0 61m web-3 1/1 Running 0 7m46s web-4 1/1 Running 0 7m44s web-5 0/1 Terminating 0 7m40s [root@master01 ~]# kubectl get pods NAME READY STATUS RESTARTS AGE web-0 1/1 Running 0 28m web-1 1/1 Running 0 62m web-2 1/1 Running 0 62m web-3 1/1 Running 0 8m4s [root@master01 ~]# kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE nfs-pv-v1 1Gi RWO,ROX,RWX Retain Bound default/www-web-0 66m nfs-pv-v2 1Gi RWO,ROX,RWX Retain Bound default/www-web-1 66m nfs-pv-v3 1Gi RWO,ROX,RWX Retain Bound default/www-web-2 66m nfs-pv-v4 1Gi RWO,ROX,RWX Retain Bound default/www-web-4 10m nfs-pv-v5 1Gi RWO,ROX,RWX Retain Bound default/www-web-3 10m nfs-pv-v6 1Gi RWO,ROX,RWX Retain Bound default/www-web-5 10m [root@master01 ~]# kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE www-web-0 Bound nfs-pv-v1 1Gi RWO,ROX,RWX 62m www-web-1 Bound nfs-pv-v2 1Gi RWO,ROX,RWX 62m www-web-2 Bound nfs-pv-v3 1Gi RWO,ROX,RWX 62m www-web-3 Bound nfs-pv-v5 1Gi RWO,ROX,RWX 8m13s www-web-4 Bound nfs-pv-v4 1Gi RWO,ROX,RWX 8m11s www-web-5 Bound nfs-pv-v6 1Gi RWO,ROX,RWX 8m7s [root@master01 ~]#
提示:可以看到縮減pod它會從索引號最大的pod逆序縮減,縮減以後對應pv和pvc的狀態依舊是bound狀態;擴縮減pod副本過程如下圖所示
提示:上圖主要描述了sts控制器上的pod縮減-->擴充套件pod副本的過程;縮減pod副本數量,對應後端的pvc和pv的狀態都是不變的,後續再增加pod副本數量,對應pvc能夠根據pod名稱自動的關聯到對應的pod上,使得擴充套件後對應名稱的pod和之前縮減pod的資料儲存一致;
滾動更新pod版本
[root@master01 ~]# kubectl set image sts web nginx=nginx:1.16-alpine statefulset.apps/web image updated [root@master01 ~]#
檢視更新過程
[root@master01 ~]# kubectl get pod -w NAME READY STATUS RESTARTS AGE web-0 1/1 Running 0 38m web-1 1/1 Running 0 71m web-2 1/1 Running 0 71m web-3 1/1 Running 0 17m web-3 1/1 Terminating 0 20m web-3 0/1 Terminating 0 20m web-3 0/1 Terminating 0 20m web-3 0/1 Terminating 0 20m web-3 0/1 Pending 0 0s web-3 0/1 Pending 0 0s web-3 0/1 ContainerCreating 0 0s web-3 1/1 Running 0 1s web-2 1/1 Terminating 0 74m web-2 0/1 Terminating 0 74m web-2 0/1 Terminating 0 74m web-2 0/1 Terminating 0 74m web-2 0/1 Pending 0 0s web-2 0/1 Pending 0 0s web-2 0/1 ContainerCreating 0 0s web-2 1/1 Running 0 2s web-1 1/1 Terminating 0 74m web-1 0/1 Terminating 0 74m web-1 0/1 Terminating 0 75m web-1 0/1 Terminating 0 75m web-1 0/1 Pending 0 0s web-1 0/1 Pending 0 0s web-1 0/1 ContainerCreating 0 0s web-1 1/1 Running 0 2s web-0 1/1 Terminating 0 41m web-0 0/1 Terminating 0 41m web-0 0/1 Terminating 0 41m web-0 0/1 Terminating 0 41m web-0 0/1 Pending 0 0s web-0 0/1 Pending 0 0s web-0 0/1 ContainerCreating 0 0s web-0 1/1 Running 0 1s
提示:從上面更新過程來看,statefulset控制器滾動更新是從索引號最大的pod開始更新,並且它一次更新一個pod,只有等到上一個pod更新完畢,並且其狀態為running以後,才開始更新第二個,依次類推;
驗證:檢視對應sts資訊,看看對應版本是否更新為我們指定的映象版本?
[root@master01 ~]# kubectl get sts -o wide NAME READY AGE CONTAINERS IMAGES web 4/4 79m nginx nginx:1.16-alpine [root@master01 ~]#
回滾pod版本為上一個版本
[root@master01 ~]# kubectl get sts -o wide NAME READY AGE CONTAINERS IMAGES web 4/4 80m nginx nginx:1.16-alpine [root@master01 ~]# kubectl rollout undo sts/web statefulset.apps/web rolled back [root@master01 ~]# kubectl get pods NAME READY STATUS RESTARTS AGE web-0 1/1 Running 0 6m6s web-1 1/1 Running 0 6m13s web-2 0/1 ContainerCreating 0 1s web-3 1/1 Running 0 12s [root@master01 ~]# kubectl get pods NAME READY STATUS RESTARTS AGE web-0 1/1 Running 0 6m14s web-1 0/1 ContainerCreating 0 1s web-2 1/1 Running 0 9s web-3 1/1 Running 0 20s [root@master01 ~]# kubectl get pods NAME READY STATUS RESTARTS AGE web-0 1/1 Running 0 1s web-1 1/1 Running 0 8s web-2 1/1 Running 0 16s web-3 1/1 Running 0 27s [root@master01 ~]# kubectl get sts -o wide NAME READY AGE CONTAINERS IMAGES web 4/4 81m nginx nginx:1.14-alpine [root@master01 ~]#
使用金絲雀更新策略
sts控制器預設的更新策略是依次從索引最大的pod開始逆行更新,先刪除一個pod等待對應pod更新完畢以後,狀態處於running以後,接著更新第二個依次更新完所有的pod,要想實現金絲雀更新,我們需要告訴sts更新在那個位置;在deploy控制器中我們使用的是kubectl rollout pause命令來暫停更新實現金絲雀釋出策略,當然在sts中也可以;除此之外,sts還支援通過sts.spec.updateStrategy.rollingUpdate.partition欄位的值來控制器更新數量;預設partition的值為0,表示更新到索引大於0的pod位置,即全部更新;如下圖所示
提示:在sts控制器中更新pod模板的映象版本,可以使用partition這個欄位來控制更新到那個位置,partition=3表示更新索引大於等於3的pod,小於3的pod就不更新;partition=0表示全部更新;
示例:線上更改sts控制器的partition欄位的值為3
提示:線上修改sts的配置,使用kubectl edit命令指定型別和對應控制器例項,就可以進入編輯對應配置檔案的介面,找到updateStrategy欄位下的rollingUpdate欄位下的partition欄位,把原有的0更改為3,儲存退出即可生效;當然我們也可以直接更改配置清單,然後再重新應用一下也行;如果配置清單中沒有定義,可以加上對應的欄位即可;
再次更新pod版本,看看它是否只更新索引於等3的pod呢?
[root@master01 ~]# kubectl get sts -o wide NAME READY AGE CONTAINERS IMAGES web 4/4 146m nginx nginx:1.14-alpine [root@master01 ~]# kubectl set image sts web nginx=nginx:1.16-alpine statefulset.apps/web image updated [root@master01 ~]# kubectl get sts -o wide NAME READY AGE CONTAINERS IMAGES web 3/4 146m nginx nginx:1.16-alpine [root@master01 ~]#
檢視更新過程
[root@master01 ~]# kubectl get pods -w NAME READY STATUS RESTARTS AGE web-0 1/1 Running 0 64m web-1 1/1 Running 0 64m web-2 1/1 Running 0 64m web-3 1/1 Running 0 64m web-3 1/1 Terminating 0 51s web-3 0/1 Terminating 0 51s web-3 0/1 Terminating 0 60s web-3 0/1 Terminating 0 60s web-3 0/1 Pending 0 0s web-3 0/1 Pending 0 0s web-3 0/1 ContainerCreating 0 0s web-3 1/1 Running 0 1s ^C[root@master01 ~]# kubectl get pods NAME READY STATUS RESTARTS AGE web-0 1/1 Running 0 65m web-1 1/1 Running 0 65m web-2 1/1 Running 0 65m web-3 1/1 Running 0 50s [root@master01 ~]#
提示:從上面的更新過程可以看到,對應sts控制器此時更新只是更新了web-3,其餘索引小於3的pod並沒有發生更新操作;
恢復全部更新
提示:從上面的演示來看,我們把對應的控制器中的partition欄位的值從3更改為0以後,對應更新操作就理解開始執行;
以上就是sts控制器的相關使用說明,其實我上面使用nginx來演示sts控制器的相關操作,在生產環境中我們部署的是一個真正有狀態的服務,還要考慮怎麼去適配對應的叢集,每個pod怎麼加入到叢集,擴縮容怎麼做等等一系列運維操作都需要在pod模板中定義出來;
示例:在k8s上使用sts控制器部署zk叢集
apiVersion: v1 kind: Service metadata: name: zk-hs labels: app: zk spec: ports: - port: 2888 name: server - port: 3888 name: leader-election clusterIP: None selector: app: zk --- apiVersion: v1 kind: Service metadata: name: zk-cs labels: app: zk spec: ports: - port: 2181 name: client selector: app: zk --- apiVersion: policy/v1beta1 kind: PodDisruptionBudget metadata: name: zk-pdb spec: selector: matchLabels: app: zk maxUnavailable: 1 --- apiVersion: apps/v1 kind: StatefulSet metadata: name: zk spec: selector: matchLabels: app: zk serviceName: zk-hs replicas: 3 updateStrategy: type: RollingUpdate podManagementPolicy: Parallel template: metadata: labels: app: zk spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: "app" operator: In values: - zk-hs topologyKey: "kubernetes.io/hostname" containers: - name: kubernetes-zookeeper image: gcr.io/google-containers/kubernetes-zookeeper:1.0-3.4.10 resources: requests: memory: "1Gi" cpu: "0.5" ports: - containerPort: 2181 name: client - containerPort: 2888 name: server - containerPort: 3888 name: leader-election command: - sh - -c - "start-zookeeper \ --servers=3 \ --data_dir=/var/lib/zookeeper/data \ --data_log_dir=/var/lib/zookeeper/data/log \ --conf_dir=/opt/zookeeper/conf \ --client_port=2181 \ --election_port=3888 \ --server_port=2888 \ --tick_time=2000 \ --init_limit=10 \ --sync_limit=5 \ --heap=512M \ --max_client_cnxns=60 \ --snap_retain_count=3 \ --purge_interval=12 \ --max_session_timeout=40000 \ --min_session_timeout=4000 \ --log_level=INFO" readinessProbe: exec: command: - sh - -c - "zookeeper-ready 2181" initialDelaySeconds: 10 timeoutSeconds: 5 livenessProbe: exec: command: - sh - -c - "zookeeper-ready 2181" initialDelaySeconds: 10 timeoutSeconds: 5 volumeMounts: - name: data mountPath: /var/lib/zookeeper securityContext: runAsUser: 1000 fsGroup: 1000 volumeClaimTemplates: - metadata: name: data spec: accessModes: [ "ReadWriteOnce" ] storageClassName: gluster-dynamic resources: requests: storage: 5Gi
示例:在k8s上使用sts控制器部署etcd叢集
apiVersion: v1 kind: Service metadata: name: etcd labels: app: etcd annotations: # Create endpoints also if the related pod isn't ready service.alpha.kubernetes.io/tolerate-unready-endpoints: "true" spec: ports: - port: 2379 name: client - port: 2380 name: peer clusterIP: None selector: app: etcd-member --- apiVersion: v1 kind: Service metadata: name: etcd-client labels: app: etcd spec: ports: - name: etcd-client port: 2379 protocol: TCP targetPort: 2379 selector: app: etcd-member type: NodePort --- apiVersion: apps/v1 kind: StatefulSet metadata: name: etcd labels: app: etcd spec: serviceName: etcd # changing replicas value will require a manual etcdctl member remove/add # # command (remove before decreasing and add after increasing) replicas: 3 selector: matchLabels: app: etcd-member template: metadata: name: etcd labels: app: etcd-member spec: containers: - name: etcd image: "quay.io/coreos/etcd:v3.2.16" ports: - containerPort: 2379 name: client - containerPort: 2380 name: peer env: - name: CLUSTER_SIZE value: "3" - name: SET_NAME value: "etcd" volumeMounts: - name: data mountPath: /var/run/etcd command: - "/bin/sh" - "-ecx" - | IP=$(hostname -i) PEERS="" for i in $(seq 0 $((${CLUSTER_SIZE} - 1))); do PEERS="${PEERS}${PEERS:+,}${SET_NAME}-${i}=http://${SET_NAME}-${i}.${SET_NAME}:2380" done # start etcd. If cluster is already initialized the `--initial-*` options will be ignored. exec etcd --name ${HOSTNAME} \ --listen-peer-urls http://${IP}:2380 \ --listen-client-urls http://${IP}:2379,http://127.0.0.1:2379 \ --advertise-client-urls http://${HOSTNAME}.${SET_NAME}:2379 \ --initial-advertise-peer-urls http://${HOSTNAME}.${SET_NAME}:2380 \ --initial-cluster-token etcd-cluster-1 \ --initial-cluster ${PEERS} \ --initial-cluster-state new \ --data-dir /var/run/etcd/default.etcd volumeClaimTemplates: - metadata: name: data spec: storageClassName: gluster-dynamic accessModes: - "ReadWriteOnce" resources: requests: storage: 1Gi
提示:以上示例都是使用的sc資源物件自動建立pv並關聯pvc,在執行前請先準備好對應的儲存和建立好sc物件;如果不使用pv自動供給,可以先建立pv在應用資源清單(手動建立pv需要將其pvc模板中的storageClassName欄位刪除);
最後再來說一下k8s operator
從上述sts示例來看,我們要在k8s上部署一個真正意思上的有狀態服務,最重要的就是定義好pod模板,這個模板通常就是指定對應的映象內部怎麼加入叢集,對應pod擴縮容怎麼處理等等;根據不同的服務邏輯定義的方式各有不同,這樣一來使得在k8s上跑有狀態的服務就顯得格外的吃力;為此coreos想了個辦法,它把在k8s上跑有狀態應用的絕大部分運維操作,做成了一個sdk,這個sdk叫operator,使用者只需要針對這個sdk來開發一些適合自己業務需要用到的對應服務的運維操作程式;然後把此程式跑到k8s上;這樣一來針對專有服務就有專有的operator,使用者如果要在k8s上跑對應服務,只需要告訴對應的operator跑一個某某服務即可;簡單講operator就是一個針對某有狀態服務的全能運維,使用者需要在k8s上建立一個對應服務的叢集,就告訴對應的“運維”建立一個叢集,需要擴充套件/縮減叢集pod數量,告訴運維“擴充套件/縮減叢集”即可;至於該”運維“有哪些能力,取決開發此程式的程式設計師賦予了該operator哪些能力;這樣一來我們在k8s上跑有狀態的應用程式,只需要把對應的operator部署到k8s叢集,然後根據此operator來編寫對應的資源配置清單應用即可;在哪裡找operator呢?https://github.com/operator-framework/awesome-operators;該地址是一個operator的列表,裡面有很多服務的operator的網站地址;可以找到對應的服務,進入到對應網站檢視相關文件部署使用即可;