前文我們聊到了k8s的apiservice資源結合自定義apiserver擴充套件原生apiserver功能的相關話題,回顧請參考:https://www.cnblogs.com/qiuhom-1874/p/14279850.html;今天我們來聊一聊監控k8s叢集相關話題;
前文我們使用自定義apiserver metrics server擴充套件了原生apiserver的功能,讓其原生apiserver能夠通過kubectl top node/pod 命令來獲取對應節點或名稱空間下pod的cpu和記憶體指標資料;這些指標資料在一定程度上能夠讓我們清楚的知道對應pod或節點資源使用情況,本質上這也是一種監控方式;但是metrics server 採集的資料只有記憶體和cpu指標資料,在一定程度上不能滿足我們瞭解節點或pod的其他資料;這樣一來我們就需要有一款專業的監控系統來幫助我們監控k8s叢集節點或pod;Prometheus是一款高效能的監控程式,其內部主要有3個元件,Retrieval元件主要負責資料收集工作,它可以結合外部其他程式收集資料;TSDB元件主要是用來儲存指標資料,該元件是一個時間序列儲存系統;HttpServer元件主要用來對外提供restful api介面,為客戶端提供查詢介面;預設監聽在9090埠;
prometheus監控系統整體top
提示:上圖是Prometheus監控系統的top圖;Pushgateway元件類似Prometheus retrieval代理,它主要負責收集主動推送指標資料的pod的指標資料,在Prometheus 監控系統中也有主動監控和被動監控的概念,主動監控是指被監控端主動推送資料到server,被動監控是指被監控端被動等待server來拉去資料,預設情況Prometheus是工作為被動監控模式,即server主動到被監控端採集資料;節點級別metrics 資料可以使用node-exporter來收集,當然node-exporter也可以收集pod容器裡的指標資料;alertmanager主要用來為Prometheus監控系統提供告警功能;Prometheus web ui主要作用是為其提供一個web查詢頁面;
Prometheus 監控系統元件
kube-state-metrics:該元件主要用來為監控k8s叢集中的指標資料提供計數能力;比如k8s節點有幾個,pod的數量等等;
node-exporter:該元件主要作用是用來收集對應節點上的指標資料;
alertmanager:該元件主要用來為Prometheus監控系統提供告警功能;
prometheus-server:該元件主要用來儲存指標資料,處理指標資料,以及為使用者提供一個restful api查詢介面;
控制pod能夠被Prometheus抓取資料的註解資訊
prometheus.io/scrape:該註解資訊主要用來描述對應pod是否允許抓取指標資料,true表示允許,false表示不允許;
prometheus.io/path:用於描述抓取指標資料使用的url路徑,一般為/metrics
prometheus.io/port:用於描述對應抓取指標資料使用的埠資訊;
部署Prometheus監控系統
1、部署kube-state-metrics
建立kube-state-metrics rbac授權相關清單
[root@master01 kube-state-metrics]# cat kube-state-metrics-rbac.yaml apiVersion: v1 kind: ServiceAccount metadata: name: kube-state-metrics namespace: kube-system labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: kube-state-metrics labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile rules: - apiGroups: [""] resources: - configmaps - secrets - nodes - pods - services - resourcequotas - replicationcontrollers - limitranges - persistentvolumeclaims - persistentvolumes - namespaces - endpoints verbs: ["list", "watch"] - apiGroups: ["extensions","apps"] resources: - daemonsets - deployments - replicasets verbs: ["list", "watch"] - apiGroups: ["apps"] resources: - statefulsets verbs: ["list", "watch"] - apiGroups: ["batch"] resources: - cronjobs - jobs verbs: ["list", "watch"] - apiGroups: ["autoscaling"] resources: - horizontalpodautoscalers verbs: ["list", "watch"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: kube-state-metrics-resizer namespace: kube-system labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile rules: - apiGroups: [""] resources: - pods verbs: ["get"] - apiGroups: ["extensions","apps"] resources: - deployments resourceNames: ["kube-state-metrics"] verbs: ["get", "update"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: kube-state-metrics labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: kube-state-metrics subjects: - kind: ServiceAccount name: kube-state-metrics namespace: kube-system --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: kube-state-metrics namespace: kube-system labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: kube-state-metrics-resizer subjects: - kind: ServiceAccount name: kube-state-metrics namespace: kube-system [root@master01 kube-state-metrics]#
提示:上述清單主要建立了一個sa使用者,和兩個角色,並將sa使用者繫結之對應的角色上;讓其對應sa使用者擁有對應角色的相關許可權;
建立kube-state-metrics service配置清單
[root@master01 kube-state-metrics]# cat kube-state-metrics-service.yaml apiVersion: v1 kind: Service metadata: name: kube-state-metrics namespace: kube-system labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile kubernetes.io/name: "kube-state-metrics" annotations: prometheus.io/scrape: 'true' spec: ports: - name: http-metrics port: 8080 targetPort: http-metrics protocol: TCP - name: telemetry port: 8081 targetPort: telemetry protocol: TCP selector: k8s-app: kube-state-metrics [root@master01 kube-state-metrics]#
建立kube-state-metrics 部署清單
[root@master01 kube-state-metrics]# cat kube-state-metrics-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: kube-state-metrics namespace: kube-system labels: k8s-app: kube-state-metrics kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile version: v2.0.0-beta spec: selector: matchLabels: k8s-app: kube-state-metrics version: v2.0.0-beta replicas: 1 template: metadata: labels: k8s-app: kube-state-metrics version: v2.0.0-beta spec: priorityClassName: system-cluster-critical serviceAccountName: kube-state-metrics containers: - name: kube-state-metrics image: quay.io/coreos/kube-state-metrics:v2.0.0-beta ports: - name: http-metrics containerPort: 8080 - name: telemetry containerPort: 8081 readinessProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 5 timeoutSeconds: 5 - name: addon-resizer image: k8s.gcr.io/addon-resizer:1.8.7 resources: limits: cpu: 100m memory: 30Mi requests: cpu: 100m memory: 30Mi env: - name: MY_POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: MY_POD_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace volumeMounts: - name: config-volume mountPath: /etc/config command: - /pod_nanny - --config-dir=/etc/config - --container=kube-state-metrics - --cpu=100m - --extra-cpu=1m - --memory=100Mi - --extra-memory=2Mi - --threshold=5 - --deployment=kube-state-metrics volumes: - name: config-volume configMap: name: kube-state-metrics-config --- # Config map for resource configuration. apiVersion: v1 kind: ConfigMap metadata: name: kube-state-metrics-config namespace: kube-system labels: k8s-app: kube-state-metrics kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile data: NannyConfiguration: |- apiVersion: nannyconfig/v1alpha1 kind: NannyConfiguration [root@master01 kube-state-metrics]#
應用上述三個清單,部署kube-state-metrics元件
[root@master01 kube-state-metrics]# ls kube-state-metrics-deployment.yaml kube-state-metrics-rbac.yaml kube-state-metrics-service.yaml [root@master01 kube-state-metrics]# kubectl apply -f . deployment.apps/kube-state-metrics created configmap/kube-state-metrics-config created serviceaccount/kube-state-metrics created clusterrole.rbac.authorization.k8s.io/kube-state-metrics created role.rbac.authorization.k8s.io/kube-state-metrics-resizer created clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created rolebinding.rbac.authorization.k8s.io/kube-state-metrics created service/kube-state-metrics created [root@master01 kube-state-metrics]#
驗證:檢視對應的pod和service是否都成功建立?
提示:可以看到對應pod和svc都已經正常建立;
驗證:訪問對應service的8080埠,url為/metrics,看看是否能夠訪問到資料?
提示:可以看到訪問對應service的8080埠,url為/metrics能夠訪問到對應資料,說明kube-state-metrics元件安裝部署完成;
2、部署node-exporter
建立node-export service配置清單
[root@master01 node_exporter]# cat node-exporter-service.yaml apiVersion: v1 kind: Service metadata: name: node-exporter namespace: kube-system annotations: prometheus.io/scrape: "true" labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile kubernetes.io/name: "NodeExporter" spec: clusterIP: None ports: - name: metrics port: 9100 protocol: TCP targetPort: 9100 selector: k8s-app: node-exporter [root@master01 node_exporter]#
建立node-export 部署清單
[root@master01 node_exporter]# cat node-exporter-ds.yml apiVersion: apps/v1 kind: DaemonSet metadata: name: node-exporter namespace: kube-system labels: k8s-app: node-exporter kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile version: v1.0.1 spec: selector: matchLabels: k8s-app: node-exporter version: v1.0.1 updateStrategy: type: OnDelete template: metadata: labels: k8s-app: node-exporter version: v1.0.1 spec: priorityClassName: system-node-critical containers: - name: prometheus-node-exporter image: "prom/node-exporter:v1.0.1" imagePullPolicy: "IfNotPresent" args: - --path.procfs=/host/proc - --path.sysfs=/host/sys ports: - name: metrics containerPort: 9100 hostPort: 9100 volumeMounts: - name: proc mountPath: /host/proc readOnly: true - name: sys mountPath: /host/sys readOnly: true resources: limits: memory: 50Mi requests: cpu: 100m memory: 50Mi hostNetwork: true hostPID: true volumes: - name: proc hostPath: path: /proc - name: sys hostPath: path: /sys tolerations: - key: node-role.kubernetes.io/master operator: Exists effect: NoSchedule [root@master01 node_exporter]#
提示:上述清單主要用daemonSet控制器來執行node-exporter pod,並在對應pod上做了共享宿主機網路名稱空間和pid,以及對主節點汙點的容忍度;這樣node-exporter就可以在k8s的所有節點上執行一個pod,通過對應pod來採集對應節點上的指標資料;
應用上述兩個配置清單部署 node-exporter
[root@master01 node_exporter]# ls node-exporter-ds.yml node-exporter-service.yaml [root@master01 node_exporter]# kubectl apply -f . daemonset.apps/node-exporter created service/node-exporter created [root@master01 node_exporter]#
驗證:檢視對應pod和svc是否正常建立?
[root@master01 node_exporter]# kubectl get pods -l "k8s-app=node-exporter" -n kube-system NAME READY STATUS RESTARTS AGE node-exporter-6zgkz 1/1 Running 0 107s node-exporter-9mvxr 1/1 Running 0 107s node-exporter-jbll7 1/1 Running 0 107s node-exporter-s7vvt 1/1 Running 0 107s node-exporter-xmrjh 1/1 Running 0 107s [root@master01 node_exporter]# kubectl get svc -n kube-system NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 39d kube-state-metrics ClusterIP 10.110.110.216 <none> 8080/TCP,8081/TCP 20m metrics-server ClusterIP 10.98.59.116 <none> 443/TCP 46h node-exporter ClusterIP None <none> 9100/TCP 116s [root@master01 node_exporter]#
驗證:訪問任意節點上的9100埠,url為/metrics,看看是否能夠訪問到指標資料?
提示:可以看到對應埠下/metrics url能夠訪問到對應的資料,說明node-exporter元件部署成功;
3、部署alertmanager
建立alertmanager pvc配置清單
[root@master01 alertmanager]# cat alertmanager-pvc.yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: alertmanager namespace: kube-system labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: EnsureExists spec: # storageClassName: standard accessModes: - ReadWriteOnce resources: requests: storage: "2Gi" [root@master01 alertmanager]#
建立pv
[root@master01 ~]# cat pv-demo.yaml apiVersion: v1 kind: PersistentVolume metadata: name: nfs-pv-v1 spec: capacity: storage: 5Gi volumeMode: Filesystem accessModes: ["ReadWriteOnce","ReadWriteMany","ReadOnlyMany"] persistentVolumeReclaimPolicy: Retain mountOptions: - hard - nfsvers=4.1 nfs: path: /data/v1 server: 192.168.0.99 --- apiVersion: v1 kind: PersistentVolume metadata: name: nfs-pv-v2 spec: capacity: storage: 5Gi volumeMode: Filesystem accessModes: ["ReadWriteOnce","ReadWriteMany","ReadOnlyMany"] persistentVolumeReclaimPolicy: Retain mountOptions: - hard - nfsvers=4.1 nfs: path: /data/v2 server: 192.168.0.99 --- apiVersion: v1 kind: PersistentVolume metadata: name: nfs-pv-v3 spec: capacity: storage: 5Gi volumeMode: Filesystem accessModes: ["ReadWriteOnce","ReadWriteMany","ReadOnlyMany"] persistentVolumeReclaimPolicy: Retain mountOptions: - hard - nfsvers=4.1 nfs: path: /data/v3 server: 192.168.0.99 [root@master01 ~]#
應用清單建立pv
[root@master01 ~]# kubectl apply -f pv-demo.yaml persistentvolume/nfs-pv-v1 created persistentvolume/nfs-pv-v2 created persistentvolume/nfs-pv-v3 created [root@master01 ~]# kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE nfs-pv-v1 5Gi RWO,ROX,RWX Retain Available 4s nfs-pv-v2 5Gi RWO,ROX,RWX Retain Available 4s nfs-pv-v3 5Gi RWO,ROX,RWX Retain Available 4s [root@master01 ~]#
建立alertmanager service配置清單
[root@master01 alertmanager]# cat alertmanager-service.yaml apiVersion: v1 kind: Service metadata: name: alertmanager namespace: kube-system labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile kubernetes.io/name: "Alertmanager" spec: ports: - name: http port: 80 protocol: TCP targetPort: 9093 nodePort: 30093 selector: k8s-app: alertmanager type: "NodePort" [root@master01 alertmanager]#
建立alertmanager cm配置清單
[root@master01 alertmanager]# cat alertmanager-configmap.yaml apiVersion: v1 kind: ConfigMap metadata: name: alertmanager-config namespace: kube-system labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: EnsureExists data: alertmanager.yml: | global: null receivers: - name: default-receiver route: group_interval: 5m group_wait: 10s receiver: default-receiver repeat_interval: 3h [root@master01 alertmanager]#
建立alertmanager 部署清單
[root@master01 alertmanager]# cat alertmanager-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: alertmanager namespace: kube-system labels: k8s-app: alertmanager kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile version: v0.14.0 spec: replicas: 1 selector: matchLabels: k8s-app: alertmanager version: v0.14.0 template: metadata: labels: k8s-app: alertmanager version: v0.14.0 spec: priorityClassName: system-cluster-critical containers: - name: prometheus-alertmanager image: "prom/alertmanager:v0.14.0" imagePullPolicy: "IfNotPresent" args: - --config.file=/etc/config/alertmanager.yml - --storage.path=/data - --web.external-url=/ ports: - containerPort: 9093 readinessProbe: httpGet: path: /#/status port: 9093 initialDelaySeconds: 30 timeoutSeconds: 30 volumeMounts: - name: config-volume mountPath: /etc/config - name: storage-volume mountPath: "/data" subPath: "" resources: limits: cpu: 10m memory: 50Mi requests: cpu: 10m memory: 50Mi # - name: prometheus-alertmanager-configmap-reload # image: "jimmidyson/configmap-reload:v0.1" # imagePullPolicy: "IfNotPresent" # args: # - --volume-dir=/etc/config # - --webhook-url=http://localhost:9093/-/reload # volumeMounts: # - name: config-volume # mountPath: /etc/config # readOnly: true # resources: # limits: # cpu: 10m # memory: 10Mi # requests: # cpu: 10m # memory: 10Mi volumes: - name: config-volume configMap: name: alertmanager-config - name: storage-volume persistentVolumeClaim: claimName: alertmanager [root@master01 alertmanager]#
應用上述4個清單,部署alertmanager
[root@master01 alertmanager]# ls alertmanager-configmap.yaml alertmanager-deployment.yaml alertmanager-pvc.yaml alertmanager-service.yaml [root@master01 alertmanager]# kubectl apply -f . configmap/alertmanager-config created deployment.apps/alertmanager created persistentvolumeclaim/alertmanager created service/alertmanager created [root@master01 alertmanager]#
驗證:檢視對應pod和svc是否正常建立?
[root@master01 alertmanager]# kubectl get pods -l "k8s-app=alertmanager" -n kube-system NAME READY STATUS RESTARTS AGE alertmanager-6546bf7676-lt9jq 1/1 Running 0 85s [root@master01 alertmanager]# kubectl get svc -n kube-system NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE alertmanager NodePort 10.99.246.148 <none> 80:30093/TCP 92s kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 39d kube-state-metrics ClusterIP 10.110.110.216 <none> 8080/TCP,8081/TCP 31m metrics-server ClusterIP 10.98.59.116 <none> 443/TCP 47h node-exporter ClusterIP None <none> 9100/TCP 13m [root@master01 alertmanager]#
驗證:訪問任意節點的30093埠,看看是否能夠訪問到alertmanager?
提示:訪問對應的埠能夠訪問到上述介面,說明alertmanager 部署成功;
4、部署prometheus-server
建立Prometheus rabc相關授權配置清單
[root@master01 prometheus-server]# cat prometheus-rbac.yaml apiVersion: v1 kind: ServiceAccount metadata: name: prometheus namespace: kube-system labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: prometheus labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile rules: - apiGroups: - "" resources: - nodes - nodes/metrics - services - endpoints - pods verbs: - get - list - watch - apiGroups: - "" resources: - configmaps verbs: - get - nonResourceURLs: - "/metrics" verbs: - get --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: prometheus labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: prometheus subjects: - kind: ServiceAccount name: prometheus namespace: kube-system [root@master01 prometheus-server]#
建立Prometheus service配置清單
[root@master01 prometheus-server]# cat prometheus-service.yaml kind: Service apiVersion: v1 metadata: name: prometheus namespace: kube-system labels: kubernetes.io/name: "Prometheus" kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile spec: ports: - name: http port: 9090 protocol: TCP targetPort: 9090 nodePort: 30090 selector: k8s-app: prometheus type: NodePort [root@master01 prometheus-server]#
建立Prometheus cm配置清單
[root@master01 prometheus-server]# cat prometheus-configmap.yaml # Prometheus configuration format https://prometheus.io/docs/prometheus/latest/configuration/configuration/ apiVersion: v1 kind: ConfigMap metadata: name: prometheus-config namespace: kube-system labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: EnsureExists data: prometheus.yml: | scrape_configs: - job_name: prometheus static_configs: - targets: - localhost:9090 - job_name: kubernetes-apiservers kubernetes_sd_configs: - role: endpoints relabel_configs: - action: keep regex: default;kubernetes;https source_labels: - __meta_kubernetes_namespace - __meta_kubernetes_service_name - __meta_kubernetes_endpoint_port_name scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: true bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token - job_name: kubernetes-nodes-kubelet kubernetes_sd_configs: - role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: true bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token - job_name: kubernetes-nodes-cadvisor kubernetes_sd_configs: - role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - target_label: __metrics_path__ replacement: /metrics/cadvisor scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: true bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token - job_name: kubernetes-service-endpoints kubernetes_sd_configs: - role: endpoints relabel_configs: - action: keep regex: true source_labels: - __meta_kubernetes_service_annotation_prometheus_io_scrape - action: replace regex: (https?) source_labels: - __meta_kubernetes_service_annotation_prometheus_io_scheme target_label: __scheme__ - action: replace regex: (.+) source_labels: - __meta_kubernetes_service_annotation_prometheus_io_path target_label: __metrics_path__ - action: replace regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 source_labels: - __address__ - __meta_kubernetes_service_annotation_prometheus_io_port target_label: __address__ - action: labelmap regex: __meta_kubernetes_service_label_(.+) - action: replace source_labels: - __meta_kubernetes_namespace target_label: kubernetes_namespace - action: replace source_labels: - __meta_kubernetes_service_name target_label: kubernetes_name - job_name: kubernetes-services kubernetes_sd_configs: - role: service metrics_path: /probe params: module: - http_2xx relabel_configs: - action: keep regex: true source_labels: - __meta_kubernetes_service_annotation_prometheus_io_probe - source_labels: - __address__ target_label: __param_target - replacement: blackbox target_label: __address__ - source_labels: - __param_target target_label: instance - action: labelmap regex: __meta_kubernetes_service_label_(.+) - source_labels: - __meta_kubernetes_namespace target_label: kubernetes_namespace - source_labels: - __meta_kubernetes_service_name target_label: kubernetes_name - job_name: kubernetes-pods kubernetes_sd_configs: - role: pod relabel_configs: - action: keep regex: true source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_scrape - action: replace regex: (.+) source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_path target_label: __metrics_path__ - action: replace regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 source_labels: - __address__ - __meta_kubernetes_pod_annotation_prometheus_io_port target_label: __address__ - action: labelmap regex: __meta_kubernetes_pod_label_(.+) - action: replace source_labels: - __meta_kubernetes_namespace target_label: kubernetes_namespace - action: replace source_labels: - __meta_kubernetes_pod_name target_label: kubernetes_pod_name alerting: alertmanagers: - kubernetes_sd_configs: - role: pod tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token relabel_configs: - source_labels: [__meta_kubernetes_namespace] regex: kube-system action: keep - source_labels: [__meta_kubernetes_pod_label_k8s_app] regex: alertmanager action: keep - source_labels: [__meta_kubernetes_pod_container_port_number] regex: action: drop [root@master01 prometheus-server]#
建立Prometheus 部署清單
[root@master01 prometheus-server]# cat prometheus-statefulset.yaml apiVersion: apps/v1 kind: StatefulSet metadata: name: prometheus namespace: kube-system labels: k8s-app: prometheus kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile version: v2.24.0 spec: serviceName: "prometheus" replicas: 1 podManagementPolicy: "Parallel" updateStrategy: type: "RollingUpdate" selector: matchLabels: k8s-app: prometheus template: metadata: labels: k8s-app: prometheus spec: priorityClassName: system-cluster-critical serviceAccountName: prometheus initContainers: - name: "init-chown-data" image: "busybox:latest" imagePullPolicy: "IfNotPresent" command: ["chown", "-R", "65534:65534", "/data"] volumeMounts: - name: prometheus-data mountPath: /data subPath: "" containers: # - name: prometheus-server-configmap-reload # image: "jimmidyson/configmap-reload:v0.1" # imagePullPolicy: "IfNotPresent" # args: # - --volume-dir=/etc/config # - --webhook-url=http://localhost:9090/-/reload # volumeMounts: # - name: config-volume # mountPath: /etc/config # readOnly: true # resources: # limits: # cpu: 10m # memory: 10Mi # requests: # cpu: 10m # memory: 10Mi - name: prometheus-server image: "prom/prometheus:v2.24.0" imagePullPolicy: "IfNotPresent" args: - --config.file=/etc/config/prometheus.yml - --storage.tsdb.path=/data - --web.console.libraries=/etc/prometheus/console_libraries - --web.console.templates=/etc/prometheus/consoles - --web.enable-lifecycle ports: - containerPort: 9090 readinessProbe: httpGet: path: /-/ready port: 9090 initialDelaySeconds: 30 timeoutSeconds: 30 livenessProbe: httpGet: path: /-/healthy port: 9090 initialDelaySeconds: 30 timeoutSeconds: 30 # based on 10 running nodes with 30 pods each resources: limits: cpu: 200m memory: 1000Mi requests: cpu: 200m memory: 1000Mi volumeMounts: - name: config-volume mountPath: /etc/config - name: prometheus-data mountPath: /data subPath: "" terminationGracePeriodSeconds: 300 volumes: - name: config-volume configMap: name: prometheus-config volumeClaimTemplates: - metadata: name: prometheus-data spec: # storageClassName: standard accessModes: - ReadWriteOnce resources: requests: storage: "5Gi" [root@master01 prometheus-server]#
提示:應用上述清單前,請確保對應pv容量是否夠用;
應用上述4個清單部署Prometheus server
[root@master01 prometheus-server]# ls prometheus-configmap.yaml prometheus-rbac.yaml prometheus-service.yaml prometheus-statefulset.yaml [root@master01 prometheus-server]# kubectl apply -f . configmap/prometheus-config created serviceaccount/prometheus created clusterrole.rbac.authorization.k8s.io/prometheus created clusterrolebinding.rbac.authorization.k8s.io/prometheus created service/prometheus created statefulset.apps/prometheus created [root@master01 prometheus-server]#
驗證:檢視對應pod和svc是否成功建立?
[root@master01 prometheus-server]# kubectl get pods -l "k8s-app=prometheus" -n kube-system NAME READY STATUS RESTARTS AGE prometheus-0 1/1 Running 0 2m20s [root@master01 prometheus-server]# kubectl get svc -n kube-system NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE alertmanager NodePort 10.99.246.148 <none> 80:30093/TCP 10m kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 39d kube-state-metrics ClusterIP 10.110.110.216 <none> 8080/TCP,8081/TCP 40m metrics-server ClusterIP 10.98.59.116 <none> 443/TCP 47h node-exporter ClusterIP None <none> 9100/TCP 22m prometheus NodePort 10.111.155.1 <none> 9090:30090/TCP 2m27s [root@master01 prometheus-server]#
驗證:訪問任意節點的30090埠,看看對應Prometheus 是否能夠被訪問?
提示:能夠訪問到上述頁面,表示Prometheus server部署沒有問題;
通過上述介面檢視監控指標資料
提示:選擇對應要檢視的指標資料項,點選execute,對應影像就會呈現出來;到此Prometheus監控系統就部署完成了,接下來部署grafana,並配置grafana使用Prometheus資料來源展示監控資料;
部署grafana
建立grafana 部署清單
[root@master01 grafana]# cat grafana.yaml apiVersion: apps/v1 kind: Deployment metadata: name: monitoring-grafana namespace: kube-system spec: replicas: 1 selector: matchLabels: task: monitoring k8s-app: grafana template: metadata: labels: task: monitoring k8s-app: grafana spec: containers: - name: grafana image: k8s.gcr.io/heapster-grafana-amd64:v5.0.4 ports: - containerPort: 3000 protocol: TCP volumeMounts: - mountPath: /etc/ssl/certs name: ca-certificates readOnly: true - mountPath: /var name: grafana-storage env: # - name: INFLUXDB_HOST # value: monitoring-influxdb - name: GF_SERVER_HTTP_PORT value: "3000" - name: GF_AUTH_BASIC_ENABLED value: "false" - name: GF_AUTH_ANONYMOUS_ENABLED value: "true" - name: GF_AUTH_ANONYMOUS_ORG_ROLE value: Admin - name: GF_SERVER_ROOT_URL value: / volumes: - name: ca-certificates hostPath: path: /etc/ssl/certs - name: grafana-storage emptyDir: {} --- apiVersion: v1 kind: Service metadata: labels: kubernetes.io/cluster-service: 'true' kubernetes.io/name: monitoring-grafana name: monitoring-grafana namespace: kube-system spec: ports: - port: 80 targetPort: 3000 selector: k8s-app: grafana type: "NodePort" [root@master01 grafana]#
應用資源清單 部署grafana
[root@master01 grafana]# ls grafana.yaml [root@master01 grafana]# kubectl apply -f . deployment.apps/monitoring-grafana created service/monitoring-grafana created [root@master01 grafana]#
驗證:檢視對應pod和svc是否都建立?
[root@master01 grafana]# kubectl get pods -l "k8s-app=grafana" -n kube-system NAME READY STATUS RESTARTS AGE monitoring-grafana-6c74ccc5dd-grjzf 1/1 Running 0 87s [root@master01 grafana]# kubectl get svc -n kube-system NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE alertmanager NodePort 10.99.246.148 <none> 80:30093/TCP 82m kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 39d kube-state-metrics ClusterIP 10.110.110.216 <none> 8080/TCP,8081/TCP 112m metrics-server ClusterIP 10.98.59.116 <none> 443/TCP 2d monitoring-grafana NodePort 10.100.230.71 <none> 80:30196/TCP 92s node-exporter ClusterIP None <none> 9100/TCP 94m prometheus NodePort 10.111.155.1 <none> 9090:30090/TCP 74m [root@master01 grafana]#
提示:可以看到grafana svc暴露了30196埠;
驗證:訪問grafana service 暴露的埠,看看對應pod是否能夠被訪問?
提示:能夠訪問到上述頁面,表示grafana部署成功;
配置grafana
1、配置grafana的資料來源為Prometheus
2、新建監控皮膚
提示:進入grafana.com網站上,下載監控皮膚模板;
下載好模板檔案以後,匯入模板檔案到grafana
提示:選擇下載的模板檔案,然後再選擇對應的資料來源,點選import即可;上面沒有資料的原因是對應指標名稱和Prometheus中指標名稱不同導致的;我們可以根據自己環境Prometheus中指標資料名稱來修改模板檔案;