部署Prometheus Operator完整流程及踩坑解決思路

王又又的锅發表於2024-05-13

環境資訊

軟體 版本號
Linux Centos7.9
k8s v1.26.9
Docker 25.0.4
kube-prometheus v0.13.0
nginx-ingress-controller v1.10.1

K8S叢集資訊(提前安裝好自己的叢集,本文不再講解叢集的安裝)

主機名 IP
k8s-master 192.168.2.11
k8s-node01 192.168.2.12
k8s-node02 192.168.2.13

一、安裝Prometheus Operator

版本選擇-複製下載地址下載到本地
wget https://github.com/prometheus-operator/kube-prometheus/archive/refs/tags/v0.13.0.tar.gz

github下載比較慢,可以用代理的加速地址,我這邊用的 wget https://mirror.ghproxy.com/https://github.com/prometheus-operator/kube-prometheus/archive/refs/tags/v0.13.0.tar.gz

可自行選擇版本,版本對照如圖:https://github.com/prometheus-operator/kube-prometheus/releases

image

1.解壓進入目錄:

tar -zxvf v0.13.0.tar.gz && cd kube-prometheus-0.13.0/manifests

第一個坑:國內無法訪問registry.k8s.io,需替換資源清單內帶使用倉庫映象的地址。

image

網上有說用bitnami倉庫也有用registry.aliyuncs.com/google_containers的,從這兩個倉庫我都pull失敗了,最用docker search找到了說是從官方sync的映象,測試環境也就不管了直接使用

image

2.替換映象地址

替換映象地址
sed -i 's#registry.k8s.io/kube-state-metrics#jerrymei#' kubeStateMetrics-deployment.yaml
sed -i 's#registry.k8s.io/prometheus-adapter#jerrymei#' prometheusAdapter-deployment.yaml

3.部署prometheeus

部署prometheeus
kubectl apply --server-side -f ./setup
kubectl create -f ./

也可以先下載下來重打tag,那樣需要把映象的下載模式imagePullPolicy從Always改成IfNotPresent(預設好像是用的Always,我沒看到配置檔案中存在imagePullPolicy的配置資訊,可以在部署後使用命令修改 kubectl -n monitoring get deploy 找到相應deploy在使用kubectl -n monitoring edit deploy <YOUR DEPLOY NAME>)

4.使用ingress提供外部訪問

k8s需要安裝ingress controller,我這裡選擇的是ingress-nginx controller
已安全裝,或者選擇其他ingress controller可跳過或參考官方文件:https://v1-26.docs.kubernetes.io/zh-cn/docs/concepts/services-networking/ingress-controllers/
image

1.安裝ingress-nginx controller,也可透過helm部署具體可參考官方文件

kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.10.1/deploy/static/provider/cloud/deploy.yaml

###ingress控制器pod可能出現image下載失敗,可先下載該yaml檔案,修改image為 registry.cn-hangzhou.aliyuncs.com/google_containers/nginx-ingress-controller:v1.10.1 版本可根據實際情況更換

image

2.部署一個ingerss

kubectl apply -f ingress-prometheus.yaml

ingress-prometheus.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  namespace: monitoring
  name: ingress-monitoring
spec:
  ingressClassName: nginx
  rules:
  - host: "www.prometheus.com"
    http:
      paths:
      - pathType: Prefix
        path: "/"
        backend:
          service:
            name: prometheus-k8s
            port:
              number: 9090
  - host: "www.grafana.com"
    http:
      paths:
      - pathType: Prefix
        path: "/"
        backend:
          service:
            name: grafana
            port:
              number: 3000
  - host: "www.alertmanager.com"
    http:
      paths:
      - pathType: Prefix
        path: "/"
        backend:
          service:
            name: alertmanager-main
            port:
              number: 9093

二.本機測試訪問(防火牆和selinux處於關閉狀態)

kubectl get ingress -n monitoring

image

內網測試域名,需要將host修改為服務實際配置的host

curl -H "host: www.prometheus.com" 10.99.98.214
curl -H "host: www.grafana.com" 10.99.98.214
curl -H "host: www.alertmanager.com" 10.99.98.214

image
全部提示:504 Gateway Time-out

第二個坑,直接curl ingres的ClusterIP報504,以下為排查思路

kubectl get pods -n monitoring -owide
kubectl get svc -n monitoring

image

1.排查ipvs負載規則,svc到pod規則正常
ipvsadm -L -n | egrep "3000\s"

image

2.進入pod,curl服務正常,任意pod中互相訪問正常,prometheus、grafana、altermanerge服務正常

kubectl -n monitoring exec -it grafana-79f47474f7-hxjh9 /bin/bash

image

2.直接訪問svc和後端服務pod的ClusterIP,都無響應無響應

3.透過port-forward將本地埠分別轉發到svc和pod,均訪問正常

kubectl port-forward --address=0.0.0.0 svc/grafana 3000 -nmonitoring

kubectl port-forward --address=0.0.0.0 pod/grafana-79f47474f7-hxjh9 3000 -nmonitoring

image

4,最後還是透過一個帖子(https://zhuanlan.zhihu.com/p/624478715) 發現問題:

解決方法:Prometheus Operator 預設設定了 NetworkPolicy,需要手動刪除後才能訪問
kubectl delete -f manifests/prometheus-networkPolicy.yaml
kubectl delete -f manifests/grafana-networkPolicy.yaml
kubectl delete -f manifests/alertmanager-networkPolicy.yaml

思考:如果是因為networkPolicy的ingress規則導致的無法訪問,很好奇透過port-forward轉發到svc可以訪問,透過svc的ClusterIP卻無法訪問,這裡對k8s的網路理解的還不夠啊。

既然是networkPolicy的規則導致的,這裡檢視了三個服務的networkPolicy檔案,如果不打算解除安裝networkPolicy,應該也也可以在ingress-nginx contronaller的yaml檔案中新增networkPolicy允許透過的lable(未進行測試)

image

或者也可以透過修改三個服務的networkPolicy規則,為其新增ingerss-contronller已存在的labels(已測試成功)
app.kubernetes.io/name: ingress-nginx

根據推測嘗試修改grafana的networkPolicy

kubectl get networkPolicy -n monitoring
kubectl edit networkPolicy -n monitoring grafana

image

curl -H "host: www.grafana.com" 10.99.98.214

image

參考文件:

kubernetes官網ingress:
https://v1-26.docs.kubernetes.io/zh-cn/docs/concepts/services-networking/ingress/
kube-prometheus github地址:
https://github.com/prometheus-operator/kube-prometheus
ingress-nginx 官方地址:
https://kubernetes.github.io/ingress-nginx/deploy/

其他:
https://zhuanlan.zhihu.com/p/624478715
https://cloud.tencent.com/developer/article/2327634

相關文章