環境資訊
軟體 | 版本號 |
---|---|
Linux | Centos7.9 |
k8s | v1.26.9 |
Docker | 25.0.4 |
kube-prometheus | v0.13.0 |
nginx-ingress-controller | v1.10.1 |
K8S叢集資訊(提前安裝好自己的叢集,本文不再講解叢集的安裝)
主機名 | IP |
---|---|
k8s-master | 192.168.2.11 |
k8s-node01 | 192.168.2.12 |
k8s-node02 | 192.168.2.13 |
一、安裝Prometheus Operator
版本選擇-複製下載地址下載到本地
wget https://github.com/prometheus-operator/kube-prometheus/archive/refs/tags/v0.13.0.tar.gz
github下載比較慢,可以用代理的加速地址,我這邊用的 wget https://mirror.ghproxy.com/https://github.com/prometheus-operator/kube-prometheus/archive/refs/tags/v0.13.0.tar.gz
可自行選擇版本,版本對照如圖:https://github.com/prometheus-operator/kube-prometheus/releases
1.解壓進入目錄:
tar -zxvf v0.13.0.tar.gz && cd kube-prometheus-0.13.0/manifests
第一個坑:國內無法訪問registry.k8s.io,需替換資源清單內帶使用倉庫映象的地址。
網上有說用bitnami倉庫也有用registry.aliyuncs.com/google_containers的,從這兩個倉庫我都pull失敗了,最用docker search找到了說是從官方sync的映象,測試環境也就不管了直接使用
2.替換映象地址
替換映象地址
sed -i 's#registry.k8s.io/kube-state-metrics#jerrymei#' kubeStateMetrics-deployment.yaml
sed -i 's#registry.k8s.io/prometheus-adapter#jerrymei#' prometheusAdapter-deployment.yaml
3.部署prometheeus
部署prometheeus
kubectl apply --server-side -f ./setup
kubectl create -f ./
也可以先下載下來重打tag,那樣需要把映象的下載模式imagePullPolicy從Always改成IfNotPresent(預設好像是用的Always,我沒看到配置檔案中存在imagePullPolicy的配置資訊,可以在部署後使用命令修改 kubectl -n monitoring get deploy 找到相應deploy在使用kubectl -n monitoring edit deploy <YOUR DEPLOY NAME>)
4.使用ingress提供外部訪問
k8s需要安裝ingress controller,我這裡選擇的是ingress-nginx controller
已安全裝,或者選擇其他ingress controller可跳過或參考官方文件:https://v1-26.docs.kubernetes.io/zh-cn/docs/concepts/services-networking/ingress-controllers/
1.安裝ingress-nginx controller,也可透過helm部署具體可參考官方文件
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.10.1/deploy/static/provider/cloud/deploy.yaml
###ingress控制器pod可能出現image下載失敗,可先下載該yaml檔案,修改image為 registry.cn-hangzhou.aliyuncs.com/google_containers/nginx-ingress-controller:v1.10.1 版本可根據實際情況更換
2.部署一個ingerss
kubectl apply -f ingress-prometheus.yaml
ingress-prometheus.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
namespace: monitoring
name: ingress-monitoring
spec:
ingressClassName: nginx
rules:
- host: "www.prometheus.com"
http:
paths:
- pathType: Prefix
path: "/"
backend:
service:
name: prometheus-k8s
port:
number: 9090
- host: "www.grafana.com"
http:
paths:
- pathType: Prefix
path: "/"
backend:
service:
name: grafana
port:
number: 3000
- host: "www.alertmanager.com"
http:
paths:
- pathType: Prefix
path: "/"
backend:
service:
name: alertmanager-main
port:
number: 9093
二.本機測試訪問(防火牆和selinux處於關閉狀態)
kubectl get ingress -n monitoring
內網測試域名,需要將host修改為服務實際配置的host
curl -H "host: www.prometheus.com" 10.99.98.214
curl -H "host: www.grafana.com" 10.99.98.214
curl -H "host: www.alertmanager.com" 10.99.98.214
全部提示:504 Gateway Time-out
第二個坑,直接curl ingres的ClusterIP報504,以下為排查思路
kubectl get pods -n monitoring -owide
kubectl get svc -n monitoring
1.排查ipvs負載規則,svc到pod規則正常
ipvsadm -L -n | egrep "3000\s"
2.進入pod,curl服務正常,任意pod中互相訪問正常,prometheus、grafana、altermanerge服務正常
kubectl -n monitoring exec -it grafana-79f47474f7-hxjh9 /bin/bash
2.直接訪問svc和後端服務pod的ClusterIP,都無響應無響應
3.透過port-forward將本地埠分別轉發到svc和pod,均訪問正常。
kubectl port-forward --address=0.0.0.0 svc/grafana 3000 -nmonitoring
kubectl port-forward --address=0.0.0.0 pod/grafana-79f47474f7-hxjh9 3000 -nmonitoring
4,最後還是透過一個帖子(https://zhuanlan.zhihu.com/p/624478715) 發現問題:
解決方法:Prometheus Operator 預設設定了 NetworkPolicy,需要手動刪除後才能訪問
kubectl delete -f manifests/prometheus-networkPolicy.yaml
kubectl delete -f manifests/grafana-networkPolicy.yaml
kubectl delete -f manifests/alertmanager-networkPolicy.yaml
思考:如果是因為networkPolicy的ingress規則導致的無法訪問,很好奇透過port-forward轉發到svc可以訪問,透過svc的ClusterIP卻無法訪問,這裡對k8s的網路理解的還不夠啊。
既然是networkPolicy的規則導致的,這裡檢視了三個服務的networkPolicy檔案,如果不打算解除安裝networkPolicy,應該也也可以在ingress-nginx contronaller的yaml檔案中新增networkPolicy允許透過的lable(未進行測試)
或者也可以透過修改三個服務的networkPolicy規則,為其新增ingerss-contronller已存在的labels(已測試成功)
app.kubernetes.io/name: ingress-nginx
根據推測嘗試修改grafana的networkPolicy
kubectl get networkPolicy -n monitoring
kubectl edit networkPolicy -n monitoring grafana
curl -H "host: www.grafana.com" 10.99.98.214
參考文件:
kubernetes官網ingress:
https://v1-26.docs.kubernetes.io/zh-cn/docs/concepts/services-networking/ingress/
kube-prometheus github地址:
https://github.com/prometheus-operator/kube-prometheus
ingress-nginx 官方地址:
https://kubernetes.github.io/ingress-nginx/deploy/
其他:
https://zhuanlan.zhihu.com/p/624478715
https://cloud.tencent.com/developer/article/2327634