下載kube-prometheus
wget https://github.com/prometheus-operator/kube-prometheus/archive/refs/tags/v0.14.0.tar.gz
安裝
tar -zxvf v0.14.0.tar.gz
cd v0.14.0
kubectl apply --server-side -f manifests/setup
kubectl wait --for condition=Established --all CustomResourceDefinition --namespace=monitoring
kubectl apply -f manifests/
檢視安裝情況
kubectl get pod,svc -n monitoring -o wide
刪除
kubectl delete --ignore-not-found=true -f manifests/ -f manifests/setup
參考
https://www.cnblogs.com/liugp/p/16444580.html
使用NodePort型別訪問
- prometheus-service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/component: prometheus
app.kubernetes.io/instance: k8s
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.54.1
name: prometheus-k8s
namespace: monitoring
spec:
type: NodePort #增加
ports:
- name: web
port: 9090
nodePort: 30080 #增加
targetPort: web
- name: reloader-web
port: 8080
targetPort: reloader-web
selector:
app.kubernetes.io/component: prometheus
app.kubernetes.io/instance: k8s
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: kube-prometheus
sessionAffinity: ClientIP
- grafana-service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/component: grafana
app.kubernetes.io/name: grafana
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 11.2.0
name: grafana
namespace: monitoring
spec:
type: NodePort #增加
ports:
- name: http
port: 3000
nodePort: 30081 #增加
targetPort: http
selector:
app.kubernetes.io/component: grafana
app.kubernetes.io/name: grafana
app.kubernetes.io/part-of: kube-prometheus
- alertmanager-service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/component: alert-router
app.kubernetes.io/instance: main
app.kubernetes.io/name: alertmanager
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 0.27.0
name: alertmanager-main
namespace: monitoring
spec:
type: NodePort #增加
ports:
- name: web
port: 9093
nodePort: 30082 #增加
targetPort: web
- name: reloader-web
port: 8080
targetPort: reloader-web
selector:
app.kubernetes.io/component: alert-router
app.kubernetes.io/instance: main
app.kubernetes.io/name: alertmanager
app.kubernetes.io/part-of: kube-prometheus
sessionAffinity: ClientIP
將service型別由"ClusterIP"改為"NodePort"無法使用nodeip+埠訪問服務解決方法.
解決方法是刪除monitoring名稱空間下的網路策略讓其從新載入pod間網路
kubectl delete networkpolicy --all -n monitoring
替換國內可用映象
cd 0.14.0/manifests
grep -riE 'ghcr.io/|egistry.k8s.io/|quay.io|k8s.gcr|grafana/' *
grafana/grafana:11.2.0
替換成 swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/grafana/grafana:11.2.0
registry.k8s.io/prometheus-adapter/prometheus-adapter:v0.12.0
替換成 swr.cn-north-4.myhuaweicloud.com/ddn-k8s/registry.k8s.io/prometheus-adapter/prometheus-adapter:v0.12.0
ghcr.io/jimmidyson/configmap-reload:v0.13.1
替換成 swr.cn-east-3.myhuaweicloud.com/kubesre/ghcr.io/jimmidyson/configmap-reload:v0.13.1
registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.13.0
替換成 swr.cn-north-4.myhuaweicloud.com/ddn-k8s/registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.13.0
檢視自定義資源CRD
kubectl get customresourcedefinitions.apiextensions.k8s.io | grep monitoring.coreos.com
定義規則
https://developer.aliyun.com/article/1046908
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
prometheus: k8s
ole: alert-rules
name: cusstom-rule
namespace: monitoring
spec:
groups:
- name: disk
rules:- alert: diskFree
annotations:
summary: "{{ $labels.job }} 專案例項 {{ $labels.instance }} 磁碟使用率大於 80%"
description: "{{ $labels.instance }} {{ $labels.mountpoint }} 磁碟使用率大於80% (當前的值: {{ $value }}%),請及時處理"
expr: |
(1-(node_filesystem_free_bytes{fstype=~"ext4|xfs",mountpoint!="/boot"} / node_filesystem_size_bytes{fstype=~"ext4|xfs",mountpoint!="/boot"}) )*100 > 80
for: 1m
labels:
level: disaster
severity: warning
- alert: diskFree
https://gitee.com/crazywjj/K8S/blob/main/promethues/prometheus告警規則.md
參考
https://cloud.tencent.cn/developer/article/2327634
https://docker.aityp.com/manage/add
https://github.com/prometheus-operator/kube-prometheus
https://www.cnblogs.com/huangjiabobk/p/18126130