這是prometheus告警規則配置,實際告警要結合alertmanager使用,請看下一篇文章。
rule
https://samber.github.io/awesome-prometheus-alerts/rules
jvm案例
wget https://raw.githubusercontent.com/samber/awesome-prometheus-alerts/master/dist/rules/jvm/jvm-exporter.yml
檔案內容
點選檢視程式碼
groups:
- name: exceptionRule
rules:
- alert: exceptionAlert
expr: application_exception{application="userDemo"} < 10
for: 1m
labels:
severity: warning
team: frontend
annotations:
summary: "伺服器頻繁報錯"
description: "報錯的頻率達到(當前值:{{ $value }}%)"
- name: ckExceptionRule
rules:
- alert: ckExceptionAlert
expr: sum(increase(bbc_request_timer_ID_seconds_count{}[5m])) by (business_name) > 10
for: 2m
labels:
severity: warning
app: "gateway"
annotations:
summary: "test系統最近5分鐘服務異常"
description: "報錯的頻率達到(當前值:{{ $value }})"
檢查模版
./promtool check rules first_rules.yml
./promtool check rules jvm-exporter.yml
關閉
ps -ef |grep prometheus |awk '{print $2}'|xargs kill -9
啟動
nohup ./prometheus --config.file=./prometheus.yml --web.enable-lifecycle --storage.tsdb.retention.time=20d --web.external-url=http://8.219.198.22:9090 > server_prometheus.log 2>&1 &
重啟
curl -X POST http://localhost:9090/-/reload