prometheus告警配置

kunchengs發表於2024-09-05

這是prometheus告警規則配置,實際告警要結合alertmanager使用,請看下一篇文章。

rule
https://samber.github.io/awesome-prometheus-alerts/rules

jvm案例
wget https://raw.githubusercontent.com/samber/awesome-prometheus-alerts/master/dist/rules/jvm/jvm-exporter.yml

檔案內容

點選檢視程式碼

groups:
- name: exceptionRule
  rules:
  - alert: exceptionAlert
    expr: application_exception{application="userDemo"} < 10
    for: 1m
    labels:
      severity: warning
      team: frontend
    annotations:
      summary: "伺服器頻繁報錯"
      description: "報錯的頻率達到(當前值:{{ $value }}%)"
- name: ckExceptionRule
  rules:
  - alert: ckExceptionAlert
    expr: sum(increase(bbc_request_timer_ID_seconds_count{}[5m])) by (business_name) > 10
    for: 2m
    labels:
      severity: warning
      app: "gateway"
    annotations:
      summary: "test系統最近5分鐘服務異常"
      description: "報錯的頻率達到(當前值:{{ $value }})"

檢查模版
./promtool check rules first_rules.yml
./promtool check rules jvm-exporter.yml
關閉
ps -ef |grep prometheus |awk '{print $2}'|xargs kill -9
啟動
nohup ./prometheus --config.file=./prometheus.yml --web.enable-lifecycle --storage.tsdb.retention.time=20d --web.external-url=http://8.219.198.22:9090 > server_prometheus.log 2>&1 &

重啟
curl -X POST http://localhost:9090/-/reload

相關文章