prometheus配置MySQL郵件報警

沃趣科技發表於2019-11-04

原文網址 : http://blog.itpub.net/28218939/viewspace-2662524/

前兩期文章（引用沃趣技術—— 《prometheus監控多個MySQL例項》、《構建狂拽炫酷屌的MySQL監控平臺》）介紹了prometheus、grafana、exporter的安裝以及集中監控mysql節點的方法，這篇文章介紹一下prometheus的郵件報警配置。

alertmanager下載

prometheus報警配置需要用到alertmanager元件，這個元件可以到prometheus官網上進行下載。

https://prometheus.io/download/

prometheus配置MySQL郵件報警

由於最新版本的alertmanager元件配置郵箱通訊存在一些問題，我們這裡選擇在github上下載0.14版本的alertmanager。

https://github.com/prometheus/alertmanager

prometheus配置MySQL郵件報警

附具體下載地址：

https://github.com/prometheus/alertmanager/releases/download/v0.14.0/alertmanager-0.14.0.linux-amd64.tar.gz

alertmanager安裝配置

將下載的alertmanager包進行解壓安裝。

tar -xf alertmanager-0.14.0.linux-amd64.tar.gz
mv alertmanager-0.14.0.linux-amd64 /data/alertmanager

編輯alertmanager的配置檔案，新增郵箱資訊。

# cd /data/alertmanager
# cat alertmanager.yml
global:
  smtp_smarthost: smtp.exmail.xxx.com:465 # 發件人郵箱smtp地址
  smtp_auth_username: xxxx@xxx.com # 發件人郵箱賬號
  smtp_from: xxx@xxx.com # 發件人郵箱賬號
  smtp_auth_password: xxxxxx # 發件人郵箱密碼
  resolve_timeout: 5m
  smtp_require_tls: false
route:
  # group_by: ['alertname'] # 報警分組依據
  group_wait: 10s # 最初即第一次等待多久時間傳送一組警報的通知
  group_interval: 10s # 在傳送新警報前的等待時間
  repeat_interval: 1m # 傳送重複警報的週期 對於email配置中多頻繁
  receiver: 'email'
receivers:
- name: email
  email_configs:
  - send_resolved: true
    to: xxx@xxx.com # 收件人郵箱賬號

啟動alertmanager。

# cd /data/alertmanager
./alertmanager --config.file=alertmanager.yml &

alertmanager的預設埠為9093。

prometheus配置

在prometheus目錄下編輯報警模版alert_rules.yml，新增一些自定義報警項。

# cd /data/prometheus
# cat alert_rules.yml
groups:
- name: MySQL-rules
  rules:
  - alert: MySQL Status # 告警名稱
    expr: up == 0
    for: 5s # 滿足告警條件持續時間多久後，才會傳送告警
    annotations: # 解析項，詳細解釋告警資訊
      summary: "{{$labels.instance}}: MySQL has stop !!!"
      value: "{{$value}}"
      alertname: "MySQL資料庫停止執行"
      description: "檢測MySQL資料庫執行狀態"
      message: 當前資料庫例項{{$labels.instance}}已經停止執行，請及時處理
  - alert: MySQL Slave IO Thread Status # 告警名稱
    expr: mysql_slave_status_slave_io_running == 0
    for: 5s # 滿足告警條件持續時間多久後，才會傳送告警
    annotations: # 解析項，詳細解釋告警資訊
      summary: "{{$labels.instance}}: MySQL Slave IO Thread has stop !!!"
      value: "{{$value}}"
      alertname: "MySQL主從IO執行緒停止執行"
      description: "檢測MySQL主從IO執行緒執行狀態"
      message: 當前資料庫例項{{$labels.instance}} IO執行緒已經停止執行，請及時處理
  - alert: MySQL Slave SQL Thread Status # 告警名稱
    expr: mysql_slave_status_slave_sql_running == 0
    for: 5s # 滿足告警條件持續時間多久後，才會傳送告警
    annotations: # 解析項，詳細解釋告警資訊
      summary: "{{$labels.instance}}: MySQL Slave SQL Thread has stop !!!"
      value: "{{$value}}"
      alertname: "MySQL主從SQL執行緒停止執行"
      description: "檢測MySQL主從SQL執行緒執行狀態"
      message: 當前資料庫例項{{$labels.instance}} SQL執行緒已經停止執行，請及時處理
  - alert: MySQL Slave Delay Status # 告警名稱
    expr: mysql_slave_status_sql_delay == 30
    for: 5s # 滿足告警條件持續時間多久後，才會傳送告警
    annotations: # 解析項，詳細解釋告警資訊
      summary: "{{$labels.instance}}: MySQL Slave Delay has more than 30s !!!"
      value: "{{$value}}"
      alertname: "MySQL主從延時過大"
      description: "檢測MySQL主從延時狀態"
      message: 當前資料庫例項{{$labels.instance}} 主從延時狀態已經超過30s，請及時處理

在prometheus目錄下編輯prometheus的配置檔案，將監控的配置資訊新增到prometheus.yml。

# cd /data/prometheus
# cat prometheus.yml
# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - 172.18.0.24:9093 # 對應啟動的altermanager節點的9093埠
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "alert_rules.yml" # 對應前面編輯的報警模版alert_rules.yml檔案
# A scrape configuration containing exactly one endpoint to scrape:
scrape_configs:
  - file_sd_configs:
    - files:
      - mysql.yml
    job_name: MySQL
    metrics_path: /metrics
    relabel_configs:
    - source_labels: [__address__]
      regex: (.*)
      target_label: __address__
      replacement: $1

編輯完成後，重新載入一下配置更改。