Prometheus alertmanager郵件傳送+grafana告警展示

老楊伏櫪發表於2021-08-18

前言

前面一篇部落格,我已經介紹了prometheus如何監控mysql。

這一篇我來介紹如何通過alertmanger進行告警郵件傳送(微信或釘釘類似,因為需要企業帳戶,我就不試了),以及如何通過grafana檢視告警。

開始演示

測試機器

Prometheus: 192.168.56.140

Host01:192.168.56.103

 

安裝alertmanager

獲取安裝包

wget https://github.com/prometheus/alertmanager/releases/download/v0.22.2/alertmanager-0.22.2.linux-amd64.tar.gz

 

建立目錄

mkdir -p /etc/alertmanager/

mkdir -p /etc/alertmanager/data

mkdir -p /etc/alertmanager/template/

 

獲取郵件模板

[root@prometheus-server template]# pwd

/etc/alertmanager/template

[root@prometheus-servertemplate]# wget https://raw.githubusercontent.com/prometheus/alertmanager/master/template/default.tmpl

 

複製檔案到/etc/alertmanager目錄

[root@prometheus-server ftpusr]cp ./alertmanager-0.22.2.linux-amd64/alertmanager* /etc/alertmanager/.

 

配置啟動服務

[root@prometheus-server alertmanager]# cat /etc/systemd/system/alertmanager.service

[Unit]

Description=Alertmanager

After=network.target



[Service]

Type=simple

User=prometheus

ExecStart=/etc/alertmanager/alertmanager \

  --config.file=/etc/alertmanager/alertmanager.yml \

  --storage.path=/etc/alertmanager/data

Restart=on-failure

 

[Install]

WantedBy=multi-user.target

配置alertmanager郵件傳送

如下我使用的是163郵箱來傳送郵件。

如需使用SMTP服務,需要先開啟服務。開啟後,增加授權碼,如下配置檔案裡面的smtp_auth_password填寫的是授權碼(而不是個人郵箱密碼)

 

 

 

 

[root@prometheus-server alertmanager]# cat alertmanager.yml

global:

  smtp_smarthost: 'smtp.163.com:25'

  smtp_from: 'xxxx@163.com'

  smtp_auth_username: 'xxxx@163.com'

  smtp_auth_password: 'xxxxxxxxxxx'

  smtp_require_tls: false

 

templates:

  - '/etc/alertmanager/template/*.tmpl'

 

route:

  group_by: ['alertname','cluster','service']

  group_wait: 10s

  group_interval: 10s

  repeat_interval: 10m

  receiver: 'default-receiver'

 

receivers:

- name: 'default-receiver'

  email_configs:

  - to: '20889922@qq.com'

    html: '{{ template "email.default.html" . }}'

    headers: { Subject: "Prometheus 告警測試郵件" }

啟動服務

service alertmanager start

 

prometheus配置alertmanager

prometheus.yml配置

# Alertmanager configuration

alerting:

  alertmanagers:

  - static_configs:

    - targets: ["localhost:9093"]

      # - alertmanager:9093

 

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.

rule_files:

  - "rules.yml"

  # - "first_rules.yml"

  # - "second_rules.yml"

rules.yml配置

[root@prometheus-server prometheus]# cat rules.yml

# hostStatsAlert

groups:

- name: hostStatsAlert

  rules:

  - alert: NodeDown

    expr: up == 0

    for: 1m

    labels:

      severity: "Critical"

    annotations:

      summary: "Instance {{$labels.instance}} down"

      description: "{{$labels.instance}} of job {{$labels.job}} has been down for more than 5 minutes."

  - alert: NodeCPUUsage

    expr: sum(avg without (cpu)(irate(node_cpu_seconds_total{mode!='idle'}[5m]))) by (instance) > 0.85

    for: 1m

    labels:

      severity: "Warning"

    annotations:

      summary: "Instance {{ $labels.instance }} CPU usgae high"

      description: "{{ $labels.instance }} CPU usage above 85% (current value: {{ $value }})"

  - alert: NodeMemoryUsage

    expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes)/node_memory_MemTotal_bytes > 0.85

    for: 1m

    labels:

      severity: "Warning"

    annotations:

      summary: "Instance {{ $labels.instance }} MEM usgae high"

      description: "{{ $labels.instance }} MEM usage above 85% (current value: {{ $value }})"

  - alert: filesystemUsageAlert

    expr: 100 - ((node_filesystem_avail_bytes{mountpoint="/",fstype=~"ext4|xfs"} * 100) / node_filesystem_size_bytes {mountpoint="/",fstype=~"ext4|xfs"}) > 85

    for: 1m

    labels:

      severity: "Warning"

    annotations:

      summary: "Instance {{ $labels.instance }} root DISK usgae high"

      description: "{{ $labels.instance }} root DISK usage above 85% (current value: {{ $value }})"

重新啟動prometheus使服務生效

 service prometheus restart

 

檢視告警郵件

等待幾分鐘後,可以看到郵件的告警資訊

 

登入alertmanager埠,也可檢視告警資訊

http://192.168.56.140:9093/

 

Alertmanager grafana展示

安裝

grafana-cli plugins install camptocamp-prometheus-alertmanager-datasource

 

安裝完後,重新啟動grafana-server

service grafana-server restart

新增alertmanager datasource

 

匯入dashboard

 

 

 

展示效果

 

 

碰到的問題與解決方法

告警展示的時候,雖然alerts有兩個告警,但downnode卻顯示沒有。

通過下載展示的JSON檔案,檢視原來是altername在告警檔案中,與JSON檔案中不匹配。匹配完成就OK了。

 

serverity在郵件顯示正常,但是grafana無法正常顯示。這個還沒調查清楚。

估計得需要谷歌了。但是,你能體會中國人無法上谷歌的痛苦嗎?

參考資料:

https://www.cnblogs.com/danny-djy/p/11097726.html

https://medium.com/devops-dudes/prometheus-alerting-with-alertmanager-e1bbba8e6a8e

 

相關文章