前言
前面一篇部落格,我已經介紹了prometheus如何監控mysql。
這一篇我來介紹如何通過alertmanger進行告警郵件傳送(微信或釘釘類似,因為需要企業帳戶,我就不試了),以及如何通過grafana檢視告警。
開始演示
測試機器
Prometheus: 192.168.56.140
Host01:192.168.56.103
安裝alertmanager
獲取安裝包
wget https://github.com/prometheus/alertmanager/releases/download/v0.22.2/alertmanager-0.22.2.linux-amd64.tar.gz
建立目錄
mkdir -p /etc/alertmanager/
mkdir -p /etc/alertmanager/data
mkdir -p /etc/alertmanager/template/
獲取郵件模板
[root@prometheus-server template]# pwd
/etc/alertmanager/template
[root@prometheus-servertemplate]# wget https://raw.githubusercontent.com/prometheus/alertmanager/master/template/default.tmpl
複製檔案到/etc/alertmanager目錄
[root@prometheus-server ftpusr]cp ./alertmanager-0.22.2.linux-amd64/alertmanager* /etc/alertmanager/.
配置啟動服務
[root@prometheus-server alertmanager]# cat /etc/systemd/system/alertmanager.service [Unit] Description=Alertmanager After=network.target [Service] Type=simple User=prometheus ExecStart=/etc/alertmanager/alertmanager \ --config.file=/etc/alertmanager/alertmanager.yml \ --storage.path=/etc/alertmanager/data Restart=on-failure [Install] WantedBy=multi-user.target
配置alertmanager郵件傳送
如下我使用的是163郵箱來傳送郵件。
如需使用SMTP服務,需要先開啟服務。開啟後,增加授權碼,如下配置檔案裡面的smtp_auth_password填寫的是授權碼(而不是個人郵箱密碼)
[root@prometheus-server alertmanager]# cat alertmanager.yml
global: smtp_smarthost: 'smtp.163.com:25' smtp_from: 'xxxx@163.com' smtp_auth_username: 'xxxx@163.com' smtp_auth_password: 'xxxxxxxxxxx' smtp_require_tls: false templates: - '/etc/alertmanager/template/*.tmpl' route: group_by: ['alertname','cluster','service'] group_wait: 10s group_interval: 10s repeat_interval: 10m receiver: 'default-receiver' receivers: - name: 'default-receiver' email_configs: - to: '20889922@qq.com' html: '{{ template "email.default.html" . }}' headers: { Subject: "Prometheus 告警測試郵件" }
啟動服務
service alertmanager start
prometheus配置alertmanager
prometheus.yml配置
# Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: ["localhost:9093"] # - alertmanager:9093 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: - "rules.yml" # - "first_rules.yml" # - "second_rules.yml"
rules.yml配置
[root@prometheus-server prometheus]# cat rules.yml # hostStatsAlert groups: - name: hostStatsAlert rules: - alert: NodeDown expr: up == 0 for: 1m labels: severity: "Critical" annotations: summary: "Instance {{$labels.instance}} down" description: "{{$labels.instance}} of job {{$labels.job}} has been down for more than 5 minutes." - alert: NodeCPUUsage expr: sum(avg without (cpu)(irate(node_cpu_seconds_total{mode!='idle'}[5m]))) by (instance) > 0.85 for: 1m labels: severity: "Warning" annotations: summary: "Instance {{ $labels.instance }} CPU usgae high" description: "{{ $labels.instance }} CPU usage above 85% (current value: {{ $value }})" - alert: NodeMemoryUsage expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes)/node_memory_MemTotal_bytes > 0.85 for: 1m labels: severity: "Warning" annotations: summary: "Instance {{ $labels.instance }} MEM usgae high" description: "{{ $labels.instance }} MEM usage above 85% (current value: {{ $value }})" - alert: filesystemUsageAlert expr: 100 - ((node_filesystem_avail_bytes{mountpoint="/",fstype=~"ext4|xfs"} * 100) / node_filesystem_size_bytes {mountpoint="/",fstype=~"ext4|xfs"}) > 85 for: 1m labels: severity: "Warning" annotations: summary: "Instance {{ $labels.instance }} root DISK usgae high" description: "{{ $labels.instance }} root DISK usage above 85% (current value: {{ $value }})"
重新啟動prometheus使服務生效
service prometheus restart
檢視告警郵件
等待幾分鐘後,可以看到郵件的告警資訊
登入alertmanager埠,也可檢視告警資訊
Alertmanager grafana展示
安裝
grafana-cli plugins install camptocamp-prometheus-alertmanager-datasource
安裝完後,重新啟動grafana-server
service grafana-server restart
新增alertmanager datasource
匯入dashboard
展示效果
碰到的問題與解決方法
告警展示的時候,雖然alerts有兩個告警,但downnode卻顯示沒有。
通過下載展示的JSON檔案,檢視原來是altername在告警檔案中,與JSON檔案中不匹配。匹配完成就OK了。
serverity在郵件顯示正常,但是grafana無法正常顯示。這個還沒調查清楚。
估計得需要谷歌了。但是,你能體會中國人無法上谷歌的痛苦嗎?
參考資料:
https://www.cnblogs.com/danny-djy/p/11097726.html
https://medium.com/devops-dudes/prometheus-alerting-with-alertmanager-e1bbba8e6a8e