prometheus + alertmanager 搭建告警通知

BUG弄潮儿發表於2024-10-19

prometheus

下載prometheus-2.53.2

prometheus.yml檔案修改

global:
  scrape_interval: 15s
  evaluation_interval: 15s

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - 127.0.0.1:9093

rule_files:
  - "rules/rule-*.yml"

scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090", "127.0.0.1:9104"]

其中127.0.0.1:9104是mysqld_exporter的metric地址

新建rules目錄,並建立規則 rule-first.yml

groups:
  - name: InstanceDown_Rule
    rules:
      - alert: InstanceDown  # 告警名稱
        expr: up == 0        # 告警條件
        for: 30s              # 告警觸發前需要持續滿足條件的時間
        labels:
          severity: critical # 告警的嚴重程度
        annotations:
          summary: "Instance {{ $labels.instance }} down"
          description: "Instance {{ $labels.instance }} has been down for more than 5 minutes."

啟動Prometheus

prometheus --config.file=prometheus.yml --storage.tsdb.path=./data --web.enable-lifecycle

訪問Prometheus

http://127.0.0.1:9090/

alertmanager

下載alertmanager-0.27.0

修改配置檔案alertmanager.yml

route:
  group_by: ['instance']
  group_wait: 10s
  group_interval: 20s
  #repeat_interval: 1h
  receiver: 'web.hook'
receivers:
  - name: 'web.hook'
    webhook_configs:
      - url: 'http://127.0.0.1:5001/alert/hook'
        send_resolved: true
#inhibit_rules:
#  - source_match:
#      severity: 'critical'
#    target_match:
#      severity: 'warning'
#    equal: ['instance']

其中 http://127.0.0.1:5001/alert/hook 是接收告警的鉤子介面

啟動alertmanager

alertmanager --config.file=alertmanager.yml

訪問alertmanager

http://127.0.0.1:9093/

grafana

下載grafana-11.2.1

啟動grafana

grafana-server

訪問grafana

http://127.0.0.1:3000

mysqld_exporter

下載 mysqld_exporter-0.15.1

在mysqld_exporter根目錄建立.my.cnf檔案

[client]
user=root
password=root

user和password分別是MySQL資料庫的使用者和密碼;mysqld_exporter需要安裝與mysql_server同一個伺服器上。

啟動mysqld_exporter

mysqld_exporter

訪問mysqld_exporter

http://127.0.0.1:9104/

編寫hook

@PostMapping("/alert/hook")
public Response<String> alertHook(@RequestBody Map<String, Object> alertDataMap) {
	//TODO 在這裡實現告警處理,發微信、郵件、釘釘都可以
	System.out.println(JSON.toJSONString(alertDataMap));
	return Response.ok("success");
}

以上使用Java編寫介面,根據實際可以使用任何一種語言編寫。

可以關閉mysqld_exporter,模擬服務當機。/alert/hook介面收到的資料如下:

{
	"receiver": "web\\.hook",
	"status": "firing",
	"alerts": [{
		"status": "firing",
		"labels": {
			"alertname": "InstanceDown",
			"instance": "127.0.0.1:9104",
			"job": "prometheus",
			"severity": "critical"
		},
		"annotations": {
			"description": "Instance 127.0.0.1:9104 has been down for more than 5 minutes.",
			"summary": "Instance 127.0.0.1:9104 down"
		},
		"startsAt": "2024-10-10T11:27:58.11Z",
		"endsAt": "0001-01-01T00:00:00Z",
		"generatorURL": "http://olive-my:9090/graph?g0.expr=up+%3D%3D+0&g0.tab=1",
		"fingerprint": "106b3a6075af7628"
	}],
	"groupLabels": {
		"instance": "127.0.0.1:9104"
	},
	"commonLabels": {
		"alertname": "InstanceDown",
		"instance": "127.0.0.1:9104",
		"job": "prometheus",
		"severity": "critical"
	},
	"commonAnnotations": {
		"description": "Instance 127.0.0.1:9104 has been down for more than 5 minutes.",
		"summary": "Instance 127.0.0.1:9104 down"
	},
	"externalURL": "http://olive-my:9093",
	"version": "4",
	"groupKey": "{}:{instance=\"127.0.0.1:9104\"}",
	"truncatedAlerts": 0
}

相關文章