prometheus
下載prometheus-2.53.2
prometheus.yml檔案修改
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets:
- 127.0.0.1:9093
rule_files:
- "rules/rule-*.yml"
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090", "127.0.0.1:9104"]
其中127.0.0.1:9104
是mysqld_exporter的metric地址
新建rules目錄,並建立規則 rule-first.yml
groups:
- name: InstanceDown_Rule
rules:
- alert: InstanceDown # 告警名稱
expr: up == 0 # 告警條件
for: 30s # 告警觸發前需要持續滿足條件的時間
labels:
severity: critical # 告警的嚴重程度
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "Instance {{ $labels.instance }} has been down for more than 5 minutes."
啟動Prometheus
prometheus --config.file=prometheus.yml --storage.tsdb.path=./data --web.enable-lifecycle
訪問Prometheus
http://127.0.0.1:9090/
alertmanager
下載alertmanager-0.27.0
修改配置檔案alertmanager.yml
route:
group_by: ['instance']
group_wait: 10s
group_interval: 20s
#repeat_interval: 1h
receiver: 'web.hook'
receivers:
- name: 'web.hook'
webhook_configs:
- url: 'http://127.0.0.1:5001/alert/hook'
send_resolved: true
#inhibit_rules:
# - source_match:
# severity: 'critical'
# target_match:
# severity: 'warning'
# equal: ['instance']
其中 http://127.0.0.1:5001/alert/hook
是接收告警的鉤子介面
啟動alertmanager
alertmanager --config.file=alertmanager.yml
訪問alertmanager
http://127.0.0.1:9093/
grafana
下載grafana-11.2.1
啟動grafana
grafana-server
訪問grafana
http://127.0.0.1:3000
mysqld_exporter
下載 mysqld_exporter-0.15.1
在mysqld_exporter根目錄建立.my.cnf檔案
[client]
user=root
password=root
user和password分別是MySQL資料庫的使用者和密碼;mysqld_exporter需要安裝與mysql_server同一個伺服器上。
啟動mysqld_exporter
mysqld_exporter
訪問mysqld_exporter
http://127.0.0.1:9104/
編寫hook
@PostMapping("/alert/hook")
public Response<String> alertHook(@RequestBody Map<String, Object> alertDataMap) {
//TODO 在這裡實現告警處理,發微信、郵件、釘釘都可以
System.out.println(JSON.toJSONString(alertDataMap));
return Response.ok("success");
}
以上使用Java編寫介面,根據實際可以使用任何一種語言編寫。
可以關閉mysqld_exporter,模擬服務當機。/alert/hook
介面收到的資料如下:
{
"receiver": "web\\.hook",
"status": "firing",
"alerts": [{
"status": "firing",
"labels": {
"alertname": "InstanceDown",
"instance": "127.0.0.1:9104",
"job": "prometheus",
"severity": "critical"
},
"annotations": {
"description": "Instance 127.0.0.1:9104 has been down for more than 5 minutes.",
"summary": "Instance 127.0.0.1:9104 down"
},
"startsAt": "2024-10-10T11:27:58.11Z",
"endsAt": "0001-01-01T00:00:00Z",
"generatorURL": "http://olive-my:9090/graph?g0.expr=up+%3D%3D+0&g0.tab=1",
"fingerprint": "106b3a6075af7628"
}],
"groupLabels": {
"instance": "127.0.0.1:9104"
},
"commonLabels": {
"alertname": "InstanceDown",
"instance": "127.0.0.1:9104",
"job": "prometheus",
"severity": "critical"
},
"commonAnnotations": {
"description": "Instance 127.0.0.1:9104 has been down for more than 5 minutes.",
"summary": "Instance 127.0.0.1:9104 down"
},
"externalURL": "http://olive-my:9093",
"version": "4",
"groupKey": "{}:{instance=\"127.0.0.1:9104\"}",
"truncatedAlerts": 0
}