一、黑盒監控
"白盒監控"--需要把對應的Exporter程式安裝到被監控的目標主機上,從而實現對主機各種資源及其狀態的資料採集工作。
但是由於某些情況下操作技術或其他原因,不是所有的Exporter都能部署到被監控的主機環境中,最典型的例子是監控全國網路質量的穩定性,通常的方法是使用ping操作,對選取的節點進行ICMP測試,此時不可能在他人應用環境中部署相關的Exporter程式。針對這樣的應用的場景,Prometheus社群提供了黑盒解決方案,Blackbox Exporter無須安裝在被監控的目標環境中,使用者只需要將其安裝在與Prometheus和被監控目標互通的環境中,透過HTTP、HTTPS、DNS、TCP、ICMP等方式對網路進行探測監控,還可以探測SSL證書過期時間。
blackbox_exporter:
- Prometheus 官方提供的 exporter 之一,可以提供 http、dns、tcp、icmp 的監控資料採集
二、安裝方法
2.1 二進位制安裝(二選一)
https://prometheus.io/download/#blackbox_exporter
wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.23.0/blackbox_exporter-0.23.0.linux-amd64.tar.gz tar zxvf blackbox_exporter-0.23.0.linux-amd64.tar.gz mkdir /opt/prometheus -p mv blackbox_exporter-0.23.0.linux-amd64 /opt/prometheus/blackbox_exporter # 建立使用者 useradd -M -s /usr/sbin/nologin prometheus # 修改資料夾許可權 chown prometheus:prometheus -R /opt/prometheus # 建立systemd服務 cat <<"EOF" >/etc/systemd/system/blackbox_exporter.service [Unit] Description=blackbox_exporter After=network.target [Service] Type=simple User=prometheus Group=prometheus ExecStart=/opt/prometheus/blackbox_exporter/blackbox_exporter \ --config.file "/opt/prometheus/blackbox_exporter/blackbox.yml" \ --web.listen-address ":9115" Restart=on-failure [Install] WantedBy=multi-user.target EOF # 啟動 systemctl daemon-reload systemctl start blackbox_exporter systemctl enable blackbox_exporter
2.2 docker安裝(二選一)
建立配置檔案,config.yml中監控方式用不到的可以刪除,例如pop3、ssh之類
mkdir /data/blackbox_exporter/ cat >/data/blackbox_exporter/config.yml<<"EOF" modules: http_2xx: prober: http http: method: GET http_post_2xx: prober: http http: method: POST tcp_connect: prober: tcp pop3s_banner: prober: tcp tcp: query_response: - expect: "^+OK" tls: true tls_config: insecure_skip_verify: false grpc: prober: grpc grpc: tls: true preferred_ip_protocol: "ip4" grpc_plain: prober: grpc grpc: tls: false service: "service1" ssh_banner: prober: tcp tcp: query_response: - expect: "^SSH-2.0-" - send: "SSH-2.0-blackbox-ssh-check" irc_banner: prober: tcp tcp: query_response: - send: "NICK prober" - send: "USER prober prober prober :prober" - expect: "PING :([^ ]+)" send: "PONG ${1}" - expect: "^:[^ ]+ 001" icmp: prober: icmp icmp_ttl5: prober: icmp timeout: 5s icmp: ttl: 5 EOF
刪除不需要的可以留下:
cat config.yml modules: http_2xx: prober: http http: method: GET http_post_2xx: prober: http http: method: POST tcp_connect: prober: tcp icmp: prober: icmp
cf代理狀態碼非200
官網案例
http_2xx: prober: http timeout: 5s http: method: GET preferred_ip_protocol: "ip4"
注意:使用preferred_ip_protocol: "ip4" 可以檢測cf代理目標在 cloudflare 後面,狀態碼非200
2.2.1 docker直接執行
sudo docker run -d --restart=always --name blackbox-exporter -p 9115:9115 -v /data/blackbox_exporter:/etc/blackbox_exporter prom/blackbox-exporter:v0.19.0 --config.file=/etc/blackbox_exporter/config.yml
2.2.2 docker-compose執行
cd /data/blackbox_exporter/ cat >docker-compose.yaml <<"EOF" version: '3.3' services: blackbox_exporter: image: prom/blackbox-exporter container_name: blackbox_exporter restart: always volumes: - /data/blackbox_exporter:/etc/blackbox_exporter ports: - 9115:9115 EOF
啟動:docker-compose up -d
檢視狀態:http://192.168.10.100:9115/
3. Prometheus配置
配置prometheus去採集(拉取)blackbox_exporter的監控樣本資料
cd /data/docker-prometheus cat >> prometheus/prometheus.yml <<"EOF" #http配置 - job_name: "blackbox_http" metrics_path: /probe params: module: [http_2xx] static_configs: - targets: - https://www.baidu.com - https://www.jd.com relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: 192.168.10.100:9115 #tcp檢查配置 - job_name: "blackbox_tcp" metrics_path: /probe params: module: [tcp_connect] static_configs: - targets: - 192.168.10.14:22 - 192.168.10.14:9090 relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: 192.168.10.100:9115 #icmp檢查配置 ping - job_name: "blackbox_icmp" metrics_path: /probe params: module: [icmp] static_configs: - targets: - 192.168.10.14 - 192.168.10.100 relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: 192.168.10.100:9115 EOF
重新載入配置:curl -X POST http://localhost:9090/-/reload
檢查:
http://192.168.10.14:9090/targets?search=
3.1 監控項
probe_
probe_success # 是否探測成功(取值 1、0 分別表示成功、失敗)
probe_duration_seconds # 探測的耗時
# 關於 DNS
probe_dns_lookup_time_seconds # DNS 解析的耗時
probe_ip_protocol # IP 協議,取值為 4、6
probe_ip_addr_hash # IP 地址的雜湊值,用於判斷 IP 是否變化
# 關於 HTTP
probe_http_status_code # HTTP 響應的狀態碼。如果發生重定向,則取決於最後一次響應
probe_http_content_length # HTTP 響應的 body 長度,單位 bytes
probe_http_version # HTTP 響應的協議版本,比如 1.1
probe_http_ssl # HTTP 響應是否採用 SSL ,取值為 1、0
probe_ssl_earliest_cert_expiry # SSL 證書的過期時間,為 Unix 時間戳
3.2 觸發器配置
新增blackbox_exporter觸發器告警規則
cat >> prometheus/rules/blackbox_exporter.yml <<"EOF" groups: - name: Blackbox rules: - alert: 黑盒子探測失敗告警 expr: probe_success == 0 for: 1m labels: severity: critical annotations: summary: "黑盒子探測失敗{{ $labels.instance }}" description: "黑盒子檢測失敗,當前值:{{ $value }}" - alert: 請求慢告警 expr: avg_over_time(probe_duration_seconds[1m]) > 1 for: 1m labels: severity: warning annotations: summary: "請求慢{{ $labels.instance }}" description: "請求時間超過1秒,值為:{{ $value }}" - alert: http狀態碼檢測失敗 expr: probe_http_status_code <= 199 OR probe_http_status_code >= 400 for: 1m labels: severity: critical annotations: summary: "http狀態碼檢測失敗{{ $labels.instance }}" description: "HTTP狀態碼非 200-399,當前狀態碼為:{{ $value }}" - alert: ssl證書即將到期 expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 30 for: 1m labels: severity: warning annotations: summary: "證書即將到期{{ $labels.instance }}" description: "SSL 證書在 30 天后到期,值:{{ $value }}" - alert: ssl證書即將到期 expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 3 for: 1m labels: severity: critical annotations: summary: "證書即將到期{{ $labels.instance }}" description: "SSL 證書在 3 天后到期,值:{{ $value }}" - alert: ssl證書已過期 expr: probe_ssl_earliest_cert_expiry - time() <= 0 for: 1m labels: severity: critical annotations: summary: "證書已過期{{ $labels.instance }}" description: "SSL 證書已經過期,請確認是否在使用" EOF
檢查配置並載入:
docker exec -it prometheus promtool check config /etc/prometheus/prometheus.yml curl -X POST http://localhost:9090/-/reload
http://192.168.10.14:9090/rules
http://192.168.10.14:9090/alerts?search=
4.grafana dashboard圖形化展示
https://grafana.com/grafana/dashboards/13659-blackbox-exporter-http-prober/
https://grafana.com/grafana/dashboards/9965
檢測總耗時這個圖行點編輯---找到Options
--把Legend裡面的值從{{env}}_{{name}}
修改為{{instance}}