11.prometheus監控之黑盒(blackbox)監控

杨梅冲發表於2024-04-24

一、黑盒監控

"白盒監控"--需要把對應的Exporter程式安裝到被監控的目標主機上,從而實現對主機各種資源及其狀態的資料採集工作。

但是由於某些情況下操作技術或其他原因,不是所有的Exporter都能部署到被監控的主機環境中,最典型的例子是監控全國網路質量的穩定性,通常的方法是使用ping操作,對選取的節點進行ICMP測試,此時不可能在他人應用環境中部署相關的Exporter程式。針對這樣的應用的場景,Prometheus社群提供了黑盒解決方案,Blackbox Exporter無須安裝在被監控的目標環境中,使用者只需要將其安裝在與Prometheus和被監控目標互通的環境中,透過HTTP、HTTPS、DNS、TCP、ICMP等方式對網路進行探測監控,還可以探測SSL證書過期時間。

blackbox_exporter:

  • Prometheus 官方提供的 exporter 之一,可以提供 http、dns、tcp、icmp 的監控資料採集

二、安裝方法

2.1 二進位制安裝(二選一)

https://prometheus.io/download/#blackbox_exporter

wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.23.0/blackbox_exporter-0.23.0.linux-amd64.tar.gz

tar zxvf blackbox_exporter-0.23.0.linux-amd64.tar.gz 
mkdir /opt/prometheus -p
mv blackbox_exporter-0.23.0.linux-amd64 /opt/prometheus/blackbox_exporter

# 建立使用者
useradd -M -s /usr/sbin/nologin prometheus
# 修改資料夾許可權
chown prometheus:prometheus -R /opt/prometheus

# 建立systemd服務
cat <<"EOF" >/etc/systemd/system/blackbox_exporter.service
[Unit]
Description=blackbox_exporter
After=network.target

[Service]
Type=simple
User=prometheus
Group=prometheus
ExecStart=/opt/prometheus/blackbox_exporter/blackbox_exporter \
          --config.file "/opt/prometheus/blackbox_exporter/blackbox.yml" \
          --web.listen-address ":9115"
Restart=on-failure

[Install]
WantedBy=multi-user.target
EOF

# 啟動
systemctl daemon-reload
systemctl start blackbox_exporter
systemctl enable blackbox_exporter

2.2 docker安裝(二選一)

建立配置檔案,config.yml中監控方式用不到的可以刪除,例如pop3、ssh之類

mkdir /data/blackbox_exporter/

cat >/data/blackbox_exporter/config.yml<<"EOF"
modules:
  http_2xx:
    prober: http
    http:
      method: GET
  http_post_2xx:
    prober: http
    http:
      method: POST
  tcp_connect:
    prober: tcp
  pop3s_banner:
    prober: tcp
    tcp:
      query_response:
      - expect: "^+OK"
      tls: true
      tls_config:
        insecure_skip_verify: false
  grpc:
    prober: grpc
    grpc:
      tls: true
      preferred_ip_protocol: "ip4"
  grpc_plain:
    prober: grpc
    grpc:
      tls: false
      service: "service1"
  ssh_banner:
    prober: tcp
    tcp:
      query_response:
      - expect: "^SSH-2.0-"
      - send: "SSH-2.0-blackbox-ssh-check"
  irc_banner:
    prober: tcp
    tcp:
      query_response:
      - send: "NICK prober"
      - send: "USER prober prober prober :prober"
      - expect: "PING :([^ ]+)"
        send: "PONG ${1}"
      - expect: "^:[^ ]+ 001"
  icmp:
    prober: icmp
  icmp_ttl5:
    prober: icmp
    timeout: 5s
    icmp:
      ttl: 5
EOF

刪除不需要的可以留下:

cat config.yml
modules:
  http_2xx:
    prober: http
    http:
      method: GET
  http_post_2xx:
    prober: http
    http:
      method: POST
  tcp_connect:
    prober: tcp
  icmp:
    prober: icmp

cf代理狀態碼非200

官網案例

  http_2xx:
    prober: http
    timeout: 5s
    http:
      method: GET
      preferred_ip_protocol: "ip4"

注意:使用preferred_ip_protocol: "ip4" 可以檢測cf代理目標在 cloudflare 後面,狀態碼非200

2.2.1 docker直接執行
sudo docker run -d --restart=always --name blackbox-exporter -p 9115:9115  -v /data/blackbox_exporter:/etc/blackbox_exporter prom/blackbox-exporter:v0.19.0 --config.file=/etc/blackbox_exporter/config.yml
2.2.2 docker-compose執行
cd /data/blackbox_exporter/

cat >docker-compose.yaml <<"EOF"
version: '3.3'
services:
  blackbox_exporter:
    image: prom/blackbox-exporter
    container_name: blackbox_exporter
    restart: always
    volumes:
    - /data/blackbox_exporter:/etc/blackbox_exporter
    ports:
    - 9115:9115
EOF

啟動:docker-compose up -d

檢視狀態:http://192.168.10.100:9115/

3. Prometheus配置

配置prometheus去採集(拉取)blackbox_exporter的監控樣本資料

cd /data/docker-prometheus 

cat >> prometheus/prometheus.yml <<"EOF"

#http配置
  - job_name: "blackbox_http"
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets:
        - https://www.baidu.com
        - https://www.jd.com
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 192.168.10.100:9115

#tcp檢查配置
  - job_name: "blackbox_tcp"
    metrics_path: /probe
    params:
      module: [tcp_connect]
    static_configs:
      - targets: 
        - 192.168.10.14:22
        - 192.168.10.14:9090
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 192.168.10.100:9115

#icmp檢查配置 ping
  - job_name: "blackbox_icmp"
    metrics_path: /probe
    params:
      module: [icmp]
    static_configs:
      - targets: 
        - 192.168.10.14
        - 192.168.10.100
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 192.168.10.100:9115
EOF

重新載入配置:curl -X POST http://localhost:9090/-/reload

檢查:

http://192.168.10.14:9090/targets?search=

3.1 監控項

probe_


probe_success                   # 是否探測成功(取值 1、0 分別表示成功、失敗)
probe_duration_seconds          # 探測的耗時

# 關於 DNS
probe_dns_lookup_time_seconds   # DNS 解析的耗時
probe_ip_protocol               # IP 協議,取值為 4、6
probe_ip_addr_hash              # IP 地址的雜湊值,用於判斷 IP 是否變化

# 關於 HTTP
probe_http_status_code          # HTTP 響應的狀態碼。如果發生重定向,則取決於最後一次響應
probe_http_content_length       # HTTP 響應的 body 長度,單位 bytes
probe_http_version              # HTTP 響應的協議版本,比如 1.1
probe_http_ssl                  # HTTP 響應是否採用 SSL ,取值為 1、0
probe_ssl_earliest_cert_expiry  # SSL 證書的過期時間,為 Unix 時間戳

3.2 觸發器配置

新增blackbox_exporter觸發器告警規則

cat >> prometheus/rules/blackbox_exporter.yml <<"EOF"
groups:
- name: Blackbox
  rules:
  - alert: 黑盒子探測失敗告警
    expr: probe_success == 0
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: "黑盒子探測失敗{{ $labels.instance }}"
      description: "黑盒子檢測失敗,當前值:{{ $value }}"
  - alert: 請求慢告警
    expr: avg_over_time(probe_duration_seconds[1m]) > 1
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: "請求慢{{ $labels.instance }}"
      description: "請求時間超過1秒,值為:{{ $value }}"
  - alert: http狀態碼檢測失敗
    expr: probe_http_status_code <= 199 OR probe_http_status_code >= 400
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: "http狀態碼檢測失敗{{ $labels.instance }}"
      description: "HTTP狀態碼非 200-399,當前狀態碼為:{{ $value }}"
  - alert: ssl證書即將到期
    expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 30
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: "證書即將到期{{ $labels.instance }}"
      description: "SSL 證書在 30 天后到期,值:{{ $value }}"

  - alert: ssl證書即將到期
    expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 3
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: "證書即將到期{{ $labels.instance }}"
      description: "SSL 證書在 3 天后到期,值:{{ $value }}"

  - alert: ssl證書已過期
    expr: probe_ssl_earliest_cert_expiry - time() <= 0
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: "證書已過期{{ $labels.instance }}"
      description: "SSL 證書已經過期,請確認是否在使用"
EOF

檢查配置並載入:

docker exec -it prometheus promtool check config /etc/prometheus/prometheus.yml

curl -X POST http://localhost:9090/-/reload

http://192.168.10.14:9090/rules

http://192.168.10.14:9090/alerts?search=

4.grafana dashboard圖形化展示

https://grafana.com/grafana/dashboards/13659-blackbox-exporter-http-prober/

https://grafana.com/grafana/dashboards/9965

檢測總耗時 和 HTTP狀態佔比 這2個圖形,顯示異常

檢測總耗時這個圖行點編輯---找到Options--把Legend裡面的值從{{env}}_{{name}}修改為{{instance}}

相關文章