KAFKA監控一條龍:史上最強Kafka看板+監控配置與告警規則

StarsL發表於2024-11-03

使用kafka_exporter監控多kafka

kafka_exporter專案地址:https://github.com/danielqsj/kafka_exporter

  • 使用docker-compose部署多個kafka_exporter,每個exporter對接一個kafka。
  • 注意:配置上每個kafka broker的地址,kafka3需要指定版本。
version: '3.1'
services:
  kafka-exporter-opslogs:
    image: bitnami/kafka-exporter:latest
    command:
      - '--kafka.server=10.2.19.43:9092'
      - '--kafka.server=10.2.24.62:9092'
      - '--kafka.server=10.5.98.190:9092'
      - '--kafka.version=3.2.1'
    restart: always
    ports:
      - 9310:9308

  kafka-exporter-prod:
    image: bitnami/kafka-exporter:latest
    command:
      - '--kafka.server=192.168.53.99:9092'
      - '--kafka.server=192.168.53.53:9092'
      - '--kafka.server=192.168.53.96:9092'
    restart: always
    ports:
      - 9311:9308

Promethus配置job接入kafka-exporter

  • 注意:每個kafka-exporter必須增加name標籤,看板需要使用這個標籤。
  - job_name: 'kafka-exporter'
    metrics_path: /metrics
    scrape_interval: 15s
    scrape_timeout: 10s
    static_configs:
    - targets:
      - 10.0.0.26:9310
      labels:
        name: kafka-opslogs
    - targets:
      - 10.0.0.26:9311
      labels:
        name: kafka-prod

KAFKA Grafana Dashboard

【中文版本】2024.05.16更新,基於Prometheus的kafka_exporter,KAFKA資源展示、問題排查、快速積壓分析!
  • 看板的所有Panel支援最新樣式,最佳化展示效能,已相容Grafana10.X版本.
  • 包括KAFKA整體的資源狀態,
  • 生產者與消費者關係
  • 訊息積壓的明細資訊
  • 生產與消費的速率
  • 異常的消費與Topic展示
  • 分割槽級別的積壓與消費明細

截圖

  • 全域性資訊、消費者與Topic、異常與積壓分析
  • 分割槽維度明細

看板下載

  • Grafana看板ID:21078
  • Grafana看板地址:https://grafana.com/grafana/dashboards/21078
  • 專案倉庫:https://github.com/starsliao/Prometheus/kafka

Prometheus告警規則

- name: kafka
  rules:
  - alert: KAFKA_brokers異常
    expr: kafka_broker_info != 1
    for: 2m
    labels:
      severity: critical
    annotations:
      description: "{{ $labels.name }}當前brokers異常:{{ $labels.address }}"

  - alert: 電商生產KAFKA訊息整體積壓
    expr: sum(kafka_consumergroup_lag_sum{job="kafka-exporter"}) by (name,consumergroup, topic)>5000
    for: 2m
    labels:
      severity: critical
    annotations:
      description: "【環境】{{ $labels.name }}\n【消費組】{{ $labels.consumergroup }}\n【topic】{{ $labels.topic }}【積壓】:{{ $value | printf \"%.2f\" }}"

  - alert: 電商生產KAFKA訊息分割槽積壓
    expr: (sum(kafka_consumergroup_lag{job="kafka-exporter"}) by (name,consumergroup, topic, partition)>1500) AND ON() (hour()+8)%24 >= 7 <= 21
    for: 3m
    labels:
      severity: critical
    annotations:
      description: "【環境】{{ $labels.name }}\n【消費組】{{ $labels.consumergroup }}\n【topic】{{$labels.topic}}【分割槽】{{ $labels.partition }}【積壓】:{{ $value | printf \"%.2f\" }}"

  - alert: 電商生產KAFKA分割槽數過多
    expr: sum by(name)(kafka_topic_partitions{job="kafka-exporter",topic !~"__.*"})>1500
    for: 2m
    labels:
      severity: critical
    annotations:
      description: "{{ $labels.name }}當前分割槽數:{{ $value | printf \"%.2f\" }}"

  - alert: 電商生產KAFKA_brokers丟失
    expr: kafka_brokers{job="kafka-exporter"} < 3
    for: 2m
    labels:
      severity: critical
    annotations:
      description: "{{ $labels.name }}當前brokers數:{{ $value | printf \"%.2f\" }}"

  - alert: 電商生產KAFKA_TopicsReplicas
    expr: sum(kafka_topic_partition_in_sync_replica{job="kafka-exporter"}) by (name,topic) <1
    for: 2m
    labels:
      severity: critical
    annotations:
      description: "{{ $labels.name }} Kafka topic in-sync partition:{{ $value | printf \"%.2f\" }}"

相關文章