用 logstash 從 kafka 讀取資料寫入 Elasticsearch(qbit)

qbit發表於2022-02-03

技術棧

OS: Ubuntu 20.04 LTS
docker: 20.10.12
docker-compose: 1.25.0
Elasticsearch: 7.16.3
Logstash: 7.16.3
kafka: 2.13-2.8.1
Python: 3.8.2
kafka-python: 2.0.2

用 docker 搭建 logstash

官方文件

配置步驟

  • 拉取映象
docker pull docker.elastic.co/logstash/logstash:7.16.3
  • logstash 配置檔案 /home/qbit/logstash/settings/logstash.yml
http.host: "0.0.0.0"
xpack.monitoring.elasticsearch.hosts: [ "http://192.168.1.46:9200" ]
  • 管道配置檔案 /home/qbit/logstash/pipeline/:/usr/share/logstash/pipeline/es-pipeline.conf
input {
    kafka {
        codec => json
        bootstrap_servers => "192.168.1.46:9092"
        topics => ["coder_topic"]
    }
}

filter {
    mutate {
        add_field => { "timestamp" => "%{@timestamp}" }
        remove_field => ["@version"]
    }
    date {
        match => [ "timestamp", "ISO8601" ]     # 這裡用 @timestamp 解析會出錯
        target => "time0"
    }
    ruby {
        code => "
            time1 = event.get('@timestamp').time.getlocal('+08:00').strftime('%Y-%m-%dT%H:%M:%S+08')
            time2 = Time.parse(event.get('timestamp')).getlocal('+08:00').strftime('%Y-%m-%dT%H:%M:%S+08')
            time3 = Time.now.getlocal('+08:00').strftime('%Y-%m-%dT%H:%M:%S+08')
            event.set('time1', time1)
            event.set('time2', time2)
            event.set('time3', time3)
        "
    }
}

output {
    stdout {
        codec => json_lines
    }
    elasticsearch {
        hosts => ["192.168.1.46:9200"]
        index => "coder_index"
        document_id => "%{id}"
    }
}
  • 建立容器
docker run --rm -it --name logstash \
-v /home/qbit/logstash/pipeline/:/usr/share/logstash/pipeline/ \
-v /home/qbit/logstash/settings/logstash.yml:/usr/share/logstash/config/logstash.yml \
docker.elastic.co/logstash/logstash:7.16.3

用 Python 傳送訊息

  • producer.py
# encoding: utf-8
# author: qbit
# date: 2022-01-28
# summary: 向 kafka 傳送訊息

import json
from kafka import KafkaProducer

def producer():
    producer = KafkaProducer(
        bootstrap_servers="192.168.1.46:9092",
        key_serializer=lambda k: json.dumps(k).encode('utf8'),
        value_serializer=lambda v: json.dumps(v).encode('utf8'),
    )
    id = 'qbit'
    dic = {'id': f"{id}", 'age': '23'}
    producer.send(topic="coder_topic", key=id, value=dic)
    print(f"send key: {id}, value: {dic}")

if __name__ == "__main__":
    producer()
  • 執行結果
# python3 producer.py
send key: qbit, value: {'id': 'qbit', 'age': '23'}

用 Kibana 檢視 ES 中資料

GET coder_index/_search
{
    "_index": "coder_index",
    "_type": "_doc",
    "_id": "qbit",
    "_score": 1.0,
    "_source": {
        "id": "qbit",
        "age": "23",
        "@timestamp": "2022-01-28T01:03:40.733Z",   // logstash event 時間戳
        "timestamp":  "2022-01-28T01:03:40.733Z",
        "time0":      "2022-01-28T01:03:40.733Z",
        "time1":      "2022-01-28T09:03:40+08",
        "time2":      "2022-01-28T09:03:40+08",
        "time3":      "2022-01-28T09:03:40+08"      // filter 中 ruby 程式碼生成的時間戳
    }
}

將訊息寫入 AWS S3

  • 管道輸出配置
output {
    stdout {
        codec => json_lines
    }
    elasticsearch {
        hosts => ["192.168.1.46:9200"]
        index => "coder_index"
        document_id => "%{id}"
    }
    s3 {
        id => "kafka_logstash_s3"
        access_key_id => "your_access_key_id"
        secret_access_key => "your_secret_access_key"
        region => "cn-northwest-1"
        bucket => "my_bucket"
        prefix => "logstash/%{+YYYY-MM-dd}"
        time_file => 1                      # 單位 minutes
        codec => "json_lines"
    }
}
  • 輸出檔名格式,注意檔名中時間的時區是 UTC
logstash/2022-01-28/ls.s3.9be50c52-8f29-437c-84c5-76911ca4d9c5.2022-01-28T08.21.part0.txt
logstash/2022-01-28/ls.s3.ef85ef47-8caf-43a8-b720-0abe9fc6a5ae.2022-01-28T08.22.part1.txt
本文出自 qbit snap

相關文章