Flume將 kafka 中的資料轉存到 HDFS 中

XIAO_WS發表於2018-12-19

flume1.8 kafka Channel + HDFS sink(without sources)

將 kafka 中的資料轉存到 HDFS 中, 用作離線計算, flume 已經幫我們實現了, 新增配置檔案, 直接啟動 flume-ng 即可.

The Kafka channel can be used for multiple scenarios:

  1. With Flume source and sink - it provides a reliable and highly available channel for events
  2. With Flume source and interceptor but no sink - it allows writing Flume events into a Kafka topic, for use by other apps
  3. With Flume sink, but no source - it is a low-latency, fault tolerant way to send events from Kafka to Flume sinks such as HDFS, HBase or Solr
  • $FLUME_HOME/conf/kafka-hdfs.conf
# kafka Channel + HDFS sink(without sources)
a1.channels = c1
a1.sinks = k1

# 定義 KafkaChannel
a1.channels.c1.type = org.apache.flume.channel.kafka.KafkaChannel
a1.channels.c1.parseAsFlumeEvent = false
a1.channels.c1.kafka.bootstrap.servers = kafka-1:9092,kafka-2:9092,kafka-3:9092
a1.channels.c1.kafka.topic = user
a1.channels.c1.kafka.consumer.group.id = g1

# 定義 HDFS sink
a1.sinks.k1.channel = c1
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://hadoop-1:9000/flume/%Y%m%d/%H
a1.sinks.k1.hdfs.useLocalTimeStamp = true
a1.sinks.k1.hdfs.filePrefix = log
a1.sinks.k1.hdfs.fileType = DataStream
# 不按照條數生成檔案
a1.sinks.k1.hdfs.rollCount = 0
# HDFS 上的檔案達到128M 生成一個檔案
a1.sinks.k1.hdfs.rollSize = 134217728
# HDFS 上的檔案達到10分鐘生成一個檔案
a1.sinks.k1.hdfs.rollInterval = 600
複製程式碼

記得配 hosts

  • 新增 HDFS 相關jar包和配置檔案
commons-configuration-1.6.jar
commons-io-2.4.jar
hadoop-auth-2.8.3.jar
hadoop-common-2.8.3.jar
hadoop-hdfs-2.8.3.jar
hadoop-hdfs-client-2.8.3.jar
htrace-core4-4.0.1-incubating.jar
core-site.xml
hdfs-site.xml
複製程式碼
  • flume-1.8 kafka客戶端預設版本0.9 但是向上相容(別用這個 有巨坑 ~_~#) kafka-clients-2.0.0.jar kafka_2.11-2.0.0.jar

  • 先啟動 zookeeper kafka 和 HDFS(否則會各種報錯,)

  • 進入$FLUME_HOME啟動 flume root@common:/usr/local/flume# ./bin/flume-ng agent -c conf/ -f conf/kafka-hdfs.conf -n a1 -Dflume.root.logger=INFO,console

相關文章