Flume + Kafka + SparkStreaming分析

weixin_34279579發表於2017-06-04

1 flume安裝

首先我們設定資料來源為埠資料,然後資料傳送到hdfs和kafka的cmcc topic中,其中flume的配置檔案為:

a1.sources = r1  
a1.sinks = k1 k2 
a1.channels = c1 c2  
  
# Describe/configure the source  
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
  
# Describe the sink  
#a1.sinks.k1.type = logger  
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink  
a1.sinks.k1.topic = cmcc  
a1.sinks.k1.brokerList = hostname:9092  
a1.sinks.k1.requiredAcks = 1  
a1.sinks.k1.batchSize = 20  
  
# Use a channel which buffers events in memory  
a1.channels.c1.type = memory  
a1.channels.c1.capacity = 1000  
a1.channels.c1.transactionCapacity = 100  

a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000
a1.channels.c2.transactionCapacity = 100

a1.sinks.k2.type = hdfs
a1.sinks.k2.hdfs.useLocalTimeStamp = true
a1.sinks.k2.hdfs.path = hdfs://hostname:9000/flume/events/%Y/%m/%d/%H/%M
a1.sinks.k2.hdfs.filePrefix = cmcc
a1.sinks.k2.hdfs.minBlockREplicas = 1
a1.sinks.k2.hdfs.fileType = DataStream
a1.sinks.k2.hdfs.writeFormat = Text
a1.sinks.k2.hdfs.rollInterval = 60
a1.sinks.k2.hdfs.rollSize = 0
# Bind the source and sink to the channel  
a1.sources.r1.channels = c1 c2  
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2

a1.sources.r1.channels = c1 c2 說明我的資料來源是同一份,但是分發到不同通道上。

2,安裝kafka

3,測試flume傳送資料,hdfs和kafka是否能夠收到

flume執行如下命令:

bin/flume-ng agent --conf ./conf/ -f conf/flume-conf.properties -n a1

然後在linux中執行telnet localhost 44444,進行傳送資料:

2778947-958f062e8bfd47ae.png
Paste_Image.png

然後檢查HDFS:

2778947-c02e3b14b43554fe.png
Paste_Image.png

檢查Kafka資料

2778947-e27da00e1d7d91e5.png
Paste_Image.png

相關文章