Flume + Kafka + SparkStreaming分析
1 flume安裝
首先我們設定資料來源為埠資料,然後資料傳送到hdfs和kafka的cmcc topic中,其中flume的配置檔案為:
a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1 c2
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
# Describe the sink
#a1.sinks.k1.type = logger
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.topic = cmcc
a1.sinks.k1.brokerList = hostname:9092
a1.sinks.k1.requiredAcks = 1
a1.sinks.k1.batchSize = 20
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000
a1.channels.c2.transactionCapacity = 100
a1.sinks.k2.type = hdfs
a1.sinks.k2.hdfs.useLocalTimeStamp = true
a1.sinks.k2.hdfs.path = hdfs://hostname:9000/flume/events/%Y/%m/%d/%H/%M
a1.sinks.k2.hdfs.filePrefix = cmcc
a1.sinks.k2.hdfs.minBlockREplicas = 1
a1.sinks.k2.hdfs.fileType = DataStream
a1.sinks.k2.hdfs.writeFormat = Text
a1.sinks.k2.hdfs.rollInterval = 60
a1.sinks.k2.hdfs.rollSize = 0
# Bind the source and sink to the channel
a1.sources.r1.channels = c1 c2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2
a1.sources.r1.channels = c1 c2 說明我的資料來源是同一份,但是分發到不同通道上。
2,安裝kafka
3,測試flume傳送資料,hdfs和kafka是否能夠收到
flume執行如下命令:
bin/flume-ng agent --conf ./conf/ -f conf/flume-conf.properties -n a1
然後在linux中執行telnet localhost 44444,進行傳送資料:
然後檢查HDFS:
檢查Kafka資料
相關文章
- Flume 整合 Kafka_flume 到kafka 配置【轉】Kafka
- sparkStreaming 之 kafka源SparkKafka
- Kafka實戰-Flume到KafkaKafka
- Kafka結合SparkStreaming開發KafkaSpark
- kafka+flume的整合Kafka
- SparkStreaming入門教程(三)高階輸入源:Flume、KaSpark
- KafKa+Zookeeper+Flume部署指令碼Kafka指令碼
- 08【線上日誌分析】之Flume Agent(聚合節點) sink to kafka clusterKafka
- 使用Flume消費Kafka資料到HDFSKafka
- 新版flume+kafka+storm安裝部署KafkaORM
- flume+kafka+storm+mysql架構設計KafkaORMMySql架構
- Flume將 kafka 中的資料轉存到 HDFS 中Kafka
- flume-ng+Kafka+Storm+HDFS 實時系統搭建KafkaORM
- Spark-stream基礎---sparkStreaming和Kafka整合wordCount單詞計數SparkKafka
- 圖解SparkStreaming與Kafka的整合,這些細節大家要注意!圖解SparkKafka
- 【Spark篇】---SparkStreaming+Kafka的兩種模式receiver模式和Direct模式SparkKafka模式
- 大資料流處理:Flume、Kafka和NiFi對比大資料KafkaNifi
- 大資料03-整合 Flume 和 Kafka 收集日誌大資料Kafka
- 資料採集元件:Flume基礎用法和Kafka整合元件Kafka
- Sparkstreaming讀取Kafka訊息再結合SparkSQL,將結果儲存到HBaseSparkKafkaSQL
- flume 寫往hdfs引數理解分析
- 【Twitter Storm系列】flume-ng+Kafka+Storm+HDFS 實時系統搭建ORMKafka
- Flume+Kafka收集Docker容器內分散式日誌應用實踐KafkaDocker分散式
- Flume消費內外網分流配置的Kafka時遇到的坑Kafka
- Flume+Kafka+Strom基於偽分散式環境的結合使用Kafka分散式
- Kafka原始碼分析Kafka原始碼
- SparkStreaming VS Structed StreaminSparkStruct
- Flume學習——Flume的架構架構
- apache kafka原始碼分析-Producer分析ApacheKafka原始碼
- 大資料架構:flume-ng+Kafka+Storm+HDFS 實時系統組合大資料架構KafkaORM
- Kafka效能測試分析Kafka
- Kafka - 消費介面分析Kafka
- 大資料-SparkStreaming(一)大資料Spark
- Flume+Spark+Hive+Spark SQL離線分析系統SparkHiveSQL
- flume + elasticsearchElasticsearch
- Flume:spark-project專案的flume配置SparkProject
- Flume與Kafka整合--扇入、扇出功能整合,其中扇出包括:複製流、複用流Kafka
- 原始碼分析Kafka之Producer原始碼Kafka