新版flume+kafka+storm安裝部署

停不下的腳步發表於2015-08-27

原文網址 : https://blog.csdn.net/mylittlered/article/details/48029705

KafkaORM

去年寫了篇flume+kafka+storm的文章，最近重新回顧發現改動挺大的，就重新整理這篇文章。希望能幫上大家。架構圖、元件介紹就不重複了，這篇文章重點將如何安裝部署。需要原始碼的請留言。

版本介紹：

zookeeper3.4.6

flume-ng1.6

kafka2.10-0.8.2

storm0.9.5

安裝zookeeper

1.下載最新release版zookeeper

http://zookeeper.apache.org/releases.html#download

2.修改zookeeper配置檔案

$zookeeper_home/conf

$ cp zoo_sample.cfg zoo_sample.cfg.bak

$ mv zoo_sample.cfg zoo.cfg

修改zoo.cfg中的zookeeper儲存臨時檔案的路徑

在$zookeeper_home的根目錄下建立tmp目錄

vi zoo.cfg

找到 dataDir=/tmp/zookeeper 改為剛才建立的目錄

3.驗證zookeeper是否啟動成功

進入$zookeeper_home/bin目錄下執行

mylover:bin luobao$ shzkServer.sh start

顯示如下內容表示成功

JMX enabled by default
Using config: /Users/luobao/study/zookeeper-3.4.6/bin/../conf/zoo.cfg
-n Starting zookeeper ...

STARTED

安裝storm

1.下載最新release版storm

http://apache.claz.org/storm/apache-storm-0.9.5/apache-storm-0.9.5.tar.gz

2.解壓壓縮包並配置storm的環境變數

3.驗證storm是否能啟動

注：啟動storm之前必須啟動zookeeper

依次啟動：

$storm nimbus

$storm supervisor

$storm ui

開啟瀏覽器地址http://localhost:8080 看到如下介面表示啟動成功

安裝kafka

1.下載對應scala版本的kafka

http://kafka.apache.org/downloads.html

2.啟動並驗證kafka

啟動及測試命令：

下面的啟動步驟是從kafka官網複製來的，之前使用的是kafka0.8.0,發現命令都和0.8.2不同。

Step 1: Download the code

Download the 0.8.2.0 release and un-tar it.

> tar -xzf kafka_2.10-0.8.2.0.tgz
> cd kafka_2.10-0.8.2.0

Step 2: Start the server

Kafka uses ZooKeeper so you need to first start a ZooKeeper server if you don't already have one. You can use the convenience script packaged with kafka to get a quick-and-dirty single-node ZooKeeper instance.

> bin/zookeeper-server-start.sh config/zookeeper.properties
[2013-04-22 15:01:37,495] INFO Reading configuration from: config/zookeeper.properties (org.apache.zookeeper.server.quorum.QuorumPeerConfig)
...

Now start the Kafka server:

> bin/kafka-server-start.sh config/server.properties
[2013-04-22 15:01:47,028] INFO Verifying properties (kafka.utils.VerifiableProperties)
[2013-04-22 15:01:47,051] INFO Property socket.send.buffer.bytes is overridden to 1048576 (kafka.utils.VerifiableProperties)
...

Step 3: Create a topic

Let's create a topic named "test" with a single partition and only one replica:

> bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

We can now see that topic if we run the list topic command:

> bin/kafka-topics.sh --list --zookeeper localhost:2181
test

Alternatively, instead of manually creating topics you can also configure your brokers to auto-create topics when a non-existent topic is published to.

Step 4: Send some messages

Kafka comes with a command line client that will take input from a file or from standard input and send it out as messages to the Kafka cluster. By default each line will be sent as a separate message.

Run the producer and then type a few messages into the console to send to the server.

> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test 
This is a message
This is another message

Step 5: Start a consumer

Kafka also has a command line consumer that will dump out messages to standard output.

> bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning
This is a message
This is another message

storm 和kafka準備就緒了，現在讓我們把他們結合起來使用。

kafka和storm整合

1.下載kafka-storm0.8外掛：https://github.com/wurstmeister/storm-kafka-0.8-plus

以maven方式將該專案匯入eclipse中，等所有依賴包下載好後我們就來寫自己的topology吧

我這裡寫了三個topology提供給大家作為參考

我們執行MykafkaTopology後，回到上文中kafka的命令終端，輸入測試單詞，即可在控制檯看到處理日誌，當然程式在我們手裡debug來看執行還是最能學到東西的。

spout和bolt的組合使用才能完成我們的業務需求，大家可以參考我上文畫的架構圖，制定自己的topology。

大部分的日常業務kafka+storm就可以滿足了，但是這裡我再寫下kafka和flume的整合，用flume採集資料，kafka作為緩衝和傳輸作用。

kafka+flume的整合

1.下載flume-kafka-plus: https://github.com/beyondj2ee/flumeng-kafka-plugin

2.提取外掛中的flume-conf.properties檔案

修改該檔案：#source section

producer.sources.s.type = exec

producer.sources.s.command = tail -f -n+1 /Users/luobao/study/test.log

producer.sources.s.channels = c

修改所有topic的值改為test

將改後的配置檔案放進flume/conf目錄下

3.將flume-kafka-plus/package/flume-kafka-plugins.jar複製到flume的lib下

啟動flume

$bin/flume-ng agent --conf conf --conf-file conf/flume-conf.properties --name producer

現在我們向/Users/luobao/study/test.log檔案中寫入字元

這裡我寫了個簡單指令碼來向test.log寫入當前日期

while true

echo $(date +"%y-%m-%d %H:%M:%S") >> /Users/luobao/study/test.log

sleep 3

done

我們可以進入flume的log資料夾下觀看flume收到的訊息。同時在debug來看看storm是否讀取到。

通過debug我們看出storm已經在不停的處理採集到的資料了。

注：我在看flume的lib包時注意到flume自帶了對kafka的支援，猜想找到對應JAR包下面兩個配置檔案的路徑即可

producer.sinks.r.type = org.apache.flume.plugins.KafkaSink

producer.sinks.r.partitioner.class=org.apache.flume.plugins.SinglePartition

暫且遺留下來，有時間再看吧。

總結下：

啟動流程：zookeeper - kafka - storm - flume

Ubuntu 16.04 環境安裝部署（更新版）
2018-10-16
Ubuntu
安裝最新版MySQL（yum安裝）
2024-06-08
MySql
安裝EOS最新版
2018-06-25
Linux安裝部署
2018-05-16
Linux
ELK安裝部署
2024-03-29
chromedriver安裝部署
2024-05-27
Chrome
canal安裝部署
2020-11-05
SQOOP安裝部署
2020-11-22
OOP
keepalived 安裝部署
2024-09-14
Hadoop安裝部署
2024-11-01
Hadoop
Zabbix安裝部署
2022-01-06
Doris安裝部署
2022-12-06
Ubuntu安裝最新版nodejs
2018-03-29
UbuntuNodeJS
Centos7安裝安裝部署docker
2024-01-09
CentOSDocker
ElasticSearch + Kibana 安裝部署
2019-03-19
Elasticsearch
hadoop的安裝部署
2018-09-07
Hadoop
Jenkins安裝部署（一）
2018-07-10
Jenkins
Saltstack基本安裝部署
2018-06-21
docke安裝與部署
2024-03-12
CDH - [02] 安裝部署
2024-03-25
gitlab - [02] 安裝部署
2024-05-27
Gitlab
Tomcat 8安裝部署
2019-05-09
Tomcat
DataX - [02] 安裝部署
2024-07-31
flume的安裝部署
2018-04-25
RocketMQ安裝及部署
2018-03-20
MQ
python安裝部署(3.12)
2024-11-28
Python
Hive（八）安裝部署
2024-10-14
Hive
Apache Ranger安裝部署
2022-03-27
ApacheRanger
openGauss Datakit安裝部署
2023-04-12
Hive 3.1.2安裝部署
2023-01-28
Hive
Oozie--安裝部署
2021-01-04
minio client安裝部署
2024-12-10
client
Ubuntu 18.04 安裝最新版 Docker
2018-06-08
UbuntuDocker
Navicat 15 最新版安裝教程
2020-12-10
最新版大資料平臺安裝部署指南，HDP-2.6.5.0，ambari-2.6.2.0
2021-03-02
大資料
idea安裝教程2021 最新版idea安裝詳細教程
2021-12-02
Idea
Ubuntu16.04安裝最新版nodejs
2018-12-06
UbuntuNodeJS
新版本torchtext的安裝辦法
2024-09-12
最新版gradle安裝使用簡介
2021-02-07
Gradle

新版flume+kafka+storm安裝部署

Step 2: Start the server

Step 3: Create a topic

Step 4: Send some messages

Step 5: Start a consumer

相關文章