新版flume+kafka+storm安裝部署

停不下的腳步發表於2015-08-27
去年寫了篇flume+kafka+storm的文章,最近重新回顧發現改動挺大的,就重新整理這篇文章。希望能幫上大家。架構圖、元件介紹就不重複了,這篇文章重點將如何安裝部署。需要原始碼的請留言。



版本介紹:
zookeeper3.4.6
flume-ng1.6
kafka2.10-0.8.2
storm0.9.5


安裝zookeeper
1.下載最新release版zookeeper
2.修改zookeeper配置檔案
$zookeeper_home/conf
$ cp zoo_sample.cfg zoo_sample.cfg.bak
$ mv zoo_sample.cfg zoo.cfg

修改zoo.cfg中的zookeeper儲存臨時檔案的路徑
在$zookeeper_home的根目錄下建立tmp目錄
vi zoo.cfg
找到 dataDir=/tmp/zookeeper 改為 剛才建立的目錄
3.驗證zookeeper是否啟動成功
進入$zookeeper_home/bin目錄下執行
mylover:bin luobao$ shzkServer.sh start
顯示如下內容表示成功
JMX enabled by default
Using config: /Users/luobao/study/zookeeper-3.4.6/bin/../conf/zoo.cfg
-n Starting zookeeper ...
STARTED

安裝storm
1.下載最新release版storm
2.解壓壓縮包並配置storm的環境變數
3.驗證storm是否能啟動
注:啟動storm之前必須啟動zookeeper
依次啟動:
$storm nimbus
$storm supervisor
$storm ui
開啟瀏覽器地址http://localhost:8080 看到如下介面表示啟動成功


安裝kafka
1.下載對應scala版本的kafka
2.啟動並驗證kafka
啟動及測試命令:
下面的啟動步驟是從kafka官網複製來的,之前使用的是kafka0.8.0,發現命令都和0.8.2不同。

Step 1: Download the code 
Download the 0.8.2.0 release and un-tar it.
> tar -xzf kafka_2.10-0.8.2.0.tgz
> cd kafka_2.10-0.8.2.0

Step 2: Start the server

Kafka uses ZooKeeper so you need to first start a ZooKeeper server if you don't already have one. You can use the convenience script packaged with kafka to get a quick-and-dirty single-node ZooKeeper instance.

> bin/zookeeper-server-start.sh config/zookeeper.properties
[2013-04-22 15:01:37,495] INFO Reading configuration from: config/zookeeper.properties (org.apache.zookeeper.server.quorum.QuorumPeerConfig)
...
Now start the Kafka server:
> bin/kafka-server-start.sh config/server.properties
[2013-04-22 15:01:47,028] INFO Verifying properties (kafka.utils.VerifiableProperties)
[2013-04-22 15:01:47,051] INFO Property socket.send.buffer.bytes is overridden to 1048576 (kafka.utils.VerifiableProperties)
...

Step 3: Create a topic

Let's create a topic named "test" with a single partition and only one replica:
> bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
We can now see that topic if we run the list topic command:
> bin/kafka-topics.sh --list --zookeeper localhost:2181
test
Alternatively, instead of manually creating topics you can also configure your brokers to auto-create topics when a non-existent topic is published to.

Step 4: Send some messages

Kafka comes with a command line client that will take input from a file or from standard input and send it out as messages to the Kafka cluster. By default each line will be sent as a separate message.

Run the producer and then type a few messages into the console to send to the server.

> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test 
This is a message
This is another message

Step 5: Start a consumer

Kafka also has a command line consumer that will dump out messages to standard output.
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning
This is a message
This is another message

storm 和kafka準備就緒了,現在讓我們把他們結合起來使用。
kafka和storm整合
以maven方式將該專案匯入eclipse中,等所有依賴包下載好後我們就來寫自己的topology吧
我這裡寫了三個topology提供給大家作為參考

我們執行MykafkaTopology後,回到上文中kafka的命令終端,輸入測試單詞,即可在控制檯看到處理日誌,當然程式在我們手裡debug來看執行還是最能學到東西的。
spout和bolt的組合使用才能完成我們的業務需求,大家可以參考我上文畫的架構圖,制定自己的topology。

大部分的日常業務kafka+storm就可以滿足了,但是這裡我再寫下kafka和flume的整合,用flume採集資料,kafka作為緩衝和傳輸作用。
kafka+flume的整合

2.提取外掛中的flume-conf.properties檔案
修改該檔案:#source section
producer.sources.s.type = exec
producer.sources.s.command = tail -f -n+1 /Users/luobao/study/test.log
producer.sources.s.channels = c
修改所有topic的值改為test
將改後的配置檔案放進flume/conf目錄下
3.將flume-kafka-plus/package/flume-kafka-plugins.jar複製到flume的lib下
啟動flume
$bin/flume-ng agent --conf conf --conf-file conf/flume-conf.properties --name producer 
現在我們向/Users/luobao/study/test.log檔案中寫入字元
這裡我寫了個簡單指令碼來向test.log寫入當前日期
while true              
  do
    echo $(date +"%y-%m-%d %H:%M:%S") >> /Users/luobao/study/test.log      
    sleep 3
done
我們可以進入flume的log資料夾下觀看flume收到的訊息。同時在debug來看看storm是否讀取到。

通過debug我們看出storm已經在不停的處理採集到的資料了。
注:我在看flume的lib包時注意到flume自帶了對kafka的支援,猜想找到對應JAR包下面兩個配置檔案的路徑即可
producer.sinks.r.type = org.apache.flume.plugins.KafkaSink
producer.sinks.r.partitioner.class=org.apache.flume.plugins.SinglePartition
暫且遺留下來,有時間再看吧。
總結下:
啟動流程:zookeeper - kafka - storm - flume