本文詳細描述如何在archlinux上搭建twitter storm cluster,轉載請註明出處,謝謝。
有關archlinux基本系統安裝,請參照archlinux簡明安裝指南一文,下面以上述為基礎講解如何一步步安裝twitter storm cluster.
先列出安裝主要步驟
- 安裝oracle jdk
- 安裝必須的編譯工具gcc, g++, make
- 安裝python2.7, unzip
- 編譯安裝zeromq
- 編譯安裝jzmq
- 下載lein
- 下載storm-starter
- 下載storm release版本 安裝zookeeper為了自動執行storm cluster,安裝supervisord
安裝oracle jdk
在linux平臺上標配的java是openjdk,如果要安裝oracle的jdk的話,需要從官方下載相應的安裝包。使用archlinux幸福的一點就是有yaourt,一切可以變得非常簡單,:).
#yaourt -S jdk
注意安裝完的java路徑,應該是在/opt/java, 這個後面會用到。
修改/etc/profile, 新增環境變數JAVA_HOME,為PATH新增/opt/java/bin
PATH="/usr/local/sbin:/usr/local/bin:/usr/bin:/opt/java/bin" export PATH export JAVA_HOME="/opt/java"
安裝編譯工具
在twitter storm中會使用zeromq,因為zeromq是用c&c++編寫的,所以需要安裝相應的編譯工具,不要使用archlinux中的版本,因為目前pacman或aur中的zeromq版本是3.x,而twitter storm中需要的zeromq是2.1.7
#pacman -S gcc g++ libtool pkg-config make autoconf git util-linux
安裝python2.7, unzip
#pacman -S python2 unzip
編譯安裝zeromq,jzmq
從 http://download.zeromq.org/zeromq-2.1.7.tar.gz下載zeromq 2.1.7
#tar zvxf zeromq-2.1.7.tar.gz #config #make #make install
安裝的路徑是/usr/local/lib
編譯安裝jzmq
#git clone https://github.com/nathanmarz/jzmq.git #cd jzmq #./autogen.sh #./configure --with-zeromq=/usr/local #make 注意,此處可能會出錯,解決辦法是修改jzmq/src/Makefile.am,將classdist_noinst.stamp修改為classnoinst.stamp #make install
安裝完zeromq和jzmq之後,修改/etc/ld.so.conf,在該檔案中新增如下一行
/usr/local/lib
然後執行
#ldconfig
為了驗證libjzmq確實使用的zeromq是自行編譯的版本,可使用如下命令進行檢測。
#ldd /usr/local/lib/libjzmq.so linux-gate.so.1 (0xb779e000) libzmq.so.1 => /usr/local/lib/libzmq.so.1 (0xb7749000) libuuid.so.1 => /usr/lib/libuuid.so.1 (0xb7743000) librt.so.1 => /usr/lib/librt.so.1 (0xb773a000) libpthread.so.0 => /usr/lib/libpthread.so.0 (0xb771e000) libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0xb7635000) libm.so.6 => /usr/lib/libm.so.6 (0xb75ee000) libc.so.6 => /usr/lib/libc.so.6 (0xb743e000) libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0xb7422000) /usr/lib/ld-linux.so.2 (0xb779f000)
如果libzmq.so.1確實指向/usr/local/lib中的版本,則說明版本使用正確。
安裝storm-starter
storm-starter是由storm的作者為了storm的初學者快速上手而建立的一個github專案。
#git clone https://github.com/nathanmarz/storm-starter.git
編譯執行, 注意這是執行在local模式而非常cluster模式
#lein deps #lein compile #java -cp $(lein classpath) storm.starter.ExclamationTopology
注:
直接從http://leiningen.org/下載lein script,而非直接使用pacman或yaourt來安裝
#chmod +x ./lein #cp ./lein /usr/local/bin
#export LEIN_ROOT=1 如果想以root來執行lein,需要設定該變數
安裝zookeeper
#yaourt -S zookeeper
作簡單的配置,修改檔案/etc/zookeeper/zoo.cfg,使其內容如下所示
#The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. # do not use /tmp for storage, /tmp here is just # example sakes. dataDir=/var/lib/zookeeper # the port at which the clients will connect clientPort=2181
因為zookeeper只對IPv6地址進行監聽,為了強制其只監聽IPv4地址,需要修改/opt/zookeeper-3.4.5/bin/zkServer.sh,在start)一節中加入 "-Djava.net.preferIPv4Stack=true", 整體看起來如下所示
case $1 in start) echo -n "Starting zookeeper ... " if [ -f $ZOOPIDFILE ]; then if kill -0 `cat $ZOOPIDFILE` > /dev/null 2>&1; then echo $command already running as process `cat $ZOOPIDFILE`. exit 0 fi fi nohup $JAVA "-Dzookeeper.log.dir=${ZOO_LOG_DIR}" "-Dzookeeper.root.logger=${ZOO_LOG4J_PROP}" \ "-Djava.net.preferIPv4Stack=true" \ -cp "$CLASSPATH" $JVMFLAGS $ZOOMAIN "$ZOOCFG" > "$_ZOO_DAEMON_OUT" 2>&1 < /dev/null & if [ $? -eq 0 ] then if /bin/echo -n $! > "$ZOOPIDFILE" then sleep 1 echo STARTED else echo FAILED TO WRITE PID exit 1 fi else echo SERVER DID NOT START exit 1 fi ;;
注意藍底紅字的一行。
啟動zookeeper
#/opt/zookeeper-3.4.5/bin/zkServer.sh start
下載安裝storm
從storm-project.net下載storm-0.8.2,將其解壓到/opt目錄下
#unzip storm-0.8.2.zip
修改/opt/storm-0.8.2/conf/storm.yaml, 檔案內容如下
########### These MUST be filled in for a storm configuration
storm.zookeeper.servers:
- "localhost"
# - "server2"
#
nimbus.host: "localhost"
#
#
# ##### These may optionally be filled in:
#
## List of custom serializations
# topology.kryo.register:
# - org.mycompany.MyType
# - org.mycompany.MyType2: org.mycompany.MyType2Serializer
#
## List of custom kryo decorators
# topology.kryo.decorators:
# - org.mycompany.MyDecorator
#
## Locations of the drpc servers
# drpc.servers:
# - "server1"
# - "server2"
## Metrics Consumers
# topology.metrics.consumer.register:
# - class: "backtype.storm.metrics.LoggingMetricsConsumer"
# parallelism.hint: 1
# - class: "org.mycompany.MyMetricsConsumer"
# parallelism.hint: 1
# argument:
# - endpoint: "metrics-collector.mycompany.org"
java.library.path: "/usr/local/lib:/usr/local/share/java"
supervisor.slots.ports:
- 6700
- 6701
注意:
yaml要求配置項必須以空格打頭
修改storm指令碼,將#!/usr/bin/python改為#!/usr/bin/python2, /usr/bin/python是指向python3的所以需要顯示將其改為python2
準備執行cluster模式了
#/opt/storm-0.8.2/bin/storm nimbus #/opt/storm-0.8.2/bin/storm supervisor #/opt/storm-0.8.2/bin/storm ui
上述每條指令需要單獨執行在一個終端,如果ui啟動成功,可以使用瀏覽器來訪問localhost:8080檢視整個cluster的狀況了。
部署Topology到cluster
#./storm jar $HOME/working/storm-starter/target/storm-starter-0.0.1-SNAPSHOT-standalone.jar storm.starter.ExclamationTopology exclamationTopology
一切順利的話,應該可以看到類似的輸出
0 [main] INFO backtype.storm.StormSubmitter - Jar not uploaded to master yet. Submitting jar... 91 [main] INFO backtype.storm.StormSubmitter - Uploading topology jar /root/working/storm-starter/target/storm-starter-0.0.1-SNAPSHOT-standalone.jar to assigned location: storm-local/nimbus/inbox/stormjar-c73d28f0-68fc-4e6e-98b5-c4d1355aa94f.jar 667 [main] INFO backtype.storm.StormSubmitter - Successfully uploaded topology jar to assigned location: storm-local/nimbus/inbox/stormjar-c73d28f0-68fc-4e6e-98b5-c4d1355aa94f.jar 670 [main] INFO backtype.storm.StormSubmitter - Submitting topology exclamationTopology in distributed mode with conf {"topology.workers":3,"topology.debug":true} 2449 [main] INFO backtype.storm.StormSubmitter - Finished submitting topology: exclamationTopology
自動化執行storm cluster
每次都要手工啟動storm cluster並不是一件很令人愉快的事,最好是能自動啟動。解決辦法總是有的,使用python supervisor即可。
#pacman -S supervisor
#mkdir -p /var/log/storm
修改supervisor配置檔案,在檔案最後新增如下內容
[program:storm-nimbus] environment=JAVA_HOME=/opt/java, PATH="/usr/sbin:/usr/bin:/usr/local/bin:/opt/java/bin" command=/opt/storm-0.8.2/bin/storm nimbus ;;user=storm autostart=true autorestart=true startsecs=10 startretries=999 log_stdout=true log_stderr=true logfile=/var/log/storm/nimbus.out logfile_maxbytes=20MB logfile_backups=10 [program:storm-supervisor] environment=JAVA_HOME=/opt/java, PATH="/usr/sbin:/usr/bin:/usr/local/bin:/opt/java/bin" command=/opt/storm-0.8.2/bin/storm supervisor ;;user=storm autostart=true autorestart=true startsecs=10 startretries=999 log_stdout=true log_stderr=true logfile=/var/log/storm/supervisor.out logfile_maxbytes=20MB logfile_backups=10
注:
在上述配置中顯示加入了environment一行,主要是為了解決可執行檔案搜尋路徑的問題,否則會報錯說無法找到java可執行程式因其不在標準路徑/usr/bin, /usr/sbin, /usr/local/bin, /usr/local/sbin中。
啟動supervisord
#systemctl start supervisord
想開機自動執行supervisord的話,執行如下指令
#systemctl enable supervisord
參考資料
-
Running a Multi-Node Storm Cluster http://www.michael-noll.com/tutorials/running-multi-node-storm-cluster/