轉載請註明作者及出處
前言
由於業務需求需要把Strom與kafka整合到spring boot專案裡,實現其他服務輸出日誌至kafka訂閱話題,storm實時處理該話題完成資料監控及其他資料統計,但是網上教程較少,今天想寫的就是如何整合storm+kafka 到spring boot,順帶說一說我遇到的坑。
使用工具及環境配置
1. java 版本jdk-1.8
2. 編譯工具使用IDEA-2017
3. maven作為專案管理
4.spring boot-1.5.8.RELEASE
需求體現
1.為什麼需要整合到spring boot
為了使用spring boot 統一管理各種微服務,及同時避免多個分散配置
2.具體思路及整合原因
使用spring boot統一管理kafka、storm、redis等所需要的bean,通過其他服務日誌收集至Kafka,KafKa實時傳送日誌至storm,在strom bolt時進行相應的處理操作
遇到的問題
1.使用spring boot並沒有相關整合storm
2.以spring boot啟動方式不知道如何觸發提交Topolgy
3.提交Topology時遇到numbis not client localhost 問題
4.Storm bolt中無法通過註解獲得例項化bean進行相應的操作
解決思路
在整合之前我們需要知道相應的spring boot 的啟動方式及配置(如果你在閱讀本文時,預設你已經對storm,kafka及spring boot有相關了解及使用)
-
spring boot 對storm進行整合的例子在網上很少,但是因為有相應的需求,因此我們還是需要整合.
首先匯入所需要jar包:
<dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-clients</artifactId> <version>0.10.1.1</version> </dependency> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-starter-stream-kafka</artifactId> <exclusions> <exclusion> <artifactId>zookeeper</artifactId> <groupId>org.apache.zookeeper</groupId> </exclusion> <exclusion> <artifactId>spring-boot-actuator</artifactId> <groupId>org.springframework.boot</groupId> </exclusion> <exclusion> <artifactId>kafka-clients</artifactId> <groupId>org.apache.kafka</groupId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.springframework.kafka</groupId> <artifactId>spring-kafka</artifactId> <exclusions> <exclusion> <artifactId>kafka-clients</artifactId> <groupId>org.apache.kafka</groupId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.springframework.data</groupId> <artifactId>spring-data-hadoop</artifactId> <version>2.5.0.RELEASE</version> <exclusions> <exclusion> <groupId>org.slf4j</groupId> <artifactId>slf4j-log4j12</artifactId> </exclusion> <exclusion> <artifactId>commons-logging</artifactId> <groupId>commons-logging</groupId> </exclusion> <exclusion> <artifactId>netty</artifactId> <groupId>io.netty</groupId> </exclusion> <exclusion> <artifactId>jackson-core-asl</artifactId> <groupId>org.codehaus.jackson</groupId> </exclusion> <exclusion> <artifactId>curator-client</artifactId> <groupId>org.apache.curator</groupId> </exclusion> <exclusion> <artifactId>jettison</artifactId> <groupId>org.codehaus.jettison</groupId> </exclusion> <exclusion> <artifactId>jackson-mapper-asl</artifactId> <groupId>org.codehaus.jackson</groupId> </exclusion> <exclusion> <artifactId>jackson-jaxrs</artifactId> <groupId>org.codehaus.jackson</groupId> </exclusion> <exclusion> <artifactId>snappy-java</artifactId> <groupId>org.xerial.snappy</groupId> </exclusion> <exclusion> <artifactId>jackson-xc</artifactId> <groupId>org.codehaus.jackson</groupId> </exclusion> <exclusion> <artifactId>guava</artifactId> <groupId>com.google.guava</groupId> </exclusion> <exclusion> <artifactId>hadoop-mapreduce-client-core</artifactId> <groupId>org.apache.hadoop</groupId> </exclusion> <exclusion> <artifactId>zookeeper</artifactId> <groupId>org.apache.zookeeper</groupId> </exclusion> <exclusion> <artifactId>servlet-api</artifactId> <groupId>javax.servlet</groupId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.apache.zookeeper</groupId> <artifactId>zookeeper</artifactId> <version>3.4.10</version> <exclusions> <exclusion> <artifactId>slf4j-log4j12</artifactId> <groupId>org.slf4j</groupId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-client</artifactId> <version>1.2.4</version> <exclusions> <exclusion> <artifactId>log4j</artifactId> <groupId>log4j</groupId> </exclusion> <exclusion> <artifactId>zookeeper</artifactId> <groupId>org.apache.zookeeper</groupId> </exclusion> <exclusion> <artifactId>netty</artifactId> <groupId>io.netty</groupId> </exclusion> <exclusion> <artifactId>hadoop-common</artifactId> <groupId>org.apache.hadoop</groupId> </exclusion> <exclusion> <artifactId>guava</artifactId> <groupId>com.google.guava</groupId> </exclusion> <exclusion> <artifactId>hadoop-annotations</artifactId> <groupId>org.apache.hadoop</groupId> </exclusion> <exclusion> <artifactId>hadoop-yarn-common</artifactId> <groupId>org.apache.hadoop</groupId> </exclusion> <exclusion> <artifactId>slf4j-log4j12</artifactId> <groupId>org.slf4j</groupId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.7.3</version> <exclusions> <exclusion> <artifactId>commons-logging</artifactId> <groupId>commons-logging</groupId> </exclusion> <exclusion> <artifactId>curator-client</artifactId> <groupId>org.apache.curator</groupId> </exclusion> <exclusion> <artifactId>jackson-mapper-asl</artifactId> <groupId>org.codehaus.jackson</groupId> </exclusion> <exclusion> <artifactId>jackson-core-asl</artifactId> <groupId>org.codehaus.jackson</groupId> </exclusion> <exclusion> <artifactId>log4j</artifactId> <groupId>log4j</groupId> </exclusion> <exclusion> <artifactId>snappy-java</artifactId> <groupId>org.xerial.snappy</groupId> </exclusion> <exclusion> <artifactId>zookeeper</artifactId> <groupId>org.apache.zookeeper</groupId> </exclusion> <exclusion> <artifactId>guava</artifactId> <groupId>com.google.guava</groupId> </exclusion> <exclusion> <artifactId>hadoop-auth</artifactId> <groupId>org.apache.hadoop</groupId> </exclusion> <exclusion> <artifactId>commons-lang</artifactId> <groupId>commons-lang</groupId> </exclusion> <exclusion> <artifactId>slf4j-log4j12</artifactId> <groupId>org.slf4j</groupId> </exclusion> <exclusion> <artifactId>servlet-api</artifactId> <groupId>javax.servlet</groupId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-mapreduce-examples</artifactId> <version>2.7.3</version> <exclusions> <exclusion> <artifactId>commons-logging</artifactId> <groupId>commons-logging</groupId> </exclusion> <exclusion> <artifactId>netty</artifactId> <groupId>io.netty</groupId> </exclusion> <exclusion> <artifactId>guava</artifactId> <groupId>com.google.guava</groupId> </exclusion> <exclusion> <artifactId>log4j</artifactId> <groupId>log4j</groupId> </exclusion> <exclusion> <artifactId>servlet-api</artifactId> <groupId>javax.servlet</groupId> </exclusion> </exclusions> </dependency> <!--storm--> <dependency> <groupId>org.apache.storm</groupId> <artifactId>storm-core</artifactId> <version>${storm.version}</version> <scope>${provided.scope}</scope> <exclusions> <exclusion> <groupId>org.apache.logging.log4j</groupId> <artifactId>log4j-slf4j-impl</artifactId> </exclusion> <exclusion> <artifactId>servlet-api</artifactId> <groupId>javax.servlet</groupId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.apache.storm</groupId> <artifactId>storm-kafka</artifactId> <version>1.1.1</version> <exclusions> <exclusion> <artifactId>kafka-clients</artifactId> <groupId>org.apache.kafka</groupId> </exclusion> </exclusions> </dependency> 複製程式碼
其中去除jar包是因為需要相與專案構建依賴有多重依賴問題,storm版本為1.1.0 spring boot相關依賴為
```java
<!-- spring boot -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter</artifactId>
<exclusions>
<exclusion>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-logging</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-aop</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-log4j2</artifactId>
</dependency>
<dependency>
<groupId>org.mybatis.spring.boot</groupId>
<artifactId>mybatis-spring-boot-starter</artifactId>
<version>${mybatis-spring.version}</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-configuration-processor</artifactId>
<optional>true</optional>
</dependency>
複製程式碼
ps:maven的jar包僅因為專案使用需求,不是最精簡,僅供大家參考.
專案結構:
config-儲存不同環境配置檔案
java-config 儲存構建spring boot 相關實現類 其他如構建名啟動spring boot的時候我們會發現
- 其實開始整合前,對storm瞭解的較少,屬於剛開始沒有接觸過,後面參考發現整合到spring boot裡面啟動spring boot之後並沒有相應的方式去觸發提交Topolgy的函式,所以也造成了以為啟動spring boot之後就完事了結果等了半個小時什麼事情都沒發生….-。-才發現沒有實現觸發提交函式.
為了解決這個問題我的想法是: 啟動spring boot->建立kafka監聽Topic然後啟動Topolgy完成啟動,可是這樣的問題kafka監聽這個主題會重複觸發Topolgy,這明顯不是我們想要的.看了一會後發現spring 有相關啟動完成之後執行某個時間方法,這個對我來說簡直是救星啊.所以現在觸發Topolgy的思路變為:
啟動spring boot ->執行觸發方法->完成相應的觸發條件
構建方法為:
/**
* @author Leezer
* @date 2017/12/28
* spring載入完後自動自動提交Topology
**/
@Configuration
@Component
public class AutoLoad implements ApplicationListener<ContextRefreshedEvent> {
private static String BROKERZKSTR;
private static String TOPIC;
private static String HOST;
private static String PORT;
public AutoLoad(@Value("${storm.brokerZkstr}") String brokerZkstr,
@Value("${zookeeper.host}") String host,
@Value("${zookeeper.port}") String port,
@Value("${kafka.default-topic}") String topic
){
BROKERZKSTR = brokerZkstr;
HOST= host;
TOPIC= topic;
PORT= port;
}
@Override
public void onApplicationEvent(ContextRefreshedEvent event) {
try {
//例項化topologyBuilder類。
TopologyBuilder topologyBuilder = new TopologyBuilder();
//設定噴發節點並分配併發數,該併發數將會控制該物件在叢集中的執行緒數。
BrokerHosts brokerHosts = new ZkHosts(BROKERZKSTR);
// 配置Kafka訂閱的Topic,以及zookeeper中資料節點目錄和名字
SpoutConfig spoutConfig = new SpoutConfig(brokerHosts, TOPIC, "/storm", "s32");
spoutConfig.scheme = new SchemeAsMultiScheme(new StringScheme());
spoutConfig.zkServers = Collections.singletonList(HOST);
spoutConfig.zkPort = Integer.parseInt(PORT);
//從Kafka最新輸出日誌讀取
spoutConfig.startOffsetTime = OffsetRequest.LatestTime();
KafkaSpout receiver = new KafkaSpout(spoutConfig);
topologyBuilder.setSpout("kafka-spout", receiver, 1).setNumTasks(2);
topologyBuilder.setBolt("alarm-bolt", new AlarmBolt(), 1).setNumTasks(2).shuffleGrouping("kafka-spout");
Config config = new Config();
config.setDebug(false);
/*設定該topology在storm叢集中要搶佔的資源slot數,一個slot對應這supervisor節點上的以個worker程式,如果你分配的spot數超過了你的物理節點所擁有的worker數目的話,有可能提交不成功,加入你的叢集上面已經有了一些topology而現在還剩下2個worker資源,如果你在程式碼裡分配4個給你的topology的話,那麼這個topology可以提交但是提交以後你會發現並沒有執行。 而當你kill掉一些topology後釋放了一些slot後你的這個topology就會恢復正常執行。
*/
config.setNumWorkers(1);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("kafka-spout", config, topologyBuilder.createTopology());
} catch (Exception e) {
e.printStackTrace();
}
}
}
複製程式碼
注:
- 啟動專案時因為使用的是內嵌tomcat進行啟動,可能會報如下錯誤
[Tomcat-startStop-1] ERROR o.a.c.c.ContainerBase - A child container failed during start
java.util.concurrent.ExecutionException: org.apache.catalina.LifecycleException: Failed to start component [StandardEngine[Tomcat].StandardHost[localhost].TomcatEmbeddedContext[]]
at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:1.8.0_144]
at java.util.concurrent.FutureTask.get(FutureTask.java:192) ~[?:1.8.0_144]
at org.apache.catalina.core.ContainerBase.startInternal(ContainerBase.java:939) [tomcat-embed-core-8.5.23.jar:8.5.23]
at org.apache.catalina.core.StandardHost.startInternal(StandardHost.java:872) [tomcat-embed-core-8.5.23.jar:8.5.23]
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) [tomcat-embed-core-8.5.23.jar:8.5.23]
at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1419) [tomcat-embed-core-8.5.23.jar:8.5.23]
at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1409) [tomcat-embed-core-8.5.23.jar:8.5.23]
at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266) [?:1.8.0_144]
at java.util.concurrent.FutureTask.run(FutureTask.java) [?:1.8.0_144]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_144]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_144]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
複製程式碼
這是因為有相應匯入的jar包引入了servlet-api版本低於內嵌版本,我們需要做的就是開啟maven依賴把其去除
<exclusion>
<artifactId>servlet-api</artifactId>
<groupId>javax.servlet</groupId>
</exclusion>
複製程式碼
然後重新啟動就可以了.
-
啟動過程中還有可能報:
org.apache.storm.utils.NimbusLeaderNotFoundException: Could not find leader nimbus from seed hosts [localhost]. Did you specify a valid list of nimbus hosts for config nimbus.seeds?at org.apache.storm.utils.NimbusClient.getConfiguredClientAs(NimbusClient.java:90 複製程式碼
這個問題我思考了很久,發現網上的解釋都是因為storm配置問題導致不對,可是我的storm是部署在伺服器上的.並沒有相關的配置,按理也應該去伺服器上讀取相關配置,可是結果並不是這樣的。最後嘗試了幾個做法發現都不對,這裡才發現,在構建叢集的時候storm提供了相應的本地叢集
LocalCluster cluster = new LocalCluster(); 複製程式碼
進行本地測試,如果在本地測試就使用其進行部署測試,如果部署到伺服器上需要把:
cluster.submitTopology("kafka-spout", config, topologyBuilder.createTopology()); //修正為: StormSubmitter.submitTopology("kafka-spout", config, topologyBuilder.createTopology()); 複製程式碼
進行任務提交;
以上解決了上面所述的問題1-3
問題4:是在bolt中使用相關bean例項,我發現我把其使用@Component加入spring中也無法獲取到例項:我的猜想是在我們構建提交Topolgy的時候,它會在:
topologyBuilder.setBolt("alarm-bolt",new AlarmBolt(),1).setNumTasks(2).shuffleGrouping("kafka-spout"); 複製程式碼
執行bolt相關:
@Override public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) { this.collector = collector; StormLauncher stormLauncher = StormLauncher.getStormLauncher(); dataRepositorys =(AlarmDataRepositorys) stormLauncher.getBean("alarmdataRepositorys"); } 複製程式碼
而不會例項化bolt,導致執行緒不一而spring 獲取不到.(這裡我也不是太明白,如果有大佬知道可以分享一波)
而我們使用spring boot的意義就在於這些獲取這些繁雜的物件,這個問題困擾了我很久.最終想到,我們可以通過上下文getbean獲取例項不知道能不能行,然後我就開始了定義:
例如我需要在bolt中使用一個服務:
/** * @author Leezer * @date 2017/12/27 * 儲存操作失敗時間 **/ @Service("alarmdataRepositorys") public class AlarmDataRepositorys extends RedisBase implements IAlarmDataRepositorys { private static final String ERRO = "erro"; /** * @param type 型別 * @param key key值 * @return 錯誤次數 **/ @Override public String getErrNumFromRedis(String type,String key) { if(type==null || key == null){ return null; }else { ValueOperations<String, String> valueOper = primaryStringRedisTemplate.opsForValue(); return valueOper.get(String.format("%s:%s:%s",ERRO,type,key)); } } /** * @param type 錯誤型別 * @param key key值 * @param value 儲存值 **/ @Override public void setErrNumToRedis(String type, String key,String value) { try { ValueOperations<String, String> valueOper = primaryStringRedisTemplate.opsForValue(); valueOper.set(String.format("%s:%s:%s", ERRO,type, key), value, Dictionaries.ApiKeyDayOfLifeCycle, TimeUnit.SECONDS); }catch (Exception e){ logger.info(Dictionaries.REDIS_ERROR_PREFIX+String.format("key為%s存入redis失敗",key)); } } 複製程式碼
這裡我指定了該bean的名稱,則在bolt執行prepare時:使用getbean方法獲取了相關bean就能完成相應的操作.
然後kafka訂閱主題傳送至我bolt進行相關的處理.而這裡getbean的方法是在啟動bootmain函式定義:
@SpringBootApplication
@EnableTransactionManagement
@ComponentScan({"service","storm"})
@EnableMongoRepositories(basePackages = {"storm"})
@PropertySource(value = {"classpath:service.properties", "classpath:application.properties","classpath:storm.properties"})
@ImportResource(locations = {
"classpath:/configs/spring-hadoop.xml",
"classpath:/configs/spring-hbase.xml"})
public class StormLauncher extends SpringBootServletInitializer {
//設定 安全執行緒launcher例項
private volatile static StormLauncher stormLauncher;
//設定上下文
private ApplicationContext context;
public static void main(String[] args) {
SpringApplicationBuilder application = new SpringApplicationBuilder(StormLauncher.class);
// application.web(false).run(args);該方式是spring boot不以web形式啟動
application.run(args);
StormLauncher s = new StormLauncher();
s.setApplicationContext(application.context());
setStormLauncher(s);
}
private static void setStormLauncher(StormLauncher stormLauncher) {
StormLauncher.stormLauncher = stormLauncher;
}
public static StormLauncher getStormLauncher() {
return stormLauncher;
}
@Override
protected SpringApplicationBuilder configure(SpringApplicationBuilder application) {
return application.sources(StormLauncher.class);
}
/**
* 獲取上下文
*
* @return the application context
*/
public ApplicationContext getApplicationContext() {
return context;
}
/**
* 設定上下文.
*
* @param appContext 上下文
*/
private void setApplicationContext(ApplicationContext appContext) {
this.context = appContext;
}
/**
* 通過自定義name獲取 例項 Bean.
*
* @param name the name
* @return the bean
*/
public Object getBean(String name) {
return context.getBean(name);
}
/**
* 通過class獲取Bean.
*
* @param <T> the type parameter
* @param clazz the clazz
* @return the bean
*/
public <T> T getBean(Class<T> clazz) {
return context.getBean(clazz);
}
/**
* 通過name,以及Clazz返回指定的Bean
*
* @param <T> the type parameter
* @param name the name
* @param clazz the clazz
* @return the bean
*/
public <T> T getBean(String name, Class<T> clazz) {
return context.getBean(name, clazz);
}
複製程式碼
到此整合storm 和kafka至spring boot已經結束了,相關kafka及其他配置我會放入github上面
對了這裡還有一個kafkaclient的坑:
Async loop died! java.lang.NoSuchMethodError: org.apache.kafka.common.network.NetworkSend.
複製程式碼
專案會報kafka client 問題,這是因為storm-kafka中,kafka使用的是0.8版本,而NetworkSend是0.9以上的版本,這裡整合需要與你整合的kafka相關版本一致.
雖然整合比較簡單,但是參考都比較少,加之剛開始接觸storm所以思考比較多,也在這記錄一下.
專案地址 - github