針對flume中扇出複用(源exec)原始碼修改,並編譯flume
一、編寫目地
最近研究Flume元件中關於複用資料,多路複用可以根據設定的資訊,進一步分流。透過對flume元件的瞭解,除使用攔截器外,還可以對原始碼修改或自定義源。本次測試使用的是對原始碼修改。使用虛擬搭建flume + kafka環境,編寫功能簡單的多路複用,主要是用flume的exec收集資料來源,放入到logger、hdfs、kafka佇列。在這期間遇到了很多問題,也學到了一些知識,打算做個筆記吧,幫助自己也幫助別人;
二、編譯環境說明
1、jdk版本:jdk1.7.0_80 注:根據pom.xml中要求flume-ng-1.6.0需要在jdk1.7中編譯,使jdk1.8可能會存在問題
2、flume版本:flume-ng-1.6.0-cdh5.7.0
3、maven版本:apache-maven-3.3.9
三、Flume原始碼的修改
3、用maven編譯flume-ng-1.6.0-cdh5.7.0原始碼
3.1、下載flume
cd /opt/sourcecode
wget
tar -zxvf flume-ng-1.6.0-cdh5.7.0-src.tar.gz
3.2、修改原始碼:
cd /opt/sourcecode/flume-ng-1.6.0-cdh5.7.0/flume-ng-core/src/main/java/org/apache/flume/source
在ExecSource.java中增加如下程式碼:
vi ExecSource.java
// 建立map(注:在原始碼中336行增加)
// 擷取每行資料前兩個位元組(根據個人業務邏輯修改或使用flume中的攔截器)
// (注:在原始碼中339行下增加)
// 其中"state"對應於selector.header,"channelFlag"對應於mapping.XXX(如:mapping=US)
//透過這樣在從源獲取資料時對資料做了封裝,成為event並加入header作為分流標記。
// 當event要推送入channel時,會根據header的值將其推送入相應的channel中,詳細在flume配置檔案
修改(注:原始碼中第340行)
儲存退(:wq)
四、Flume原始碼的編譯;
4.1、原始碼編譯
cd /opt/sourcecode/flume-ng-1.6.0-cdh5.7.0
mvn install -DskipTests
最後將會在flume-ng-dist/target目錄下產生檔案:apache-flume-1.6.0-cdh5.7.0-bin.tar.gz
4.2、編譯問題:
4.2.1、ua-parser-1.3.0下載失敗
[ERROR] Failed to execute goal on project flume-ng-morphline-solr-sink: Could not resolve dependencies
for project org.apache.flume.flume-ng-sinks:flume-ng-morphline-solr-sink:jar:1.6.0: Failed to collect dependencies at
org.kitesdk:kite-morphlines-all:pom:1.0.0 -> org.kitesdk:kite-morphlines-useragent:jar:1.0.0 ->
ua_parser:ua-parser:jar:1.3.0: Failed to read artifact descriptor for ua_parser:ua-parser:jar:1.3.0:
Could not transfer artifact ua_parser:ua-parser:pom:1.3.0 from/to maven-twttr ():
maven.twttr.com: Unknown host maven.twttr.com -> [Help 1]
解決辦法:在flume原始碼的pom.xml下加個repository,資源地址可以自行選擇
點選(此處)摺疊或開啟
4.2.2、
點選(此處)摺疊或開啟
五、測試
詳見部落格:http://blog.itpub.net/31511218/viewspace-2152461/
六、總結
【來自@若澤大資料】
最近研究Flume元件中關於複用資料,多路複用可以根據設定的資訊,進一步分流。透過對flume元件的瞭解,除使用攔截器外,還可以對原始碼修改或自定義源。本次測試使用的是對原始碼修改。使用虛擬搭建flume + kafka環境,編寫功能簡單的多路複用,主要是用flume的exec收集資料來源,放入到logger、hdfs、kafka佇列。在這期間遇到了很多問題,也學到了一些知識,打算做個筆記吧,幫助自己也幫助別人;
二、編譯環境說明
1、jdk版本:jdk1.7.0_80 注:根據pom.xml中要求flume-ng-1.6.0需要在jdk1.7中編譯,使jdk1.8可能會存在問題
2、flume版本:flume-ng-1.6.0-cdh5.7.0
3、maven版本:apache-maven-3.3.9
三、Flume原始碼的修改
3、用maven編譯flume-ng-1.6.0-cdh5.7.0原始碼
3.1、下載flume
cd /opt/sourcecode
wget
tar -zxvf flume-ng-1.6.0-cdh5.7.0-src.tar.gz
3.2、修改原始碼:
cd /opt/sourcecode/flume-ng-1.6.0-cdh5.7.0/flume-ng-core/src/main/java/org/apache/flume/source
在ExecSource.java中增加如下程式碼:
vi ExecSource.java
// 建立map(注:在原始碼中336行增加)
點選(此處)摺疊或開啟
- Map<String,String> map=new HashMap<String,String>();
// 擷取每行資料前兩個位元組(根據個人業務邏輯修改或使用flume中的攔截器)
// (注:在原始碼中339行下增加)
// 其中"state"對應於selector.header,"channelFlag"對應於mapping.XXX(如:mapping=US)
//透過這樣在從源獲取資料時對資料做了封裝,成為event並加入header作為分流標記。
// 當event要推送入channel時,會根據header的值將其推送入相應的channel中,詳細在flume配置檔案
點選(此處)摺疊或開啟
-
String channelFlag = line.substring(0,2);
- map.put("state", channelFlag);
修改(注:原始碼中第340行)
點選(此處)摺疊或開啟
-
eventList.add(EventBuilder.withBody(line.getBytes(charset)));
-
為
- eventList.add(EventBuilder.withBody(line.getBytes(charset),map));
四、Flume原始碼的編譯;
4.1、原始碼編譯
cd /opt/sourcecode/flume-ng-1.6.0-cdh5.7.0
mvn install -DskipTests
點選(此處)摺疊或開啟
- [INFO] ------------------------------------------------------------------------
- [INFO] Reactor Summary:
- [INFO]
- [INFO] Apache Flume ....................................... SUCCESS [ 3.621 s]
- [INFO] Flume NG SDK ....................................... SUCCESS [ 5.543 s]
- [INFO] Flume NG Configuration ............................. SUCCESS [ 1.117 s]
- [INFO] Flume Auth ......................................... SUCCESS [ 3.623 s]
- [INFO] Flume NG Core ...................................... SUCCESS [ 5.092 s]
- [INFO] Flume NG Sinks ..................................... SUCCESS [ 0.180 s]
- [INFO] Flume NG HDFS Sink ................................. SUCCESS [ 1.932 s]
- [INFO] Flume NG IRC Sink .................................. SUCCESS [ 0.954 s]
- [INFO] Flume NG Channels .................................. SUCCESS [ 0.153 s]
- [INFO] Flume NG JDBC channel .............................. SUCCESS [ 0.989 s]
- [INFO] Flume NG file-based channel ........................ SUCCESS [ 1.078 s]
- [INFO] Flume NG Spillable Memory channel .................. SUCCESS [ 0.960 s]
- [INFO] Flume NG Node ...................................... SUCCESS [ 1.258 s]
- [INFO] Flume NG Embedded Agent ............................ SUCCESS [ 1.937 s]
- [INFO] Flume NG HBase Sink ................................ SUCCESS [ 2.021 s]
- [INFO] Flume NG ElasticSearch Sink ........................ SUCCESS [ 1.095 s]
- [INFO] Flume NG Morphline Solr Sink ....................... SUCCESS [ 6.976 s]
- [INFO] Flume Kafka Sink ................................... SUCCESS [ 1.065 s]
- [INFO] Flume NG Kite Dataset Sink ......................... SUCCESS [ 4.729 s]
- [INFO] Flume NG Hive Sink ................................. SUCCESS [ 0.841 s]
- [INFO] Flume Sources ...................................... SUCCESS [ 0.099 s]
- [INFO] Flume Scribe Source ................................ SUCCESS [ 0.791 s]
- [INFO] Flume JMS Source ................................... SUCCESS [ 0.847 s]
- [INFO] Flume Twitter Source ............................... SUCCESS [ 0.785 s]
- [INFO] Flume Kafka Source ................................. SUCCESS [ 0.707 s]
- [INFO] Flume Taildir Source ............................... SUCCESS [ 0.763 s]
- [INFO] flume-kafka-channel ................................ SUCCESS [ 0.695 s]
- [INFO] Flume legacy Sources ............................... SUCCESS [ 0.074 s]
- [INFO] Flume legacy Avro source ........................... SUCCESS [ 1.625 s]
- [INFO] Flume legacy Thrift Source ......................... SUCCESS [ 0.714 s]
- [INFO] Flume NG Clients ................................... SUCCESS [ 0.079 s]
- [INFO] Flume NG Log4j Appender ............................ SUCCESS [ 7.450 s]
- [INFO] Flume NG Tools ..................................... SUCCESS [ 0.843 s]
- [INFO] Flume NG distribution .............................. SUCCESS [ 30.321 s]
- [INFO] Flume NG Integration Tests ......................... SUCCESS [ 1.106 s]
- [INFO] ------------------------------------------------------------------------
- [INFO] BUILD SUCCESS
- [INFO] ------------------------------------------------------------------------
- [INFO] Total time: 01:36 min
- [INFO] Finished at: 2018-04-14T21:46:36+08:00
- [INFO] Final Memory: 106M/705M
- [INFO] ------------------------------------------------------------------------
4.2、編譯問題:
4.2.1、ua-parser-1.3.0下載失敗
[ERROR] Failed to execute goal on project flume-ng-morphline-solr-sink: Could not resolve dependencies
for project org.apache.flume.flume-ng-sinks:flume-ng-morphline-solr-sink:jar:1.6.0: Failed to collect dependencies at
org.kitesdk:kite-morphlines-all:pom:1.0.0 -> org.kitesdk:kite-morphlines-useragent:jar:1.0.0 ->
ua_parser:ua-parser:jar:1.3.0: Failed to read artifact descriptor for ua_parser:ua-parser:jar:1.3.0:
Could not transfer artifact ua_parser:ua-parser:pom:1.3.0 from/to maven-twttr ():
maven.twttr.com: Unknown host maven.twttr.com -> [Help 1]
解決辦法:在flume原始碼的pom.xml下加個repository,資源地址可以自行選擇
點選(此處)摺疊或開啟
-
<repository>
-
<id>p2.jfrog.org</id>
-
<url>http://p2.jfrog.org/libs-releases</url>
- </repository>
pentaho-aggdesigner-algorithm-5.1.5-hyde下載失敗
This problem persists with the current Hive-1.2.0 release since it depends on Calcite-1.3.0-incubating< which in turn depends on artifacts like pentaho-aggdesigner-algorithm-5.1.5-hyde<https://github.com/apache/incubator-calcite/blob/branch-1.3/pom.xml#L264> , There are also licensing issues that I am not completely sure of. For instance what is the解決辦法:在flume原始碼的pom.xml下加個repository,資源地址可以自行選擇
點選(此處)摺疊或開啟
-
<repository>
-
<releases>
-
<enabled>true</enabled>
-
<updatePolicy>always</updatePolicy>
-
<checksumPolicy>warn</checksumPolicy>
-
</releases>
-
<id>conjars</id>
-
<name>Conjars</name>
-
<url>http://conjars.org/repo</url>
-
<layout>default</layout>
- </repository>
五、測試
詳見部落格:http://blog.itpub.net/31511218/viewspace-2152461/
六、總結
【來自@若澤大資料】
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/31511218/viewspace-2152950/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- Flume與Kafka整合--扇入、扇出功能整合,其中扇出包括:複製流、複用流Kafka
- Flume學習——Flume中事務的定義
- Flume-NG原始碼閱讀之FileChannel原始碼
- Flume-NG原始碼閱讀之AvroSink原始碼VRROS
- Flume學習——Flume的架構架構
- [從原始碼學設計] Flume 之 memory channel原始碼
- flume + elasticsearchElasticsearch
- 使用dnSpy對無原始碼EXE或DLL進行反編譯並且修改DNS原始碼編譯
- Flume:spark-project專案的flume配置SparkProject
- Flume 整合 Kafka_flume 到kafka 配置【轉】Kafka
- Flume-NG原始碼閱讀之Interceptor(原創)原始碼
- nvme driver 原始碼修改、編譯原始碼編譯
- 修改ffmpeg原始碼,並用它對多路節目TS流解複用及播放原始碼
- Flume篇---Flume安裝配置與相關使用
- Flume-NG原始碼閱讀之SinkGroups和SinkRunner原始碼
- flume面試理論和應用面試
- KafKa+Zookeeper+Flume部署指令碼Kafka指令碼
- 在專案中自定義路徑放入element-ui並修改編譯原始碼UI編譯原始碼
- spark 與flume 1.6.0Spark
- Flume面試題整理面試題
- Flume學習——BasicChannelSemantics
- Flume學習——BasicTransactionSemantics
- flume HdfsEventSink KafkaSink配置Kafka
- Flume:資料匯入到hdfs中
- Flume架構以及應用介紹[轉]架構
- Flume基礎學習
- kafka+flume的整合Kafka
- Apache Flume 入門教程Apache
- flume的安裝部署
- Flume + Kafka + SparkStreaming分析KafkaSpark
- Flume負載均衡配置負載
- Flume學習系列(六)---- Logger Sink原始碼解讀與自定原始碼
- Flume將 kafka 中的資料轉存到 HDFS 中Kafka
- flume 1.8.0 開發基礎
- Flume 配置環境變數變數
- 大資料之Flume(二)大資料
- 日誌採集框架Flume框架
- Flume(一):簡介架構架構