針對flume中扇出複用(源exec)原始碼修改,並編譯flume

loveheping發表於2018-04-14
一、編寫目地
  最近研究Flume元件中關於複用資料,多路複用可以根據設定的資訊,進一步分流。透過對flume元件的瞭解,除使用攔截器外,還可以對原始碼修改或自定義源。本次測試使用的是對原始碼修改。使用虛擬搭建flume + kafka環境,編寫功能簡單的多路複用,主要是用flume的exec收集資料來源,放入到logger、hdfs、kafka佇列。在這期間遇到了很多問題,也學到了一些知識,打算做個筆記吧,幫助自己也幫助別人;

二、編譯環境說明
1、jdk版本:jdk1.7.0_80  注:根據pom.xml中要求flume-ng-1.6.0需要在jdk1.7中編譯,使jdk1.8可能會存在問題
2、flume版本:flume-ng-1.6.0-cdh5.7.0
3、maven版本:apache-maven-3.3.9

三、Flume原始碼的修改
3、用maven編譯flume-ng-1.6.0-cdh5.7.0原始碼
3.1、下載flume
cd /opt/sourcecode
wget
tar -zxvf flume-ng-1.6.0-cdh5.7.0-src.tar.gz

3.2、修改原始碼:
cd /opt/sourcecode/flume-ng-1.6.0-cdh5.7.0/flume-ng-core/src/main/java/org/apache/flume/source
在ExecSource.java中增加如下程式碼:
vi ExecSource.java
// 建立map(注:在原始碼中336行增加)

點選(此處)摺疊或開啟

  1. Map<String,String> map=new HashMap<String,String>();

// 擷取每行資料前兩個位元組(根據個人業務邏輯修改或使用flume中的攔截器)
// (注:在原始碼中339行下增加)
// 其中"state"對應於selector.header,"channelFlag"對應於mapping.XXX(如:mapping=US)
//透過這樣在從源獲取資料時對資料做了封裝,成為event並加入header作為分流標記。
// 當event要推送入channel時,會根據header的值將其推送入相應的channel中,詳細在flume配置檔案

點選(此處)摺疊或開啟

  1. String channelFlag = line.substring(0,2);
  2. map.put("state", channelFlag);

修改(注:原始碼中第340行)

點選(此處)摺疊或開啟

  1. eventList.add(EventBuilder.withBody(line.getBytes(charset)));

  2. eventList.add(EventBuilder.withBody(line.getBytes(charset),map));
儲存退(:wq)

四、Flume原始碼的編譯;
4.1、原始碼編譯
cd /opt/sourcecode/flume-ng-1.6.0-cdh5.7.0
mvn install -DskipTests

點選(此處)摺疊或開啟

  1. [INFO] ------------------------------------------------------------------------
  2. [INFO] Reactor Summary:
  3. [INFO]
  4. [INFO] Apache Flume ....................................... SUCCESS [ 3.621 s]
  5. [INFO] Flume NG SDK ....................................... SUCCESS [ 5.543 s]
  6. [INFO] Flume NG Configuration ............................. SUCCESS [ 1.117 s]
  7. [INFO] Flume Auth ......................................... SUCCESS [ 3.623 s]
  8. [INFO] Flume NG Core ...................................... SUCCESS [ 5.092 s]
  9. [INFO] Flume NG Sinks ..................................... SUCCESS [ 0.180 s]
  10. [INFO] Flume NG HDFS Sink ................................. SUCCESS [ 1.932 s]
  11. [INFO] Flume NG IRC Sink .................................. SUCCESS [ 0.954 s]
  12. [INFO] Flume NG Channels .................................. SUCCESS [ 0.153 s]
  13. [INFO] Flume NG JDBC channel .............................. SUCCESS [ 0.989 s]
  14. [INFO] Flume NG file-based channel ........................ SUCCESS [ 1.078 s]
  15. [INFO] Flume NG Spillable Memory channel .................. SUCCESS [ 0.960 s]
  16. [INFO] Flume NG Node ...................................... SUCCESS [ 1.258 s]
  17. [INFO] Flume NG Embedded Agent ............................ SUCCESS [ 1.937 s]
  18. [INFO] Flume NG HBase Sink ................................ SUCCESS [ 2.021 s]
  19. [INFO] Flume NG ElasticSearch Sink ........................ SUCCESS [ 1.095 s]
  20. [INFO] Flume NG Morphline Solr Sink ....................... SUCCESS [ 6.976 s]
  21. [INFO] Flume Kafka Sink ................................... SUCCESS [ 1.065 s]
  22. [INFO] Flume NG Kite Dataset Sink ......................... SUCCESS [ 4.729 s]
  23. [INFO] Flume NG Hive Sink ................................. SUCCESS [ 0.841 s]
  24. [INFO] Flume Sources ...................................... SUCCESS [ 0.099 s]
  25. [INFO] Flume Scribe Source ................................ SUCCESS [ 0.791 s]
  26. [INFO] Flume JMS Source ................................... SUCCESS [ 0.847 s]
  27. [INFO] Flume Twitter Source ............................... SUCCESS [ 0.785 s]
  28. [INFO] Flume Kafka Source ................................. SUCCESS [ 0.707 s]
  29. [INFO] Flume Taildir Source ............................... SUCCESS [ 0.763 s]
  30. [INFO] flume-kafka-channel ................................ SUCCESS [ 0.695 s]
  31. [INFO] Flume legacy Sources ............................... SUCCESS [ 0.074 s]
  32. [INFO] Flume legacy Avro source ........................... SUCCESS [ 1.625 s]
  33. [INFO] Flume legacy Thrift Source ......................... SUCCESS [ 0.714 s]
  34. [INFO] Flume NG Clients ................................... SUCCESS [ 0.079 s]
  35. [INFO] Flume NG Log4j Appender ............................ SUCCESS [ 7.450 s]
  36. [INFO] Flume NG Tools ..................................... SUCCESS [ 0.843 s]
  37. [INFO] Flume NG distribution .............................. SUCCESS [ 30.321 s]
  38. [INFO] Flume NG Integration Tests ......................... SUCCESS [ 1.106 s]
  39. [INFO] ------------------------------------------------------------------------
  40. [INFO] BUILD SUCCESS
  41. [INFO] ------------------------------------------------------------------------
  42. [INFO] Total time: 01:36 min
  43. [INFO] Finished at: 2018-04-14T21:46:36+08:00
  44. [INFO] Final Memory: 106M/705M
  45. [INFO] ------------------------------------------------------------------------
最後將會在flume-ng-dist/target目錄下產生檔案:apache-flume-1.6.0-cdh5.7.0-bin.tar.gz

4.2、編譯問題:
4.2.1、ua-parser-1.3.0下載失敗
 [ERROR] Failed to execute goal on project flume-ng-morphline-solr-sink: Could not resolve dependencies
  for project org.apache.flume.flume-ng-sinks:flume-ng-morphline-solr-sink:jar:1.6.0: Failed to collect dependencies at 
  org.kitesdk:kite-morphlines-all:pom:1.0.0 -> org.kitesdk:kite-morphlines-useragent:jar:1.0.0 -> 
  ua_parser:ua-parser:jar:1.3.0: Failed to read artifact descriptor for ua_parser:ua-parser:jar:1.3.0:
   Could not transfer artifact ua_parser:ua-parser:pom:1.3.0 from/to maven-twttr ():
    maven.twttr.com: Unknown host maven.twttr.com -> [Help 1]
 解決辦法:在flume原始碼的pom.xml下加個repository,資源地址可以自行選擇
 點選(此處)摺疊或開啟
  1. <repository>
  2.    <id>p2.jfrog.org</id>
  3.    <url>http://p2.jfrog.org/libs-releases</url>
  4. </repository>
4.2.2、
pentaho-aggdesigner-algorithm-5.1.5-hyde
下載失敗
This problem persists with the current Hive-1.2.0 release since it depends on Calcite-1.3.0-incubating<
which in turn depends on artifacts like pentaho-aggdesigner-algorithm-5.1.5-hyde<https://github.com/apache/incubator-calcite/blob/branch-1.3/pom.xml#L264>
, There are also licensing issues that I am not completely sure of. For instance what is the
 解決辦法:在flume原始碼的pom.xml下加個repository,資源地址可以自行選擇
點選(此處)摺疊或開啟
  1. <repository>
  2.       <releases>
  3.         <enabled>true</enabled>
  4.         <updatePolicy>always</updatePolicy>
  5.         <checksumPolicy>warn</checksumPolicy>
  6.       </releases>
  7.       <id>conjars</id>
  8.       <name>Conjars</name>
  9.       <url>http://conjars.org/repo</url>
  10.       <layout>default</layout>
  11.     </repository>
   
五、測試
   詳見部落格:http://blog.itpub.net/31511218/viewspace-2152461/


六、總結


【來自@若澤大資料】





來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/31511218/viewspace-2152950/,如需轉載,請註明出處,否則將追究法律責任。

相關文章