大資料之Flume（二）

愛學習的老冰棍發表於2020-09-24

原文網址 : https://blog.csdn.net/qq_43182741/article/details/108783230

大資料

大資料之Flume（二）

3. Flume進階

之前一節分享是基礎的內容，今天來講解一下進階的哈！！

3. Flume進階

3.1 Flume 事務

關於Flume事務我總結了一下，不過我們先看圖來了解一下：

在這裡插入圖片描述

看完圖後，我們來看一下總結：
Flume是由兩個事務組成的，一個是資料傳輸到Source元件後到Channel的過程中，這個事務我們一般稱為put事務，另外一個是Sink元件從Channel元件中提取資料的過程中，這個事務我們一般稱為take事務。
PUT事務：source元件接收資料後，會將資料封裝成一個一個的event，source元件中的一個個event會像經過put事務，寫到transaction的臨時緩衝區中，一般情況下設定臨時緩衝區的大小為100個event，當臨時緩衝區存夠100個event後，會把自己的資料傳輸到Channel元件中，若此時發現Channel記憶體佇列中存不下100個event（上傳失敗），put事務會清除掉上傳的部分資料以及put事務中的資料，然後回重新讀取Source元件中的資料再進行上傳，此操作為回滾，上傳成功後，put事務結束。
TAKE事務：take事務會從Channel提取資料到臨時緩衝區，並將資料傳送到HDFS上。假設在傳輸過程中，資料傳輸失敗了，take事務此次也就失敗了，儲存在事務中的臨時緩衝區中的資料也會清除，但是傳輸到HDFS上的資料不能清除（這也是生產過程中產生資料重複的原因之一），當資料成功寫到HDFS上後，take事務成功，成功後，其也會清除掉臨時緩衝區的資料，此時，take事務結束。
Flume中的事務主要是為了保證資料傳輸的完整性，所以一般情況下資料都是完整的，但是也不避免，Channel元件使用memory型別時，Flume當機造成資料丟失的情況，但是發生這種情況的時候，資料丟失往往不那麼重要了。。。

3.2 Flume Agent 內部原理

老套路，我們先來看下圖，然後我在給大家闡述一下：

在這裡插入圖片描述

我們先來看一下各個元件：
1）ChannelSelector
ChannelSelector的作用就是選出Event將要被髮往哪個Channel。其共有兩種型別，分別是Replicating（複製）和Multiplexing（多路複用）。
ReplicatingSelector會將同一個Event發往所有的Channel，Multiplexing會根據相應的原則（event的header），將不同的Event發往不同的Channel。
2）SinkProcessor
SinkProcessor共有三種型別，分別是DefaultSinkProcessor、LoadBalancingSinkProcessor和FailoverSinkProcessor
DefaultSinkProcessor對應的是單個的Sink，LoadBalancingSinkProcessor和FailoverSinkProcessor對應的是Sink Group（sink組），LoadBalancingSinkProcessor可以實現負載均衡的功能，FailoverSinkProcessor可以**錯誤恢復（故障轉移）**的功能。
ok,我們來走一下event的一生，Source元件接收資料封裝為event（誕生），其後到達Channel Processor（處理event），因在實際的生產過程中，我們收集到資料多少會存在髒資料，所以哦一般會將event傳遞給interceptor（攔截器），若是多個攔截器，我們稱之為攔截器鏈，經過攔截器鏈處理後的資料會傳遞給Channel Selector，Selector的型別上面我有寫到，但是Multiplexing的處理原則是會根據event的header傳送至不同的Channel元件，之後event會在通過sinkprocessor，這裡我再講一下loadbalancingsinkprocessor，它的原則是隨機和輪循原則，隨機原則是不同的sink隨機出來一個提取資料，而輪循原則是一個一個sink來提取資料（可能會出現提取不到資料的情況）。當event經過processor後，被sink提取後，event的一生結束。

3.3 Flume 拓撲結構

3.3.1 簡單串聯

在這裡插入圖片描述

這種模式的優點就是Channel多，快取多一些，但是它的缺點太明顯了，只要一臺flume出現當機，整個系統就廢了，所以不建議用這個。

3.3.2 複製和多路複用

在這裡插入圖片描述

flume支援將資料傳輸到一個或多個目的地，這種模式可以將相同資料複製到多個channel中，或者將不同資料分發到不同的channel中，sink選擇傳送到不同的目的地。

3.3.3 負載均衡和故障轉移

在這裡插入圖片描述

Flume支援使用將多個sink邏輯上分到一個sink組，sink組配合不同的SinkProcessor可以實現負載均衡和錯誤恢復的功能。

3.3.4 聚合

在這裡插入圖片描述

這種模式是我們經常使用的，日常web應用通常分佈在上百個伺服器，大者甚至上千個、上萬個伺服器。產生的日誌，處理起來也非常麻煩。用flume的這種組合方式能很好的解決這一問題，每臺伺服器部署一個flume採集日誌，傳送到一個集中收集日誌的flume，再由此flume上傳到hdfs、hive、hbase等，進行日誌分析。

3.4 Flume企業開發案例

3.4.1 複製和多路複用

1）案例需求
使用Flume-1監控檔案變動，Flume-1將變動內容傳遞給Flume-2，Flume-2負責儲存到HDFS。同時Flume-1將變動內容傳遞給Flume-3，Flume-3負責輸出到Local FileSystem。
2）需求分析：

在這裡插入圖片描述

3）實現步驟：
（1）準備工作
在/opt/module/flume/job目錄下建立group1資料夾

[atguigu@hadoop102 job]$ cd group1/

在/opt/module/datas/目錄下建立flume3資料夾

[atguigu@hadoop102 datas]$ mkdir flume3

（2）建立flume-file-flume.conf
配置1個接收日誌檔案的source和兩個channel、兩個sink，分別輸送給flume-flume-hdfs和flume-flume-dir。
編輯配置檔案

[atguigu@hadoop102 group1]$ vim flume-file-flume.conf

新增如下內容

# Name the components on this agent
a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1 c2
# 將資料流複製給所有channel
a1.sources.r1.selector.type = replicating

# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /opt/module/hive/logs/hive.log
a1.sources.r1.shell = /bin/bash -c

# Describe the sink
# sink端的avro是一個資料傳送者
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = hadoop102 
a1.sinks.k1.port = 4141

a1.sinks.k2.type = avro
a1.sinks.k2.hostname = hadoop102
a1.sinks.k2.port = 4142

# Describe the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000
a1.channels.c2.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1 c2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2

（3）建立flume-flume-hdfs.conf
配置上級Flume輸出的Source，輸出是到HDFS的Sink。
編輯配置檔案

[atguigu@hadoop102 group1]$ vim flume-flume-hdfs.conf

新增如下內容

# Name the components on this agent
a2.sources = r1
a2.sinks = k1
a2.channels = c1

# Describe/configure the source
# source端的avro是一個資料接收服務
a2.sources.r1.type = avro
a2.sources.r1.bind = hadoop102
a2.sources.r1.port = 4141

# Describe the sink
a2.sinks.k1.type = hdfs
a2.sinks.k1.hdfs.path = hdfs://hadoop102:8020/flume2/%Y%m%d/%H
#上傳檔案的字首
a2.sinks.k1.hdfs.filePrefix = flume2-
#是否按照時間滾動資料夾
a2.sinks.k1.hdfs.round = true
#多少時間單位建立一個新的資料夾
a2.sinks.k1.hdfs.roundValue = 1
#重新定義時間單位
a2.sinks.k1.hdfs.roundUnit = hour
#是否使用本地時間戳
a2.sinks.k1.hdfs.useLocalTimeStamp = true
#積攢多少個Event才flush到HDFS一次
a2.sinks.k1.hdfs.batchSize = 100
#設定檔案型別，可支援壓縮
a2.sinks.k1.hdfs.fileType = DataStream
#多久生成一個新的檔案
a2.sinks.k1.hdfs.rollInterval = 600
#設定每個檔案的滾動大小大概是128M
a2.sinks.k1.hdfs.rollSize = 134217700
#檔案的滾動與Event數量無關
a2.sinks.k1.hdfs.rollCount = 0

# Describe the channel
a2.channels.c1.type = memory
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a2.sources.r1.channels = c1
a2.sinks.k1.channel = c1

（4）建立flume-flume-dir.conf
配置上級Flume輸出的Source，輸出是到本地目錄的Sink。
編輯配置檔案

[atguigu@hadoop102 group1]$ vim flume-flume-dir.conf

新增如下內容

# Name the components on this agent
a3.sources = r1
a3.sinks = k1
a3.channels = c2

# Describe/configure the source
a3.sources.r1.type = avro
a3.sources.r1.bind = hadoop102
a3.sources.r1.port = 4142

# Describe the sink
a3.sinks.k1.type = file_roll
a3.sinks.k1.sink.directory = /opt/module/data/flume3

# Describe the channel
a3.channels.c2.type = memory
a3.channels.c2.capacity = 1000
a3.channels.c2.transactionCapacity = 100

# Bind the source and sink to the channel
a3.sources.r1.channels = c2
a3.sinks.k1.channel = c2

提示：輸出的本地目錄必須是已經存在的目錄，如果該目錄不存在，並不會建立新的目錄。
（5）執行配置檔案
分別啟動對應的flume程式：flume-flume-dir，flume-flume-hdfs，flume-file-flume。

[atguigu@hadoop102 flume]$ bin/flume-ng agent --conf conf/ --name a3 --conf-file job/group1/flume-flume-dir.conf

[atguigu@hadoop102 flume]$ bin/flume-ng agent --conf conf/ --name a2 --conf-file job/group1/flume-flume-hdfs.conf

[atguigu@hadoop102 flume]$ bin/flume-ng agent --conf conf/ --name a1 --conf-file job/group1/flume-file-flume.conf

（6）啟動Hadoop和Hive

[atguigu@hadoop102 hadoop-2.7.2]$ sbin/start-dfs.sh
[atguigu@hadoop103 hadoop-2.7.2]$ sbin/start-yarn.sh

[atguigu@hadoop102 hive]$ bin/hive
hive (default)>

（7）檢查HDFS上資料
在這裡插入圖片描述
（8）檢查/opt/module/datas/flume3目錄中資料

[atguigu@hadoop102 flume3]$ ll
總用量 8
-rw-rw-r--. 1 atguigu atguigu 5942 5月  22 00:09 1526918887550-3

3.4.2 負載均衡和故障轉移

1）案例需求
使用Flume1監控一個埠，其sink組中的sink分別對接Flume2和Flume3，採用FailoverSinkProcessor，實現故障轉移的功能。
2）需求分析

在這裡插入圖片描述

3）實現步驟
（1）準備工作
在/opt/module/flume/job目錄下建立group2資料夾

[atguigu@hadoop102 job]$ cd group2/

（2）建立flume-netcat-flume.conf
配置1個netcat source和1個channel、1個sink group（2個sink），分別輸送給flume-flume-console1和flume-flume-console2。
編輯配置檔案

[atguigu@hadoop102 group2]$ vim flume-netcat-flume.conf

新增如下內容

# Name the components on this agent
a1.sources = r1
a1.channels = c1
a1.sinkgroups = g1
a1.sinks = k1 k2

# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

a1.sinkgroups.g1.processor.type = failover
a1.sinkgroups.g1.processor.priority.k1 = 5
a1.sinkgroups.g1.processor.priority.k2 = 10
a1.sinkgroups.g1.processor.maxpenalty = 10000

# Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = hadoop102
a1.sinks.k1.port = 4141

a1.sinks.k2.type = avro
a1.sinks.k2.hostname = hadoop102
a1.sinks.k2.port = 4142

# Describe the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinkgroups.g1.sinks = k1 k2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c1

（3）建立flume-flume-console1.conf
配置上級Flume輸出的Source，輸出是到本地控制檯。
編輯配置檔案

[atguigu@hadoop102 group2]$ vim flume-flume-console1.conf

新增如下內容

# Name the components on this agent
a2.sources = r1
a2.sinks = k1
a2.channels = c1

# Describe/configure the source
a2.sources.r1.type = avro
a2.sources.r1.bind = hadoop102
a2.sources.r1.port = 4141

# Describe the sink
a2.sinks.k1.type = logger

# Describe the channel
a2.channels.c1.type = memory
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a2.sources.r1.channels = c1
a2.sinks.k1.channel = c1

（4）建立flume-flume-console2.conf
配置上級Flume輸出的Source，輸出是到本地控制檯。
編輯配置檔案

[atguigu@hadoop102 group2]$ vim flume-flume-console2.conf

新增如下內容

# Name the components on this agent
a3.sources = r1
a3.sinks = k1
a3.channels = c2

# Describe/configure the source
a3.sources.r1.type = avro
a3.sources.r1.bind = hadoop102
a3.sources.r1.port = 4142

# Describe the sink
a3.sinks.k1.type = logger

# Describe the channel
a3.channels.c2.type = memory
a3.channels.c2.capacity = 1000
a3.channels.c2.transactionCapacity = 100

# Bind the source and sink to the channel
a3.sources.r1.channels = c2
a3.sinks.k1.channel = c2

（5）執行配置檔案
分別開啟對應配置檔案：flume-flume-console2，flume-flume-console1，flume-netcat-flume。

[atguigu@hadoop102 flume]$ bin/flume-ng agent --conf conf/ --name a3 --conf-file job/group2/flume-flume-console2.conf -Dflume.root.logger=INFO,console

[atguigu@hadoop102 flume]$ bin/flume-ng agent --conf conf/ --name a2 --conf-file job/group2/flume-flume-console1.conf -Dflume.root.logger=INFO,console

[atguigu@hadoop102 flume]$ bin/flume-ng agent --conf conf/ --name a1 --conf-file job/group2/flume-netcat-flume.conf

（6）使用netcat工具向本機的44444埠傳送內容

[atguigu@hadoop102 ~]$ nc localhost 44444

（7）檢視Flume2及Flume3的控制檯列印日誌
（8）將Flume2 kill，觀察Flume3的控制檯列印情況。
注：使用jps -ml檢視Flume程式。

3.3.4 聚合

1）案例需求：
hadoop102上的Flume-1監控檔案/opt/module/group.log，
hadoop103上的Flume-2監控某一個埠的資料流，
Flume-1與Flume-2將資料傳送給hadoop104上的Flume-3，Flume-3將最終資料列印到控制檯。
2）需求分析

3）實現步驟：
（1）準備工作
分發Flume
[atguigu@hadoop102 module]$ xsync flume
在hadoop102、hadoop103以及hadoop104的/opt/module/flume/job目錄下建立一個group3資料夾。
[atguigu@hadoop102 job]$ mkdir group3
[atguigu@hadoop103 job]$ mkdir group3
[atguigu@hadoop104 job]$ mkdir group3
（2）建立flume1-logger-flume.conf
配置Source用於監控hive.log檔案，配置Sink輸出資料到下一級Flume。
在hadoop102上編輯配置檔案
[atguigu@hadoop102 group3]$ vim flume1-logger-flume.conf 
新增如下內容
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /opt/module/group.log
a1.sources.r1.shell = /bin/bash -c

# Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = hadoop104
a1.sinks.k1.port = 4141

# Describe the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
（3）建立flume2-netcat-flume.conf
配置Source監控埠44444資料流，配置Sink資料到下一級Flume：
在hadoop103上編輯配置檔案
[atguigu@hadoop102 group3]$ vim flume2-netcat-flume.conf
新增如下內容
# Name the components on this agent
a2.sources = r1
a2.sinks = k1
a2.channels = c1

# Describe/configure the source
a2.sources.r1.type = netcat
a2.sources.r1.bind = hadoop103
a2.sources.r1.port = 44444

# Describe the sink
a2.sinks.k1.type = avro
a2.sinks.k1.hostname = hadoop104
a2.sinks.k1.port = 4141

# Use a channel which buffers events in memory
a2.channels.c1.type = memory
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a2.sources.r1.channels = c1
a2.sinks.k1.channel = c1
（4）建立flume3-flume-logger.conf
配置source用於接收flume1與flume2傳送過來的資料流，最終合併後sink到控制檯。
在hadoop104上編輯配置檔案
[atguigu@hadoop104 group3]$ touch flume3-flume-logger.conf
[atguigu@hadoop104 group3]$ vim flume3-flume-logger.conf
新增如下內容
# Name the components on this agent
a3.sources = r1
a3.sinks = k1
a3.channels = c1

# Describe/configure the source
a3.sources.r1.type = avro
a3.sources.r1.bind = hadoop104
a3.sources.r1.port = 4141

# Describe the sink
a3.sinks.k1.type = logger

# Describe the channel
a3.channels.c1.type = memory
a3.channels.c1.capacity = 1000
a3.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a3.sources.r1.channels = c1
a3.sinks.k1.channel = c1
（5）執行配置檔案
分別開啟對應配置檔案：flume3-flume-logger.conf，flume2-netcat-flume.conf，flume1-logger-flume.conf。
[atguigu@hadoop104 flume]$ bin/flume-ng agent --conf conf/ --name a3 --conf-file job/group3/flume3-flume-logger.conf -Dflume.root.logger=INFO,console

[atguigu@hadoop102 flume]$ bin/flume-ng agent --conf conf/ --name a2 --conf-file job/group3/flume1-logger-flume.conf

[atguigu@hadoop103 flume]$ bin/flume-ng agent --conf conf/ --name a1 --conf-file job/group3/flume2-netcat-flume.conf
（6）在hadoop103上向/opt/module目錄下的group.log追加內容
[atguigu@hadoop103 module]$ echo 'hello' > group.log
（7）在hadoop102上向44444埠傳送資料
[atguigu@hadoop102 flume]$ telnet hadoop102 44444
（8）檢查hadoop104上資料

3.5 自定義Interceptor

1）案例需求
使用Flume採集伺服器本地日誌，需要按照日誌型別的不同，將不同種類的日誌發往不同的分析系統。
2）需求分析
在實際的開發中，一臺伺服器產生的日誌型別可能有很多種，不同型別的日誌可能需要傳送到不同的分析系統。此時會用到Flume拓撲結構中的Multiplexing結構，Multiplexing的原理是，根據event中Header的某個key的值，將不同的event傳送到不同的Channel中，所以我們需要自定義一個Interceptor，為不同型別的event的Header中的key賦予不同的值。
在該案例中，我們以埠資料模擬日誌，以數字（單個）和字母（單個）模擬不同型別的日誌，我們需要自定義interceptor區分數字和字母，將其分別發往不同的分析系統（Channel）。

在這裡插入圖片描述

3）實現步驟
（1）建立一個maven專案，並引入以下依賴。

<dependency>
   <groupId>org.apache.flume</groupId>
   <artifactId>flume-ng-core</artifactId>
   <version>1.9.0</version>
</dependency>

（2）定義CustomInterceptor類並實現Interceptor介面

public class CustomInterceptor implements Interceptor {


   @Override
   public void initialize() {

   }

   @Override
   public Event intercept(Event event) {

       byte[] body = event.getBody();
       if (body[0] < 'z' && body[0] > 'a') {
           event.getHeaders().put("type", "letter");
       } else if (body[0] > '0' && body[0] < '9') {
           event.getHeaders().put("type", "number");
       }
       return event;

   }

   @Override
   public List<Event> intercept(List<Event> events) {
       for (Event event : events) {
           intercept(event);
       }
       return events;
   }

   @Override
   public void close() {

   }

   public static class Builder implements Interceptor.Builder {

       @Override
       public Interceptor build() {
           return new CustomInterceptor();
       }

       @Override
       public void configure(Context context) {
       }
   }
}

（3）編輯flume配置檔案
為hadoop102上的Flume1配置1個netcat source，1個sink group（2個avro sink），並配置相應的ChannelSelector和interceptor。

# Name the components on this agent
a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1 c2

# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = com.atguigu.flume.interceptor.CustomInterceptor$Builder
a1.sources.r1.selector.type = multiplexing
a1.sources.r1.selector.header = type
a1.sources.r1.selector.mapping.letter = c1
a1.sources.r1.selector.mapping.number = c2
# Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = hadoop103
a1.sinks.k1.port = 4141

a1.sinks.k2.type=avro
a1.sinks.k2.hostname = hadoop104
a1.sinks.k2.port = 4242

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Use a channel which buffers events in memory
a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000
a1.channels.c2.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1 c2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2

為hadoop103上的Flume4配置一個avro source和一個logger sink。

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = avro
a1.sources.r1.bind = hadoop103
a1.sources.r1.port = 4141

a1.sinks.k1.type = logger

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.sinks.k1.channel = c1
a1.sources.r1.channels = c1

為hadoop104上的Flume3配置一個avro source和一個logger sink。

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = avro
a1.sources.r1.bind = hadoop104
a1.sources.r1.port = 4242

a1.sinks.k1.type = logger

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.sinks.k1.channel = c1
a1.sources.r1.channels = c1

（4）分別在hadoop102，hadoop103，hadoop104上啟動flume程式，注意先後順序。
（5）在hadoop102使用netcat向localhost:44444傳送字母和數字。
（6）觀察hadoop103和hadoop104列印的日誌。

3.6 Flume 資料流監控

3.6.1 Ganglia的安裝與部署

Ganglia由gmond、gmetad和gweb三部分組成。
gmond（Ganglia Monitoring Daemon）是一種輕量級服務，安裝在每臺需要收集指標資料的節點主機上。使用gmond，你可以很容易收集很多系統指標資料，如CPU、記憶體、磁碟、網路和活躍程式的資料等。
gmetad（Ganglia Meta Daemon）整合所有資訊，並將其以RRD格式儲存至磁碟的服務。
gweb（Ganglia Web）Ganglia視覺化工具，gweb是一種利用瀏覽器顯示gmetad所儲存資料的PHP前端。在Web介面中以圖表方式展現叢集的執行狀態下收集的多種不同指標資料。
1）安裝ganglia
（1）規劃

主機名	gweb	gmetad	gmond
hadoop102	gweb	gmetad	gmond
hadoop103			gmond
hadoop104			gmond

（2）在102 103 104分別安裝epel-release
[atguigu@hadoop102 flume]$ sudo yum -y install epel-release
（3）在102 安裝

[atguigu@hadoop102 flume]$ sudo yum -y install ganglia-gmetad 
[atguigu@hadoop102 flume]$ sudo yum -y install ganglia-web
[atguigu@hadoop102 flume]$ sudo yum -y install ganglia-gmond

（4）在103 和 104 安裝

[atguigu@hadoop102 flume]$ sudo yum -y install ganglia-gmond

2）在102修改配置檔案/etc/httpd/conf.d/ganglia.conf

[atguigu@hadoop102 flume]$ sudo vim /etc/httpd/conf.d/ganglia.conf

修改為紅顏色的配置：

# Ganglia monitoring system php web frontend

Alias /ganglia /usr/share/ganglia

<Location /ganglia>
# Require local
# 通過windows訪問ganglia,需要配置Linux對應的主機(windows)ip地址
   Require ip 192.168.202.1  
 # Require ip 10.1.2.3
 # Require host example.org
</Location>

5）在102修改配置檔案/etc/ganglia/gmetad.conf

[atguigu@hadoop102 flume]$ sudo vim /etc/ganglia/gmetad.conf

修改為：

data_source "my cluster" hadoop102

6）在102 103 104修改配置檔案/etc/ganglia/gmond.conf

[atguigu@hadoop102 flume]$ sudo vim /etc/ganglia/gmond.conf

修改為：

cluster {
 name = "my cluster"
 owner = "unspecified"
 latlong = "unspecified"
 url = "unspecified"
}
udp_send_channel {
 #bind_hostname = yes # Highly recommended, soon to be default.
                      # This option tells gmond to use a source address
                      # that resolves to the machine's hostname.  Without
                      # this, the metrics may appear to come from any
                      # interface and the DNS names associated with
                      # those IPs will be used to create the RRDs.
 # mcast_join = 239.2.11.71
 # 資料傳送給hadoop102
 host = hadoop102
 port = 8649
 ttl = 1
}
udp_recv_channel {
 # mcast_join = 239.2.11.71
 port = 8649
 # 接收來自任意連線的資料
 bind = 0.0.0.0
 retry_bind = true
 # Size of the UDP buffer. If you are handling lots of metrics you really
 # should bump it up to e.g. 10MB or even higher.
 # buffer = 10485760
}

7）在102修改配置檔案/etc/selinux/config

[atguigu@hadoop102 flume]$ sudo vim /etc/selinux/config

修改為：

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
#     targeted - Targeted processes are protected,
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted

尖叫提示：selinux本次生效關閉必須重啟，如果此時不想重啟，可以臨時生效之：
8）啟動ganglia
（1）在102 103 104 啟動

[atguigu@hadoop102 flume]$ sudo systemctl  start gmond

（2）在102 啟動

[atguigu@hadoop102 flume]$ sudo systemctl start httpd
[atguigu@hadoop102 flume]$ sudo systemctl start gmetad

9）開啟網頁瀏覽ganglia頁面
http://hadoop102/ganglia
尖叫提示：如果完成以上操作依然出現許可權不足錯誤，請修改/var/lib/ganglia目錄的許可權：

[atguigu@hadoop102 flume]$ sudo chmod -R 777 /var/lib/ganglia

3.6.2 操作Flume測試監控

1）啟動Flume任務

[atguigu@hadoop102 flume]$ bin/flume-ng agent \
-c conf/ \
-n a1 \
-f datas/netcat-flume-logger.conf \
-Dflume.root.logger=INFO,console \
-Dflume.monitoring.type=ganglia \
-Dflume.monitoring.hosts=hadoop102:8649

2）傳送資料觀察ganglia監測圖

[atguigu@hadoop102 flume]$ nc localhost 44444

在這裡插入圖片描述

圖例說明：

在這裡插入圖片描述

Flume到此結束，需要原始碼和資料私聊哦！

大資料4.1 - Flume整合案例+Hive資料倉
2018-04-08
大資料Hive
大資料3-Flume收集資料+落地HDFS
2018-04-02
大資料
Flume 在有贊大資料的實踐
2019-04-02
大資料
大資料01-Flume 日誌收集
2021-09-09
大資料
大資料基礎學習-5.Flume1.6.0
2018-04-26
大資料
阿里大資料工程師教你怎樣理解Flume
2018-04-14
阿里大資料工程師
好程式設計師大資料學習路線之Logstach與flume對比
2019-08-13
程式設計師大資料
大資料流處理：Flume、Kafka和NiFi對比
2019-07-19
大資料KafkaNifi
大資料03-整合 Flume 和 Kafka 收集日誌
2021-09-09
大資料Kafka
大資料計算生態之資料計算（二）
2020-11-15
大資料
大資料工程師入門系列—常用資料採集工具（Flume、Logstash 和 Fluentd）
2021-08-10
大資料工程師
聊聊自學大資料flume中容易被人忽略的細節
2020-12-21
大資料
Flume：資料匯入到hdfs中
2018-09-17
IT十年-大資料系列講解之HDFS（二）
2018-04-09
大資料
Flume監控之Ganglia
2020-11-27
大資料：大資料之基礎語法
2020-03-11
大資料
使用log4j將資料流入flume
2020-08-23
大資料之概率論
2019-03-13
大資料
大資料開發-Flume-頻繁產生小檔案原因和處理
2020-12-06
大資料
Flume和Hive整合之hive sink
2020-12-19
Hive
Flume將 kafka 中的資料轉存到 HDFS 中
2018-12-19
Kafka
企業大資料-之機器資料
2019-04-03
大資料
大資料分析之資料下鑽上卷
2024-03-19
大資料
大資料教程之大資料的影響二
2020-06-02
大資料
大資料學習之Hadoop如何高效處理大資料
2018-09-20
大資料Hadoop
大資料教程分享實用的大資料之陣列
2020-03-11
大資料陣列
大資料應用-Flume+HBase+Kafka整合資料採集/儲存/分發完整流程測試03.
2020-12-17
大資料Kafka
教育大資料之資料資產管理系統
2022-11-07
大資料
大資料測試之ETL
2019-08-07
大資料
大資料技術之資料採集篇
2019-06-19
大資料
教育大資料之資料開發系統
2022-09-09
大資料
資料採集元件：Flume基礎用法和Kafka整合
2021-03-05
元件Kafka
大資料學習資源之DataCamp
2019-02-14
大資料
大資料測試之揭秘大資料的背景與發展
2019-08-07
大資料
第二屆資料安全大賽“數信杯”資料安全大賽 WP
2024-07-21
資料結構之「二叉樹」
2019-03-25
資料結構二叉樹
RxHttp 一條鏈傳送請求之強大的資料解析功能（二）
2019-04-25
HTTP
flume
2020-10-09

大資料之Flume（二）

大資料之Flume（二）

3. Flume進階

3.1 Flume 事務

3.2 Flume Agent 內部原理

3.3 Flume 拓撲結構

3.3.1 簡單串聯

3.3.2 複製和多路複用

3.3.3 負載均衡和故障轉移

3.3.4 聚合

3.4 Flume企業開發案例

3.4.1 複製和多路複用

3.4.2 負載均衡和故障轉移

3.3.4 聚合

3.5 自定義Interceptor

3.6 Flume 資料流監控

3.6.1 Ganglia的安裝與部署

3.6.2 操作Flume測試監控

相關文章