大資料基礎學習-5.Flume1.6.0

閒人勿-發表於2018-04-26

一、Flume1.6.0概述

Flume 作為 cloudera 開發的實時日誌收集系統，受到了業界的認可與廣泛應用。Flume 初始的發行版本目前被統稱為 Flume OG（original generation），屬於 cloudera。但隨著 FLume 功能的擴充套件，Flume OG 程式碼工程臃腫、核心元件設計不合理、核心配置不標準等缺點暴露出來，尤其是在 Flume OG 的最後一個發行版本 0.9.4. 中，日誌傳輸不穩定的現象尤為嚴重，為了解決這些問題，2011 年 10 月 22 號，cloudera 完成了 Flume-728，對 Flume 進行了里程碑式的改動：重構核心元件、核心配置以及程式碼架構，重構後的版本統稱為 Flume NG（next generation）。之後 Flume 被納入 apache 旗下，cloudera Flume 改名為 Apache Flume。

二、Flume核心概念

• 資料發生器（如：facebook，twitter）產生的資料被執行在資料發生器所在伺服器上的agent所收集，之後資料收集器從各個agent上彙集資料並將採集到的資料存入到HDFS或者HBase中。

1.事件(Flume Event)

• Flume使用Event物件來作為傳遞資料的格式，event是內部資料傳輸的最基本單元，由兩部分組成：轉載資料的位元組陣列+可選頭部。

• Header是key/value形式的，可以用來製造路由決策或攜帶其他結構化資訊(如事件的時間戳或事件來源的伺服器主機名)，可以把它想象成和HTTP頭一樣來傳輸正文之外的額外資訊。Flume提供的不同source會給其生成的event新增不同的header。

• Body是一個位元組陣列，包含了實際的內容。

2.代理(Flume Agent)

• Flume內部有一個或者多個Agent，每一個Agent是一個獨立的守護程式(JVM)。

• Agent從客戶端接收資料，或者從其他的Agent接收資料，然後迅速將獲取的資料傳給下一個Agent。

• Agent主要由source、channel、sink三個元件組成。

3.Agent Source

• 負責一個外部源（資料發生器），如一個web伺服器傳遞給他的事件，該外部源將它的事件以Flume可以識別的格式傳送到Flume中。

• 當一個Flume源接收到一個事件時，其將通過一個或者多個通道儲存該事件。

4.Agent Channel

• Channel是一種短暫的儲存容器，它將從source處接收到的event格式的資料快取起來，直到它們被sinks消費掉，它在source和sink間起著一共橋樑的作用，channel是一個完整的事務，這一點保證了資料在收發的時候的一致性。並且它可以和任意數量的source和sink連結。

• 可以通過引數設定儲存的最大event數和每次傳輸的最大event數。

• Flume通常選擇FileChannel，而不使用Memory Channel。

– Memory Channel：記憶體儲存事務，吞吐率極高，但存在丟資料風險

– File Channel：本地磁碟的事務實現模式，保證資料不會丟失（WAL實現）

5.Agent Sink

• Sink會將事件從Channel中移除，並將事件放置到外部資料介質上或者放置到下一個Flume的Source，等到下一個Flume處理。例如：通過Flume HDFS Sink可以將資料放置到HDFS中。對於快取在通道中的事件，Source和Sink採用非同步處理的方式。

• Sink成功取出Event後，將Event從Channel中移除

• 不同型別的Sink：

– 儲存Event到最終目的的終端：HDFS、Hbase

– 自動消耗：Null Sink

– 用於Agent之間通訊：Avro

6.Agent Interceptor

• Interceptor是作用於Source的一組攔截器，對events進行過濾和自定義處理。可以在app(應用程式日誌)和source之間，對app日誌進行攔截處理，在日誌進入到source之前，對日誌進行一些包裝、清洗、過濾等等動作。

• 官方上提供的已有的攔截器有：

– Timestamp Interceptor：在event的header中新增一個key叫：timestamp，value為當前的時間戳

– Host Interceptor：在event的header中新增一個key叫：host，value為當前機器的hostname或者ip

– Static Interceptor：可以在event的header中新增自定義的key和value

– Regex Filtering Interceptor：通過正則來清洗或包含匹配的events

– Regex Extractor Interceptor：通過正規表示式來在header中新增指定的key，value則為正則匹配的部分

• flume的攔截器也是chain形式的，可以對一個source指定多個攔截器，按先後順序依次處理。

7.Agent Selector

• channel selectors有兩種型別:

– Replicating ChannelSelector(default)：將source過來的events發往所有channel

– MultiplexingChannel Selector：Multiplexing可以選擇發到哪些channel

• 對於有選擇性資料來源，明顯需要使用Multiplexing這種分發方式

• 問題：Multiplexing需要判斷header裡指定key的值來決定分發到某個具體的channel，如果demo1和demo2在不同的伺服器上執行，我們可以在source上加上一個host攔截器，通過header中的host來判斷event該分發給哪個channel，而在同一個伺服器上，由host是區分不出來日誌的來源的，我們必須想辦法在header中新增一個key來區分日誌的來源。

8.可靠性

• Flume 使用事務性的方式保證傳送Event整個過程的可靠性。 Sink 必須在Event 被存入 Channel 後，或者，已經被傳達到下一站agent裡，又或者，已經被存入外部資料目的地之後，才能把 Event 從 Channel 中 remove 掉。這樣資料流裡的 event 無論是在一個 agent 裡還是多個 agent 之間流轉，都能保證可靠，因為以上的事務保證了 event 會被成功儲存起來。比如 Flume支援在本地儲存一份檔案 channel 作為備份，而memory channel 將event存在記憶體 queue 裡，速度快，但丟失的話無法恢復。

• Source和Sink封裝在一個事務的儲存和檢索中，即事件的放置或者提供由一個事務通過通道來分別提供。這保證了事件集在流中可靠地進行端到端的傳遞。

• 當節點出現故障時，日誌能夠被傳送到其他節點上而不會丟失。Flume提供了三種級別的可靠性保障，從強到弱依次分別為：end-to-end（收到資料agent首先將event寫到磁碟上，當資料傳送成功後，再刪除；如果資料傳送失敗，可以重新傳送），Store on failure（這也是scribe採用的策略，當資料接收方crash時，將資料寫到本地，待恢復後，繼續傳送），Besteffort（資料傳送到接收方後，不會進行確認）。

三、搭建基於Flume日誌收集系統

1.安裝配置

• 安裝apache-flume-1.6.0-cdh5.7.0-bin

• 修改配置檔案flume-env.sh：配置JAVA_HOME即可

• 配置系統變數

export FLUME_HOME=/usr/local/src/apache-flume-1.6.0-cdh5.7.0-bin
export PATH=$PATH:$FLUME_HOME/bin

2.實踐一：NetCat方式

# Name the components on this agent 
a1.sources = r1 #定義了agent名稱為a1，source、channel、sink名稱為r1,k1,c1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = netcat #通過netcat的方式獲得資料，獲取資料的地方是masteractive的44444埠
a1.sources.r1.bind = masteractive
a1.sources.r1.port = 44444

# Describe the sink
a1.sinks.k1.type = logger #sink為logger的形式

# Use a channel which buffers events inmemory
a1.channels.c1.type = memory #channel將event儲存在記憶體中
a1.channels.c1.capacity = 1000 #最大的儲存量為1000
a1.channels.c1.transactionCapacity = 100 #最大的傳輸量為100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1 #因為r1,k1,c1只是定義在a1上，但是a1還可以定義出其他很多的source、channel和sink,所以這裡必須要指定哪個source對接哪個channel，哪個channel對接哪個sink
a1.sinks.k1.channel = c1

首先安裝Telnet

# rpm -qa|grep telnet #檢查是否安裝Telnet和Telnet-server

# yum install -y telnet-server
# yum install -y telnet

• 執行flume-ng

flume-ng agent --conf conf --conf-file ./conf/flume_netcat.conf --name a1 -Dflume.root.logger=INFO,console

--conf conf 表示指定配置檔案的目錄，--conf-file ./conf/flume_netcat.conf表示配置檔案的名稱，--name a1表示agent的名稱，-Dflume.root.logger=INFO,console表示sink的logger資訊直接到輸出到控制檯

• 驗證：

# telnet masteractive 44444

3.實踐二：Exec方式

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -f /usr/local/src/apache-flume-1.6.0-cdh5.7.0-bin/1.txt

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events inmemory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

•執行flume-ng

flume-ng agent --conf conf --conf-file ./conf/flume_exec.conf --name a1 -Dflume.root.logger=INFO,console

•驗證：

# echo 'ccc' >> 1.txt

4.實踐三：Flume監聽日誌變化，並且把增量的日誌檔案寫入到hdfs中

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /usr/local/src/apache-flume-1.6.0-cdh5.7.0-bin/1.log
a1.sources.r1.channels = c1

# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.channel = c1

## 下面的配置告訴用hdfs去寫檔案的時候寫到什麼位置，下面的表示不是寫死的，而是可以動態的變化的。表示輸出的目錄名稱是可變的
a1.sinks.k1.hdfs.path =/flume/tailout/%y-%m-%d/%H%M/

##表示最後的檔案的字首
a1.sinks.k1.hdfs.filePrefix = events-

## 表示到了需要觸發的時間時，是否要更新資料夾，true:表示要
a1.sinks.k1.hdfs.round = true

## 表示每隔1分鐘改變一次
a1.sinks.k1.hdfs.roundValue = 1

## 切換檔案的時候的時間單位是分鐘
a1.sinks.k1.hdfs.roundUnit = minute

## 表示只要過了3秒鐘，就切換生成一個新的檔案
a1.sinks.k1.hdfs.rollInterval = 3

## 如果記錄的檔案大於20位元組時切換一次
a1.sinks.k1.hdfs.rollSize = 20

## 當寫了5個事件時觸發
a1.sinks.k1.hdfs.rollCount = 5

## 收到了多少條訊息往dfs中追加內容
a1.sinks.k1.hdfs.batchSize = 10

## 使用本地時間戳
a1.sinks.k1.hdfs.useLocalTimeStamp = true

#生成的檔案型別，預設是Sequencefile，可用DataStream：為普通文字
a1.sinks.k1.hdfs.fileType = DataStream

# Use a channel which buffers events inmemory
##使用記憶體的方式
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

• 執行flume-ng

./bin/flume-ng agent --conf conf --conf-file./conf/flume.conf -name a1 -Dflume.root.logger=DEBUG,console

• 驗證：

 echo "11113">> 1.log

5.實踐四：avro

1) exec-memory-avro.conf

exec-memory-avro.sources = exec-source
exec-memory-avro.sinks = avro-sink
exec-memory-avro.channels = memory-channel
exec-memory-avro.sources.exec-source.type =exec
exec-memory-avro.sources.exec-source.command= tail -F /usr/local/src/apache-flume-1.6.0-cdh5.7.0-bin/1.log

exec-memory-avro.sources.exec-source.shell= /bin/sh -c

exec-memory-avro.sinks.avro-sink.type =avro
exec-memory-avro.sinks.avro-sink.hostname =masteractive
exec-memory-avro.sinks.avro-sink.port =44444
exec-memory-avro.channels.memory-channel.type= memory

exec-memory-avro.sources.exec-source.channels= memory-channel
exec-memory-avro.sinks.avro-sink.channel =memory-channel

2) avro-memory-logger.conf

avro-memory-logger.sources = avro-source
avro-memory-logger.sinks = logger-sink
avro-memory-logger.channels =memory-channel

avro-memory-logger.sources.avro-source.type= avro
avro-memory-logger.sources.avro-source.bind= masteractive
avro-memory-logger.sources.avro-source.port= 44444

avro-memory-logger.sinks.logger-sink.type =logger
avro-memory-logger.channels.memory-channel.type= memory

avro-memory-logger.sources.avro-source.channels= memory-channel
avro-memory-logger.sinks.logger-sink.channel= memory-channel

先啟動avro-memory-logger再啟動exec-memory-avro

【yum install psmisc

fuser -v -n tcp 8080解除埠占用】

3) avro-memory-kafka

avro-memory-kafka.sources = avro-source
avro-memory-kafka.sinks = kafka-sink
avro-memory-kafka.channels = memory-channel

avro-memory-kafka.sources.avro-source.type= avro
avro-memory-kafka.sources.avro-source.bind= masteractive
avro-memory-kafka.sources.avro-source.port= 44444

avro-memory-kafka.sinks.kafka-sink.type =org.apache.flume.sink.kafka.KafkaSink
avro-memory-kafka.sinks.kafka-sink.brokerList= masteractive:9092
avro-memory-kafka.sinks.kafka-sink.topic =hello_topic
avro-memory-kafka.sinks.kafka-sink.batchSize= 5
avro-memory-kafka.sinks.kafka-sink.requiredAcks=1

avro-memory-kafka.channels.memory-channel.type= memory
avro-memory-kafka.sources.avro-source.channels= memory-channel
avro-memory-kafka.sinks.kafka-sink.channel= memory-channel

6.實踐五：interceptor

1）時間戳

瞭解資訊建立的時間，系統會預設給值。

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = http 
a1.sources.r1.host = masteractive
a1.sources.r1.port = 52020
a1.sources.r1.channels = c1

a1.sources.r1.interceptors = i1  
a1.sources.r1.interceptors.i1.preserveExisting= false  
a1.sources.r1.interceptors.i1.type = timestamp  

a1.sinks.k1.type = hdfs  
a1.sinks.k1.channel = c1  
a1.sinks.k1.hdfs.path =hdfs://masteractive:9000/flume/%Y-%m-%d/%H%M  
a1.sinks.k1.hdfs.filePrefix = timestamp.
a1.sinks.k1.hdfs.fileType=DataStream  

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

[root@master apache-flume-1.6.0-bin]# flume-ng agent -c conf -f conf/interceptor_test/flume_ts_interceptor.conf -n a1 -Dflume.root.logger=INFO,console #啟動

[root@master interceptor_test]# curl -X POST -d '[{"headers":{"h1":"slave1 is header", "h2":"slave2 is header"}, "body":"1:2:3"}]' http://master:52020 #向埠傳送資料

2）hostname

瞭解資訊的來源

flume_hostname_interceptor.conf

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = syslogtcp
a1.sources.r1.host = master
a1.sources.r1.port = 52020
a1.sources.r1.channels = c1

a1.sources.r1.interceptors = i1 i2  
a1.sources.r1.interceptors.i1.preserveExisting= false  
a1.sources.r1.interceptors.i1.type = timestamp  
a1.sources.r1.interceptors.i2.type = host  
a1.sources.r1.interceptors.i2.hostHeader = hostname  
a1.sources.r1.interceptors.i2.useIP = false  

a1.sinks.k1.type = hdfs  
a1.sinks.k1.channel = c1  
a1.sinks.k1.hdfs.path =hdfs://master:9000/flume/%Y-%m-%d/%H%M  
a1.sinks.k1.hdfs.filePrefix = %{hostname}.
a1.sinks.k1.hdfs.fileType=DataStream  

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

[root@master apache-flume-1.6.0-bin]# flume-ng agent -c conf -f conf/interceptor_test/flume_hostname_interceptor.conf -n a1 -Dflume.root.logger=INFO,console #啟動

[root@master interceptor_test]# echo "23132" | nc master 52020 #模擬系統日誌

3）自定義

flume_static_interceptor.conf

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = http
a1.sources.r1.host = master
a1.sources.r1.port = 52020
a1.sources.r1.channels = c1

a1.sources.r1.interceptors = i1  
a1.sources.r1.interceptors.i1.type = static  
a1.sources.r1.interceptors.i1.key = badou_flume
a1.sources.r1.interceptors.i1.value = so_easy

a1.sinks.k1.type = logger

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

[root@master apache-flume-1.6.0-bin]# flume-ng agent -c conf -f conf/interceptor_test/flume_static_interceptor.conf -n a1 -Dflume.root.logger=INFO,console #啟動

[root@master interceptor_test]# curl -X POST -d '[{"headers":{"h1":"slave1 is header", "h2":"slave2 is header"}, "body":"1:2:3"}]' http://master:52020 #向埠傳送資料，可以發現header中自動新增了自定義的header

4）正則過濾

flume_regex_interceptor.conf

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = syslogtcp
a1.sources.r1.host = master
a1.sources.r1.port = 52020
a1.sources.r1.channels = c1

a1.sources.r1.interceptors = i1  
a1.sources.r1.interceptors.i1.type =regex_filter  
a1.sources.r1.interceptors.i1.regex =^[0-9]*$  #如果都是數字就會被過濾掉
a1.sources.r1.interceptors.i1.excludeEvents =true  

a1.sinks.k1.type = logger

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

[root@master apache-flume-1.6.0-bin]# flume-ng agent -c conf -f conf/interceptor_test/flume_regex_interceptor.conf -n a1 -Dflume.root.logger=INFO,console #啟動

[root@master interceptor_test]# echo "23132a" | nc master 52020 #可以被接收到
[root@master interceptor_test]# echo "23132" | nc master 52020 #不能被接收到，會報錯 Invalid Syslog data

5）正則抽取

將資訊中滿足正則條件的，新增到header中

flume_regex_interceptor.conf_extractor

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = http
a1.sources.r1.host = master
a1.sources.r1.port = 52020
a1.sources.r1.channels = c1

a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = regex_extractor
a1.sources.r1.interceptors.i1.regex = (\\d):(\\d):(\\d)
a1.sources.r1.interceptors.i1.serializers = s1 s2 s3
a1.sources.r1.interceptors.i1.serializers.s1.name = one
a1.sources.r1.interceptors.i1.serializers.s2.name = two
a1.sources.r1.interceptors.i1.serializers.s3.name = three

a1.sinks.k1.type = logger

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

[root@master apache-flume-1.6.0-bin]# flume-ng agent -c conf -f conf/interceptor_test/flume_regex_interceptor.conf_extractor -n a1 -Dflume.root.logger=INFO,console

[root@master interceptor_test]# curl -X POST -d '[{"headers":{"h1":"slave1 is header", "h2":"slave2 is header"}, "body":"hhhh1:2:3h"}]' http://master:52020

輸出的資訊為：

2018-05-22 20:15:39,000 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{two=2, one=1, three=3, h1=slave1 is header, h2=slave2 is header} body: 68 68 68 68 31 3A 32 3A 33 68                   hhhh1:2:3h } #可以看到header已經新增了匹配到的數字

7.實踐六：selector

1）replicating

下游接收到的資訊一樣，相當於廣播。

flume_client_replicating.conf

# Name the components on this agent  
a1.sources = r1  
a1.sinks = k1 k2  
a1.channels = c1 c2  
   
# Describe/configure the source  
a1.sources.r1.type = syslogtcp  
a1.sources.r1.port = 50000  
a1.sources.r1.host = master
a1.sources.r1.selector.type = replicating  
a1.sources.r1.channels = c1 c2  
    
# Describe the sink  
a1.sinks.k1.type = avro  
a1.sinks.k1.channel = c1  
a1.sinks.k1.hostname = slave1
a1.sinks.k1.port = 50000
       
a1.sinks.k2.type = avro  
a1.sinks.k2.channel = c2  
a1.sinks.k2.hostname = slave2
a1.sinks.k2.port = 50000

# Use a channel which buffers events inmemory  
a1.channels.c1.type = memory  
a1.channels.c1.capacity = 1000  
a1.channels.c1.transactionCapacity = 100  
          
a1.channels.c2.type = memory  
a1.channels.c2.capacity = 1000  
a1.channels.c2.transactionCapacity = 100

2）multiplexing

向指定的下游agent傳送資訊

# Name the components on this agent  
a1.sources = r1  
a1.sinks = k1 k2  
a1.channels = c1 c2  
   
# Describe/configure the source  
a1.sources.r1.type= org.apache.flume.source.http.HTTPSource  
a1.sources.r1.port= 50000  
a1.sources.r1.host= master
a1.sources.r1.selector.type= multiplexing  
a1.sources.r1.channels= c1 c2  

a1.sources.r1.selector.header= areyouok #如果頭中是ok就選擇c1，否則是c2
a1.sources.r1.selector.mapping.OK = c1  
a1.sources.r1.selector.mapping.NO = c2  
a1.sources.r1.selector.default= c1 

# Describe the sink  
a1.sinks.k1.type = avro  
a1.sinks.k1.channel = c1  
a1.sinks.k1.hostname = slave1
a1.sinks.k1.port = 50000
       
a1.sinks.k2.type = avro  
a1.sinks.k2.channel = c2  
a1.sinks.k2.hostname = slave2
a1.sinks.k2.port = 50000

# Use a channel which buffers events inmemory  
a1.channels.c1.type = memory  
a1.channels.c1.capacity = 1000  
a1.channels.c1.transactionCapacity = 100  
          
a1.channels.c2.type = memory  
a1.channels.c2.capacity = 1000  
a1.channels.c2.transactionCapacity = 100

[root@master apache-flume-1.6.0-bin]# flume-ng agent -c conf -f conf/selector_test/flume_client_multiplexing.conf -n a1 -Dflume.root.logger=INFO,console

測試：

[root@master interceptor_test]# curl -X POST -d '[{"headers":{"areyouok":"ok","h1":"slave1 is header", "h2":"slave2 is header"}, "body":"hhhh1:2:3h"}]' http://master:52020   #slave1將接收到資訊

8.實踐七：故障轉移

場景：有1個主agent和2個從agent，正常工作時，只有1個從agent接收主agent資訊，另外1個agent作為備份，不接收agent資訊。一旦從agent當機，會自動切換到另外1個從agent。這裡需要配置主agent和從agent，2個從agent的配置基本相同，只在監聽的主機名上有所不同。

flume-client.properties

# agent1 name
agent1.channels = c1
agent1.sources = r1
agent1.sinks = k1 k2

#set group
agent1.sinkgroups = g1

#set channel
agent1.channels.c1.type = memory
agent1.channels.c1.capacity = 1000
agent1.channels.c1.transactionCapacity = 100

agent1.sources.r1.channels = c1
agent1.sources.r1.type = exec
agent1.sources.r1.command = tail -F /usr/local/src/apache-flume-1.6.0-cdh5.7.0-bin/1.log


# set sink1
agent1.sinks.k1.channel = c1
agent1.sinks.k1.type = avro
agent1.sinks.k1.hostname = slave1
agent1.sinks.k1.port = 52020

# set sink2
agent1.sinks.k2.channel = c1
agent1.sinks.k2.type = avro
agent1.sinks.k2.hostname = slave2
agent1.sinks.k2.port = 52020

# set sink group 將k1,k2和g1繫結，從而實現自動故障轉移
agent1.sinkgroups.g1.sinks = k1 k2 

# set failover
agent1.sinkgroups.g1.processor.type = failover
agent1.sinkgroups.g1.processor.k1 = 10
agent1.sinkgroups.g1.processor.k2 = 1 #數字越大，許可權越高
agent1.sinkgroups.g1.processor.maxpenalty = 10000

flume-server.properties

# agent1 name
a1.channels = c1
a1.sources = r1
a1.sinks = k1

#set channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# other node, slave to master
a1.sources.r1.type = avro
a1.sources.r1.bind = slave1 #對於slave2機器上，是slave2
a1.sources.r1.port = 52020


a1.sources.r1.channels = c1

# set sink to hdfs
a1.sinks.k1.type = logger

a1.sinks.k1.channel=c1
a1.sinks.k1.hdfs.filePrefix = %Y-%m-%d

啟動時候先把2個從agent啟動，再啟動主agent。

9.實踐八：負載均衡

使訊息比較均衡地傳輸到下游的agent中，在這裡，2個從agent的配置和實踐六中一致，只修改主節點的配置檔案。

flume-client.properties_loadbalance

# Name the components on this agent
a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -f /usr/local/src/apache-flume-1.6.0-cdh5.7.0-bin/1.log


# Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = slave1
a1.sinks.k1.port = 52020

a1.sinks.k2.type = avro
a1.sinks.k2.hostname = slave2
a1.sinks.k2.port = 52020

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c1

a1.sinkgroups = g1
a1.sinkgroups.g1.sinks = k1 k2
a1.sinkgroups.g1.processor.type = load_balance
a1.sinkgroups.g1.processor.selector = round_robin # 每個agent接收幾個訊息後，就輪到下一個agent接收訊息

# a1.sinkgroups.g1.processor.type = failover
# a1.sinkgroups.g1.processor.priority.k1 = 10
# a1.sinkgroups.g1.processor.priority.k2 = 1
# a1.sinkgroups.g1.processor.priority.maxpenality = 10000

大資料學習之Linux基礎
2019-10-15
大資料Linux
大資料學習之路——java基礎（二）
2019-04-01
大資料Java
零基礎大資料學習框架
2019-05-31
大資料框架
大資料學習方法，學大資料需要的基礎和路線
2019-07-11
大資料
大資料基礎學習-4.Zookeeper-3.4.5
2018-04-25
大資料
入行IT界，0基礎如何學習大資料？
2019-07-31
大資料
大資料學習記錄，Python基礎（3）
2024-11-29
大資料Python
大資料基礎學習-6.Kafka2.11
2018-04-26
大資料Kafka
大資料基礎學習-7.Hive-1.1.0
2018-04-27
大資料Hive
大資料基礎學習-8.Hbase-1.2.0
2018-04-29
大資料
大資料基礎學習-9.Spark2.1.1
2018-04-29
大資料Spark
大資料學習記錄，Python基礎（4）
2024-12-02
大資料Python
學習Java大資料都需要哪些基礎
2021-05-10
Java大資料
學習大資料需要什麼基礎？大資料要學哪些內容？
2018-12-19
大資料
什麼是大資料？零基礎如何學習大資料？（附學習路線）
2018-08-29
大資料
大資料學習：零基礎大資料入門該看哪些書？
2019-10-30
大資料
零基礎大資料學習線路詳解
2019-05-24
大資料
零基礎學習大資料人工智慧，學習路線篇！
2018-07-27
大資料人工智慧
零基礎入門大資料學習，如何才能系統的學好大資料
2019-04-27
大資料
大資料工資這麼高，零基礎可以學習嗎?
2019-03-26
大資料
0基礎學習大資料你需要了解的學習路線和方向
2019-01-27
大資料
大資料學習之路——MySQL基礎（一）——MySQL的基礎知識與常見操作
2021-05-29
大資料MySql
好程式設計師大資料學習路線分享大資料之基礎語法
2019-11-19
程式設計師大資料
大資料分析/機器學習基礎之matplotlib繪圖篇
2023-11-25
大資料機器學習繪圖
零基礎學習大資料為什麼找不到工作?
2020-05-08
大資料
大資料基礎學習-1.CentOS-7.0環境安裝
2018-04-23
大資料CentOS
有基礎學習大資料開發好不好入門
2021-03-22
大資料
大資料需要掌握的數學基礎
2018-09-14
大資料
機器學習基礎-資料降維
2019-05-02
機器學習
轉型進入IT行業，0基礎學習大資料開發需要什麼基礎？
2019-08-02
行業大資料
零基礎學大資料程式設計需要哪些基礎?
2019-07-21
大資料程式設計
0基礎大資料學習路線及各階段學習書籍推薦
2019-07-17
大資料
零基礎的人可以去大資料培訓機構學習大資料開發嗎？
2018-08-11
大資料
零基礎入門學習大資料可以從事哪些工作？
2018-07-10
大資料
大資料學習者需要了解的10本基礎書籍
2018-06-26
大資料
大資料Java語言基礎培訓學習12條心得感悟
2018-09-06
大資料Java
沒有程式設計基礎的小白可以學習大資料嗎？
2018-04-04
程式設計大資料
大資料——HBase基礎
2020-09-24
大資料