flume

syc0616發表於2020-10-09

原文網址 : https://blog.csdn.net/syc0616/article/details/108988417

Flume概述

Flume是Cloudera提供的一個高可用的，高可靠的，分散式的海量日誌採集、聚合和傳輸的系統。Flume基於流式架構，靈活簡單。

版本區別
       0.9之前稱為flume og
       0.9之後為flume ng

       目前都使用flume ng!

       1.7之前，沒有taildirsource，1.7及之後有taildirsource

使用Flume
啟動agent: flume-ng agent -n agent的名稱 -f agent配置檔案 -c 其他配置檔案所在的目錄 -Dproperty=value

Flume基礎架構

圖1-1 Flume組成架構

1 Agent

Agent是一個JVM程式，它以事件的形式將資料從源頭送至目的。

Agent主要有3個部分組成，Source、Channel、Sink。

1.Source（資料來源）

Source是負責接收資料到Flume Agent的元件。Source元件可以處理各種型別、各種格式的日誌資料，包括avro、thrift、exec、jms、spooling directory、netcat、sequence generator、syslog、http、legacy。

2.Sink（落地）

Sink不斷地輪詢Channel中的事件且批量地移除它們，並將這些事件批量寫入到儲存或索引系統、或者被髮送到另一個Flume Agent。

Sink元件目的地包括hdfs、logger、avro、thrift、ipc、file、HBase、solr、自定義。

3.Channel

Channel是位於Source和Sink之間的緩衝區。因此，Channel允許Source和Sink運作在不同的速率上。Channel是執行緒安全的，可以同時處理幾個Source的寫入操作和幾個Sink的讀取操作。

Flume自帶兩種Channel：Memory Channel和File Channel。

Memory Channel是記憶體中的佇列。Memory Channel在不需要關心資料丟失的情景下適用。如果需要關心資料丟失，那麼Memory Channel就不應該使用，因為程式死亡、機器當機或者重啟都會導致資料丟失。

File Channel將所有事件寫到磁碟。因此在程式關閉或機器當機的情況下不會丟失資料。

Event

傳輸單元，Flume資料傳輸的基本單元，以Event的形式將資料從源頭送至目的地。Event由Header和Body兩部分組成，Header用來存放該event的一些屬性，為K-V結構，Body用來存放該條資料，形式為位元組陣列。

Interceptors

在source將event放入到channel之前，呼叫攔截器對event進行攔截和處理！

在Flume中允許使用攔截器對傳輸中的event進行攔截和處理！攔截器必須實現org.apache.flume.interceptor.Interceptor介面。攔截器可以根據開發者的設定修改甚至刪除event！Flume同時支援攔截器鏈，即由多個攔截器組合而成！通過指定攔截器鏈中攔截器的順序，event將按照順序依次被攔截器進行處理！

Channel Selectors

當一個source對接多個channel時，由 Channel Selectors選取channel將event存入！

Channel Selectors用於source元件將event傳輸給多個channel的場景。常用的有replicating（預設）和multiplexing兩種型別。replicating負責將event複製到多個channel，而multiplexing則根據event的屬性和配置的引數進行匹配，匹配成功則傳送到指定的channel!

Sink Processors

當多個sink從一個channel取資料時，為了保證資料的順序，由sink processor從多個sink中挑選一個sink，由這個sink幹活！

使用者可以將多個sink組成一個整體（sink組），Sink Processors可用於提供組內的所有sink的負載平衡功能，或在時間故障的情況下實現從一個sink到另一個sink的故障轉移。

如何編寫agent的配置檔案
       agent的配置檔案的本質是一個Properties檔案！格式為屬性名=屬性值

       在配置檔案中需要編寫：
       ①定義當前配置檔案中agent的名稱，再定義source,sink,channel它們的別名
       ②指定source和channel和sink等元件的型別
       ③指定source和channel和sink等元件的配置，配置引數名和值都需要參考flume到官方使用者手冊
       ④指定source和channel的對應關係，以及sink和channel的對應關係。連線元件！

監控埠資料官方案例

使用Flume監聽一個埠，收集該埠資料，並列印到控制檯。

實現步驟：

1.安裝netcat工具

sudo yum install -y nc

2.判斷44444埠是否被佔用

sudo netstat -tunlp | grep 44444

3.建立Flume Agent配置檔案flume-netcat-logger.conf

檢視flume官方文件：

使用的元件型別
①netcat source: 作用就是監聽某個tcp埠手動的資料，將每行資料封裝為一個event。
       工作原理類似於nc -l 埠
配置：
   必須屬性：
   type   –   The component type name, needs to be netcat
   bind   –   Host name or IP address to bind to
   port   –   Port # to bind to

②logger sink: 作用使用logger(日誌輸出器)將event輸出到檔案或控制檯,使用info級別記錄event!
   必須屬性：
   type   –   The component type name, needs to be logger
   可選屬性：
maxBytesToLog   16   Maximum number of bytes of the Event body to log

③memery channel
   必須屬性：
   type   –   The component type name, needs to be memory
   可選屬性：
   capacity   100   The maximum number of events stored in the channel
   transactionCapacity   100   The maximum number of events the channel will take from a source or give to a sink per transaction

在job資料夾下建立Flume Agent配置檔案flume-netcat-logger.conf。

vim flume-netcat-logger.conf

在flume-netcat-logger.conf檔案中新增如下內容。

# Name the components on this agent

#r1：表示a1的SOURCE的名稱

a1.sources = r1

#k1：表示a1的Sink的名稱

a1.sinks = k1

#c1：表示a1的Channel的名稱

a1.channels = c1

# Describe/configure the source

#表示a1的輸入源型別為netcat埠型別

a1.sources.r1.type = netcat

#表示a1的輸入源的監聽的主機

a1.sources.r1.bind = kylin3

#表示a1的輸入源的監聽的埠號

a1.sources.r1.port = 44444

# Describe the sink

#標識a1的輸出目的地事控制檯logger型別

a1.sinks.k1.type = logger

# Use a channel which buffers events in memory

#表示a1的channel型別是memory

a1.channels.c1.type = memory

#表示a1的channel總容量10000個

a1.channels.c1.capacity = 10000

#表示a1的channel傳輸時收集到1000條event以後再去提交事務

a1.channels.c1.transactionCapacity = 1000

# Bind the source and sink to the channel

#表示r1和c1連線起來

a1.sources.r1.channels = c1

#表示c1和k1連線起來

a1.sinks.k1.channel = c1

注：配置檔案來源於官方手冊http://flume.apache.org/FlumeUserGuide.html

先開啟flume監聽埠

第一種寫法：

bin/flume-ng agent --conf conf/ --name a1 --conf-file job/flume-netcat-logger.conf -Dflume.root.logger=INFO,console

第二種寫法：

bin/flume-ng agent -c conf/ -n a1 –f job/flume-netcat-logger.conf -Dflume.root.logger=INFO,console

引數說明：

--conf/-c：表示配置檔案儲存在conf/目錄

--name/-n：表示給agent起名為a1

--conf-file/-f：flume本次啟動讀取的配置檔案是在job資料夾下的flume-telnet.conf檔案。

-Dflume.root.logger=INFO,console ：-D表示flume執行時動態修改flume.root.logger引數屬性值，並將控制檯日誌列印級別設定為INFO級別。日誌級別包括:log、info、warn、error。

使用netcat工具向本機的44444埠傳送內容

nc kylin3 44444

hello

在Flume監聽頁面觀察接收資料情況

Flume概述
2020-10-13
flume + elasticsearch
2020-11-24
Elasticsearch
Flume：spark-project專案的flume配置
2018-09-18
SparkProject
Flume 整合 Kafka_flume 到kafka 配置【轉】
2024-04-15
Kafka
spark 與flume 1.6.0
2018-10-18
Spark
flume線上配置
2018-08-02
Apache Flume 入門教程
2018-11-09
Apache
Flume採集到HDFS
2018-08-09
Flume - [02] Spooling Directory Source
2024-03-20
Flume基礎學習
2019-09-28
flume的安裝部署
2018-04-25
kafka+flume的整合
2021-09-09
Kafka
Flume面試題整理
2021-11-14
面試題
Flume監控之Ganglia
2020-11-27
針對flume中扇出複用（源exec）原始碼修改，並編譯flume
2018-04-14
原始碼編譯
flume 1.8.0 開發基礎
2019-03-01
Flume 配置環境變數
2020-10-17
變數
大資料之Flume（二）
2020-09-24
大資料
日誌採集框架Flume
2020-10-06
框架
FLume相關面試題
2024-12-04
面試題
Spark 系列（十五）—— Spark Streaming 整合 Flume
2019-08-15
Spark
Flume安裝及簡單部署
2018-03-25
Apache Sqoop與Apache Flume比較
2022-01-13
ApacheOOP
Flume和Hive整合之hive sink
2020-12-19
Hive
使用Flume消費Kafka資料到HDFS
2018-11-19
Kafka
Flume：資料匯入到hdfs中
2018-09-17
Flume收集日誌到本地目錄
2018-08-10
flume面試理論和應用
2020-10-20
面試
Flume 在有贊大資料的實踐
2019-04-02
大資料
Linux環境Flume安裝配置及使用
2019-03-07
Linux
Flume1.7.0下載及安裝部署
2020-10-28
Flume 總結（三）sources型別-1.9.0新版
2020-10-10
型別
Flume架構以及應用介紹[轉]
2018-04-27
架構
大資料01-Flume 日誌收集
2021-09-09
大資料
三十七、Flume的安裝及測試
2020-12-26
hadoop之旅10-centerOS7 : Flume環境搭建
2018-11-13
HadoopROS
使用log4j將資料流入flume
2020-08-23
Flume實時監控單個追加檔案
2020-10-02