Linux環境Flume安裝配置及使用

YBCarry發表於2019-03-07

原文網址 : https://juejin.im/post/5c80874df265da2d8a55ddc5

Linux

Linux環境Flume安裝配置及使用

1. 認識Flume

(1) Flume介紹

日誌收集系統
官網：http://flume.apache.org/
概述：Flume是一種分散式，可靠且可用的服務，用於有效地收集，聚合和移動大量日誌資料。它具有基於流資料流的簡單靈活的架構。它具有可靠的可靠性機制和許多故障轉移和恢復機制，具有強大的容錯性。它使用簡單的可擴充套件資料模型，允許線上分析應用程式。
大資料階段：
- <1>. 資料採集(爬蟲/日誌資料/flume)
- <2>. 資料儲存(hdfs/hive/hbase(nosql))
- <3>. 資料計算(mapreduce/hive/sparkSQL/sparkStreaming/flink)
- <4>. 資料視覺化(echart/quickBI)

(2) Flume角色

<1>. source
- 資料來源，使用者從資料發生器採集接收資料，source產生資料流，同時會把產生的資料流以Flume的event格式傳輸到一個或者多個channel。
<2>. channel
- 傳輸通道，短暫的儲存容器，將從source處接收到的event格式的資料以佇列形式快取起來，直到它們被sinks消費掉，它在source和sink間起橋樑的作用，channal是一個完整的事務，這一點保證了資料在收發的時候的一致性. 並且它可以和任意數量的source和sink連結。
<3>. sink
- 下沉，用於消費channel傳輸的資料，將資料來源傳遞到目標源，目標可能是另一個sink，也可能HDFS、HBase，最終將資料儲存到集中儲存器。
<4>. event
- 在flume中使用事件作為傳輸的基本單元。

(3) Flume使用

簡單易用，只需要寫配置檔案即可。

2. Flume-1.6.0安裝配置流程

(1) Flume環境前提：

Java執行環境

(2) 解壓apache-flume-1.6.0-bin.tar.gz安裝包到目標目錄下：

tar -zxvf .tar.gz -C 目標目錄

(3) 為後續方便，重新命名Flume資料夾：

mv apache-flume-1.6.0-bin/ flume-1.6.0

(4) 修改配置檔案：

進入flume-1.6.0/conf路徑，重新命名配置檔案：
- mv flume-env.sh.template flume-env.sh

修改flume-env.sh資訊：

vi flume-env.sh

# Enviroment variables can be set here.
export JAVA_HOME=jdk路徑
複製程式碼

(5) 配置環境變數：

修改配置檔案：
- vi /etc/profile
增加以下內容：
- export FLUME_HOME=flume安裝路徑
- export PATH=$PATH:$FLUME_HOME/bin
宣告環境變數：
- source /etc/profile

(6) 啟動

flume-ng agent 使用ng啟動agent --conf YYYY/ 指定配置所在的資料夾 --name a1指定的agent別名 --conf-file YYYY/XXXXXX 指定配置檔案 -Dflume.root.logger=INFO,console 可選，指定日誌輸出級別(輸出到控制檯) & 可選，Flume在後臺執行
舊版本：flume-og ——在bin目錄下檢視

3. Flume監聽埠

(1) 編輯配置檔案：

在flume/conf目錄下，建立配置檔案flumejob_telnet.conf：

vi flumejob_telnet.conf

# Flume監聽埠——配置檔案


# Name the components on this agent  定義變數方便呼叫 加s可以有多個此角色
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source  描述source角色 進行內容定製
# 此配置屬於tcp source 必須是netcat型別
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

# Describe the sink  輸出日誌檔案
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory（file） 使用記憶體 總大小1000 每次傳輸100
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel  一個source可以繫結多個channel
# 一個sink繫結一個channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
複製程式碼

儲存並退出：
- :wq

(2) NetCat

介紹：netcat是網路工具中的“瑞士軍刀”，它能通過TCP和UDP在網路中讀寫資料。通過與其他工具結合和重定向，你可以在指令碼中以多種方式使用它。netcat所做的就是在兩臺電腦之間建立連結並返回兩個資料流。
安裝：
- yum install nc

(3) Telnet

介紹：telnet協議是TCP/IP協議族中的一員，是Internet遠端登入服務的標準協議和主要方式。它為使用者提供了在本地計算機上完成遠端主機工作的能力。在終端使用者的電腦上使用telnet程式，用它連線到伺服器。終端使用者可以在telnet程式中輸入命令，這些命令會在伺服器上執行，就像直接在伺服器的控制檯上輸入一樣。可以在本地就能控制伺服器。
安裝：
- yum install telnet.x86_64

(4) 執行：

啟動Flume：
- flume-ng agent --conf conf/ --name a1 --conf-file conf/flumejob_telnet.conf -Dflume.root.logger=INFO,console
開啟客戶端副本：
- telnet localhost 44444
在副本輸入資料，客戶端會收到相關監聽資訊

4. Flume監聽本地Linux-hive日誌檔案採集到HDFS

(1) 編輯配置檔案：

在flume/conf目錄下，建立配置檔案flumejob_hdfs.conf：

vi flumejob_hdfs.conf

# Flume監聽本地Linux-hive日誌檔案採集到HDFS——配置檔案


# Name the components on this agent  agent別名設定
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source  設定資料來源監聽本地檔案配置
# exec 執行一個命令的方式去檢視檔案 tail -F 實時檢視
a1.sources.r1.type = exec
# 要執行的指令碼command tail -F 預設10行 man tail  檢視幫助
# 監聽hive操作日誌
a1.sources.r1.command = tail -F /tmp/root/hive.log
# 執行這個command使用的是哪個指令碼 -c 指定使用什麼命令
# whereis bash
# bash: /usr/bin/bash /usr/share/man/man1/bash.1.gz
a1.sources.r1.shell = /usr/bin/bash -c

# Describe the sink  設定sink
# 指定sink型別
a1.sinks.k1.type = hdfs
# 指定HDFS路徑 %Y%m%d/%H%M%S 日期時間  ————修改項
a1.sinks.k1.hdfs.path = hdfs://bigdata01:9000/flume/%Y%m%d/%H-%M
#上傳檔案的字首
a1.sinks.k1.hdfs.filePrefix = logs-
#是否按照時間滾動資料夾
a1.sinks.k1.hdfs.round = true
#多少時間單位建立一個新的資料夾  秒 （預設30s）
a1.sinks.k1.hdfs.roundValue = 1
#重新定義時間單位（每分鐘滾動一個資料夾）
a1.sinks.k1.hdfs.roundUnit = minute
#是否使用本地時間戳
a1.sinks.k1.hdfs.useLocalTimeStamp = true
#積攢多少個 Event 才 flush 到 HDFS 一次
a1.sinks.k1.hdfs.batchSize = 500
#設定檔案型別，可支援壓縮
a1.sinks.k1.hdfs.fileType = DataStream
#多久生成一個新的檔案 秒
a1.sinks.k1.hdfs.rollInterval = 30
#設定每個檔案的滾動大小 位元組（最好128M）
a1.sinks.k1.hdfs.rollSize = 134217700
#檔案的滾動與 Event 數量無關
a1.sinks.k1.hdfs.rollCount = 0
#最小冗餘數(備份數 生成滾動功能則生效roll hadoop本身有此功能 無需配置) 1份 不冗餘
a1.sinks.k1.hdfs.minBlockReplicas = 1

# Use a channel which buffers events in memory  設定channel  使用記憶體 總大小1000 每次傳輸100
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel  指定channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
複製程式碼

儲存並退出：
- :wq

(2) Flume+Hive

把Hive相關Hadoop依賴包匯入Flume：
- 進入Flume包路徑：
  - cd /XXXX/flume/lib
- 上傳相關jar包

(3) 執行：

啟動Flume：
- flume-ng agent --conf conf/ --name a1 --conf-file conf/flumejob_hdfs.conf
之後操作Hive，相關日誌檔案上傳至hdfs指定目錄

5. Flume監聽本地資料夾

(1) 編輯配置檔案：

在flume/conf目錄下，建立配置檔案flumejob_dir.conf：

vi flumejob_dir.conf

# Flume監聽資料夾


# Name the components on this agent  agent別名設定
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source  設定資料來源監聽本地檔案配置
a1.sources.r1.type = spooldir
# 監控的資料夾
a1.sources.r1.spoolDir = /root/testdir
# 上傳成功後顯示字尾名
a1.sources.r1.fileSuffix = .COMPLETED
# 加絕對路徑的檔名 預設為false
a1.sources.r1.fileHeader = true
# 忽略所有以.tmp結尾的檔案（正在被寫入）
# ^以任何開頭 出現無限次 以.tmp結尾的檔案
a1.sources.r1.ignorePattern = ([^ ]*\.tmp)

# Describe the sink  設定sink 下沉到hdfs
# 指定sink型別
a1.sinks.k1.type = hdfs
# 指定HDFS路徑 %Y%m%d/%H%M%S 日期時間  ————修改項
a1.sinks.k1.hdfs.path = hdfs://bigdata01:9000/flume/testdir/%Y%m%d/%H-%M
# 上傳檔案的字首
a1.sinks.k1.hdfs.filePrefix = testdir-
#是否按照時間滾動資料夾
a1.sinks.k1.hdfs.round = true
#多少時間單位建立一個新的資料夾 （預設30s）
a1.sinks.k1.hdfs.roundValue = 1
#重新定義時間單位
a1.sinks.k1.hdfs.roundUnit = hour
#是否使用本地時間戳
a1.sinks.k1.hdfs.useLocalTimeStamp = true
#積攢多少個 Event 才 flush 到 HDFS 一次
a1.sinks.k1.hdfs.batchSize = 100
#設定檔案型別，可支援壓縮
a1.sinks.k1.hdfs.fileType = DataStream
#多久生成一個新的檔案 秒
a1.sinks.k1.hdfs.rollInterval = 600
#設定每個檔案的滾動大小 位元組（最好128M）
a1.sinks.k1.hdfs.rollSize = 134217700
#檔案的滾動與 Event 數量無關
a1.sinks.k1.hdfs.rollCount = 0
#最小冗餘數(備份數 生成滾動功能則生效roll hadoop本身有此功能 無需配置) 1份 不冗餘
a1.sinks.k1.hdfs.minBlockReplicas = 1

# Use a channel which buffers events in memory  設定channel  使用記憶體 總大小1000 每次傳輸100
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel  指定channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
複製程式碼

儲存並退出：
- :wq

(2) 執行：

啟動Flume：
- flume-ng agent --conf conf/ --name a1 --conf-file conf/flumejob_dir.conf
之後對指定的資料夾進行操作，相關資料夾檔案資訊上傳至hdfs指定目錄。
注意資料夾內只能有檔案不能有目錄。

6. Flume多channel結構配置

(1) 實現目標

獲取資料來源後由Flume-a1進行處理，分兩個埠傳送資料，Flume-a2接收到資料後經過處理下沉到hdfs，Flume-a3接收到資料後經過處理下沉到本地。
流程圖如下：

(2) 編輯配置檔案1：

在flume/conf目錄下，建立配置檔案flumejob_a1.conf：

vi flumejob_a1.conf

# Flume多channel結構配置a1


# Name the components on this agent  agent別名設定
a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1 c2

# 將資料流複製給多個channel
a1.sources.r1.selector.type = replicating

# Describe/configure the source  設定資料來源監聽本地檔案配置
# exec 執行一個命令的方式去檢視檔案 tail -F 實時檢視
a1.sources.r1.type = exec
# 要執行的指令碼command tail -F 預設10行 man tail  檢視幫助
# 監聽hive操作日誌
a1.sources.r1.command = tail -F /tmp/root/hive.log
# 執行這個command使用的是哪個指令碼 -c 指定使用什麼命令
# whereis bash
# bash: /usr/bin/bash /usr/share/man/man1/bash.1.gz
a1.sources.r1.shell = /usr/bin/bash -c

# Describe the sink  設定sink
# 分兩個埠傳送資料
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = bigdata01
a1.sinks.k1.port = 4141
a1.sinks.k2.type = avro
a1.sinks.k2.hostname = bigdata01
a1.sinks.k2.port = 4142

# Use a channel which buffers events in memory  設定channel  使用記憶體 總大小1000 每次傳輸100
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000
a1.channels.c2.transactionCapacity = 100

# Bind the source and sink to the channel  指定channel
a1.sources.r1.channels = c1 c2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2
複製程式碼

儲存並退出：
- :wq

(3) 編輯配置檔案2：

在flume/conf目錄下，建立配置檔案flumejob_a2.conf：

vi flumejob_a2.conf

# Flume多channel結構配置a2
# 接收a1資料下沉到hdfs


# Name the components on this agent  agent別名設定
a2.sources = r1
a2.sinks = k1
a2.channels = c1

# Describe/configure the source  設定資料來源監聽本地檔案配置
a2.sources.r1.type = avro
# 獲取資料
a2.sources.r1.bind = bigdata01
a2.sources.r1.port = 4141

# Describe the sink  設定sink
# 指定sink型別
a2.sinks.k1.type = hdfs
# 指定HDFS路徑 %Y%m%d/%H%M%S 日期時間  ————修改項
a2.sinks.k1.hdfs.path = hdfs://bigdata01:9000/flume1/%Y%m%d/%H-%M
# 上傳檔案的字首
a2.sinks.k1.hdfs.filePrefix = flume1-
# 是否按照時間滾動資料夾
a2.sinks.k1.hdfs.round = true
# 多少時間單位建立一個新的資料夾 （預設30s）
a2.sinks.k1.hdfs.roundValue = 1
# 重新定義時間單位（每分鐘滾動一個資料夾）
a2.sinks.k1.hdfs.roundUnit = hour
# 是否使用本地時間戳
a2.sinks.k1.hdfs.useLocalTimeStamp = true
# 積攢多少個 Event 才 flush 到 HDFS 一次
a2.sinks.k1.hdfs.batchSize = 100
# 設定檔案型別，可支援壓縮
a2.sinks.k1.hdfs.fileType = DataStream
# 多久生成一個新的檔案 秒
a2.sinks.k1.hdfs.rollInterval = 600
# 設定每個檔案的滾動大小 位元組（最好128M）
a2.sinks.k1.hdfs.rollSize = 134217700
# 檔案的滾動與 Event 數量無關
a1.sinks.k1.hdfs.rollCount = 0
# 最小冗餘數(備份數 生成滾動功能則生效roll hadoop本身有此功能 無需配置) 1份 不冗餘
a2.sinks.k1.hdfs.minBlockReplicas = 1

# Use a channel which buffers events in memory  設定channel  使用記憶體 總大小1000 每次傳輸100
a2.channels.c1.type = memory
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel  指定channel
a2.sources.r1.channels = c1
a2.sinks.k1.channel = c1

複製程式碼

儲存並退出：
- :wq

(4) 編輯配置檔案3：

在flume/conf目錄下，建立配置檔案flumejob_a3.conf：

vi flumejob_a3.conf

# Flume多channel結構配置a3
# 接收a1資料下沉到本地


# Name the components on this agent  agent別名設定
a3.sources = r1
a3.sinks = k1
a3.channels = c1

# Describe/configure the source  設定資料來源監聽本地檔案配置
a3.sources.r1.type = avro
# 獲取資料
a3.sources.r1.bind = bigdata01
a3.sources.r1.port = 4142

# Describe the sink  設定sink
# 指定sink型別
a3.sinks.k1.type = file_roll
a3.sinks.k1.sink.directory = /root/flume1

# Use a channel which buffers events in memory  設定channel  使用記憶體 總大小1000 每次傳輸100
a3.channels.c1.type = memory
a3.channels.c1.capacity = 1000
a3.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel  指定channel
a3.sources.r1.channels = c1
a3.sinks.k1.channel = c1

複製程式碼

儲存並退出：
- :wq
**注意：**本地資料夾需要提前建立好

(5) 執行：

啟動Flume-a1：
- flume-ng agent --conf conf/ --name a1 --conf-file conf/flumejob_a1.conf
啟動Flume-a2：
- flume-ng agent --conf conf/ --name a2 --conf-file conf/flumejob_a2.conf
啟動Flume-a3：
- flume-ng agent --conf conf/ --name a3 --conf-file conf/flumejob_a3.conf
a1最先啟動，a2、a3先後啟動順序無限制。

Linux環境Sqoop安裝配置及使用
2019-03-09
LinuxOOP
Linux環境Hive安裝配置及使用
2019-02-27
LinuxHive
Linux環境Spark安裝配置及使用
2019-05-07
LinuxSpark
Linux 使用Yum安裝Go和配置環境
2024-10-09
LinuxGo
Linux環境HBase安裝配置
2019-03-15
Linux
Linux環境Azkaban安裝配置
2019-03-12
Linux
angular環境配置及安裝
2019-02-24
Angular
GoLand安裝及環境配置
2020-09-16
GoLand
scala安裝及環境配置
2020-12-02
Linux環境下nginx安裝配置
2020-10-12
LinuxNginx
快速搭建 Linux（LNMP + Linux 安裝 + 環境配置）
2019-08-02
LinuxLNMP
Flume 配置環境變數
2020-10-17
變數
Python+Selenium安裝及環境配置
2018-06-07
Python
Node.js安裝及環境配置
2024-10-20
Node.js
Linux環境安裝Oracle11g(三)——使用者、路徑建立及配置環境變數
2021-02-04
LinuxOracle變數
Laravel swoole安裝及使用及inotify熱更新 (Linux centos環境）
2021-03-09
LaravelLinuxCentOS
Linux-RHEL7環境MySQL安裝配置
2019-02-26
LinuxMySql
Linux-CentOS7環境MySQL安裝配置
2019-02-27
LinuxCentOSMySql
LINUX 環境 mysql to mysql OGG安裝配置（二）
2023-01-18
LinuxMySql
jenkins簡單安裝及配置（Windows環境
2023-02-23
JenkinsWindows
Ubuntu系統-FFmpeg安裝及環境配置
2022-06-28
Ubuntu
效能測試工具JMeter的安裝及環境配置--Windows和Linux
2019-08-18
JMeterWindowsLinux
Linux環境安裝Oracle11g(一)——配置檢查及依賴項安裝
2021-02-04
LinuxOracle
Linux下安裝JDK及環境設定
2020-04-06
LinuxJDK
RabbitMQ使用教程（一）RabbitMQ環境安裝配置及Hello World示例
2019-05-23
MQ
Linux 下使用 Docker 安裝lnmp環境
2021-05-14
LinuxDockerLNMP
Linux環境下的Android的ADK安裝配置
2018-11-03
LinuxAndroid
Linux & Windows 環境下 RabbitMQ 安裝與基本配置
2018-05-12
LinuxWindowsMQ
Linux & Windows 環境下 Redis 安裝與基本配置
2018-05-09
LinuxWindowsRedis
jdk在linux下安裝、配置環境變數
2020-09-25
JDKLinux變數
Linux環境下elasticsearch-6.2.2安裝以及配置
2018-03-02
LinuxElasticsearch
linux系統安裝jdk，配置環境變數
2020-12-05
LinuxJDK變數
mac os電腦安裝tomact環境及配置
2020-10-28
Mac
【深度學習】PyTorch CUDA環境配置及安裝
2021-04-19
深度學習PyTorch
0001_02_JDK的安裝及環境配置
2021-01-05
JDK
【轉載】MAVEN環境變數配置及安裝及專案配置
2024-07-30
Maven變數
Linux安裝jdk環境
2020-11-09
LinuxJDK
Linux環境安裝GO
2020-11-27
LinuxGo

Linux環境Flume安裝配置及使用

Linux環境Flume安裝配置及使用

1. 認識Flume

(1) Flume介紹

(2) Flume角色

(3) Flume使用

2. Flume-1.6.0安裝配置流程

(1) Flume環境前提：

(2) 解壓apache-flume-1.6.0-bin.tar.gz安裝包到目標目錄下：

(3) 為後續方便，重新命名Flume資料夾：

(4) 修改配置檔案：

(5) 配置環境變數：

(6) 啟動

3. Flume監聽埠

(1) 編輯配置檔案：

(2) NetCat

(3) Telnet

(4) 執行：

4. Flume監聽本地Linux-hive日誌檔案採集到HDFS

(1) 編輯配置檔案：

(2) Flume+Hive

(3) 執行：

5. Flume監聽本地資料夾

(1) 編輯配置檔案：

(2) 執行：

6. Flume多channel結構配置

(1) 實現目標

(2) 編輯配置檔案1：

(3) 編輯配置檔案2：

(4) 編輯配置檔案3：

(5) 執行：

相關文章