Kafka管理工具介紹

五柳-先生發表於2016-04-05

　Kafka內部提供了許多管理指令碼，這些指令碼都放在$KAFKA_HOME/bin目錄下，而這些類的實現都是放在原始碼的kafka/core/src/main/scala/kafka/tools/路徑下。

Consumer Offset Checker

　　Consumer Offset Checker主要是執行kafka.tools.ConsumerOffsetChecker類，對應的指令碼是kafka-consumer-offset-checker.sh，會顯示出Consumer的Group、Topic、分割槽ID、分割槽對應已經消費的Offset、logSize大小，Lag以及Owner等資訊。

如果執行kafka-consumer-offset-checker.sh指令碼的時候什麼資訊都不輸入，那麼會顯示以下資訊：

[iteblog@www.iteblog.com /]$ bin/kafka-consumer-offset-checker.sh
Check the offset of your consumers.
Option                                  Description                            
------                                  -----------                            
--broker-info                           Print broker info                      
--group                                 Consumer group.                        
--help                                  Print this message.                    
--retry.backoff.ms <Integer>            Retry back-off to use for failed       
                                          offset queries. (default: 3000)      
--socket.timeout.ms <Integer>           Socket timeout to use when querying    
                                          for offsets. (default: 6000)         
--topic                                 Comma-separated list of consumer       
                                          topics (all topics if absent).       
--zookeeper                             ZooKeeper connect string. (default:    
                                          localhost:2181)

我們根據提示，輸入的命令如下：

[iteblog@www.iteblog.com /]$ bin/kafka-consumer-offset-checker.sh --zookeeper www.iteblog.com:2181 --topic test --group spark --broker-info
Group           Topic      Pid Offset          logSize         Lag             Owner
spark    test       0   34666914        34674392        7478            none
spark    test       1   34670481        34678029        7548            none
spark    test       2   34670547        34678002        7455            none
spark    test       3   34664512        34671961        7449            none
spark    test       4   34680143        34687562        7419            none
spark    test       5   34672309        34679823        7514            none
spark    test       6   34674660        34682220        7560            none
BROKER INFO
2 -> www.iteblog.com:9092
5 -> www.iteblog.com:9093
4 -> www.iteblog.com:9094
7 -> www.iteblog.com:9095
1 -> www.iteblog.com:9096
3 -> www.iteblog.com:9097
6 -> www.iteblog.com:9098

Dump Log Segment

　　有時候我們需要驗證日誌索引是否正確，或者僅僅想從log檔案中直接列印訊息，我們可以使用kafka.tools.DumpLogSegments類來實現，先來看看它需要的引數：

[iteblog@www.iteblog.com /]$ bin/kafka-run-class.sh kafka.tools.DumpLogSegments 
Parse a log file and dump its contents to the console, useful for debugging a seemingly corrupt log segment.
Option                                  Description                            
------                                  -----------                            
--deep-iteration                        if set, uses deep instead of shallow   
                                          iteration                            
--files <file1, file2, ...>             REQUIRED: The comma separated list of  
                                          data and index log files to be dumped
--key-decoder-class                     if set, used to deserialize the keys.  
                                          This class should implement kafka.   
                                          serializer.Decoder trait. Custom jar 
                                          should be available in kafka/libs    
                                          directory. (default: kafka.          
                                          serializer.StringDecoder)            
--max-message-size <Integer: size>      Size of largest message. (default:     
                                          5242880)                             
--print-data-log                        if set, printing the messages content  
                                          when dumping data logs               
--value-decoder-class                   if set, used to deserialize the        
                                          messages. This class should          
                                          implement kafka.serializer.Decoder   
                                          trait. Custom jar should be          
                                          available in kafka/libs directory.   
                                          (default: kafka.serializer.          
                                          StringDecoder)                       
--verify-index-only                     if set, just verify the index log      
                                          without printing its content

　　很明顯，我們在使用kafka.tools.DumpLogSegments的時候必須輸入--files，這個引數指的就是Kafka中Topic分割槽所在的絕對路徑。分割槽所在的目錄由config/server.properties檔案中log.dirs引數決定。比如我們想看/home/q/kafka/kafka_2.10-0.8.2.1/data/test-4/00000000000034245135.log日誌檔案的相關情況可以使用下面的命令：

[iteblog@www.iteblog.com /]$ bin/kafka-run-class.sh kafka.tools.DumpLogSegments --files /iteblog/data/test-4/00000000000034245135.log
Dumping /home/q/kafka/kafka_2.10-0.8.2.1/data/test-4/00000000000034245135.log
Starting offset: 34245135
offset: 34245135 position: 0 isvalid: true payloadsize: 4213 magic: 0 compresscodec: NoCompressionCodec crc: 865449274 keysize: 4213
offset: 34245136 position: 8452 isvalid: true payloadsize: 4657 magic: 0 compresscodec: NoCompressionCodec crc: 4123037760 keysize: 4657
offset: 34245137 position: 17792 isvalid: true payloadsize: 3921 magic: 0 compresscodec: NoCompressionCodec crc: 541297511 keysize: 3921
offset: 34245138 position: 25660 isvalid: true payloadsize: 2290 magic: 0 compresscodec: NoCompressionCodec crc: 1346104996 keysize: 2290
offset: 34245139 position: 30266 isvalid: true payloadsize: 2284 magic: 0 compresscodec: NoCompressionCodec crc: 1930558677 keysize: 2284
offset: 34245140 position: 34860 isvalid: true payloadsize: 268 magic: 0 compresscodec: NoCompressionCodec crc: 57847488 keysize: 268
offset: 34245141 position: 35422 isvalid: true payloadsize: 263 magic: 0 compresscodec: NoCompressionCodec crc: 2964399224 keysize: 263
offset: 34245142 position: 35974 isvalid: true payloadsize: 1875 magic: 0 compresscodec: NoCompressionCodec crc: 647039113 keysize: 1875
offset: 34245143 position: 39750 isvalid: true payloadsize: 648 magic: 0 compresscodec: NoCompressionCodec crc: 865445580 keysize: 648
offset: 34245144 position: 41072 isvalid: true payloadsize: 556 magic: 0 compresscodec: NoCompressionCodec crc: 1174686061 keysize: 556
offset: 34245145 position: 42210 isvalid: true payloadsize: 4211 magic: 0 compresscodec: NoCompressionCodec crc: 3691302513 keysize: 4211
offset: 34245146 position: 50658 isvalid: true payloadsize: 2299 magic: 0 compresscodec: NoCompressionCodec crc: 2367114411 keysize: 2299
offset: 34245147 position: 55282 isvalid: true payloadsize: 642 magic: 0 compresscodec: NoCompressionCodec crc: 4122061921 keysize: 642
offset: 34245148 position: 56592 isvalid: true payloadsize: 4211 magic: 0 compresscodec: NoCompressionCodec crc: 3257991653 keysize: 4211
offset: 34245149 position: 65040 isvalid: true payloadsize: 2278 magic: 0 compresscodec: NoCompressionCodec crc: 2103489307 keysize: 2278
offset: 34245150 position: 69622 isvalid: true payloadsize: 269 magic: 0 compresscodec: NoCompressionCodec crc: 792857391 keysize: 269
offset: 34245151 position: 70186 isvalid: true payloadsize: 640 magic: 0 compresscodec: NoCompressionCodec crc: 791599616 keysize: 640

可以看出，這個命令將Kafka中Message中Header的相關資訊和偏移量都顯示出來了，但是沒有看到日誌的內容，我們可以通過--print-data-log來設定。如果需要檢視多個日誌檔案，可以以逗號分割。

匯出Zookeeper中Group相關的偏移量

　　有時候我們需要匯出某個Consumer group各個分割槽的偏移量，我們可以通過使用Kafka的kafka.tools.ExportZkOffsets類來滿足。來看看這個類需要的引數：

[iteblog@www.iteblog.com /]$ bin/kafka-run-class.sh kafka.tools.ExportZkOffsets
Export consumer offsets to an output file.
Option                                  Description                            
------                                  -----------                            
--group                                 Consumer group.                        
--help                                  Print this message.                    
--output-file                           Output file                            
--zkconnect                             ZooKeeper connect string. (default:    
                                          localhost:2181)

我們需要輸入Consumer group，Zookeeper的地址以及儲存檔案路徑：

[iteblog@www.iteblog.com /]$ bin/kafka-run-class.sh kafka.tools.ExportZkOffsets --group spark --zkconnect www.iteblog.com:2181 --output-file ~/offset

[iteblog@www.iteblog.com /]$ vim ~/offset 
/consumers/spark/offsets/test/3:34846274
/consumers/spark/offsets/test/2:34852378
/consumers/spark/offsets/test/1:34852360
/consumers/spark/offsets/test/0:34848170
/consumers/spark/offsets/test/6:34857010
/consumers/spark/offsets/test/5:34854268
/consumers/spark/offsets/test/4:34861572

注意，--output-file引數必須在指定，否則會出錯。

通過JMX獲取metrics資訊

　　我們可以通過kafka.tools.JmxTool類列印出Kafka相關的metrics資訊。

[iteblog@www.iteblog.com /]$ bin/kafka-run-class.sh kafka.tools.JmxTool
Dump JMX values to standard output.
Option                                  Description                            
------                                  -----------                            
--attributes <name>                     The whitelist of attributes to query.  
                                          This is a comma-separated list. If   
                                          no attributes are specified all      
                                          objects will be queried.             
--date-format <format>                  The date format to use for formatting  
                                          the time field. See java.text.       
                                          SimpleDateFormat for options.        
--help                                  Print usage information.               
--jmx-url <service-url>                 The url to connect to to poll JMX      
                                          data. See Oracle javadoc for         
                                          JMXServiceURL for details. (default: 
                                          service:jmx:rmi:///jndi/rmi://:      
                                          9999/jmxrmi)                         
--object-name <name>                    A JMX object name to use as a query.   
                                          This can contain wild cards, and     
                                          this option can be given multiple    
                                          times to specify more than one       
                                          query. If no objects are specified   
                                          all objects will be queried.         
--reporting-interval <Integer: ms>      Interval in MS with which to poll jmx  
                                          stats. (default: 2000)

可以這麼使用

[iteblog@www.iteblog.com /]$ bin/kafka-run-class.sh kafka.tools.JmxTool --jmx-url service:jmx:rmi:///jndi/rmi://www.iteblog.com:1099/jmxrmi

執行上面命令前提是在啟動kafka叢集的時候指定export JMX_PORT= ，這樣才會開啟JMX。然後就可以通過上面命令列印出Kafka所有的metrics資訊。

Kafka資料遷移工具

　　這個工具主要有兩個：kafka.tools.KafkaMigrationTool和kafka.tools.MirrorMaker。第一個主要是用於將Kafka 0.7上面的資料遷移到Kafka 0.8（https://cwiki.apache.org/confluence/display/KAFKA/Migrating+from+0.7+to+0.8）；而後者可以同步兩個Kafka叢集的資料（https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=27846330）。都是從原端消費Messages，然後釋出到目標端。

[iteblog@www.iteblog.com /]$ bin/kafka-run-class.sh kafka.tools.KafkaMigrationTool --kafka.07.jar kafka-0.7.19.jar --zkclient.01.jar zkclient-0.2.0.jar --num.producers 16 --consumer.config=sourceCluster2Consumer.config --producer.config=targetClusterProducer.config --whitelist=.*

[iteblog@www.iteblog.com /]$ bin/kafka-run-class.sh kafka.tools.MirrorMaker --consumer.config sourceCluster1Consumer.config --consumer.config sourceCluster2Consumer.config --num.streams 2 --producer.config targetClusterProducer.config --whitelist=".*"

日誌重放工具

　　這個工具主要作用是從一個Kafka叢集裡面讀取指定Topic的訊息，並將這些訊息傳送到其他叢集的指定topic中：

[iteblog@www.iteblog.com /]$ bin/kafka-replay-log-producer.sh 
Missing required argument "[broker-list]"
Option                                  Description                            
------                                  -----------                            
--broker-list <hostname:port>           REQUIRED: the broker list must be      
                                          specified.                           
--inputtopic <input-topic>              REQUIRED: The topic to consume from.   
--messages <Integer: count>             The number of messages to send.        
                                          (default: -1)                        
--outputtopic <output-topic>            REQUIRED: The topic to produce to      
--property <producer properties>        A mechanism to pass properties in the  
                                          form key=value to the producer. This 
                                          allows the user to override producer 
                                          properties that are not exposed by   
                                          the existing command line arguments  
--reporting-interval <Integer: size>    Interval at which to print progress    
                                          info. (default: 5000)                
--sync                                  If set message send requests to the    
                                          brokers are synchronously, one at a  
                                          time as they arrive.                 
--threads <Integer: threads>            Number of sending threads. (default: 1)
--zookeeper <zookeeper url>             REQUIRED: The connection string for    
                                          the zookeeper connection in the form 
                                          host:port. Multiple URLS can be      
                                          given to allow fail-over. (default:  
                                          127.0.0.1:2181)

Simple Consume指令碼

　　kafka-simple-consumer-shell.sh工具主要是使用Simple Consumer API從指定Topic的分割槽讀取資料並列印在終端：

bin/kafka-simple-consumer-shell.sh --broker-list www.iteblog.com:9092 --topic test --partition 0

更新Zookeeper中的偏移量

　　kafka.tools.UpdateOffsetsInZK工具可以更新Zookeeper中指定Topic所有分割槽的偏移量，可以指定成 earliest或者latest：

[iteblog@www.iteblog.com /]$ bin/kafka-run-class.sh kafka.tools.UpdateOffsetsInZK
USAGE: kafka.tools.UpdateOffsetsInZK$ [earliest | latest] consumer.properties topic

需要指定是更新成earliest或者latest，consumer.properties檔案的路徑以及topic的名稱

本部落格文章除特別宣告，全部都是原創！
尊重原創，轉載請註明：轉載自過往記憶（http://www.iteblog.com/）
本文連結: 【Kafka管理工具介紹】（http://www.iteblog.com/archives/1605）

kafka介紹
2018-02-09
Kafka
Kafka 介紹
2016-10-10
Kafka
一、kafka 介紹 && kafka-client
2020-06-04
Kafkaclient
Kafka詳細介紹
2018-09-05
Kafka
kafka 基礎介紹
2017-11-16
Kafka
kafka基礎介紹
2017-11-16
Kafka
Kafka剖析：Kafka背景及架構介紹
2015-10-18
Kafka架構
Apache kafka 工作原理介紹
2015-08-09
ApacheKafka
Kafka剖析（一）：Kafka背景及架構介紹
2018-02-04
Kafka架構
KAFKA介紹（分散式架構）
2018-08-15
Kafka分散式架構
Kafka中的segment的介紹
2023-03-28
Kafka
MongoDB管理工具Rockmongo介紹
2015-12-28
MongoDB
Kafka的原理介紹及實踐
2020-07-09
Kafka
原始碼管理工具Github介紹
2024-05-22
原始碼Github
原始碼管理工具介紹-github
2024-05-30
原始碼Github
Apache Kafka簡單介紹 - 解道Jdon
2014-09-20
ApacheKafka
Kafka 處理器客戶端介紹
2016-06-01
Kafka客戶端
github原始碼管理工具——使用介紹
2024-05-29
Github原始碼
主流原始碼管理工具Github介紹
2024-05-29
原始碼Github
VNC管理工具，Windows系統下VNC管理工具介紹
2020-06-03
VNCWindows
版本管理工具Git（一）簡要介紹
2018-08-12
Git
介紹一款docker管理工具——portainer
2021-11-18
DockerAI
[原創]敏捷開發管理工具介紹
2013-11-27
敏捷
程式碼管理工具介紹——Git與GitHub
2024-05-22
Github
複製管理工具介紹——高階複製
2010-07-04
原始碼管理工具——GitHub的介紹和使用
2024-05-22
原始碼Github
什麼,kafka能夠從follower副本讀資料了 —kafka新功能介紹
2020-12-02
Kafka
Yahoo開源的Apache Kafka管理工具：Kafka Manager
2015-11-17
ApacheKafka
kafka之一：kafka簡介
2021-06-04
Kafka
JDK14效能管理工具:jstat使用介紹
2020-07-01
JDKJS
pip軟體包管理工具介紹及基本使用
2021-03-17
secrets 管理工具 Vault 的介紹、安裝及使用
2021-01-25
專案管理工具的特性簡要介紹（轉）
2007-08-15
專案管理
Kafka簡介
2022-03-22
Kafka
kafka 簡介
2018-05-21
Kafka
Java中的幾種Kafka客戶端比較介紹
2022-01-23
JavaKafka客戶端
kafka詳解一、Kafka簡介
2015-11-17
Kafka
三款win7遠端桌面管理工具介紹
2019-08-29
Win7