Kafka內部提供了許多管理指令碼,這些指令碼都放在$KAFKA_HOME/bin
目錄下,而這些類的實現都是放在原始碼的kafka/core/src/main/scala/kafka/tools/
路徑下。
Consumer Offset Checker
Consumer Offset Checker主要是執行kafka.tools.ConsumerOffsetChecker
類,對應的指令碼是kafka-consumer-offset-checker.sh,會顯示出Consumer的Group、Topic、分割槽ID、分割槽對應已經消費的Offset、logSize大小,Lag以及Owner等資訊。
如果執行kafka-consumer-offset-checker.sh
指令碼的時候什麼資訊都不輸入,那麼會顯示以下資訊:
[iteblog@www.iteblog.com /]$ bin/kafka-consumer-offset-checker.sh
Check the offset of your consumers.
Option Description
------ -----------
--broker-info Print broker info
--group Consumer group.
--help Print this message.
--retry.backoff.ms <Integer> Retry back-off to use for failed
offset queries. (default: 3000)
--socket.timeout.ms <Integer> Socket timeout to use when querying
for offsets. (default: 6000)
--topic Comma-separated list of consumer
topics (all topics if absent).
--zookeeper ZooKeeper connect string. (default:
localhost:2181)
我們根據提示,輸入的命令如下:
[iteblog@www.iteblog.com /]$ bin/kafka-consumer-offset-checker.sh --zookeeper www.iteblog.com:2181 --topic test --group spark --broker-info
Group Topic Pid Offset logSize Lag Owner
spark test 0 34666914 34674392 7478 none
spark test 1 34670481 34678029 7548 none
spark test 2 34670547 34678002 7455 none
spark test 3 34664512 34671961 7449 none
spark test 4 34680143 34687562 7419 none
spark test 5 34672309 34679823 7514 none
spark test 6 34674660 34682220 7560 none
BROKER INFO
2 -> www.iteblog.com:9092
5 -> www.iteblog.com:9093
4 -> www.iteblog.com:9094
7 -> www.iteblog.com:9095
1 -> www.iteblog.com:9096
3 -> www.iteblog.com:9097
6 -> www.iteblog.com:9098
Dump Log Segment
有時候我們需要驗證日誌索引是否正確,或者僅僅想從log檔案中直接列印訊息,我們可以使用kafka.tools.DumpLogSegments
類來實現,先來看看它需要的引數:
[iteblog@www.iteblog.com /]$ bin/kafka-run-class.sh kafka.tools.DumpLogSegments
Parse a log file and dump its contents to the console, useful for debugging a seemingly corrupt log segment.
Option Description
------ -----------
--deep-iteration if set, uses deep instead of shallow
iteration
--files <file1, file2, ...> REQUIRED: The comma separated list of
data and index log files to be dumped
--key-decoder-class if set, used to deserialize the keys.
This class should implement kafka.
serializer.Decoder trait. Custom jar
should be available in kafka/libs
directory. (default: kafka.
serializer.StringDecoder)
--max-message-size <Integer: size> Size of largest message. (default:
5242880)
--print-data-log if set, printing the messages content
when dumping data logs
--value-decoder-class if set, used to deserialize the
messages. This class should
implement kafka.serializer.Decoder
trait. Custom jar should be
available in kafka/libs directory.
(default: kafka.serializer.
StringDecoder)
--verify-index-only if set, just verify the index log
without printing its content
很明顯,我們在使用kafka.tools.DumpLogSegments
的時候必須輸入--files,這個引數指的就是Kafka中Topic分割槽所在的絕對路徑。分割槽所在的目錄由config/server.properties
檔案中log.dirs
引數決定。比如我們想看/home/q/kafka/kafka_2.10-0.8.2.1/data/test-4/00000000000034245135.log日誌檔案的相關情況可以
使用下面的命令:
[iteblog@www.iteblog.com /]$ bin/kafka-run-class.sh kafka.tools.DumpLogSegments --files /iteblog/data/test-4/00000000000034245135.log
Dumping /home/q/kafka/kafka_2.10-0.8.2.1/data/test-4/00000000000034245135.log
Starting offset: 34245135
offset: 34245135 position: 0 isvalid: true payloadsize: 4213 magic: 0 compresscodec: NoCompressionCodec crc: 865449274 keysize: 4213
offset: 34245136 position: 8452 isvalid: true payloadsize: 4657 magic: 0 compresscodec: NoCompressionCodec crc: 4123037760 keysize: 4657
offset: 34245137 position: 17792 isvalid: true payloadsize: 3921 magic: 0 compresscodec: NoCompressionCodec crc: 541297511 keysize: 3921
offset: 34245138 position: 25660 isvalid: true payloadsize: 2290 magic: 0 compresscodec: NoCompressionCodec crc: 1346104996 keysize: 2290
offset: 34245139 position: 30266 isvalid: true payloadsize: 2284 magic: 0 compresscodec: NoCompressionCodec crc: 1930558677 keysize: 2284
offset: 34245140 position: 34860 isvalid: true payloadsize: 268 magic: 0 compresscodec: NoCompressionCodec crc: 57847488 keysize: 268
offset: 34245141 position: 35422 isvalid: true payloadsize: 263 magic: 0 compresscodec: NoCompressionCodec crc: 2964399224 keysize: 263
offset: 34245142 position: 35974 isvalid: true payloadsize: 1875 magic: 0 compresscodec: NoCompressionCodec crc: 647039113 keysize: 1875
offset: 34245143 position: 39750 isvalid: true payloadsize: 648 magic: 0 compresscodec: NoCompressionCodec crc: 865445580 keysize: 648
offset: 34245144 position: 41072 isvalid: true payloadsize: 556 magic: 0 compresscodec: NoCompressionCodec crc: 1174686061 keysize: 556
offset: 34245145 position: 42210 isvalid: true payloadsize: 4211 magic: 0 compresscodec: NoCompressionCodec crc: 3691302513 keysize: 4211
offset: 34245146 position: 50658 isvalid: true payloadsize: 2299 magic: 0 compresscodec: NoCompressionCodec crc: 2367114411 keysize: 2299
offset: 34245147 position: 55282 isvalid: true payloadsize: 642 magic: 0 compresscodec: NoCompressionCodec crc: 4122061921 keysize: 642
offset: 34245148 position: 56592 isvalid: true payloadsize: 4211 magic: 0 compresscodec: NoCompressionCodec crc: 3257991653 keysize: 4211
offset: 34245149 position: 65040 isvalid: true payloadsize: 2278 magic: 0 compresscodec: NoCompressionCodec crc: 2103489307 keysize: 2278
offset: 34245150 position: 69622 isvalid: true payloadsize: 269 magic: 0 compresscodec: NoCompressionCodec crc: 792857391 keysize: 269
offset: 34245151 position: 70186 isvalid: true payloadsize: 640 magic: 0 compresscodec: NoCompressionCodec crc: 791599616 keysize: 640
可以看出,這個命令將Kafka中Message中Header的相關資訊和偏移量都顯示出來了,但是沒有看到日誌的內容,我們可以通過--print-data-log來設定。如果需要檢視多個日誌檔案,可以以逗號分割。
匯出Zookeeper中Group相關的偏移量
有時候我們需要匯出某個Consumer group各個分割槽的偏移量,我們可以通過使用Kafka的kafka.tools.ExportZkOffsets
類來滿足。來看看這個類需要的引數:
[iteblog@www.iteblog.com /]$ bin/kafka-run-class.sh kafka.tools.ExportZkOffsets
Export consumer offsets to an output file.
Option Description
------ -----------
--group Consumer group.
--help Print this message.
--output-file Output file
--zkconnect ZooKeeper connect string. (default:
localhost:2181)
我們需要輸入Consumer group,Zookeeper的地址以及儲存檔案路徑:
[iteblog@www.iteblog.com /]$ bin/kafka-run-class.sh kafka.tools.ExportZkOffsets --group spark --zkconnect www.iteblog.com:2181 --output-file ~/offset
[iteblog@www.iteblog.com /]$ vim ~/offset
/consumers/spark/offsets/test/3:34846274
/consumers/spark/offsets/test/2:34852378
/consumers/spark/offsets/test/1:34852360
/consumers/spark/offsets/test/0:34848170
/consumers/spark/offsets/test/6:34857010
/consumers/spark/offsets/test/5:34854268
/consumers/spark/offsets/test/4:34861572
注意,--output-file
引數必須在指定,否則會出錯。
通過JMX獲取metrics資訊
我們可以通過kafka.tools.JmxTool
類列印出Kafka相關的metrics資訊。
[iteblog@www.iteblog.com /]$ bin/kafka-run-class.sh kafka.tools.JmxTool
Dump JMX values to standard output.
Option Description
------ -----------
--attributes <name> The whitelist of attributes to query.
This is a comma-separated list. If
no attributes are specified all
objects will be queried.
--date-format <format> The date format to use for formatting
the time field. See java.text.
SimpleDateFormat for options.
--help Print usage information.
--jmx-url <service-url> The url to connect to to poll JMX
data. See Oracle javadoc for
JMXServiceURL for details. (default:
service:jmx:rmi:///jndi/rmi://:
9999/jmxrmi)
--object-name <name> A JMX object name to use as a query.
This can contain wild cards, and
this option can be given multiple
times to specify more than one
query. If no objects are specified
all objects will be queried.
--reporting-interval <Integer: ms> Interval in MS with which to poll jmx
stats. (default: 2000)
可以這麼使用
[iteblog@www.iteblog.com /]$ bin/kafka-run-class.sh kafka.tools.JmxTool --jmx-url service:jmx:rmi:///jndi/rmi://www.iteblog.com:1099/jmxrmi
執行上面命令前提是在啟動kafka叢集的時候指定export
JMX_PORT=
,這樣才會開啟JMX。然後就可以通過上面命令列印出Kafka所有的metrics資訊。
Kafka資料遷移工具
這個工具主要有兩個:kafka.tools.KafkaMigrationTool
和kafka.tools.MirrorMaker
。第一個主要是用於將Kafka
0.7上面的資料遷移到Kafka 0.8(https://cwiki.apache.org/confluence/display/KAFKA/Migrating+from+0.7+to+0.8);而後者可以同步兩個Kafka叢集的資料(https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=27846330)。都是從原端消費Messages,然後釋出到目標端。
[iteblog@www.iteblog.com /]$ bin/kafka-run-class.sh kafka.tools.KafkaMigrationTool --kafka.07.jar kafka-0.7.19.jar --zkclient.01.jar zkclient-0.2.0.jar --num.producers 16 --consumer.config=sourceCluster2Consumer.config --producer.config=targetClusterProducer.config --whitelist=.*
[iteblog@www.iteblog.com /]$ bin/kafka-run-class.sh kafka.tools.MirrorMaker --consumer.config sourceCluster1Consumer.config --consumer.config sourceCluster2Consumer.config --num.streams 2 --producer.config targetClusterProducer.config --whitelist=".*"
日誌重放工具
這個工具主要作用是從一個Kafka叢集裡面讀取指定Topic的訊息,並將這些訊息傳送到其他叢集的指定topic中:
[iteblog@www.iteblog.com /]$ bin/kafka-replay-log-producer.sh
Missing required argument "[broker-list]"
Option Description
------ -----------
--broker-list <hostname:port> REQUIRED: the broker list must be
specified.
--inputtopic <input-topic> REQUIRED: The topic to consume from.
--messages <Integer: count> The number of messages to send.
(default: -1)
--outputtopic <output-topic> REQUIRED: The topic to produce to
--property <producer properties> A mechanism to pass properties in the
form key=value to the producer. This
allows the user to override producer
properties that are not exposed by
the existing command line arguments
--reporting-interval <Integer: size> Interval at which to print progress
info. (default: 5000)
--sync If set message send requests to the
brokers are synchronously, one at a
time as they arrive.
--threads <Integer: threads> Number of sending threads. (default: 1)
--zookeeper <zookeeper url> REQUIRED: The connection string for
the zookeeper connection in the form
host:port. Multiple URLS can be
given to allow fail-over. (default:
127.0.0.1:2181)
Simple Consume指令碼
kafka-simple-consumer-shell.sh
工具主要是使用Simple
Consumer API從指定Topic的分割槽讀取資料並列印在終端:
bin/kafka-simple-consumer-shell.sh --broker-list www.iteblog.com:9092 --topic test --partition 0
更新Zookeeper中的偏移量
kafka.tools.UpdateOffsetsInZK
工具可以更新Zookeeper中指定Topic所有分割槽的偏移量,可以指定成
earliest或者latest:
[iteblog@www.iteblog.com /]$ bin/kafka-run-class.sh kafka.tools.UpdateOffsetsInZK
USAGE: kafka.tools.UpdateOffsetsInZK$ [earliest | latest] consumer.properties topic
需要指定是更新成earliest或者latest,consumer.properties檔案的路徑以及topic的名稱
本部落格文章除特別宣告,全部都是原創!
尊重原創,轉載請註明: 轉載自過往記憶(http://www.iteblog.com/)
本文連結: 【Kafka管理工具介紹】(http://www.iteblog.com/archives/1605)