訊息中介軟體-kafka學習筆記一

石頭城程式猿發表於2020-11-02

一、功能介紹:

Kafka是最初由Linkedin公司開發,是一個分散式、支援分割槽的(partition)、多副本的(replica),基於zookeeper協調的分散式訊息系統,它的最大的特性就是可以實時的處理大量資料以滿足各種需求場景:

  • hadoop的批處理系統;
  • 低延遲的實時系統、
  • storm/Spark流式處理引擎,web/nginx日誌、訪問日誌,訊息服務等等

kafka採用scala語言編寫,於2010年貢獻給了Apache基金會併成為頂級開源 專案http://kafka.apache.org/25/javadoc/index.html

二、生產者+broker+消費者:

     1、生產者傳送訊息,支援批量傳送;

     2、生產者傳送訊息,支援定時批量傳送;

     3、生產者傳送訊息,支援事務批量提交;

kafka消費topic是以group為單位來的,一個group消費一個topic。一個group能容納多個consumer。consumer消費是以分割槽(partition)來的,一個consumer可以消費一個或多個partition,一個partition只能被一個consumer消費。

規則

1、如果一個consumer group中的consumer個數多於topic中的partition的個數,多出來的consumer會閒置(idle),所以如果為了增加消費者能力,只簡單增加消費者數量不一定會有用). 

2、如果一個consumer group中的consumer個數小於topic中的partition的個數,會存在一個消費者,消費多個partition的情況,最優的情況消費者與分割槽數量相同。

三、生產者+broker(多個分割槽)+消費者:

 

 

四、生產者+broker(多個分割槽+多個副本)+消費者:

     參考五,單個消費者,消費三個分割槽資料

 

五、生產者+broker(多個分割槽+多個副本)+消費者組(多個):

 

消費者組中各個消費者的消費的偏移量資訊:

__consumer_offsets這個topic是由kafka自動建立的,預設50個

[root@hadoop03 kafka-logs]# ll
total 224
-rw-r--r--. 1 root root    0 Mar 24 13:19 cleaner-offset-checkpoint
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-0
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-1
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-10
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-11
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-12
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-13
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-14
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-15
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-16
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-17
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-18
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-19
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-2
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-20
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-21
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-22
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-23
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-24
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-25
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-26
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-27
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-28
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-29
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-3
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-30
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-31
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-32
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-33
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-34
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-35
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-36
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-37
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-38
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-39
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-4
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-40
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-41
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-42
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-43
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-44
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-45
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-46
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-47
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-48
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-49
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-5
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-6
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-7
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-8
drwxr-xr-x. 2 root root 4096 Mar 24 13:46 __consumer_offsets-9
drwxr-xr-x. 2 root root 4096 Mar 24 13:54 friend-0
drwxr-xr-x. 2 root root 4096 Mar 24 13:54 friend-1
drwxr-xr-x. 2 root root 4096 Mar 24 13:54 friend-2
-rw-r--r--. 1 root root   54 Mar 24 13:19 meta.properties
-rw-r--r--. 1 root root 1228 Mar 24 13:54 recovery-point-offset-checkpoint
-rw-r--r--. 1 root root 1228 Mar 24 13:54 replication-offset-checkpoint
[root@hadoop03 kafka-logs]# /opt/kafka/bin/kafka-topics.sh --describe --topic __consumer_offsets --zookeeper localhost:2181
Topic:__consumer_offsets        PartitionCount:50       ReplicationFactor:3     Configs:segment.bytes=104857600,cleanup.policy=compact,compression.type=producer
        Topic: __consumer_offsets       Partition: 0    Leader: 1       Replicas: 1,2,0 Isr: 1,2,0
        Topic: __consumer_offsets       Partition: 1    Leader: 2       Replicas: 2,0,1 Isr: 2,0,1
        Topic: __consumer_offsets       Partition: 2    Leader: 0       Replicas: 0,1,2 Isr: 1,2,0
        Topic: __consumer_offsets       Partition: 3    Leader: 1       Replicas: 1,0,2 Isr: 1,0,2
        Topic: __consumer_offsets       Partition: 4    Leader: 2       Replicas: 2,1,0 Isr: 2,1,0
        Topic: __consumer_offsets       Partition: 5    Leader: 0       Replicas: 0,2,1 Isr: 2,1,0
        Topic: __consumer_offsets       Partition: 6    Leader: 1       Replicas: 1,2,0 Isr: 1,2,0
        Topic: __consumer_offsets       Partition: 7    Leader: 2       Replicas: 2,0,1 Isr: 2,0,1
        Topic: __consumer_offsets       Partition: 8    Leader: 0       Replicas: 0,1,2 Isr: 1,2,0
        Topic: __consumer_offsets       Partition: 9    Leader: 1       Replicas: 1,0,2 Isr: 1,0,2
        Topic: __consumer_offsets       Partition: 10   Leader: 2       Replicas: 2,1,0 Isr: 2,1,0
        Topic: __consumer_offsets       Partition: 11   Leader: 0       Replicas: 0,2,1 Isr: 2,1,0
        Topic: __consumer_offsets       Partition: 12   Leader: 1       Replicas: 1,2,0 Isr: 1,0,2
        Topic: __consumer_offsets       Partition: 13   Leader: 2       Replicas: 2,0,1 Isr: 2,0,1
        Topic: __consumer_offsets       Partition: 14   Leader: 0       Replicas: 0,1,2 Isr: 1,2,0
        Topic: __consumer_offsets       Partition: 15   Leader: 1       Replicas: 1,0,2 Isr: 1,0,2
        Topic: __consumer_offsets       Partition: 16   Leader: 2       Replicas: 2,1,0 Isr: 2,1,0
        Topic: __consumer_offsets       Partition: 17   Leader: 0       Replicas: 0,2,1 Isr: 2,1,0
        Topic: __consumer_offsets       Partition: 18   Leader: 1       Replicas: 1,2,0 Isr: 1,2,0
        Topic: __consumer_offsets       Partition: 19   Leader: 2       Replicas: 2,0,1 Isr: 2,0,1
        Topic: __consumer_offsets       Partition: 20   Leader: 0       Replicas: 0,1,2 Isr: 1,2,0
        Topic: __consumer_offsets       Partition: 21   Leader: 1       Replicas: 1,0,2 Isr: 1,0,2
        Topic: __consumer_offsets       Partition: 22   Leader: 2       Replicas: 2,1,0 Isr: 2,1,0
        Topic: __consumer_offsets       Partition: 23   Leader: 0       Replicas: 0,2,1 Isr: 2,1,0
        Topic: __consumer_offsets       Partition: 24   Leader: 1       Replicas: 1,2,0 Isr: 1,0,2
        Topic: __consumer_offsets       Partition: 25   Leader: 2       Replicas: 2,0,1 Isr: 2,0,1
        Topic: __consumer_offsets       Partition: 26   Leader: 0       Replicas: 0,1,2 Isr: 1,2,0
        Topic: __consumer_offsets       Partition: 27   Leader: 1       Replicas: 1,0,2 Isr: 1,0,2
        Topic: __consumer_offsets       Partition: 28   Leader: 2       Replicas: 2,1,0 Isr: 2,1,0
        Topic: __consumer_offsets       Partition: 29   Leader: 0       Replicas: 0,2,1 Isr: 2,1,0
        Topic: __consumer_offsets       Partition: 30   Leader: 1       Replicas: 1,2,0 Isr: 1,2,0
        Topic: __consumer_offsets       Partition: 31   Leader: 2       Replicas: 2,0,1 Isr: 2,0,1
        Topic: __consumer_offsets       Partition: 32   Leader: 0       Replicas: 0,1,2 Isr: 1,2,0
        Topic: __consumer_offsets       Partition: 33   Leader: 1       Replicas: 1,0,2 Isr: 1,0,2
        Topic: __consumer_offsets       Partition: 34   Leader: 2       Replicas: 2,1,0 Isr: 2,1,0
        Topic: __consumer_offsets       Partition: 35   Leader: 0       Replicas: 0,2,1 Isr: 2,1,0
        Topic: __consumer_offsets       Partition: 36   Leader: 1       Replicas: 1,2,0 Isr: 1,2,0
        Topic: __consumer_offsets       Partition: 37   Leader: 2       Replicas: 2,0,1 Isr: 2,0,1
        Topic: __consumer_offsets       Partition: 38   Leader: 0       Replicas: 0,1,2 Isr: 1,2,0
        Topic: __consumer_offsets       Partition: 39   Leader: 1       Replicas: 1,0,2 Isr: 1,0,2
        Topic: __consumer_offsets       Partition: 40   Leader: 2       Replicas: 2,1,0 Isr: 2,1,0
        Topic: __consumer_offsets       Partition: 41   Leader: 0       Replicas: 0,2,1 Isr: 2,1,0
        Topic: __consumer_offsets       Partition: 42   Leader: 1       Replicas: 1,2,0 Isr: 1,0,2
        Topic: __consumer_offsets       Partition: 43   Leader: 2       Replicas: 2,0,1 Isr: 2,0,1
        Topic: __consumer_offsets       Partition: 44   Leader: 0       Replicas: 0,1,2 Isr: 1,2,0
        Topic: __consumer_offsets       Partition: 45   Leader: 1       Replicas: 1,0,2 Isr: 1,0,2
        Topic: __consumer_offsets       Partition: 46   Leader: 2       Replicas: 2,1,0 Isr: 2,1,0
        Topic: __consumer_offsets       Partition: 47   Leader: 0       Replicas: 0,2,1 Isr: 2,1,0
        Topic: __consumer_offsets       Partition: 48   Leader: 1       Replicas: 1,2,0 Isr: 1,2,0
        Topic: __consumer_offsets       Partition: 49   Leader: 2       Replicas: 2,0,1 Isr: 2,0,1
[root@hadoop03 kafka-logs]# 
[root@hadoop03 kafka-logs]# /opt/kafka/bin/kafka-topics.sh --describe --topic friend --zookeeper localhost:2181                  
Topic:friend    PartitionCount:3        ReplicationFactor:3     Configs:
        Topic: friend   Partition: 0    Leader: 2       Replicas: 2,0,1 Isr: 2,0,1
        Topic: friend   Partition: 1    Leader: 0       Replicas: 0,1,2 Isr: 0,1,2
        Topic: friend   Partition: 2    Leader: 1       Replicas: 1,2,0 Isr: 1,2,0
[root@hadoop03 kafka-logs]#

 

六、生產者+broker(多個分割槽(每個分割槽支援多段)+多個副本)+消費者組(多個):

 kafka將一個partiton分割成很多個segment檔案,segment下分為幾部分
- index檔案:索引檔案,與log檔案有一定的關聯關係
- log檔案:真正儲存資料的檔案

多個大小相等的segment file (段)組成了一個partition。

segment file 是什麼?

每個partition 就相當於一個巨型的檔案 裡面由多個大小相等的segment file小檔案組成,但是每個segment file 的訊息數量並不一定相等,這種設計方便舊的segment file 快速刪除

每個partition 只需支援順序進行讀寫即可,segment 的生命週期由服務端配置引數決定

segment file 由2大部分組成,以.index結尾的索引檔案,和以.log 結尾的資料檔案 ,索引檔案和資料檔案的名稱是一樣的只是檔案的字尾名不一樣

 

七、Kafka如何支援多個消費者組重複訊息?

  __consumer_offsets通過kafka自動建立的topic,記錄消費者組中消費者消費分割槽的偏移量資訊。

 

八、kafka的ACK機制:

request.required.acks 有三個值 0 1 -1

0:生產者不會等待 broker 的 ack,這個延遲最低但是儲存的保證最弱當 server 掛掉的時候就會丟資料

1:服務端會等待 ack 值 leader 副本確認接收到訊息後傳送 ack 但是如果 leader 掛掉後他不確保是否複製完成新 leader 也會導致資料丟失

-1:同樣在 1 的基礎上 服務端會等所有的 follower 的副本受到資料後才會受到 leader 發出的 ack,這樣資料不會丟失

相關文章