超詳細kafka教程來啦

女友在高考發表於2021-09-05

原文網址 : https://www.cnblogs.com/javammc/p/15230299.html

Kafka的概念和入門

Kafka是一個訊息系統。由LinkedIn於2011年設計開發。

Kafka是一種分散式的，基於釋出/訂閱的訊息系統。主要設計目標如下：

以時間複雜度O（1）的方式提供訊息持久化能力，即使對TB級以上資料頁能保證常數時間複雜度的訪問效能。
高吞吐率。即使在非常廉價的商用機器上也能做到單機支援每秒100K條以上的訊息傳輸。
支援Kafka Server間的訊息分割槽，及分散式消費，同時保證每個Partition內的訊息順序傳輸。
同時支援離線資料處理和實時資料處理。
支援線上水平擴充套件。

消費者是採用pull模式從Broker訂閱訊息。

模式	優點	缺點
pull模式	消費者可以根據自己的消費能力決定拉取的策略	沒有訊息的時候會空輪詢（kafka為了避免，有個引數可以阻塞直到新訊息到達）
push模式	及時性高	消費能力遠低於生產能力時，就會導致客戶端訊息堆積，甚至服務崩潰。服務端需要維護每次傳輸狀態，以防訊息傳遞失敗好進行重試。

Kafka的基本概念

Broker：Kafka叢集包含一個或多個伺服器，這種伺服器被稱為broker
Topic：每條釋出到Kafka叢集的訊息都有一個類別，這個類別被稱為Topic。
Partition：Partition是物理上的概念，每個Topic包含一個或多個Partition
Producer：負責釋出訊息到Kafka broker
Consumer：訊息消費者，向Kafka broker讀取訊息的客戶端
Consumer Group：每個Consumer屬於一個特定的Consumer Group（可為每個Consumer指定group name，若不指定group name則屬於預設的group）

單機部署結構

叢集部署結構

Topic和Partition

一個Topic可以包含一個或者多個Partition。因為一臺機器的儲存和效能是有限的，多個Partition是為了支援水平擴充套件和並行處理。

Partition和Replica

分為多個Partition可以將訊息壓力分到多個機器上，但是如果其中一個partition資料丟了，那總體資料就少了一塊。所以又引入了Replica的概念。

每個partition都可以通過副本因子新增多個副本。這樣就算有一臺機器故障了，其他機器上也有備份的資料

叢集環境下的3分割槽，3副本：

二、安裝部署

下載kafka http://kafka.apache.org/downloads

我用的2.7.0版本，下載後解壓

注意選擇Binary downloads而不是Source download
2. 進入conf/server.properties檔案，開啟如下配置

listeners=PLAINTEXT://localhost:9092

啟動zookeeper

自行安裝，我用的3.7版本的zookeeper
4. 啟動kafka

bin/kafka-server-start.sh config/server.properties

Kafka命令列

檢視topic

bin/kafka-topics.sh --zookeeper localhost:2181 --list

建立topic

bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic test1 --partitions 4 --replication-factor 1

檢視topic資訊

bin/kafka-topics.sh --zookeeper localhost:2181 --describe --topic test1

消費命令

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic test1

生產命令

bin/kafka-console-producer.sh --bootstrap-server localhost:9092 --topic test1

簡單效能測試：

bin/kafka-producer-perf-test.sh --topic test1 --num-records 100000 --record-size 1000 --throughput 2000 --producer-props bootstrap.servers=localhost:9092

bin/kafka-consumer-perf-test.sh --bootstrap-server localhost:9092 --topic test1 -- fetch-size 1048576 --messages 100000 --threads 1

Java客戶端

生產者：

public class SimpleKafkaProducer {

    public static void main(String[] args) {
        Properties properties=new Properties();
        properties.setProperty("key.serializer","org.apache.kafka.common.serialization.StringSerializer");
        properties.setProperty("value.serializer","org.apache.kafka.common.serialization.StringSerializer");
        properties.setProperty("bootstrap.servers","192.168.157.200:9092");

        KafkaProducer producer=new KafkaProducer(properties);

        ProducerRecord record=new ProducerRecord("test1","這是一條訊息");
        producer.send(record);
        producer.close();
    }
}

消費者

public class SimpleKafkaConsumer {

    public static void main(String[] args) {
        Properties properties=new Properties();
        properties.setProperty("key.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
        properties.setProperty("value.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
        properties.setProperty("bootstrap.servers","192.168.157.200:9092");
        //消費者組
        properties.setProperty("group.id","group1");


        KafkaConsumer consumer=new KafkaConsumer(properties);
        //訂閱topic
        consumer.subscribe(Arrays.asList("test1"));

        while (true){
            //拉取資料
            ConsumerRecords poll=consumer.poll(100);
            ((ConsumerRecords) poll).forEach(
                    data->{
                        System.out.println(((ConsumerRecord)data).value());
                    }
            );
        }
    }
}

三、高階特性

生產者特性

生產者-確認模式

acks=0 :只傳送不管有沒有寫入到broker
acks=1：只寫入到leader就認為成功
acks=-1/all：要求ISR列表裡所有follower都同步過去，才算成功

將acks設定為-1就一定能保證訊息不丟嗎？

答：不是的。如果partition只有一個副本，也就是光有leader沒有follower，那麼當機了訊息一樣會丟失。所以至少也要設定2個及以上的副本才行。

另外，要提高資料可靠性，設定acks=-1的同時，也要設定min.insync.replicas(最小副本數，預設1)

生產者-同步傳送

public void syncSend() throws ExecutionException, InterruptedException {
        Properties properties=new Properties();
        properties.setProperty("key.serializer","org.apache.kafka.common.serialization.StringSerializer");
        properties.setProperty("value.serializer","org.apache.kafka.common.serialization.StringSerializer");
        properties.setProperty("bootstrap.servers","192.168.157.200:9092");

        KafkaProducer producer=new KafkaProducer(properties);

        ProducerRecord record=new ProducerRecord("test1","這是一條訊息");
        Future future = producer.send(record);
        //同步傳送訊息方法1
        Object o = future.get();
        
        //同步傳送訊息方法2
        producer.send(record);
        producer.flush();
        
        producer.close();
    }

生產者-非同步傳送

public void asyncSend(){
        Properties properties=new Properties();
        properties.setProperty("key.serializer","org.apache.kafka.common.serialization.StringSerializer");
        properties.setProperty("value.serializer","org.apache.kafka.common.serialization.StringSerializer");
        properties.setProperty("bootstrap.servers","192.168.157.200:9092");
       
        //生產者在傳送批次之前等待更多訊息加入批次的時間
        properties.setProperty("linger.ms","1");
        properties.setProperty("batch.size","20240");
        
        
        KafkaProducer producer=new KafkaProducer(properties);

        ProducerRecord record=new ProducerRecord("test1","這是一條訊息");
        //非同步傳送方法1
        producer.send(record);
        //非同步傳送方法2
        producer.send(record,((metadata, exception) -> {
            if(exception==null){
                System.out.println("record="+record.value());
            }
        }));
    }

生產者-順序保證

同步請求傳送+broker只能一個請求一個請求的接

 public void sequenceGuarantee(){
        Properties properties=new Properties();
        properties.setProperty("key.serializer","org.apache.kafka.common.serialization.StringSerializer");
        properties.setProperty("value.serializer","org.apache.kafka.common.serialization.StringSerializer");
        properties.setProperty("bootstrap.servers","192.168.157.200:9092");
        //生產者在收到伺服器響應之前可以傳送多少個訊息，保證一個一個的發
        properties.setProperty("max.in.flight.requests.per.connection","1");
        KafkaProducer producer=new KafkaProducer(properties);

        ProducerRecord record=new ProducerRecord("test1","這是一條訊息");
        //同步傳送
        producer.send(record);
        producer.flush();

        producer.close();
    }

生產者-訊息可靠性傳遞

事務+冪等

這裡的事務就是，傳送100條訊息，如果其中報錯了，那麼所有的訊息都不能被消費者讀取。

 public static void transaction(){
        Properties properties=new Properties();
        properties.setProperty("key.serializer","org.apache.kafka.common.serialization.StringSerializer");
        properties.setProperty("value.serializer","org.apache.kafka.common.serialization.StringSerializer");
        //重試次數
        properties.setProperty("retries","3");
        properties.setProperty("bootstrap.servers","192.168.157.200:9092");
        //生產者傳送訊息冪等，此時會預設把acks設定為all
        properties.setProperty("enable.idempotence","true");
        //事務id
        properties.setProperty("transactional.id","tx0001");
        ProducerRecord record=new ProducerRecord("test1","這是一條訊息");
        KafkaProducer producer=new KafkaProducer(properties);
        try {
            producer.initTransactions();
            producer.beginTransaction();
            for (int i = 0; i < 100; i++) {
                producer.send(record,(recordMetadata, e) -> {
                   if(e!=null){
                       producer.abortTransaction();
                       throw new KafkaException("send error"+e.getMessage());
                   }
                });
            }
            producer.commitTransaction();
        } catch (ProducerFencedException e) {
            producer.abortTransaction();
            e.printStackTrace();
        }
        producer.close();
    }

消費者特性

消費者-消費者組

每個消費者組都記錄了一個patition的offset，一個partition只能被一個消費者組消費。

如圖，一個Topic有4個partition，分別在兩個broker上。

對於消費者組A來說，他有兩個消費者，所以他裡面一個消費者消費2個partition。而對於消費者組B，他有4個消費者，所以一個消費者消費1個partition.

消費者-offset同步提交

void commitSyncReceive() throws InterruptedException {
        Properties props = new Properties();
        props.put("bootstrap.servers", "49.234.77.60:9092");
        props.put("group.id", "group_id");
        //關閉自動提交
        props.put("enable.auto.commit", "false");
        props.put("auto.commit.interval.ms", "1000");
        props.put("session.timeout.ms", "30000");
        props.put("max.poll.records", 1000);
        props.put("auto.offset.reset", "earliest");
        props.put("key.deserializer", StringDeserializer.class.getName());
        props.put("value.deserializer", StringDeserializer.class.getName());
        KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
        consumer.subscribe(Arrays.asList(TOPIC));

        while (true){
            ConsumerRecords<String, String> msgList=consumer.poll(1000);
            for (ConsumerRecord<String,String> record:msgList){
                System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
            }
            //同步提交，當前執行緒會阻塞直到 offset 提交成功
            consumer.commitSync();
        }

    }

消費者-非同步提交

void commitAsyncReceive() throws InterruptedException {
        Properties props = new Properties();
        props.put("bootstrap.servers", "49.234.77.60:9092");
        props.put("group.id", "group_id");
        props.put("enable.auto.commit", "false");
        props.put("auto.commit.interval.ms", "1000");
        props.put("session.timeout.ms", "30000");
        props.put("max.poll.records", 1000);
        props.put("auto.offset.reset", "earliest");
        props.put("key.deserializer", StringDeserializer.class.getName());
        props.put("value.deserializer", StringDeserializer.class.getName());
        KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
        consumer.subscribe(Arrays.asList(TOPIC));

        while (true){
            ConsumerRecords<String, String> msgList=consumer.poll(1000);
            for (ConsumerRecord<String,String> record:msgList){
                System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
            }
            //非同步提交
            consumer.commitAsync(new OffsetCommitCallback() {
                @Override
                public void onComplete(Map<TopicPartition, OffsetAndMetadata> map, Exception e) {
                    if(e!=null){
                        System.err.println("commit failed for "+map);
                    }
                }
            });
        }
    }

消費者-自定義儲存offset

void commitCustomSaveOffest() throws InterruptedException {
        Properties props = new Properties();
        props.put("bootstrap.servers", "49.234.77.60:9092");
        props.put("group.id", "group_id");
        props.put("enable.auto.commit", "false");
        props.put("auto.commit.interval.ms", "1000");
        props.put("session.timeout.ms", "30000");
        props.put("max.poll.records", 1000);
        props.put("auto.offset.reset", "earliest");
        props.put("key.deserializer", StringDeserializer.class.getName());
        props.put("value.deserializer", StringDeserializer.class.getName());
        KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
        consumer.subscribe(Arrays.asList(TOPIC), new ConsumerRebalanceListener() {
            //呼叫時機是Consumer停止拉取資料後，Rebalance開始之前，我們可以手動提交offset
            @Override
            public void onPartitionsRevoked(Collection<TopicPartition> collection) {
                
            }

            //呼叫時機是Rebalance之後，Consumer開始拉取資料之前，我們可以在此方法調整offset
            @Override
            public void onPartitionsAssigned(Collection<TopicPartition> collection) {

            }
        });

        while (true){
            ConsumerRecords<String, String> msgList=consumer.poll(1000);
            for (ConsumerRecord<String,String> record:msgList){
                System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
            }
            consumer.commitAsync(new OffsetCommitCallback() {
                @Override
                public void onComplete(Map<TopicPartition, OffsetAndMetadata> map, Exception e) {
                    if(e!=null){
                        System.err.println("commit failed for "+map);
                    }
                }
            });
        }

    }

四、SpringBoot整合Kafka

引入依賴

<dependency>
		<groupId>org.springframework.kafka</groupId>
		<artifactId>spring-kafka</artifactId>
</dependency>

配置


#kafka
spring.kafka.bootstrap-servers=192.168.157.200:9092
# 發生錯誤後，訊息重發的次數
spring.kafka.producer.retries=0
spring.kafka.producer.batch-size=16384
# 設定生產者記憶體緩衝區的大小。
spring.kafka.producer.buffer-memory=33554432
spring.kafka.producer.key-serializer=org.apache.kafka.common.serialization.StringSerializer
spring.kafka.producer.value-serializer=org.apache.kafka.common.serialization.StringSerializer
spring.kafka.producer.acks=1

#消費者
#自動提交的時間間隔
spring.kafka.consumer.auto-commit-interval=1S
# 該屬性指定了消費者在讀取一個沒有偏移量的分割槽或者偏移量無效的情況下該作何處理：
# latest（預設值）在偏移量無效的情況下，消費者將從最新的記錄開始讀取資料（在消費者啟動之後生成的記錄）
# earliest ：在偏移量無效的情況下，消費者將從起始位置讀取分割槽的記錄
spring.kafka.consumer.auto-offset-reset=earliest
spring.kafka.consumer.enable-auto-commit=false
spring.kafka.consumer.key-deserializer=org.apache.kafka.common.serialization.StringDeserializer
spring.kafka.consumer.value-deserializer=org.apache.kafka.common.serialization.StringDeserializer
# 在偵聽器容器中執行的執行緒數。
spring.kafka.listener.concurrency=5
#listner負責ack，每呼叫一次，就立即commit
spring.kafka.listener.ack-mode=manual_immediate
spring.kafka.listener.missing-topics-fatal=false

producer

@Component
public class MyKafkaProducer {


    @Autowired
    private KafkaTemplate<String,Object> kafkaTemplate;


    public void send(String topic,Object object){
        ListenableFuture<SendResult<String, Object>> future = kafkaTemplate.send(topic, object);
        future.addCallback(new ListenableFutureCallback<SendResult<String, Object>>() {
            @Override
            public void onFailure(Throwable ex) {
                System.out.println("傳送訊息失敗"+ex.getMessage());
            }

            @Override
            public void onSuccess(SendResult<String, Object> result) {
                System.out.println("傳送訊息成功"+result);
            }
        });
    }

consumer

@Component
public class MyKafkaConsumer {

    @KafkaListener(topics = "test1",groupId = "group_test")
    public void consumer(ConsumerRecord<?, ?> record, Acknowledgment ack, @Header(KafkaHeaders.RECEIVED_TOPIC) String topic){
        Optional message = Optional.ofNullable(record.value());
        if (message.isPresent()) {
            Object msg = message.get();
            System.out.println("group_test 消費了： Topic:" + topic + ",Message:" + msg);
            ack.acknowledge();
        }
    }

    @KafkaListener(topics = "test1",groupId = "group_test2")
    public void consumer2(ConsumerRecord<?, ?> record, Acknowledgment ack, @Header(KafkaHeaders.RECEIVED_TOPIC) String topic){
        Optional message = Optional.ofNullable(record.value());
        if (message.isPresent()) {
            Object msg = message.get();
            System.out.println("group_test2 消費了： Topic:" + topic + ",Message:" + msg);
            ack.acknowledge();
        }
    }

}

測試

超詳細的ChatGPT註冊教程來了
2022-12-16
ChatGPT
Kafka詳細教程加面試題
2021-09-06
Kafka面試題
Redis安裝教程（超詳細）
2022-05-05
Redis
超詳細教程：SpringBoot整合MybatisPlus
2021-09-11
Spring BootMyBatis
Springboot整合MybatisPlus（超詳細）完整教程~
2020-05-27
Spring BootMyBatis
Android Studio 超詳細安裝教程
2019-01-29
Android
RabbitMQ超詳細安裝教程（Linux）
2022-03-28
MQLinux
超詳細！Vuex手把手教程
2021-07-26
Vue
Nginx超詳細常用功能演示，夠用啦~~~
2021-05-18
Nginx
NumPy 超詳細教程（1）：NumPy 陣列
2019-03-15
陣列
Android Studio安裝教程（超級詳細）
2020-12-18
Android
Kafka詳細介紹
2018-09-05
Kafka
Kafka超詳細學習筆記【概念理解，安裝配置】
2020-12-26
Kafka筆記
NumPy 超詳細教程（2）：資料型別
2019-03-18
資料型別
秒懂系列，超詳細Java列舉教程！！！
2020-06-08
Java
linux shell 指令碼語言教程（超詳細！）
2024-10-24
Linux指令碼
超詳細！Vue-Router手把手教程
2021-08-09
Vue
用Windows電腦訓練深度學習模型？超詳細配置教程來了
2020-10-12
Windows深度學習模型
詳細解析kafka之kafka分割槽和副本
2021-09-09
Kafka
超詳細 Hexo + GitHub Page 搭建技術 blog 教程
2020-06-09
HexoGithub
超詳細！Apache Maven下載安裝使用教程
2024-11-27
ApacheMaven
小白必看！超詳細MySQL下載安裝教程
2021-09-11
MySql
webpack從0到1超詳細超基礎學習教程
2018-05-27
Web
:SpringBoot專案接入ELK超級版（超詳細圖文教程）
2020-09-23
Spring Boot
Nginx，Charles與Webpack配置前端API代理教程(超詳細)
2018-10-07
NginxWeb前端API
超詳細！Postman 安裝與漢化全流程教程
2024-11-26
Postman
windows10下載安裝Git教程[超詳細]
2021-12-10
WindowsGit
VMware安裝Ubuntu20（圖文教程，超詳細）
2022-05-02
Ubuntu
【超詳細】SQL Server2012 Express版本安裝教程
2021-09-09
SQLServerExpress
AlphaFold2無痛安裝教程(超級詳細)
2023-04-03
Vuex詳細教程
2020-08-01
Vue
Vagrant詳細教程
2022-05-09
Nginx 詳細教程
2022-03-16
Nginx
Emacs詳細教程
2020-12-26
Mac
一份超級詳細的Vue-cli3.0使用教程[趕緊來試試！]
2018-11-15
Vue
《Kafka實戰》之生產者API使用(引數解釋超詳細)
2018-08-26
KafkaAPI
用AI繪畫怎麼可以把照片一鍵生成漫畫圖？詳細ai繪畫教程來啦！
2022-12-16
AI
JWT 超詳細分析
2018-10-04
JWT