什麼是kafka

Apache Kafka® 是一個分散式流處理平臺

上面是官網的介紹，和一般的訊息處理系統相比，不同之處在於：

kafka是一個分散式系統，易於向外擴充套件
它同時為釋出和訂閱提供高吞吐量
它支援多訂閱者，當失敗時能自動平衡消費者
訊息的持久化

和其他的訊息系統之間的對比：

對比指標	kafka	activemq	rabbitmq	rocketmq
背景	Kafka 是LinkedIn 開發的一個高效能、分散式的訊息系統，廣泛用於日誌收集、流式資料處理、線上和離線訊息分發等場景	ActiveMQ是一種開源的，實現了JMS1.1規範的，面向訊息(MOM)的中介軟體，為應用程式提供高效的、可擴充套件的、穩定的和安全的企業級訊息通訊。	RabbitMQ是一個由erlang開發的AMQP協議（Advanced Message Queue ）的開源實現。	RocketMQ是阿里巴巴在2012年開源的分散式訊息中介軟體，目前已經捐贈給Apache基金會，已經於2016年11月成為 Apache 孵化專案
開發語言	Java、Scala	Java	Erlang	Java
協議支援	自己實現的一套	JMS協議	AMQP	JMS、MQTT
持久化	支援	支援	支援	支援
producer容錯	在kafka中提供了acks配置選項, acks=0 生產者在成功寫入悄息之前不會等待任何來自伺服器的響應 acks=1 只要叢集的首領節點收到訊息，生產者就會收到一個來自伺服器的成功響應 acks=all 只有當所有參與複製的節點全部收到訊息時，生產者才會收到一個來自伺服器的成功響應,這種模式最安全	傳送失敗後即可重試	有ack模型。 ack模型可能重複訊息，事務模型保證完全一致	和kafka類似
吞吐量	kafka具有高的吞吐量，內部採用訊息的批量處理，zero-copy機制，資料的儲存和獲取是本地磁碟順序批量操作，具有O(1)的複雜度，訊息處理的效率很高		rabbitMQ在吞吐量方面稍遜於kafka，他們的出發點不一樣，rabbitMQ支援對訊息的可靠的傳遞，支援事務，不支援批量的操作；基於儲存的可靠性的要求儲存可以採用記憶體或者硬碟。	kafka在topic數量不多的情況下吞吐量比rocketMq高，在topic數量多的情況下rocketMq比kafka高
負載均衡	kafka採用zookeeper對叢集中的broker、consumer進行管理，可以註冊topic到zookeeper上；通過zookeeper的協調機制，producer儲存對應topic的broker資訊，可以隨機或者輪詢傳送到broker上；並且producer可以基於語義指定分片，訊息傳送到broker的某分片上		rabbitMQ的負載均衡需要單獨的loadbalancer進行支援	NamerServer進行負載均衡

架構圖：

使用例項：

producer：

public class Producer extends Thread { 
   private final KafkaProducer<
Integer, String>
 producer;
    private final String topic;
    private final Boolean isAsync;
    public Producer(String topic, Boolean isAsync) { 
       Properties props = new Properties();
        props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, KafkaProperties.KAFKA_SERVER_URL + ":" + KafkaProperties.KAFKA_SERVER_PORT);
        props.put(ProducerConfig.CLIENT_ID_CONFIG, "DemoProducer");
        props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, IntegerSerializer.class.getName());
        props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        producer = new KafkaProducer<
>
(props);
        this.topic = topic;
        this.isAsync = isAsync;
    
  }    @Override    public void run() { 
       int messageNo = 1;
        while (true) { 
           String messageStr = "Message_" + messageNo;
            long startTime = System.currentTimeMillis();
            if (isAsync) { 
// Send asynchronously                producer.send(new ProducerRecord<
>
(topic,                        messageNo,                        messageStr), new DemoCallBack(startTime, messageNo, messageStr));
            
  } else { 
// Send synchronously                try { 
                   producer.send(new ProducerRecord<
>
(topic,                            messageNo,                            messageStr)).get();
                    System.out.println("Sent message: (" + messageNo + ", " + messageStr + ")");
                
  } catch (InterruptedException | ExecutionException e) { 
                   e.printStackTrace();
                
  }            
  }            ++messageNo;
        
  }    
  }    class DemoCallBack implements Callback { 
       private final long startTime;
        private final int key;
        private final String message;
        public DemoCallBack(long startTime, int key, String message) { 
           this.startTime = startTime;
            this.key = key;
            this.message = message;
        
  }        /**         * A callback method the user can implement to provide asynchronous handling of request completion. This method will         * be called when the record sent to the server has been acknowledged. Exactly one of the arguments will be         * non-null.         *         * @param metadata  The metadata for the record that was sent (i.e. the partition and offset). Null if an error         *                  occurred.         * @param exception The exception thrown during processing of this record. Null if no error occurred.         */        @Override        public void onCompletion(RecordMetadata metadata, Exception exception) { 
           long elapsedTime = System.currentTimeMillis() - startTime;
            if (metadata != null) { 
               System.out.println(                        "message(" + key + ", " + message + ") sent to partition(" + metadata.partition() +                                "), " +                                "offset(" + metadata.offset() + ") in " + elapsedTime + " ms");
            
  } else { 
               exception.printStackTrace();
            
  }        
  }    
  }
  }複製程式碼

consumer:

public class Consumer extends Thread { 
   private final KafkaConsumer<
Integer, String>
 consumer;
    private final String topic;
    public Consumer(String topic) { 
       Properties props = new Properties();
        props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, KafkaProperties.KAFKA_SERVER_URL + ":" + KafkaProperties.KAFKA_SERVER_PORT);
        props.put(ConsumerConfig.GROUP_ID_CONFIG, "DemoConsumer");
        props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "true");
        props.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, "1000");
        props.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, "30000");
        props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.IntegerDeserializer");
        props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer");
        consumer = new KafkaConsumer<
>
(props);
        this.topic = topic;
    
  }    @Override    public void run() { 
       while (true) { 
           consumer.subscribe(Collections.singletonList(this.topic));
            ConsumerRecords<
Integer, String>
 records = consumer.poll(Duration.ofSeconds(1).getSeconds());
            for (ConsumerRecord<
Integer, String>
 record : records) { 
               System.out.println("Received message: (" + record.key() + ", " + record.value() + ") at offset " + record.offset());
            
  }        
  }    
  }
  }複製程式碼

properties:

public class KafkaProperties { 
   public static final String TOPIC = "topic1";
    public static final String KAFKA_SERVER_URL = "localhost";
    public static final int KAFKA_SERVER_PORT = 9092;
    public static final int KAFKA_PRODUCER_BUFFER_SIZE = 64 * 1024;
    public static final int CONNECTION_TIMEOUT = 100000;
    public static final String TOPIC2 = "topic2";
    public static final String TOPIC3 = "topic3";
    public static final String CLIENT_ID = "SimpleConsumerDemoClient";
    private KafkaProperties() {
  }
  }複製程式碼

相關名詞：

Producer :訊息生產者，向Broker傳送訊息的客戶端
Consumer :訊息消費者，從Broker讀取訊息的客戶端,消費者<
=訊息的分割槽數量
broker :訊息中介軟體處理節點，一個Kafka節點就是一個broker，一個或者多個Broker可以組成一個Kafka叢集
topic : 主題，Kafka根據topic對訊息進行歸類，釋出到Kafka叢集的每條訊息都需要指定一個topic
Partition : 分割槽，物理上的概念，一個topic可以分為多個partition，每個partition內部是有序的，kafka預設根據key%partithon確定訊息傳送到具體的partition
ConsumerGroup : 每個Consumer屬於一個特定的Consumer Group，一條訊息可以傳送到多個不同的Consumer Group，但是一個Consumer Group中只能有一個Consumer能夠消費該訊息

Topic 和 Partition

一個Topic中的訊息會按照指定的規則(預設是key的hash值%分割槽的數量，當然你也可以自定義)，傳送到某一個分割槽上面；
每一個分割槽都是一個順序的、不可變的訊息佇列，並且可以持續的新增。分割槽中的訊息都被分了一個序列號，稱之為偏移量(offset)，在每個分割槽中此偏移量都是唯一的
消費者所持有的後設資料就是這個偏移量，也就是消費者在這個log（分割槽）中的位置。這個偏移量由消費者控制：正常情況當消費者消費訊息的時候，偏移量也線性的的增加

Consumer 和 Partition

通常來講，訊息模型可以分為兩種，佇列和釋出-訂閱式。佇列的處理方式是一個消費者組從佇列的一端拉取資料，這個資料消費完就沒了。在釋出-訂閱模型中，訊息被廣播給所有的消費者，接受到訊息的消費者都能處理此訊息。在Kafka模型中抽象出來了：消費者組（consumer group）
消費者組（consumer group）：每個組中有若干個消費者，如果所有的消費者都在一個組中，那麼這個就變成了佇列模型；如果笑消費者在不同的組中，這就成了釋出-訂閱模型
一個分割槽裡面的資料只會由一個分組中的消費者處理，同分組的其他消費者不會重複處理
消費者組中的消費者數量<
=分割槽數量，如果大於分割槽數量，多出來的消費者會處於收不到訊息的狀態，造成不必要的浪費。

最後

小尾巴走一波，歡迎關注我的公眾號，不定期分享程式設計、投資、生活方面的感悟:)

來源：https://juejin.im/post/5b8f85a1e51d450e4d2f8eca#comment

Kafka簡單入門

什麼是kafka

使用例項：

producer：

consumer:

properties:

相關名詞：

Topic 和 Partition

Consumer 和 Partition

最後

相關文章