Kafka的概念和入門
Kafka是一個訊息系統。由LinkedIn於2011年設計開發。
Kafka是一種分散式的,基於釋出/訂閱的訊息系統。主要設計目標如下:
- 以時間複雜度O(1)的方式提供訊息持久化能力,即使對TB級以上資料頁能保證常數時間複雜度的訪問效能。
- 高吞吐率。即使在非常廉價的商用機器上也能做到單機支援每秒100K條以上的訊息傳輸。
- 支援Kafka Server間的訊息分割槽,及分散式消費,同時保證每個Partition內的訊息順序傳輸。
- 同時支援離線資料處理和實時資料處理。
- 支援線上水平擴充套件。
消費者是採用pull模式從Broker訂閱訊息。
模式 | 優點 | 缺點 |
---|---|---|
pull模式 | 消費者可以根據自己的消費能力決定拉取的策略 | 沒有訊息的時候會空輪詢(kafka為了避免,有個引數可以阻塞直到新訊息到達) |
push模式 | 及時性高 | 消費能力遠低於生產能力時,就會導致客戶端訊息堆積,甚至服務崩潰。服務端需要維護每次傳輸狀態,以防訊息傳遞失敗好進行重試。 |
Kafka的基本概念
- Broker:Kafka叢集包含一個或多個伺服器,這種伺服器被稱為broker
- Topic:每條釋出到Kafka叢集的訊息都有一個類別,這個類別被稱為Topic。
- Partition:Partition是物理上的概念,每個Topic包含一個或多個Partition
- Producer:負責釋出訊息到Kafka broker
- Consumer:訊息消費者,向Kafka broker讀取訊息的客戶端
- Consumer Group:每個Consumer屬於一個特定的Consumer Group(可為每個Consumer指定group name,若不指定group name則屬於預設的group)
單機部署結構
叢集部署結構
Topic和Partition
一個Topic可以包含一個或者多個Partition。因為一臺機器的儲存和效能是有限的,多個Partition是為了支援水平擴充套件和並行處理。
Partition和Replica
分為多個Partition可以將訊息壓力分到多個機器上,但是如果其中一個partition資料丟了,那總體資料就少了一塊。所以又引入了Replica的概念。
每個partition都可以通過副本因子新增多個副本。這樣就算有一臺機器故障了,其他機器上也有備份的資料
叢集環境下的3分割槽,3副本:
原始碼圖:
二、安裝部署
我用的2.7.0版本,下載後解壓
注意選擇Binary downloads而不是Source download
2. 進入conf/server.properties檔案,開啟如下配置
listeners=PLAINTEXT://localhost:9092
- 啟動zookeeper
自行安裝,我用的3.7版本的zookeeper
4. 啟動kafka
bin/kafka-server-start.sh config/server.properties
Kafka命令列
檢視topic
bin/kafka-topics.sh --zookeeper localhost:2181 --list
建立topic
bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic test1 --partitions 4 --replication-factor 1
檢視topic資訊
bin/kafka-topics.sh --zookeeper localhost:2181 --describe --topic test1
消費命令
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic test1
生產命令
bin/kafka-console-producer.sh --bootstrap-server localhost:9092 --topic test1
簡單效能測試:
bin/kafka-producer-perf-test.sh --topic test1 --num-records 100000 --record-size 1000 --throughput 2000 --producer-props bootstrap.servers=localhost:9092
bin/kafka-consumer-perf-test.sh --bootstrap-server localhost:9092 --topic test1 -- fetch-size 1048576 --messages 100000 --threads 1
Java客戶端
生產者:
public class SimpleKafkaProducer {
public static void main(String[] args) {
Properties properties=new Properties();
properties.setProperty("key.serializer","org.apache.kafka.common.serialization.StringSerializer");
properties.setProperty("value.serializer","org.apache.kafka.common.serialization.StringSerializer");
properties.setProperty("bootstrap.servers","192.168.157.200:9092");
KafkaProducer producer=new KafkaProducer(properties);
ProducerRecord record=new ProducerRecord("test1","這是一條訊息");
producer.send(record);
producer.close();
}
}
消費者
public class SimpleKafkaConsumer {
public static void main(String[] args) {
Properties properties=new Properties();
properties.setProperty("key.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
properties.setProperty("value.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
properties.setProperty("bootstrap.servers","192.168.157.200:9092");
//消費者組
properties.setProperty("group.id","group1");
KafkaConsumer consumer=new KafkaConsumer(properties);
//訂閱topic
consumer.subscribe(Arrays.asList("test1"));
while (true){
//拉取資料
ConsumerRecords poll=consumer.poll(100);
((ConsumerRecords) poll).forEach(
data->{
System.out.println(((ConsumerRecord)data).value());
}
);
}
}
}
三、高階特性
生產者特性
生產者-確認模式
- acks=0 :只傳送不管有沒有寫入到broker
- acks=1:只寫入到leader就認為成功
- acks=-1/all:要求ISR列表裡所有follower都同步過去,才算成功
將acks設定為-1就一定能保證訊息不丟嗎?
答:不是的。如果partition只有一個副本,也就是光有leader沒有follower,那麼當機了訊息一樣會丟失。所以至少也要設定2個及以上的副本才行。
另外,要提高資料可靠性,設定acks=-1的同時,也要設定min.insync.replicas(最小副本數,預設1)
生產者-同步傳送
public void syncSend() throws ExecutionException, InterruptedException {
Properties properties=new Properties();
properties.setProperty("key.serializer","org.apache.kafka.common.serialization.StringSerializer");
properties.setProperty("value.serializer","org.apache.kafka.common.serialization.StringSerializer");
properties.setProperty("bootstrap.servers","192.168.157.200:9092");
KafkaProducer producer=new KafkaProducer(properties);
ProducerRecord record=new ProducerRecord("test1","這是一條訊息");
Future future = producer.send(record);
//同步傳送訊息方法1
Object o = future.get();
//同步傳送訊息方法2
producer.send(record);
producer.flush();
producer.close();
}
生產者-非同步傳送
public void asyncSend(){
Properties properties=new Properties();
properties.setProperty("key.serializer","org.apache.kafka.common.serialization.StringSerializer");
properties.setProperty("value.serializer","org.apache.kafka.common.serialization.StringSerializer");
properties.setProperty("bootstrap.servers","192.168.157.200:9092");
//生產者在傳送批次之前等待更多訊息加入批次的時間
properties.setProperty("linger.ms","1");
properties.setProperty("batch.size","20240");
KafkaProducer producer=new KafkaProducer(properties);
ProducerRecord record=new ProducerRecord("test1","這是一條訊息");
//非同步傳送方法1
producer.send(record);
//非同步傳送方法2
producer.send(record,((metadata, exception) -> {
if(exception==null){
System.out.println("record="+record.value());
}
}));
}
生產者-順序保證
同步請求傳送+broker只能一個請求一個請求的接
public void sequenceGuarantee(){
Properties properties=new Properties();
properties.setProperty("key.serializer","org.apache.kafka.common.serialization.StringSerializer");
properties.setProperty("value.serializer","org.apache.kafka.common.serialization.StringSerializer");
properties.setProperty("bootstrap.servers","192.168.157.200:9092");
//生產者在收到伺服器響應之前可以傳送多少個訊息,保證一個一個的發
properties.setProperty("max.in.flight.requests.per.connection","1");
KafkaProducer producer=new KafkaProducer(properties);
ProducerRecord record=new ProducerRecord("test1","這是一條訊息");
//同步傳送
producer.send(record);
producer.flush();
producer.close();
}
生產者-訊息可靠性傳遞
事務+冪等
這裡的事務就是,傳送100條訊息,如果其中報錯了,那麼所有的訊息都不能被消費者讀取。
public static void transaction(){
Properties properties=new Properties();
properties.setProperty("key.serializer","org.apache.kafka.common.serialization.StringSerializer");
properties.setProperty("value.serializer","org.apache.kafka.common.serialization.StringSerializer");
//重試次數
properties.setProperty("retries","3");
properties.setProperty("bootstrap.servers","192.168.157.200:9092");
//生產者傳送訊息冪等,此時會預設把acks設定為all
properties.setProperty("enable.idempotence","true");
//事務id
properties.setProperty("transactional.id","tx0001");
ProducerRecord record=new ProducerRecord("test1","這是一條訊息");
KafkaProducer producer=new KafkaProducer(properties);
try {
producer.initTransactions();
producer.beginTransaction();
for (int i = 0; i < 100; i++) {
producer.send(record,(recordMetadata, e) -> {
if(e!=null){
producer.abortTransaction();
throw new KafkaException("send error"+e.getMessage());
}
});
}
producer.commitTransaction();
} catch (ProducerFencedException e) {
producer.abortTransaction();
e.printStackTrace();
}
producer.close();
}
消費者特性
消費者-消費者組
每個消費者組都記錄了一個patition的offset,一個partition只能被一個消費者組消費。
如圖,一個Topic有4個partition,分別在兩個broker上。
對於消費者組A來說,他有兩個消費者,所以他裡面一個消費者消費2個partition。而對於消費者組B,他有4個消費者,所以一個消費者消費1個partition.
消費者-offset同步提交
void commitSyncReceive() throws InterruptedException {
Properties props = new Properties();
props.put("bootstrap.servers", "49.234.77.60:9092");
props.put("group.id", "group_id");
//關閉自動提交
props.put("enable.auto.commit", "false");
props.put("auto.commit.interval.ms", "1000");
props.put("session.timeout.ms", "30000");
props.put("max.poll.records", 1000);
props.put("auto.offset.reset", "earliest");
props.put("key.deserializer", StringDeserializer.class.getName());
props.put("value.deserializer", StringDeserializer.class.getName());
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
consumer.subscribe(Arrays.asList(TOPIC));
while (true){
ConsumerRecords<String, String> msgList=consumer.poll(1000);
for (ConsumerRecord<String,String> record:msgList){
System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
}
//同步提交,當前執行緒會阻塞直到 offset 提交成功
consumer.commitSync();
}
}
消費者-非同步提交
void commitAsyncReceive() throws InterruptedException {
Properties props = new Properties();
props.put("bootstrap.servers", "49.234.77.60:9092");
props.put("group.id", "group_id");
props.put("enable.auto.commit", "false");
props.put("auto.commit.interval.ms", "1000");
props.put("session.timeout.ms", "30000");
props.put("max.poll.records", 1000);
props.put("auto.offset.reset", "earliest");
props.put("key.deserializer", StringDeserializer.class.getName());
props.put("value.deserializer", StringDeserializer.class.getName());
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
consumer.subscribe(Arrays.asList(TOPIC));
while (true){
ConsumerRecords<String, String> msgList=consumer.poll(1000);
for (ConsumerRecord<String,String> record:msgList){
System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
}
//非同步提交
consumer.commitAsync(new OffsetCommitCallback() {
@Override
public void onComplete(Map<TopicPartition, OffsetAndMetadata> map, Exception e) {
if(e!=null){
System.err.println("commit failed for "+map);
}
}
});
}
}
消費者-自定義儲存offset
void commitCustomSaveOffest() throws InterruptedException {
Properties props = new Properties();
props.put("bootstrap.servers", "49.234.77.60:9092");
props.put("group.id", "group_id");
props.put("enable.auto.commit", "false");
props.put("auto.commit.interval.ms", "1000");
props.put("session.timeout.ms", "30000");
props.put("max.poll.records", 1000);
props.put("auto.offset.reset", "earliest");
props.put("key.deserializer", StringDeserializer.class.getName());
props.put("value.deserializer", StringDeserializer.class.getName());
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
consumer.subscribe(Arrays.asList(TOPIC), new ConsumerRebalanceListener() {
//呼叫時機是Consumer停止拉取資料後,Rebalance開始之前,我們可以手動提交offset
@Override
public void onPartitionsRevoked(Collection<TopicPartition> collection) {
}
//呼叫時機是Rebalance之後,Consumer開始拉取資料之前,我們可以在此方法調整offset
@Override
public void onPartitionsAssigned(Collection<TopicPartition> collection) {
}
});
while (true){
ConsumerRecords<String, String> msgList=consumer.poll(1000);
for (ConsumerRecord<String,String> record:msgList){
System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
}
consumer.commitAsync(new OffsetCommitCallback() {
@Override
public void onComplete(Map<TopicPartition, OffsetAndMetadata> map, Exception e) {
if(e!=null){
System.err.println("commit failed for "+map);
}
}
});
}
}
四、SpringBoot整合Kafka
- 引入依賴
<dependency>
<groupId>org.springframework.kafka</groupId>
<artifactId>spring-kafka</artifactId>
</dependency>
- 配置
#kafka
spring.kafka.bootstrap-servers=192.168.157.200:9092
# 發生錯誤後,訊息重發的次數
spring.kafka.producer.retries=0
spring.kafka.producer.batch-size=16384
# 設定生產者記憶體緩衝區的大小。
spring.kafka.producer.buffer-memory=33554432
spring.kafka.producer.key-serializer=org.apache.kafka.common.serialization.StringSerializer
spring.kafka.producer.value-serializer=org.apache.kafka.common.serialization.StringSerializer
spring.kafka.producer.acks=1
#消費者
#自動提交的時間間隔
spring.kafka.consumer.auto-commit-interval=1S
# 該屬性指定了消費者在讀取一個沒有偏移量的分割槽或者偏移量無效的情況下該作何處理:
# latest(預設值)在偏移量無效的情況下,消費者將從最新的記錄開始讀取資料(在消費者啟動之後生成的記錄)
# earliest :在偏移量無效的情況下,消費者將從起始位置讀取分割槽的記錄
spring.kafka.consumer.auto-offset-reset=earliest
spring.kafka.consumer.enable-auto-commit=false
spring.kafka.consumer.key-deserializer=org.apache.kafka.common.serialization.StringDeserializer
spring.kafka.consumer.value-deserializer=org.apache.kafka.common.serialization.StringDeserializer
# 在偵聽器容器中執行的執行緒數。
spring.kafka.listener.concurrency=5
#listner負責ack,每呼叫一次,就立即commit
spring.kafka.listener.ack-mode=manual_immediate
spring.kafka.listener.missing-topics-fatal=false
- producer
@Component
public class MyKafkaProducer {
@Autowired
private KafkaTemplate<String,Object> kafkaTemplate;
public void send(String topic,Object object){
ListenableFuture<SendResult<String, Object>> future = kafkaTemplate.send(topic, object);
future.addCallback(new ListenableFutureCallback<SendResult<String, Object>>() {
@Override
public void onFailure(Throwable ex) {
System.out.println("傳送訊息失敗"+ex.getMessage());
}
@Override
public void onSuccess(SendResult<String, Object> result) {
System.out.println("傳送訊息成功"+result);
}
});
}
- consumer
@Component
public class MyKafkaConsumer {
@KafkaListener(topics = "test1",groupId = "group_test")
public void consumer(ConsumerRecord<?, ?> record, Acknowledgment ack, @Header(KafkaHeaders.RECEIVED_TOPIC) String topic){
Optional message = Optional.ofNullable(record.value());
if (message.isPresent()) {
Object msg = message.get();
System.out.println("group_test 消費了: Topic:" + topic + ",Message:" + msg);
ack.acknowledge();
}
}
@KafkaListener(topics = "test1",groupId = "group_test2")
public void consumer2(ConsumerRecord<?, ?> record, Acknowledgment ack, @Header(KafkaHeaders.RECEIVED_TOPIC) String topic){
Optional message = Optional.ofNullable(record.value());
if (message.isPresent()) {
Object msg = message.get();
System.out.println("group_test2 消費了: Topic:" + topic + ",Message:" + msg);
ack.acknowledge();
}
}
}
- 測試