大廠面試Kafka,一定會問到的冪等性
本文來自靳剛同學,如需轉載請私聊授權
01 冪等性如此重要
Kafka作為分散式MQ,大量用於分散式系統中,如訊息推送系統、業務平臺系統(如結算平臺),就拿結算來說,業務方作為上游把資料打到結算平臺,如果一份資料被計算、處理了多次,產生的後果將會特別嚴重。
02 哪些因素影響冪等性
使用Kafka時,需要保證exactly-once語義。要知道在分散式系統中,出現網路分割槽是不可避免的,如果kafka broker 在回覆ack時,出現網路故障或者是full gc導致ack timeout,producer將會重發,如何保證producer重試時不造成重複or亂序?又或者producer 掛了,新的producer並沒有old producer的狀態資料,這個時候如何保證冪等?即使Kafka 傳送訊息滿足了冪等,consumer拉取到訊息後,把訊息交給執行緒池workers,workers執行緒對message的處理可能包含非同步操作,又會出現以下情況:
- 先commit,再執行業務邏輯:提交成功,處理失敗 。造成丟失
- 先執行業務邏輯,再commit:提交失敗,執行成功。造成重複執行
- 先執行業務邏輯,再commit:提交成功,非同步執行fail。造成丟失
本文將針對以上問題作出討論
03 Kafka保證傳送冪等性
針對以上的問題,kafka在0.11版新增了冪等型producer和事務型producer。前者解決了單會話冪等性等問題,後者解決了多會話冪等性。
單會話冪等性
為解決producer重試引起的亂序和重複。Kafka增加了pid和seq。Producer中每個RecordBatch都有一個單調遞增的seq; Broker上每個tp也會維護pid-seq的對映,並且每Commit都會更新lastSeq。這樣recordBatch到來時,broker會先檢查RecordBatch再儲存資料:如果batch中 baseSeq(第一條訊息的seq)比Broker維護的序號(lastSeq)大1,則儲存資料,否則不儲存(inSequence方法)。
ProducerStateManager.scala
private def maybeValidateAppend(producerEpoch: Short, firstSeq: Int, offset: Long): Unit = {
validationType match {
case ValidationType.None =>
case ValidationType.EpochOnly =>
checkProducerEpoch(producerEpoch, offset)
case ValidationType.Full =>
checkProducerEpoch(producerEpoch, offset)
checkSequence(producerEpoch, firstSeq, offset)
}
}
private def checkSequence(producerEpoch: Short, appendFirstSeq: Int, offset: Long): Unit = {
if (producerEpoch != updatedEntry.producerEpoch) {
if (appendFirstSeq != 0) {
if (updatedEntry.producerEpoch != RecordBatch.NO_PRODUCER_EPOCH) {
throw new OutOfOrderSequenceException(s"Invalid sequence number for new epoch at offset $offset in " +
s"partition $topicPartition: $producerEpoch (request epoch), $appendFirstSeq (seq. number)")
} else {
throw new UnknownProducerIdException(s"Found no record of producerId=$producerId on the broker at offset $offset" +
s"in partition $topicPartition. It is possible that the last message with the producerId=$producerId has " +
"been removed due to hitting the retention limit.")
}
}
} else {
val currentLastSeq = if (!updatedEntry.isEmpty)
updatedEntry.lastSeq
else if (producerEpoch == currentEntry.producerEpoch)
currentEntry.lastSeq
else
RecordBatch.NO_SEQUENCE
if (currentLastSeq == RecordBatch.NO_SEQUENCE && appendFirstSeq != 0) {
// We have a matching epoch, but we do not know the next sequence number. This case can happen if
// only a transaction marker is left in the log for this producer. We treat this as an unknown
// producer id error, so that the producer can check the log start offset for truncation and reset
// the sequence number. Note that this check follows the fencing check, so the marker still fences
// old producers even if it cannot determine our next expected sequence number.
throw new UnknownProducerIdException(s"Local producer state matches expected epoch $producerEpoch " +
s"for producerId=$producerId at offset $offset in partition $topicPartition, but the next expected " +
"sequence number is not known.")
} else if (!inSequence(currentLastSeq, appendFirstSeq)) {
throw new OutOfOrderSequenceException(s"Out of order sequence number for producerId $producerId at " +
s"offset $offset in partition $topicPartition: $appendFirstSeq (incoming seq. number), " +
s"$currentLastSeq (current end sequence number)")
}
}
}
private def inSequence(lastSeq: Int, nextSeq: Int): Boolean = {
nextSeq == lastSeq + 1L || (nextSeq == 0 && lastSeq == Int.MaxValue)
}
引申:Kafka producer 對有序性做了哪些處理
假設我們有5個請求,batch1、batch2、batch3、batch4、batch5;如果只有batch2 ack failed,3、4、5都儲存了,那2將會隨下次batch重發而造成重複。我們可以設定max.in.flight.requests.per.connection=1(客戶端在單個連線上能夠傳送的未響應請求的個數)來解決亂序,但降低了系統吞吐。
新版本kafka設定enable.idempotence=true後能夠動態調整max-in-flight-request。正常情況下max.in.flight.requests.per.connection大於1。當重試請求到來且時,batch 會根據 seq重新新增到佇列的合適位置,並把max.in.flight.requests.per.connection設為1,這樣它 前面的 batch序號都比它小,只有前面的都發完了,它才能發。
private void insertInSequenceOrder(Deque<ProducerBatch> deque, ProducerBatch batch) {
// When we are requeing and have enabled idempotence, the reenqueued batch must always have a sequence.
if (batch.baseSequence() == RecordBatch.NO_SEQUENCE)
throw new IllegalStateException("Trying to re-enqueue a batch which doesn't have a sequence even " +
"though idempotency is enabled.");
if (transactionManager.nextBatchBySequence(batch.topicPartition) == null)
throw new IllegalStateException("We are re-enqueueing a batch which is not tracked as part of the in flight " +
"requests. batch.topicPartition: " + batch.topicPartition + "; batch.baseSequence: " + batch.baseSequence());
ProducerBatch firstBatchInQueue = deque.peekFirst();
if (firstBatchInQueue != null && firstBatchInQueue.hasSequence() && firstBatchInQueue.baseSequence() < batch.baseSequence()) {
List<ProducerBatch> orderedBatches = new ArrayList<>();
while (deque.peekFirst() != null && deque.peekFirst().hasSequence() && deque.peekFirst().baseSequence() < batch.baseSequence())
orderedBatches.add(deque.pollFirst());
log.debug("Reordered incoming batch with sequence {} for partition {}. It was placed in the queue at " +
"position {}", batch.baseSequence(), batch.topicPartition, orderedBatches.size());
// Either we have reached a point where there are batches without a sequence (ie. never been drained
// and are hence in order by default), or the batch at the front of the queue has a sequence greater
// than the incoming batch. This is the right place to add the incoming batch.
deque.addFirst(batch);
// Now we have to re insert the previously queued batches in the right order.
for (int i = orderedBatches.size() - 1; i >= 0; --i) {
deque.addFirst(orderedBatches.get(i));
}
// At this point, the incoming batch has been queued in the correct place according to its sequence.
} else {
deque.addFirst(batch);
}
}
多會話冪等性
在單會話冪等性中介紹,kafka透過引入pid和seq來實現單會話冪等性,但正是引入了pid,當應用重啟時,新的producer並沒有old producer的狀態資料。可能重複儲存。
Kafka事務透過隔離機制來實現多會話冪等性
kafka事務引入了transactionId 和Epoch,設定transactional.id後,一個transactionId只對應一個pid, 且Server 端會記錄最新的 Epoch 值。這樣有新的producer初始化時,會向TransactionCoordinator傳送InitPIDRequest請求, TransactionCoordinator 已經有了這個 transactionId對應的 meta,會返回之前分配的 PID,並把 Epoch 自增 1 返回,這樣當old producer恢復過來請求操作時,將被認為是無效producer丟擲異常。 如果沒有開啟事務,TransactionCoordinator會為新的producer返回new pid,這樣就起不到隔離效果,因此無法實現多會話冪等。
private def maybeValidateAppend(producerEpoch: Short, firstSeq: Int, offset: Long): Unit = {
validationType match {
case ValidationType.None =>
case ValidationType.EpochOnly =>
checkProducerEpoch(producerEpoch, offset)
case ValidationType.Full => //開始事務,執行這個判斷
checkProducerEpoch(producerEpoch, offset)
checkSequence(producerEpoch, firstSeq, offset)
}
}
private def checkProducerEpoch(producerEpoch: Short, offset: Long): Unit = {
if (producerEpoch < updatedEntry.producerEpoch) {
throw new ProducerFencedException(s"Producer's epoch at offset $offset is no longer valid in " +
s"partition $topicPartition: $producerEpoch (request epoch), ${updatedEntry.producerEpoch} (current epoch)")
}
}
04 Consumer端冪等性
如上所述,consumer拉取到訊息後,把訊息交給執行緒池workers,workers對message的handle可能包含非同步操作,又會出現以下情況:
- 先commit,再執行業務邏輯:提交成功,處理失敗 。造成丟失
- 先執行業務邏輯,再commit:提交失敗,執行成功。造成重複執行
- 先執行業務邏輯,再commit:提交成功,非同步執行fail。造成丟失
對此我們常用的方法時,works取到訊息後先執行如下code:
if(cache.contain(msgId)){
// cache中包含msgId,已經處理過
continue;
}else {
lock.lock();
cache.put(msgId,timeout);
commitSync();
lock.unLock();
}
// 後續完成所有操作後,刪除cache中的msgId,只要msgId存在cache中,就認為已經處理過。Note:需要給cache設定有訊息
本文來自靳剛同學,如需轉載請私聊授權
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/4729/viewspace-2823489/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 【Java面試】什麼是冪等?如何解決冪等性問題?Java面試
- 答面試官問:怎麼實現介面冪等性面試
- 面試一定會問到的-js事件迴圈面試JS事件
- 邦芒面試:面試99%會被問到的問題,你一定要知道面試
- mongoDB中的冪等性MongoDB
- 聊聊開發中冪等性問題
- 最新Mysql大廠面試必會的34問題MySql面試
- 什麼是冪等性?四種介面冪等性方案詳解!
- 20道你必須要背會的微服務面試題,面試一定會被問到微服務面試題
- [java]如何裂解RESTful的冪等性JavaREST
- Kafka專題:5.kafka冪等傳送與事務Kafka
- commit操作是否一定會被記錄到redo等問題的研究MIT
- 介面冪等性如何實現?
- 分散式之介面冪等性分散式
- 介面冪等性解決方案
- 大廠必問的Redis面試題Redis面試題
- 騰訊二面:如何保證介面冪等性?高併發下的介面冪等性如何實現?
- 關於“屬性”的幾個問題,也許面試會問到哦~面試
- 如何保證介面的冪等性?
- 02 RESTFul介面和HTTP的冪等性分析RESTHTTP
- restful api設計中的冪等性的理解。RESTAPI
- 面試 Linux 運維一定會問到的24個問題,還不趕緊備下!面試Linux運維
- 架構師必備:系統性解決冪等問題架構
- gRPC重試與介面冪等性RPC
- 面試官問:Kafka 會不會丟訊息?怎麼處理的?面試Kafka
- 介面服務中的冪等性設計和防重保證,詳細分析冪等性的幾種實現方法
- 【工作篇】介面冪等問題探究
- 聊聊如何實現一個帶冪等模板的Kafka消費者Kafka
- Python面試你可能會被問到的面試題Python面試題
- 什麼是分散式系統中的冪等性分散式
- 高併發下的介面冪等性解決方案!
- 分散式系統中介面的冪等性分散式
- 十名科技業CEO面試時一定會問的問題面試
- 機器學習面試題,更有大廠內推機會機器學習面試題
- HTTP有哪些保證冪等性和安全性的方法? - mscharhagHTTP
- 前端面試中可能會問到的問題(一)前端面試
- 神奇的decimal,也許面試會問到哦~Decimal面試
- 大廠面試經:高頻率JVM面試問題整理!面試JVM