kafka原始碼剖析(三)之日誌管理-LogManager

全面攻城發表於2018-03-19

1 入口

/* start log manager */
        // 啟動日誌管理模組
        logManager = LogManager(config, zkUtils, brokerState, kafkaScheduler, time, brokerTopicStats)
        logManager.startup()

複製程式碼

2 開啟程式碼

/**
   *  Start the background threads to flush logs and do log cleanup
    *  啟動後臺執行緒來沖洗日誌和日誌清理 依賴多執行緒
   */
  def startup() {
    /* Schedule the cleanup task to delete old logs */
    if(scheduler != null) {
      info("Starting log cleanup with a period of %d ms.".format(retentionCheckMs))
      scheduler.schedule("kafka-log-retention",
                         cleanupLogs _,
                         delay = InitialTaskDelayMs,
                         period = retentionCheckMs,
                         TimeUnit.MILLISECONDS)
      info("Starting log flusher with a default period of %d ms.".format(flushCheckMs))
      scheduler.schedule("kafka-log-flusher",
                         flushDirtyLogs _,
                         delay = InitialTaskDelayMs,
                         period = flushCheckMs,
                         TimeUnit.MILLISECONDS)
      scheduler.schedule("kafka-recovery-point-checkpoint",
                         checkpointRecoveryPointOffsets _,
                         delay = InitialTaskDelayMs,
                         period = flushRecoveryOffsetCheckpointMs,
                         TimeUnit.MILLISECONDS)
      scheduler.schedule("kafka-log-start-offset-checkpoint",
                         checkpointLogStartOffsets _,
                         delay = InitialTaskDelayMs,
                         period = flushStartOffsetCheckpointMs,
                         TimeUnit.MILLISECONDS)
      scheduler.schedule("kafka-delete-logs",
                         deleteLogs _,
                         delay = InitialTaskDelayMs,
                         period = defaultConfig.fileDeleteDelayMs,
                         TimeUnit.MILLISECONDS)
    }
    if(cleanerConfig.enableCleaner)
      cleaner.startup()
  }

複製程式碼

3核心程式碼

image.png

3.1 相關配置資訊

  • 配置項log.cleaner.threads,預設值1.用於配置清理過期日誌的執行緒個數(用於日誌合併).

  • 配置項log.cleaner.dedupe.buffer.size,預設值128MB,用於配置清理過期資料的記憶體緩衝區,這個用於資料清理時,選擇的壓縮方式時,用於對重複資料的清理排序記憶體,用於日誌合併.

  • 配置項log.cleaner.io.buffer.load.factor,預設值0.9,用於配置清理記憶體緩衝區的資料裝載因子,主要是用於hash,這個因子越小,對桶的重複可能越小,但記憶體佔用越大,用於日誌合併.

  • 配置項log.cleaner.io.buffer.size,預設值512KB,用於清理過期資料的IO緩衝區大小,用於日誌合併.

  • 配置項message.max.bytes,預設值1000012位元組,用於設定單條資料的最大大小.

  • 配置項log.cleaner.io.max.bytes.per.second,用於控制過期資料清理時的IO速度限制,預設不限制速度,用於日誌合併.

  • 配置項log.cleaner.backoff.ms,用於定時檢查日誌是否需要清理的時間間隔(這個主要是在日誌合併時使用),預設是15秒.

  • 配置項log.cleaner.enable,是否啟用日誌的定時清理,預設是啟用.

  • 配置項num.recovery.threads.per.data.dir,用於在啟動時,用於日誌恢復的執行緒個數,預設是1.

  • 配置項log.flush.scheduler.interval.ms,用於檢查日誌是否被flush到磁碟,預設不檢查.

  • 配置項log.flush.offset.checkpoint.interval.ms,用於定時對partition的offset進行儲存的時間間隔,預設值60000ms.

  • 配置項log.retention.check.interval.ms,定期檢查保留日誌的時間間隔,預設值5分鐘.

3.2 啟動步驟zk 模組

// 首先先在zk 讀取日誌  這塊就不多解釋了 
   val cleanerConfig = CleanerConfig(numThreads = config.logCleanerThreads,
     dedupeBufferSize = config.logCleanerDedupeBufferSize,
     dedupeBufferLoadFactor = config.logCleanerDedupeBufferLoadFactor,
     ioBufferSize = config.logCleanerIoBufferSize,
     maxMessageSize = config.messageMaxBytes,
     maxIoBytesPerSecond = config.logCleanerIoMaxBytesPerSecond,
     backOffMs = config.logCleanerBackoffMs,
     enableCleaner = config.logCleanerEnable)

   new LogManager(logDirs = config.logDirs.map(new File(_)).toArray,
     topicConfigs = topicConfigs,
     defaultConfig = defaultLogConfig,
     cleanerConfig = cleanerConfig,
     ioThreads = config.numRecoveryThreadsPerDataDir,
     flushCheckMs = config.logFlushSchedulerIntervalMs,
     flushRecoveryOffsetCheckpointMs = config.logFlushOffsetCheckpointIntervalMs,
     flushStartOffsetCheckpointMs = config.logFlushStartOffsetCheckpointIntervalMs,
     retentionCheckMs = config.logCleanupIntervalMs,
     maxPidExpirationMs = config.transactionIdExpirationMs,
     scheduler = kafkaScheduler,
     brokerState = brokerState,
     time = time,
     brokerTopicStats = brokerTopicStats)
 }
複製程式碼

3.3 啟動執行流程

threadsafe
class LogManager(val logDirs: Array[File],
                 val topicConfigs: Map[String, LogConfig], // note that this doesn't get updated after creation
                 val defaultConfig: LogConfig,
                 val cleanerConfig: CleanerConfig,
                 ioThreads: Int,
                 val flushCheckMs: Long,
                 val flushRecoveryOffsetCheckpointMs: Long,
                 val flushStartOffsetCheckpointMs: Long,
                 val retentionCheckMs: Long,
                 val maxPidExpirationMs: Int,
                 scheduler: Scheduler,
                 val brokerState: BrokerState,
                 brokerTopicStats: BrokerTopicStats,
                 time: Time) extends Logging {
  val RecoveryPointCheckpointFile = "recovery-point-offset-checkpoint"
  val LogStartOffsetCheckpointFile = "log-start-offset-checkpoint"
  val LockFile = ".lock"
  val InitialTaskDelayMs = 30*1000

  private val logCreationOrDeletionLock = new Object
  private val logs = new Pool[TopicPartition, Log]()
  private val logsToBeDeleted = new LinkedBlockingQueue[Log]()
//  檢查日誌目錄是否被建立,如果沒有建立目錄,同時檢查目錄是否有讀寫的許可權.
  createAndValidateLogDirs(logDirs)
//  生成每個目錄的.lock檔案,並通過這個檔案鎖定這個目錄.
  private val dirLocks = lockLogDirs(logDirs)
//  根據每個目錄下的recovery-point-offset-checkpoint檔案,生成出checkpoints的集合.這個用於定期更新每個partition的offset記錄.
  private val recoveryPointCheckpoints = logDirs.map(dir => (dir, new OffsetCheckpointFile(new File(dir, RecoveryPointCheckpointFile)))).toMap
  private val logStartOffsetCheckpoints = logDirs.map(dir => (dir, new OffsetCheckpointFile(new File(dir, LogStartOffsetCheckpointFile)))).toMap
//  根據每一個目錄,生成一個執行緒池,執行緒池的大小是num.recovery.threads.per.data.dir配置的值,
//  讀取每個目錄下的topic-partitionid的目錄,並根據zk中針對此topic的配置檔案(或者預設的配置檔案),通過offset-checkpoint中記錄的此partition對應的offset,生成Log例項.並通過執行緒池來執行Log例項的載入,也就是日誌的恢復.

  loadLogs()

  // public, so we can access this from kafka.admin.DeleteTopicTest
  val cleaner: LogCleaner =
    if(cleanerConfig.enableCleaner)
      new LogCleaner(cleanerConfig, logDirs, logs, time = time)
    else
      null
複製程式碼

3.4 清理過期日誌

/**
  * Runs through the log removing segments older than a certain age
  */
 private def cleanupExpiredSegments(log: Log): Int = {
   if (log.config.retentionMs < 0)
     return 0
   val startMs = time.milliseconds
   log.deleteOldSegments(startMs - _.lastModified > log.config.retentionMs)
 }
複製程式碼

這塊又涉及到一個配置:retention.ms,這個參數列示日誌儲存的時間。如果小於0,表示永不失效,也就沒有了刪除這一說。

當然,如果檔案的修改時間跟當前時間差,大於設定的日誌儲存時間,就要執行刪除動作了。具體的刪除方法為:

  /**
   * Delete any log segments matching the given predicate function,
   * starting with the oldest segment and moving forward until a segment doesn't match.
   * @param predicate A function that takes in a single log segment and returns true iff it is deletable
   * @return The number of segments deleted
   */
  def deleteOldSegments(predicate: LogSegment => Boolean): Int = {
    lock synchronized {
      //find any segments that match the user-supplied predicate UNLESS it is the final segment
      //and it is empty (since we would just end up re-creating it)
      val lastEntry = segments.lastEntry
      val deletable =
        if (lastEntry == null) Seq.empty
        else logSegments.takeWhile(s => predicate(s) && (s.baseOffset != lastEntry.getValue.baseOffset || s.size > 0))
      val numToDelete = deletable.size
      if (numToDelete > 0) {
        // we must always have at least one segment, so if we are going to delete all the segments, create a new one first
        if (segments.size == numToDelete)
          roll()
        // remove the segments for lookups
        deletable.foreach(deleteSegment(_))
      }
      numToDelete
    }
  }
複製程式碼

這塊的邏輯是:根據傳入的predicate來判斷哪些日誌符合被刪除的要求,放入到deletable中,最後遍歷deletable,進行刪除操作。

private def deleteSegment(segment: LogSegment) {
  info("Scheduling log segment %d for log %s for deletion.".format(segment.baseOffset, name))
  lock synchronized {
    segments.remove(segment.baseOffset)
    asyncDeleteSegment(segment)
  }
}

  private def asyncDeleteSegment(segment: LogSegment) {
  segment.changeFileSuffixes("", Log.DeletedFileSuffix)
  def deleteSeg() {
    info("Deleting segment %d from log %s.".format(segment.baseOffset, name))
    segment.delete()
  }
  scheduler.schedule("delete-file", deleteSeg, delay = config.fileDeleteDelayMs)
}

複製程式碼

這塊是一個非同步刪除檔案的過程,包含一個配置:file.delete.delay.ms。表示每隔多久刪除一次日誌檔案。刪除的過程是先把日誌的字尾改為.delete,然後定時刪除。

3.5 清理過大日誌

/**
  *  Runs through the log removing segments until the size of the log
  *  is at least logRetentionSize bytes in size
  */
 private def cleanupSegmentsToMaintainSize(log: Log): Int = {
   if(log.config.retentionSize < 0 || log.size < log.config.retentionSize)
     return 0
   var diff = log.size - log.config.retentionSize
   def shouldDelete(segment: LogSegment) = {
     if(diff - segment.size >= 0) {
       diff -= segment.size
       true
     } else {
       false
     }
   }
   log.deleteOldSegments(shouldDelete)
 }
複製程式碼

這塊程式碼比較清晰,如果日誌大小大於retention.bytes,那麼就會被標記為待刪除,然後呼叫的方法是一樣的,也是deleteOldSegments。就不贅述了。

3.6 定期對log的磁碟緩衝區進行flush:

這個通過後臺的排程元件定期去執行LogManager中的flushDirtyLogs的函式,

這個函式中迭代所有的partition的log,並執行flush的操作,這個操作中通過當前最後一個offset找到上一次進行checkpoint的offset與當前的offset中間的segment,並執行segment中log與index的flush操作.對應log檔案執行檔案管道的force函式,對於index檔案,執行檔案管道map的force函式.

private def flushDirtyLogs() = {
  debug("Checking for dirty logs to flush...")

  for ((topicAndPartition, log) <- logs) {
    try {
      val timeSinceLastFlush = time.milliseconds - log.lastFlushTime
      debug("Checking if flush is needed on " + topicAndPartition.topic 

           + " flush interval  " + log.config.flushMs +
            " last flushed " + log.lastFlushTime + " time since last flush: " 

           + timeSinceLastFlush)
      if(timeSinceLastFlush >= log.config.flushMs)
        log.flush
    } catch {
      case e: Throwable =>
        error("Error flushing topic " + topicAndPartition.topic, e)
    }
  }
}

複製程式碼

3.7 定期對partition的offset進行checkpoint操作:

這個通過後臺的排程元件定期去

執行LogManager中的checkpointRecoveryPointOffsets的函式,

def checkpointRecoveryPointOffsets() {
  this.logDirs.foreach(checkpointLogsInDir)
}
複製程式碼

這裡對每個dir中儲存的partition的最後一個offset進行checkpoint的操作.

在這個函式中,迭代每個dir中對應的partition的offset記錄到對應目錄下的checkpoint檔案中.

第一行寫入一個0,表示是checkpoint檔案的版本.

第二行寫入的是partition的個數,當前checkpoint時,這個dir已經存在資料的partition的個數.

後面對應第二個的值個數的條數的資料,每條資料寫入topic partition offset的值.

private def checkpointLogsInDir(dir: File): Unit = {
  val recoveryPoints = this.logsByDir.get(dir.toString)
  if (recoveryPoints.isDefined) {
    this.recoveryPointCheckpoints(dir).write(recoveryPoints.get.mapValues(

        _.recoveryPoint))
  }
}


複製程式碼

LogCleaner例項中,定期執行的日誌壓縮: 這個例項中,通過CleanerThread的執行緒進行處理:

  1. 配置項log.cleaner.io.max.bytes.per.second,用於控制這個執行緒操作的IO速度,預設不控制速度
  2. 配置項log.cleaner.dedupe.buffer.size,預設值128MB,用於配置清理過期資料的記憶體緩衝區,這個用於資料清理時,選擇的壓縮方式時,用於對重複資料的清理排序記憶體.
  3. 配置項log.cleaner.threads,預設值1.用於配置清理過期日誌的執行緒個數.
  4. 配置項log.cleaner.backoff.ms,用於定時檢查日誌是否需要清理的時間間隔,預設是15秒.

相關文章