Spark原始碼分析之DiskBlockMangaer分析

happy19870612發表於2017-11-10

建立和維護邏輯上的block和磁碟上的物理位置之間的對映，預設情況下，一個block對映一個檔案,檔名字一般是blockId。但是，也有可能有多個資料塊對映到一個檔案的一個段，block檔案在配置的spark.local.dir目錄下的目錄之間根據hash演算法存放。我們可以配置

spark.diskStore.subDirectories配置子檔案目錄數

一核心屬性

BlockManager blockManager: 主要進行block增刪改

Int subDirsPerLocalDir: 子目錄數量預設64個

Array[File] localDirs：本地用於存放block檔案的一級目錄

Array[File] subDirs: 子檔案

二重要方法

2.1getFile 獲取一個檔案

defgetFile(filename:String): File = {
// 根據檔名進行hash
val hash= Utils.nonNegativeHash(filename)
// 根據檔名的hashcode，首先看這個檔案位於哪一個一級目錄
val dirId= hash % localDirs.length
// 再決定子目錄位於哪一個一級目錄中
val subDirId= (hash / localDirs.length) %subDirsPerLocalDir

// 如果子目錄不存在則建立
val subDir= subDirs(dirId).synchronized {
    val old = subDirs(dirId)(subDirId)
    if (old != null) {
      old
    } else {
      val newDir = new File(localDirs(dirId),"%02x".format(subDirId))
      if (!newDir.exists() && !newDir.mkdir()) {
        throw new IOException(s"Failed to create localdir in$newDir.")
      }
      subDirs(dirId)(subDirId) =newDir
      newDir
    }
}
// 根據目錄和檔名建立檔案
new File(subDir,filename)
}

// 使用blockId作為檔名建立檔案
def getFile(blockId: BlockId): File = getFile(blockId.name)

2.2containsBlock 檢查磁碟上是否存在blockId這樣的block

實就是看是否磁碟上是否存在檔名為blockId的檔案一個檔案就對應著一個block

defcontainsBlock(blockId: BlockId): Boolean = {
  getFile(blockId.name).exists()
}

2.3getAllFiles 查詢當前磁碟上所有的檔案

def getAllFiles(): Seq[File] = {
  // Get all the files inside the array of array of directories
  subDirs.flatMap { dir =>
    dir.synchronized {
      // Copy the content of dir because it may be modified in other threads
      dir.clone()
    }
  }.filter(_ != null).flatMap { dir =>
    val files = dir.listFiles()
    if (files != null) files else Seq.empty
  }
}

2.4getAllBlocks 查詢出儲存在磁碟上所有的block，其實就是查詢磁碟上所有的block對應的檔案而已

def getAllBlocks(): Seq[BlockId] = {
  getAllFiles().map(f => BlockId(f.getName))
}

2.5 createTempLocalBlock 建立臨時的本地block

def createTempLocalBlock(): (TempLocalBlockId, File) = {
  var blockId = new TempLocalBlockId(UUID.randomUUID())
  while (getFile(blockId).exists()) {
    blockId = new TempLocalBlockId(UUID.randomUUID())
  }
  (blockId, getFile(blockId))
}

2.5createTempShuffleBlock 建立臨時的本地shuffle block

def createTempShuffleBlock(): (TempShuffleBlockId, File) = {
  var blockId = new TempShuffleBlockId(UUID.randomUUID())
  while (getFile(blockId).exists()) {
    blockId = new TempShuffleBlockId(UUID.randomUUID())
  }
  (blockId, getFile(blockId))
}

Spark原始碼分析之cahce原理分析
2017-11-11
Spark原始碼
Spark原始碼分析之MemoryManager
2017-11-11
Spark原始碼
Spark原始碼分析之BlockStore
2017-11-11
Spark原始碼BloC
spark 原始碼分析之十三 -- SerializerManager剖析
2019-07-15
Spark原始碼
Spark原始碼分析之Checkpoint機制
2017-11-11
Spark原始碼
Spark 原始碼分析系列
2019-07-28
Spark原始碼
spark 原始碼分析之十九 -- Stage的提交
2019-07-26
Spark原始碼
spark 原始碼分析之十八 -- Spark儲存體系剖析
2019-07-23
Spark原始碼
spark 原始碼分析之十五 -- Spark記憶體管理剖析
2019-07-17
Spark原始碼記憶體
Spark原始碼分析之BlockManager通訊機制
2017-11-10
Spark原始碼BloC
spark 原始碼分析之十六 -- Spark記憶體儲存剖析
2019-07-18
Spark原始碼記憶體
Spark RPC框架原始碼分析（三）Spark心跳機制分析
2019-01-17
SparkRPC框架原始碼
Spark job分配流程原始碼分析
2015-10-13
Spark原始碼
Guava 原始碼分析之 EventBus 原始碼分析
2018-08-01
Guava原始碼
Spark原始碼分析之Worker啟動通訊機制
2017-11-09
Spark原始碼
Android 原始碼分析之 AsyncTask 原始碼分析
2019-03-04
Android原始碼
spark core原始碼分析3 Master HA
2016-01-29
Spark原始碼AST
spark 原始碼分析之十四 -- broadcast 是如何實現的？
2019-07-16
Spark原始碼AST
redis原始碼分析（二）、redis原始碼分析之sds字串
2017-11-12
Redis原始碼字串
原始碼分析之 HashMap
2019-03-04
原始碼HashMap
原始碼分析之AbstractQueuedSynchronizer
2017-03-28
原始碼
原始碼分析之ArrayList
2017-04-12
原始碼
spark 原始碼分析之十九 -- DAG的生成和Stage的劃分
2019-07-25
Spark原始碼
Spark RPC框架原始碼分析（一）簡述
2019-02-26
SparkRPC框架原始碼
spark streaming原始碼分析1 StreamingContext
2016-01-29
Spark原始碼GCContext
spark core原始碼分析2 master啟動流程
2016-01-29
Spark原始碼AST
spark core原始碼分析4 worker啟動流程
2016-01-29
Spark原始碼
Spark on Yarn 任務提交流程原始碼分析
2015-10-21
SparkYarn原始碼
原始碼|jdk原始碼之HashMap分析(一)
2019-01-19
原始碼JDKHashMap
原始碼|jdk原始碼之HashMap分析(二)
2019-01-19
原始碼JDKHashMap
JUC之CountDownLatch原始碼分析
2020-05-14
CountDownLatch原始碼
Dubbo之SPI原始碼分析
2019-03-04
原始碼
Fresco原始碼分析之DraweeView
2019-03-04
原始碼View
lodash原始碼分析之isArguments
2019-02-16
原始碼
Fresco原始碼分析之Hierarchy
2019-02-04
原始碼
原始碼分析Kafka之Producer
2018-08-27
原始碼Kafka
RecyclerView之SnapHelper原始碼分析
2018-11-15
View原始碼
OpenGL 之 GPUImage 原始碼分析
2018-09-10
GPUUI原始碼

Spark原始碼分析之DiskBlockMangaer分析

一 核心屬性

二 重要方法