【Spark】 Spark作業執行原理--獲取執行結果
一、執行結果並序列化
任務執行完成後,是在 TaskRunner 的 run 方法的後半部分返回結果給 Driver 的:
override def run(): Unit = {
...
// 執行任務
val value = try {
val res = task.run(
taskAttemptId = taskId,
attemptNumber = attemptNumber,
metricsSystem = env.metricsSystem)
threwException = false
res
}
...
val taskFinish = System.currentTimeMillis()
val taskFinishCpu = if (threadMXBean.isCurrentThreadCpuTimeSupported) {
threadMXBean.getCurrentThreadCpuTime
} else 0L
// If the task has been killed, let's fail it.
if (task.killed) {
throw new TaskKilledException
}
// 序列化結果
val resultSer = env.serializer.newInstance()
val beforeSerialization = System.currentTimeMillis()
val valueBytes = resultSer.serialize(value)
val afterSerialization = System.currentTimeMillis()
// Deserialization happens in two parts: first, we deserialize a Task object, which
// includes the Partition. Second, Task.run() deserializes the RDD and function to be run.
task.metrics.setExecutorDeserializeTime(
(taskStart - deserializeStartTime) + task.executorDeserializeTime)
task.metrics.setExecutorDeserializeCpuTime(
(taskStartCpu - deserializeStartCpuTime) + task.executorDeserializeCpuTime)
// We need to subtract Task.run()'s deserialization time to avoid double-counting
task.metrics.setExecutorRunTime((taskFinish - taskStart) - task.executorDeserializeTime)
task.metrics.setExecutorCpuTime(
(taskFinishCpu - taskStartCpu) - task.executorDeserializeCpuTime)
task.metrics.setJvmGCTime(computeTotalGcTime() - startGCTime)
task.metrics.setResultSerializationTime(afterSerialization - beforeSerialization)
// 序列化後的結果封裝成 DirectTaskResult
// Note: accumulator updates must be collected after TaskMetrics is updated
val accumUpdates = task.collectAccumulatorUpdates()
// TODO: do not serialize value twice
val directResult = new DirectTaskResult(valueBytes, accumUpdates)
val serializedDirectResult = ser.serialize(directResult)
val resultSize = serializedDirectResult.limit
// directSend = sending directly back to the driver
val serializedResult: ByteBuffer = {
// 生成結果大於最大值(預設1GB)直接丟棄
if (maxResultSize > 0 && resultSize > maxResultSize) {
logWarning(s"Finished $taskName (TID $taskId). Result is larger than maxResultSize " +
s"(${Utils.bytesToString(resultSize)} > ${Utils.bytesToString(maxResultSize)}), " +
s"dropping it.")
ser.serialize(new IndirectTaskResult[Any](TaskResultBlockId(taskId), resultSize))
// 生成結果設定的 maxDirectResultSize 且小於 最大值,則存放到 BlockManager 中,然後返回 BlockManager 的編號
} else if (resultSize > maxDirectResultSize) {
val blockId = TaskResultBlockId(taskId)
env.blockManager.putBytes(
blockId,
new ChunkedByteBuffer(serializedDirectResult.duplicate()),
StorageLevel.MEMORY_AND_DISK_SER)
logInfo(
s"Finished $taskName (TID $taskId). $resultSize bytes result sent via BlockManager)")
ser.serialize(new IndirectTaskResult[Any](blockId, resultSize))
// 其他結果直接返回
} else {
logInfo(s"Finished $taskName (TID $taskId). $resultSize bytes result sent to driver")
serializedDirectResult
}
}
// 向 Driver 終端傳送任務執行完畢的訊息
execBackend.statusUpdate(taskId, TaskState.FINISHED, serializedResult)
從上面可以看出,對於 Executor 的計算結果,會根據結果大小不同有不同策略。
(1)生成結果大於maxResultSize( 預設 1GB),結果直接丟棄,可以通過 spark.driver.maxResultSize 進行設定;
(2)生成結果大小大於 maxDirectResultSize(預設128M),小於 maxResultSize( 預設 1GB),將結果存入 BlockManager,並返回其編號,通過 Netty 傳送給 Driver,maxDirectResultSize 由 spark.task.maxDirectResultSiz 和 spark.rpc.message.maxSize 控制,取兩個中的最小值。
(3)生成結果小於 maxDirectResultSize(預設128M),則直接傳送給 Driver。
二、傳送執行結果
任務執行後,TaskRunner 將執行結果傳送給 DriverEndpoint 終端:
override def statusUpdate(taskId: Long, state: TaskState, data: ByteBuffer) {
val msg = StatusUpdate(executorId, taskId, state, data)
driver match {
case Some(driverRef) => driverRef.send(msg)
case None => logWarning(s"Drop $msg because has not yet connected to driver")
}
}
三、獲取執行結果
在 statusUpdate 中,將轉給 TaskScheduler 處理:
case StatusUpdate(executorId, taskId, state, data) =>
scheduler.statusUpdate(taskId, state, data.value)
if (TaskState.isFinished(state)) {
executorDataMap.get(executorId) match {
case Some(executorInfo) =>
executorInfo.freeCores += scheduler.CPUS_PER_TASK
makeOffers(executorId)
case None =>
// Ignoring the update since we don't know about the executor.
logWarning(s"Ignored task status update ($taskId state $state) " +
s"from unknown executor with ID $executorId")
}
}
TaskScheduler 中對任務的不同狀態有不同處理:
case Some(taskSet) =>
if (state == TaskState.LOST) {
// TaskState.LOST is only used by the deprecated Mesos fine-grained scheduling mode,
// where each executor corresponds to a single task, so mark the executor as failed.
val execId = taskIdToExecutorId.getOrElse(tid, throw new IllegalStateException(
"taskIdToTaskSetManager.contains(tid) <=> taskIdToExecutorId.contains(tid)"))
if (executorIdToRunningTaskIds.contains(execId)) {
reason = Some(
SlaveLost(s"Task $tid was lost, so marking the executor as lost as well."))
removeExecutor(execId, reason.get)
failedExecutor = Some(execId)
}
}
if (TaskState.isFinished(state)) {
cleanupTaskState(tid)
taskSet.removeRunningTask(tid)
if (state == TaskState.FINISHED) {
taskResultGetter.enqueueSuccessfulTask(taskSet, tid, serializedData)
} else if (Set(TaskState.FAILED, TaskState.KILLED, TaskState.LOST).contains(state)) {
taskResultGetter.enqueueFailedTask(taskSet, tid, state, serializedData)
}
}
3.1、TaskState.FINISHED
如果 TaskState.FINISHED,則進入 taskResultGetter.enqueueSuccessfulTask(taskSet, tid, serializedData):
def enqueueSuccessfulTask(
taskSetManager: TaskSetManager,
tid: Long,
serializedData: ByteBuffer): Unit = {
getTaskResultExecutor.execute(new Runnable {
override def run(): Unit = Utils.logUncaughtExceptions {
try {
val (result, size) = serializer.get().deserialize[TaskResult[_]](serializedData) match {
case directResult: DirectTaskResult[_] =>
if (!taskSetManager.canFetchMoreResults(serializedData.limit())) {
return
}
// deserialize "value" without holding any lock so that it won't block other threads.
// We should call it here, so that when it's called again in
// "TaskSetManager.handleSuccessfulTask", it does not need to deserialize the value.
directResult.value(taskResultSerializer.get())
(directResult, serializedData.limit())
case IndirectTaskResult(blockId, size) =>
if (!taskSetManager.canFetchMoreResults(size)) {
// dropped by executor if size is larger than maxResultSize
sparkEnv.blockManager.master.removeBlock(blockId)
return
}
logDebug("Fetching indirect task result for TID %s".format(tid))
scheduler.handleTaskGettingResult(taskSetManager, tid)
val serializedTaskResult = sparkEnv.blockManager.getRemoteBytes(blockId)
if (!serializedTaskResult.isDefined) {
/* We won't be able to get the task result if the machine that ran the task failed
* between when the task ended and when we tried to fetch the result, or if the
* block manager had to flush the result. */
scheduler.handleFailedTask(
taskSetManager, tid, TaskState.FINISHED, TaskResultLost)
return
}
val deserializedResult = serializer.get().deserialize[DirectTaskResult[_]](
serializedTaskResult.get.toByteBuffer)
// force deserialization of referenced value
deserializedResult.value(taskResultSerializer.get())
sparkEnv.blockManager.master.removeBlock(blockId)
(deserializedResult, size)
}
// Set the task result size in the accumulator updates received from the executors.
// We need to do this here on the driver because if we did this on the executors then
// we would have to serialize the result again after updating the size.
result.accumUpdates = result.accumUpdates.map { a =>
if (a.name == Some(InternalAccumulator.RESULT_SIZE)) {
val acc = a.asInstanceOf[LongAccumulator]
assert(acc.sum == 0L, "task result size should not have been set on the executors")
acc.setValue(size.toLong)
acc
} else {
a
}
}
scheduler.handleSuccessfulTask(taskSetManager, tid, result)
} catch {
case cnf: ClassNotFoundException =>
val loader = Thread.currentThread.getContextClassLoader
taskSetManager.abort("ClassNotFound with classloader: " + loader)
// Matching NonFatal so we don't catch the ControlThrowable from the "return" above.
case NonFatal(ex) =>
logError("Exception while getting task result", ex)
taskSetManager.abort("Exception while getting task result: %s".format(ex))
}
}
})
}
enqueueSuccessfulTask 方法中判斷如果結果是 DirectTaskResult 型別,就直接獲取,如果是 IndirectTaskResult 型別,則根據 blockId 遠端呼叫 sparkEnv.blockManager.getRemoteBytes(blockId) 獲取;
接著呼叫 scheduler.handleSuccessfulTask:
def handleSuccessfulTask(
taskSetManager: TaskSetManager,
tid: Long,
taskResult: DirectTaskResult[_]): Unit = synchronized {
taskSetManager.handleSuccessfulTask(tid, taskResult)
}
最終經過呼叫鏈會來到 DAGScheduler # handleTaskCompletion 中,在該方法中,如果 Task 是 ResultTask,判斷作業 是否完成,如果完成,標記完成,並清理作業依賴的資源,傳送訊息給訊息匯流排。
case Success =>
stage.pendingPartitions -= task.partitionId
task match {
case rt: ResultTask[_, _] =>
// Cast to ResultStage here because it's part of the ResultTask
// TODO Refactor this out to a function that accepts a ResultStage
val resultStage = stage.asInstanceOf[ResultStage]
resultStage.activeJob match {
case Some(job) =>
if (!job.finished(rt.outputId)) {
updateAccumulators(event)
job.finished(rt.outputId) = true
job.numFinished += 1
// If the whole job has finished, remove it
if (job.numFinished == job.numPartitions) {
markStageAsFinished(resultStage)
cleanupStateForJobAndIndependentStages(job)
listenerBus.post(
SparkListenerJobEnd(job.jobId, clock.getTimeMillis(), JobSucceeded))
}
// taskSucceeded runs some user code that might throw an exception. Make sure
// we are resilient against that.
try {
job.listener.taskSucceeded(rt.outputId, event.result)
} catch {
case e: Exception =>
// TODO: Perhaps we want to mark the resultStage as failed?
job.listener.jobFailed(new SparkDriverExecutionException(e))
}
}
case None =>
logInfo("Ignoring result from " + rt + " because its job has finished")
}
如果是 ShuffleMapTask,則將結果(MapStatus)序列化後存入 DirectTaskResult 或者 IndirectTaskResult 中,DAGScheduler 的 handleTaskCompletion 獲取這個結果,並註冊到 MapOutputTrackerMaster 中:
case smt: ShuffleMapTask =>
val shuffleStage = stage.asInstanceOf[ShuffleMapStage]
updateAccumulators(event)
val status = event.result.asInstanceOf[MapStatus]
val execId = status.location.executorId
logDebug("ShuffleMapTask finished on " + execId)
if (failedEpoch.contains(execId) && smt.epoch <= failedEpoch(execId)) {
logInfo(s"Ignoring possibly bogus $smt completion from executor $execId")
} else {
shuffleStage.addOutputLoc(smt.partitionId, status)
}
if (runningStages.contains(shuffleStage) && shuffleStage.pendingPartitions.isEmpty) {
markStageAsFinished(shuffleStage)
logInfo("looking for newly runnable stages")
logInfo("running: " + runningStages)
logInfo("waiting: " + waitingStages)
logInfo("failed: " + failedStages)
// We supply true to increment the epoch number here in case this is a
// recomputation of the map outputs. In that case, some nodes may have cached
// locations with holes (from when we detected the error) and will need the
// epoch incremented to refetch them.
// TODO: Only increment the epoch number if this is not the first time
// we registered these map outputs.
mapOutputTracker.registerMapOutputs(
shuffleStage.shuffleDep.shuffleId,
shuffleStage.outputLocInMapOutputTrackerFormat(),
changeEpoch = true)
clearCacheLocs()
if (!shuffleStage.isAvailable) {
// Some tasks had failed; let's resubmit this shuffleStage
// TODO: Lower-level scheduler should also deal with this
logInfo("Resubmitting " + shuffleStage + " (" + shuffleStage.name +
") because some of its tasks had failed: " +
shuffleStage.findMissingPartitions().mkString(", "))
submitStage(shuffleStage)
} else {
// Mark any map-stage jobs waiting on this stage as finished
if (shuffleStage.mapStageJobs.nonEmpty) {
val stats = mapOutputTracker.getStatistics(shuffleStage.shuffleDep)
for (job <- shuffleStage.mapStageJobs) {
markMapStageJobAsFinished(job, stats)
}
}
submitWaitingChildStages(shuffleStage)
}
}
}
3.2、TaskState.FAILED, TaskState.KILLED, TaskState.LOST
如果結果型別 TaskState.FAILED, TaskState.KILLED, TaskState.LOST,則進入 taskResultGetter.enqueueFailedTask(taskSet, tid, state, serializedData):
def enqueueFailedTask(taskSetManager: TaskSetManager, tid: Long, taskState: TaskState,
serializedData: ByteBuffer) {
var reason : TaskFailedReason = UnknownReason
try {
getTaskResultExecutor.execute(new Runnable {
override def run(): Unit = Utils.logUncaughtExceptions {
val loader = Utils.getContextOrSparkClassLoader
try {
if (serializedData != null && serializedData.limit() > 0) {
reason = serializer.get().deserialize[TaskFailedReason](
serializedData, loader)
}
} catch {
case cnd: ClassNotFoundException =>
// Log an error but keep going here -- the task failed, so not catastrophic
// if we can't deserialize the reason.
logError(
"Could not deserialize TaskEndReason: ClassNotFound with classloader " + loader)
case ex: Exception => // No-op
}
scheduler.handleFailedTask(taskSetManager, tid, taskState, reason)
}
})
} catch {
case e: RejectedExecutionException if sparkEnv.isStopped =>
// ignore it
}
}
然後再呼叫 scheduler.handleFailedTask 重新分配資源重試:
def handleFailedTask(
taskSetManager: TaskSetManager,
tid: Long,
taskState: TaskState,
reason: TaskFailedReason): Unit = synchronized {
taskSetManager.handleFailedTask(tid, taskState, reason)
if (!taskSetManager.isZombie && taskState != TaskState.KILLED) {
// Need to revive offers again now that the task set manager state has been updated to
// reflect failed tasks that need to be re-run.
backend.reviveOffers()
}
}
相關文章
- Spark的執行原理Spark
- spark執行原理、模型Spark模型
- Spark開發-spark執行原理和RDDSpark
- Spark原理-物理執行圖Spark
- Spark開發-Spark執行模式及原理一Spark模式
- Spark Task 的執行流程④ - task 結果的處理Spark
- 獲取任務的執行結果
- spark基礎之spark sql執行原理和架構SparkSQL架構
- python執行shell並獲取結果Python
- Spark資料收藏--------Spark執行架構Spark架構
- 獲取多臺主機命令執行結果
- Java獲取多執行緒執行結果方式的歸納與總結Java執行緒
- spark學習筆記--叢集執行SparkSpark筆記
- Spark修煉之道(進階篇)——Spark入門到精通:第七節 Spark執行原理Spark
- Spark學習(一)——執行模式與執行流程Spark模式
- easyexcel多sheet多執行緒匯入示例,獲取所以執行緒執行結果後返回Excel執行緒
- 執行結果
- 多執行緒的補充 獲取一定時間的執行結果執行緒
- 《深入理解Spark》之Spark的整體執行流程Spark
- 檢視spark程式執行狀態以及安裝sparkSpark
- 自適應查詢執行:在執行時提升Spark SQL執行效能SparkSQL
- Spark叢集和任務執行Spark
- spark job執行引數優化Spark優化
- Spark2 jar包執行完成,退出spark,釋放資源SparkJAR
- php中對MYSQL操作之批量執行,與獲取批量結果PHPMySql
- 【java】【多執行緒】獲取和設定執行緒名字、獲取執行緒物件(3)Java執行緒物件
- 多執行緒並行執行,然後彙總結果執行緒並行
- 執行計劃-1:獲取執行計劃
- Spark Task 的執行流程② - 建立、分發 TaskSpark
- Spark閉包 | driver & executor程式程式碼執行Spark
- spark streaming執行kafka資料來源SparkKafka
- Spark開發-執行架構基本概念Spark架構
- Ubuntu 16.04 + PyCharm + spark 執行環境配置UbuntuPyCharmSpark
- 更高階的技術可用於獲取使用QThreadPool和QRunnable啟動執行緒的執行結果QTthread執行緒
- 除了訊息佇列,以下這些高階技術也可用於獲取執行緒執行結果佇列執行緒
- Spark入門(二)--如何用Idea執行我們的Spark專案SparkIdea
- springboot:使用非同步註解@Async獲取執行結果的坑Spring Boot非同步
- Spark報錯(二):關於Spark-Streaming官方示例wordcount執行異常Spark