Spark Stage
A stage is a set of parallel tasks ① all computing the same function that need to run as part of a Spark job, where all the tasks have the same shuffle dependencies. Each DAG of tasks run by the scheduler is split up into stages at the boundaries where shuffle occurs, and then the DAGScheduler runs these stages in topological order.
Each Stage can ② either be a shuffle map stage, in which case its tasks' results are input for other stage(s), or a result stage, in which case its tasks directly compute a Spark action (e.g. count(), save(), etc) by running a function on an RDD. ③For shuffle map stages, we also track the nodes that each output partition is on.
Each Stage also has a firstJobId, identifying the job that first submitted the stage. When FIFO scheduling is used, this ④ allows Stages from earlier jobs to be computed first or recoveredfaster on failure.Finally, a single stage can be re-executed in multiple attempts due to fault recovery. In thatcase, the Stage object will track multiple StageInfo objects to pass to listeners or the web UI.The latest one will be accessible through latestInfo.
[DAGScheduler]->private[scheduler] def handleJobSubmitted
{ var finalStage: ResultStage = null try { /** ②stage 的型別只有兩種,一種是shuffle map stage 另一種是result stage,並且result stage 一定是呼叫action操作的RDD所在 的stage,引數含義:func-對每個分割槽進行的操作根據action的不同 而不同,例如action為count的時候那麼func就是計算每個分割槽的大小, 最終結果由jobwaiter(在SubmitJob方法中有涉及)蒐集並計算將func 的結果進行相加返回。 **/ finalStage = newResultStage(finalRDD, func, partitions, jobId, callSite) } catch { case e: Exception => logWarning("Creating new stage failed due to exception - job: " + jobId, e) listener.jobFailed(e) return } . . . /** [1]首次提交的一定是finalStage即resultStage,然後會遞迴 尋找該Stage的依賴直到找到一個沒有依賴的Stage才會生 成taskSet進行提交 submitStage(finalStage) [2]在遞迴尋找依賴stage的過程中如果發現當前stage有依 賴則將當前stage放入等待佇列中以便後續排程 **/ submitWaitingStages() }
[DAGScheduler]->private def submitStage(stage: Stage)
{ ... //[1] val missing = **getMissingParentStages(stage)**.sortBy( logDebug("missing: " + missing) if (missing.isEmpty) { logInfo("Submitting " + stage + " (" + stage.rdd + "), which has no missing parents”) //[1] submitMissingTasks(stage, jobId.get) } else { for (parent <- missing) { submitStage(parent) } //[2] waitingStages += stage } ... }
[DAGScheduler]->getMissingParentStages(stage: Stage): List[Stage]
根據dependency是否是shuffle dependency(wild or narrow)來進行stage劃分
{ . . . for (dep <- rdd.dependencies) { dep match { case shufDep: ShuffleDependency[_, _, _] => val mapStage = getShuffleMapStage(shufDep, stage.firstJobId) if (!mapStage.isAvailable) { missing += mapStage } case narrowDep: NarrowDependency[_] => [2] waitingForVisit.push(narrowDep.rdd) } } . . . }
ShuffleMapStages are intermediate stages in the execution DAG that produce data for a shuffle.⑤They occur right before each shuffle operation, and might contain multiple pipelined operations before that (e.g. map and filter). When executed, ⑥they save map output files that can later be fetched by reduce tasks. ⑦The
field describes the shuffle each stage is part of,and ⑧variables likeoutputLocs
track how many map outputs are ready.ShuffleMapStages can also be submitted independently as jobs with DAGScheduler.submitMapStage. For such stages, the ActiveJobs that submitted them are tracked inmapStageJobs
. ⑨Note that there can be multiple ActiveJobs trying to compute the same shuffle map stage.
⑤-在對stage進行劃分時,shuffle map stage 包含前個shuffle之後的所有非shuffle操作,如map、filter等。
⑥ 對每個partition的output 資訊進行維護
/** List of [[MapStatus]] for each partition. The index of the array is the map partition id,and each value in the array is the list of possible [[MapStatus]] for a partition(a single task might run multiple times). ③⑧當前rdd的位置及狀態資訊及每個partiton會在哪個executor 上執行併產生輸出。該資訊將用於DAG對task的排程. **/ private[this] val outputLocs = Array.fill[List[MapStatus]](numPartitions)(Nil)
[DAGScheduler]->submitMissingTasks(stage: Stage, jobId: Int)
... val partitionsToCompute: Seq[Int] = stage.findMissingPartitions() ... val taskIdToLocations: Map[Int, Seq[TaskLocation]] = try { stage match { case s: ShuffleMapStage => { id => (id, getPreferredLocs(stage.rdd, id))}.toMap case s: ResultStage => val job = s.activeJob.get { id => val p = s.partitions(id) (id, getPreferredLocs(stage.rdd, p)) }.toMap } } ...
class ShuffleDependency[K: ClassTag, V: ClassTag, C: ClassTag]( @transient private val _rdd: RDD[_ <: Product2[K, V]], val partitioner: Partitioner, val serializer: Serializer = SparkEnv.get.serializer, val keyOrdering: Option[Ordering[K]] = None, val aggregator: Option[Aggregator[K, V, C]] = None, val mapSideCombine: Boolean = false){ ... }
