Spark原始碼解析-Yarn部署流程(ApplicationMaster)

知了小巷發表於2020-10-13

Spark原始碼解析-Yarn部署流程(ApplicationMaster)

可微信搜尋 知了小巷 ,關注公眾號支援一下,謝謝。另外,公眾號後臺回覆 資料 ,可領取大資料2020學習視訊資料。

前文【Spark原始碼解析Yarn部署流程(SparkSubmit)

createContainerLaunchContext
用來執行ApplicationMaster。
主要呼叫是在:

yarnClient.submitApplication(appContext)

RM:ResourceManager
At this point, the RM will have accepted the application and in the background, will go through the process of allocating a container with the required specifications and then eventually setting up and launching the AM on the allocated container.

RM服務端實現
resourcemanager#ClientRMService.java#submitApplication

// RMAppManager.java#submitApplication
rmAppManager.submitApplication(submissionContext, System.currentTimeMillis(), user);

RMAppManager.java#submitApplication

// 1. RMAppManager.java#submitApplication方法
RMAppImpl application = createAndPopulateNewRMApp(
        submissionContext, submitTime, user, false, -1, null);

// 2. createAndPopulateNewRMApp方法
// Create RMApp
RMAppImpl application =
    new RMAppImpl(applicationId, rmContext, this.conf,
        submissionContext.getApplicationName(), user,
        submissionContext.getQueue(),
        submissionContext, this.scheduler, this.masterService,
        submitTime, submissionContext.getApplicationType(),
        submissionContext.getApplicationTags(), amReqs, placementContext,
        startTime); 

if (UserGroupInformation.isSecurityEnabled()) {
	// ...
} else {
	// 向RMAppImpl 傳送 START事件
	this.rmContext.getDispatcher().getEventHandler()
	    .handle(new RMAppEvent(applicationId, RMAppEventType.START));
}        

// 3. RMAppImpl.java#RMAppImpl構造方法
this.stateMachine = stateMachineFactory.make(this);       

// 後面會呼叫RMAppImpl裡的handle方法
this.stateMachine.doTransition(event.getType(), event); 

會執行到狀態轉化的過程:
RMAppImpl.java#StateMachineFactory

// 收到START事件,呼叫RMAppNewlySavingTransition#doTransition函式,並且RMAppImpl狀態由NEW轉化成NEW_SAVING
.addTransition(RMAppState.NEW, RMAppState.NEW_SAVING,
        RMAppEventType.START, new RMAppNewlySavingTransition())

後續會有一系列的事件處理(略),關鍵狀態到排程SCHEDULED:
RMAppAttemptImpl#ScheduleTransition
排程器分配資源allocate。

// AM resource has been checked when submission
Allocation amContainerAllocation =
    appAttempt.scheduler.allocate(...);

CapacityScheduler排程的過程:
CapacityScheduler.java#allocate方法

// CapacityScheduler.java#allocate方法
@Override
@Lock(Lock.NoLock.class)
public Allocation allocate(ApplicationAttemptId applicationAttemptId,
  List<ResourceRequest> ask, List<SchedulingRequest> schedulingRequests,
  List<ContainerId> release, List<String> blacklistAdditions,
  List<String> blacklistRemovals, ContainerUpdates updateRequests) {

  	// ...
  	// Update application requests
  	// 重點在這裡 updateResourceRequests
    if (application.updateResourceRequests(ask) || application
        .updateSchedulingRequests(schedulingRequests)) {
      updateDemandForQueue = (LeafQueue) application.getQueue();
    }
    // ...
}      

// 後面調到AppSchedulingInfo.java#updateResourceRequests方法
// The ApplicationMaster is updating resource requirements for the
// application, by asking for more resources and releasing resources acquired
// by the application.

// 然後是LocalityAppPlacementAllocator.java#updatePendingAsk方法
// Update resource requests
for (ResourceRequest request : requests) {
	// Update asks 把request放入Map裡面等待NodeManager心跳
    resourceRequestMap.put(resourceName, request);
}

NameNode心跳過來之後,CapacityScheduler的handle方法:

@Override
public void handle(SchedulerEvent event) {
	// ...
	case NODE_UPDATE:
    {
      NodeUpdateSchedulerEvent nodeUpdatedEvent = (NodeUpdateSchedulerEvent)event;
      nodeUpdate(nodeUpdatedEvent.getRMNode());
    }
    break;
    // ...
}

// 在nodeUpdate方法裡面,會呼叫allocateContainersToNode(rmNode.getNodeID(), true);

// 在allocateContainersToNode方法裡面,會呼叫allocateContainersToNode(candidates, withNodeHeartbeat)

// 然後是allocateContainerOnSingleNode(candidates, node, withNodeHeartbeat)

// 然後是submitResourceCommitRequest(getClusterResource(), assignment);

// 在submitResourceCommitRequest方法中,非同步或同步commit
tryCommit(cluster, request, true);

// 在tryCommit方法中
app.apply(cluster, request, updatePending)

// 在apply方法中,向RMContainerImpl傳送RMContainerEvent RMContainerEventType.START事件
rmContainer.handle(
              new RMContainerEvent(containerId, RMContainerEventType.START));

// 又是一系列的事件處理...好累...
// ApplicationMasterLauncher.java#handle
case LAUNCH:
      launch(application);
      break;     
// AMLauncher#launch 啟動AM方法:與對應的NodeManager通訊,啟動AM
public class AMLauncher implements Runnable {...}      

// ContainerManagerImpl.java#startContainers方法啟動Container
// Initialize the AMRMProxy service instance only if the container is of
// type AM and if the AMRMProxy service is enabled
if (amrmProxyEnabled && containerTokenIdentifier.getContainerType()
  .equals(ContainerType.APPLICATION_MASTER)) {
this.getAMRMProxyService().processApplicationStartRequest(request);
}
performContainerPreStartChecks(nmTokenIdentifier, request,
  containerTokenIdentifier);
// startContainerInternal後續會真正執行command啟動ApplicationMaster,文末有容器啟動過程
startContainerInternal(containerTokenIdentifier, request);
succeededContainers.add(containerId);        

main class:
org.apache.spark.deploy.yarn.ApplicationMaster

二、Yarn部署流程(ApplicationMaster)

2.1 執行ApplicationMaster

原始碼位置:
spark/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala

def main(args: Array[String]): Unit = {
	SignalUtils.registerLogger(log)
	// 1. 封裝各種引數
	val amArgs = new ApplicationMasterArguments(args)
	// ...
	// 2. 建立一個ApplicationMaster
	master = new ApplicationMaster(amArgs, sparkConf, yarnConf)

	val ugi = ...

	ugi.doAs(new PrivilegedExceptionAction[Unit]() {
	  // 3. 執行 master.run() 即正式執行Application Master邏輯
	  override def run(): Unit = System.exit(master.run())
	})
}
2.1.1 封裝引數ApplicationMasterArguments

new ApplicationMasterArguments(args)
原始碼位置:
spark/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMasterArguments.scala
parseArgs方法一目瞭然(比如–jar和–class)

private def parseArgs(inputArgs: List[String]): Unit = {
	// ...
    while (!args.isEmpty) {
      // ...
      args match {
      	// --jar
        case ("--jar") :: value :: tail =>
          userJar = value
          args = tail

        // --class
        case ("--class") :: value :: tail =>
          userClass = value
          args = tail

        // ...

        case _ =>
          printUsageAndExit(1, args)
      }
    }

    // ...

    userArgs = userArgsBuffer.toList
}
2.1.2 執行master.run

master.run是一個final修飾的方法。
叢集模式下執行Driver使用者程式。

final def run(): Int = {
    try {
      val attemptID = ...

      new CallerContext(
        "APPMASTER", sparkConf.get(APP_CALLER_CONTEXT),
        Option(appAttemptId.getApplicationId.toString), attemptID).setCurrentContext()

      logInfo("ApplicationAttemptId: " + appAttemptId)

      // This shutdown hook should run *after* the SparkContext is shut down.
      // ...

      if (isClusterMode) {
      	// 執行Driver端程式
        runDriver()
      } else {
        runExecutorLauncher()
      }
    } catch {
      // ...
    } finally {
      // ...
    }

    exitCode
}

runDriver()

private def runDriver(): Unit = {
    addAmIpFilter(None, System.getenv(ApplicationConstants.APPLICATION_WEB_PROXY_BASE_ENV))
    // 1. 啟動使用者的應用(上傳的jar#main)  
    // 啟動driver執行緒【已經啟動了】
    userClassThread = startUserApplication()

    // ... 初始化Spark上下文環境
    logInfo("Waiting for spark context initialization...")
    val totalWaitTime = sparkConf.get(AM_MAX_WAIT_TIME)
    try {
      val sc = ThreadUtils.awaitResult(sparkContextPromise.future,
        Duration(totalWaitTime, TimeUnit.MILLISECONDS))
      if (sc != null) {
      	// 配置Driver的RPC環境
        val rpcEnv = sc.env.rpcEnv

        val userConf = sc.getConf
        // ApplicationMaster啟動所在節點的host
        val host = userConf.get(DRIVER_HOST_ADDRESS)
        // ApplicationMaster本次啟動對外rpc的埠號
        val port = userConf.get(DRIVER_PORT)
        // 2. 向ResourceManager註冊ApplicationMaster(host+port)
        // sc.ui.map(_.webUrl)
        // ApplicationMaster對外提供可追蹤的web url,使用者可以通過該url檢視應用程式執行狀態

        registerAM(host, port, userConf, sc.ui.map(_.webUrl), appAttemptId)

        // RPC Endpoint
        val driverRef = rpcEnv.setupEndpointRef(
          RpcAddress(host, port),
          YarnSchedulerBackend.ENDPOINT_NAME)

       	// 3. 為ApplicationMaster管理的Executor分配Container
        createAllocator(driverRef, userConf, rpcEnv, appAttemptId, distCacheConf)
      } else {
        // ...
        throw new IllegalStateException("User did not initialize spark context!")
      }
      resumeDriver()
      // 主執行緒等待Driver執行緒
      userClassThread.join()
    } catch {
      // ...
    } finally {
      resumeDriver()
    }
}

resumeDriver方法

private def resumeDriver(): Unit = {
	// When initialization in runDriver happened the user class thread has to be resumed.
	sparkContextPromise.synchronized {
	  sparkContextPromise.notify()
	}
}

啟動使用者執行緒 startUserApplication
Driver是個執行緒:啟動一個執行緒去執行我們自己編寫的Driver類的main方法。

/**
 * Start the user class, which contains the spark driver, in a separate Thread.
 * If the main routine exits cleanly or exits with System.exit(N) for any N
 * we assume it was successful, for all other cases we assume failure.
 *
 * Returns the user thread that was started.
 */
private def startUserApplication(): Thread = {
	logInfo("Starting the user application in a separate Thread")

	var userArgs = args.userArgs
	// ...
	// 通過反射獲取使用者Driver類的main方法,是不會另起一個JVM程式的
	val mainMethod = userClassLoader.loadClass(args.userClass)
	  .getMethod("main", classOf[Array[String]])
	// 建立一個執行緒
	val userThread = new Thread {
	  override def run(): Unit = {
	    try {
	      if (!Modifier.isStatic(mainMethod.getModifiers)) {
	        logError(s"Could not find static main method in object ${args.userClass}")
	        finish(FinalApplicationStatus.FAILED, ApplicationMaster.EXIT_EXCEPTION_USER_CLASS)
	      } else {
	        // 呼叫Driver類的main方法
	        mainMethod.invoke(null, userArgs.toArray)
	        finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS)
	        logDebug("Done running user class")
	      }
	    } catch {
	      // ...
	    } finally {
	      // ...
	      sparkContextPromise.trySuccess(null)
	    }
	  }
	}
	userThread.setContextClassLoader(userClassLoader)
	// 執行緒名稱-Driver
	userThread.setName("Driver")
	// 直接啟動執行緒
	userThread.start()
	// 返回正在執行的執行緒
	userThread
}

registerAM(host, port, userConf, sc.ui.map(_.webUrl))

private def registerAM(
      host: String,
      port: Int,
      _sparkConf: SparkConf,
      uiAddress: Option[String],
      appAttempt: ApplicationAttemptId): Unit = {
    // ...

    // 這裡的client是 private val client = new YarnRMClient()
    // spark/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnRMClient.scala
    client.register(host, port, yarnConf, _sparkConf, uiAddress, historyAddress)
    registered = true
}

YarnRMClient#register
註釋和引數說明還是很清晰的。

/**
 * Registers the application master with the RM.
 *
 * @param driverHost Host name where driver is running.
 * @param driverPort Port where driver is listening.
 * @param conf The Yarn configuration.
 * @param sparkConf The Spark configuration.
 * @param uiAddress Address of the SparkUI.
 * @param uiHistoryAddress Address of the application on the History Server.
 */
def register(
	  driverHost: String,
	  driverPort: Int,
	  conf: YarnConfiguration,
	  sparkConf: SparkConf,
	  uiAddress: Option[String],
	  uiHistoryAddress: String): Unit = {
	amClient = AMRMClient.createAMRMClient()
	amClient.init(conf)
	amClient.start()
	this.uiHistoryAddress = uiHistoryAddress

	val trackingUrl = uiAddress.getOrElse {
	  if (sparkConf.get(ALLOW_HISTORY_SERVER_TRACKING_URL)) uiHistoryAddress else ""
	}

	logInfo("Registering the ApplicationMaster")
	synchronized {
	  // 向ResourceManager註冊ApplicationMaster
	  // private var amClient: AMRMClient[ContainerRequest] = _
	  // AMRMClientImpl
	  amClient.registerApplicationMaster(driverHost, driverPort, trackingUrl)
	  registered = true
	}
}

註冊AM

為ApplicationMaster管理的Executor分配Container:
createAllocator(driverRef, userConf)

private def createAllocator(
      driverRef: RpcEndpointRef,
      _sparkConf: SparkConf,
      rpcEnv: RpcEnv,
      appAttemptId: ApplicationAttemptId,
      distCacheConf: SparkConf): Unit = {
    // 獲取資源
    // private val client = new YarnRMClient()
    allocator = client.createAllocator(
      yarnConf,
      _sparkConf,
      appAttemptId,
      driverUrl,
      driverRef,
      securityMgr,
      localResources)

    // 分配資源
    allocator.allocateResources()
    // ...s
}

allocateResources
Container數量最大等於maxExecutors的數量。
spark/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala

/**
 * Request resources such that, if YARN gives us all we ask for, we'll have a number of containers
 * equal to maxExecutors.
 *
 * Deal with any containers YARN has granted to us by possibly launching executors in them.
 *
 * This must be synchronized because variables read in this method are mutated by other methods.
 */
def allocateResources(): Unit = synchronized {
    updateResourceRequests()

    val progressIndicator = 0.1f
    // Poll the ResourceManager. This doubles as a heartbeat if there are no pending container requests.
    // ApplicationMaster通過rpc向ResourceManager申請資源(資源以Container為單位)
    val allocateResponse = amClient.allocate(progressIndicator)
    // 獲取分配到的容器(資源)
    val allocatedContainers = allocateResponse.getAllocatedContainers()
    allocatorBlacklistTracker.setNumClusterNodes(allocateResponse.getNumClusterNodes)
    // 如果有可用的資源
    if (allocatedContainers.size > 0) {
      logDebug(("Allocated containers: %d. Current executor count: %d. " +
        "Launching executor count: %d. Cluster resources: %s.")
        .format(
          allocatedContainers.size,
          getNumExecutorsRunning,
          getNumExecutorsStarting,
          allocateResponse.getAvailableResources))
      // 處理分配好的Container,迴圈啟動Executor,等待分配Task
      handleAllocatedContainers(allocatedContainers.asScala)
    }

    // ...
}

handleAllocatedContainers
執行容器

/**
 * Handle containers granted by the RM by launching executors on them.
 *
 * Due to the way the YARN allocation protocol works, certain healthy race conditions can result
 * in YARN granting containers that we no longer need. In this case, we release them.
 *
 * Visible for testing.
 */
def handleAllocatedContainers(allocatedContainers: Seq[Container]): Unit = {
    val containersToUse = new ArrayBuffer[Container](allocatedContainers.size)

    // ...
    // 執行分配好的容器,實際上這裡就是“遠端”啟動Executor了
    runAllocatedContainers(containersToUse)

    logInfo("Received %d containers from YARN, launching executors on %d of them."
      .format(allocatedContainers.size, containersToUse.size))
}

runAllocatedContainers
在分配的容器中啟動executor

/**
 * Launches executors in the allocated containers.
 */
private def runAllocatedContainers(containersToUse: ArrayBuffer[Container]): Unit = synchronized {
	// 外層一個for迴圈
    for (container <- containersToUse) {
      // ...
      if (rpRunningExecs < getOrUpdateTargetNumExecutorsForRPId(rpId)) {
        getOrUpdateNumExecutorsStartingForRPId(rpId).incrementAndGet()
        if (launchContainers) {
          // private val launcherPool = ThreadUtils.newDaemonCachedThreadPool("ContainerLauncher", sparkConf.get(CONTAINER_LAUNCH_MAX_THREADS))
          // 執行緒池 newDaemonCachedThreadPool
          launcherPool.execute(() => {
            try {
              new ExecutorRunnable(
                Some(container),
                conf,
                sparkConf,
                driverUrl,
                executorId,
                executorHostname,
                containerMem,
                containerCores,
                appAttemptId.getApplicationId.toString,
                securityMgr,
                localResources,
                rp.id
              ).run()
              updateInternalState()
            } catch {
              // ...
            }
          })
        } else {
          // For test only
          updateInternalState()
        }
      } else {
        // ...
      }
    }
}

new ExecutorRunnable(xxx).run
spark/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala

private[yarn] class ExecutorRunnable(
    container: Option[Container],
    conf: YarnConfiguration,
    sparkConf: SparkConf,
    masterAddress: String,
    executorId: String,
    hostname: String,
    executorMemory: Int,
    executorCores: Int,
    appId: String,
    securityMgr: SecurityManager,
    localResources: Map[String, LocalResource],
    resourceProfileId: Int) extends Logging {

  var rpc: YarnRPC = YarnRPC.create(conf)
  // YARN NodeManager客戶端
  var nmClient: NMClient = _

  def run(): Unit = {
    logDebug("Starting Executor Container")
    nmClient = NMClient.createNMClient()
    nmClient.init(conf)
    nmClient.start()
    // NodeManager啟動Container
    startContainer()
  }
  // ...
}  

**startContainer**  
```scala
def startContainer(): java.util.Map[String, ByteBuffer] = {
    // 1. 準備命令
    val commands = prepareCommand()

    // 2. Send the start request to the ContainerManager
    try {
      nmClient.startContainer(container.get, ctx)
    } catch {
      // ...
    }
}

準備執行Executor的命令指令碼。
prepareCommand
jps看到的CoarseGrainedExecutorBackend

private def prepareCommand(): List[String] = {
    // Extra options for the JVM
    // 最終的執行命令列指令碼
    val commands = prefixEnv ++
      Seq(Environment.JAVA_HOME.$$() + "/bin/java", "-server") ++
      javaOpts ++
      Seq("org.apache.spark.executor.YarnCoarseGrainedExecutorBackend",
        "--driver-url", masterAddress,
        "--executor-id", executorId,
        "--hostname", hostname,
        "--cores", executorCores.toString,
        "--app-id", appId,
        "--resourceProfileId", resourceProfileId.toString) ++
      userClassPath ++
      Seq(
        s"1>${ApplicationConstants.LOG_DIR_EXPANSION_VAR}/stdout",
        s"2>${ApplicationConstants.LOG_DIR_EXPANSION_VAR}/stderr")

    // TODO: it would be nicer to just make sure there are no null commands here
    commands.map(s => if (s == null) "null" else s).toList
}

實際執行Container(AM(Driver)或Executor)

實際執行Container是在如下原始碼位置:
Hadoop原始碼:
org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
根據StartContainerRequest獲取相關引數。
比如ContainerLaunchContext物件。
首先在ContainerManagerImpl的構造方法裡面看到:

// ContainerManager level dispatcher.
dispatcher = new AsyncDispatcher();

containersLauncher = createContainersLauncher(context, exec);

dispatcher.register(ContainersLauncherEventType.class, containersLauncher);

然後看startContainerInternal方法:

LOG.info("Start request for " + containerIdStr + " by user " + user);
// 1. 從request中獲取ContainerLaunchContext物件
ContainerLaunchContext launchContext = request.getContainerLaunchContext();
// 安全認證資訊
Credentials credentials = parseCredentials(launchContext);

// 2. 建立Container
Container container =
    new ContainerImpl(getConfig(), this.dispatcher,
        launchContext, credentials, metrics, containerTokenIdentifier,
        context);

// 3. 初始化Container 
// 處理Container的狀態變化
// ContainerImpl.java#StateMachineFactory#.addTransition#ContainerEventType.INIT_CONTAINER, new RequestResourcesTransition()
// 會走到ContainerImpl.java#RequestResourcesTransition#transition#container.sendLaunchEvent();
// 最後ContainersLauncher#handle會處理ContainersLauncherEvent事件
dispatcher.getEventHandler().handle(
          new ApplicationContainerInitEvent(container))

ContainersLauncher#handle

switch (event.getType()) {
  // 啟動Container
  case LAUNCH_CONTAINER:
    Application app =
      context.getApplications().get(
          containerId.getApplicationAttemptId().getApplicationId());

    // public class ContainerLaunch implements Callable<Integer> {...}     
    ContainerLaunch launch =
        new ContainerLaunch(context, getConfig(), dispatcher, exec, app,
          event.getContainer(), dirsHandler, containerManager);
    // launch是一個Callable
    // containerLauncher是一個Executors.newCachedThreadPool
    containerLauncher.submit(launch);

    running.put(containerId, launch);
    break;

最終執行ContainerLaunch#call方法

// Write out the environment 這裡包含啟動具體Executor程式的命令
exec.writeLaunchEnv(containerScriptOutStream, environment, localResources,
    launchContext.getCommands());

writeLaunchEnv方法

public void writeLaunchEnv(OutputStream out, Map<String, String> environment, Map<Path, List<String>> resources, List<String> command) throws IOException{
	// 建立Shell指令碼 
	// return Shell.WINDOWS ? new WindowsShellScriptBuilder() : new UnixShellScriptBuilder();
    ContainerLaunch.ShellScriptBuilder sb = ContainerLaunch.ShellScriptBuilder.create();
    if (environment != null) {
      for (Map.Entry<String,String> env : environment.entrySet()) {
        sb.env(env.getKey().toString(), env.getValue().toString());
      }
    }
    if (resources != null) {
      for (Map.Entry<Path,List<String>> entry : resources.entrySet()) {
        for (String linkName : entry.getValue()) {
          sb.symlink(entry.getKey(), new Path(linkName));
        }
      }
    }

    // 具體執行Shell
    // UnixShellScriptBuilder
    // line("exec /bin/bash -c \"", StringUtils.join(" ", command), "\"");
    // 比如 $JAVA_HOME/bin/java -server xxxx
    sb.command(command);

    PrintStream pout = null;
    try {
      pout = new PrintStream(out, false, "UTF-8");
      sb.write(pout);
    } finally {
      if (out != null) {
        out.close();
      }
    }
}

附:ContainerState

public enum ContainerState {
  NEW, LOCALIZING, LOCALIZATION_FAILED, LOCALIZED, RUNNING, EXITED_WITH_SUCCESS,
  EXITED_WITH_FAILURE, KILLING, CONTAINER_CLEANEDUP_AFTER_KILL,
  CONTAINER_RESOURCES_CLEANINGUP, DONE
}

再後面就到了:
org.apache.spark.executor.YarnCoarseGrainedExecutorBackend

說明:

  1. 程式中通過反射拿到class呼叫main方法,是在原主程式中執行程式,不會建立新的程式;
  2. 程式中通過java -server執行可執行的class,是會建立新的程式,jps -l可以看到程式對應的java main class qualified name,而原主程式可能會自然結束。

相關文章