Samza在YARN上的啟動過程 =》 之二 submitApplication

devos發表於2014-05-10

首先,來看怎麼構造一個org.apache.hadoop.yarn.client.api.YarnClient

class ClientHelper(conf: Configuration) extends Logging {
  val yarnClient = YarnClient.createYarnClient
  info("trying to connect to RM %s" format conf.get(YarnConfiguration.RM_ADDRESS, YarnConfiguration.DEFAULT_RM_ADDRESS))
  yarnClient.init(conf);
  yarnClient.start

  !!!這個client還有個start方法,看來它跟RM很談得來。的確,它實現了service這個介面。 好吧,它是一個服務。在YarnJobFactory中,我們用yarn-site.xml構造了一個YarnConfiguration物件,現在用它來初始化YarnClient,因為我們至少需要RM在哪,對不?

下邊分幾部分看submitApplication方法的實現

第一次呼叫YarnClient - 獲取資訊

def submitApplication(packagePath: Path, memoryMb: Int, cpuCore: Int, cmds: List[String], env: Option[Map[String, String]], name: Option[String]): Option[ApplicationId] = {
    val app = yarnClient.createApplication
    val newAppResponse = app.getNewApplicationResponse
    var mem = memoryMb
    var cpu = cpuCore

    // If we are asking for memory more than the max allowed, shout out
    if (mem > newAppResponse.getMaximumResourceCapability().getMemory()) {
      throw new SamzaException("You're asking for more memory (%s) than is allowed by YARN: %s" format
        (mem, newAppResponse.getMaximumResourceCapability().getMemory()))
    }

    // If we are asking for cpu more than the max allowed, shout out
    if (cpu > newAppResponse.getMaximumResourceCapability().getVirtualCores()) {
      throw new SamzaException("You're asking for more CPU (%s) than is allowed by YARN: %s" format
        (cpu, newAppResponse.getMaximumResourceCapability().getVirtualCores()))
    }

    appId = Some(newAppResponse.getApplicationId)

  首先通過yarnClient的createApplication方法獲取一個YarnClientApplication物件。這是對RM的第一次請求,那麼這次請求能得到什麼資訊呢?

通過這次請求得到的YarnClientApplication物件有兩個方法:

  1. getApplicationSubmissionContext() , 它返回一個 ApplicationSubmissionContext物件。“ApplicationSubmissionContext represents all of the information needed by the ResourceManager to launch the ApplicationMaster for an application.”
  2. getNewApplicationResponse(), 它返回一個GetNewApplicationResponse物件。

鑑於YarnClient的createApplication方法沒有任何引數,而YarnClient本身的狀態中由使用者指定的部分只是YarnConfiguration的內容,因此這個createApplication方法並不會告訴YARN客戶端對資源的需求,因此它返回的app物件只包含了yarn的RM本身的資訊。

在獲取了app這個物件之後,submitApplication方法通過

 val newAppResponse = app.getNewApplicationResponse

從中取出了newAppResponse這個物件,然後從中取出了當前YARN叢集最多支援的記憶體和CPU數目(TODO:這個值是當前可用的資源的值,還是整體上最大資源值)。然後對比給AM申請的container想要的記憶體和CPU,如果超出了YARN支援的最大值,就丟擲異常。

否則,就把從newAppResponse中獲取的applicationId賦給appId。看來在第一次請求時,YARN就給分配了appId,只是這個appId,並不和資源關聯。

第二呼叫YarnClient - 提交job

 如果資源足夠,AM就可以提交,那就開始填寫AM執行需要的資源,具體來說就是組裝ApplicationSubmissionContext類的一個物件

    name match {
      case Some(name) => { appCtx.setApplicationName(name) }
      case None => { appCtx.setApplicationName(appId.toString) }
    }

    env match {
      case Some(env) => {
        containerCtx.setEnvironment(env)
        info("set environment variables to %s for %s" format (env, appId.get))
      }
      case None => None
    }

    // set the local package so that the containers and app master are provisioned with it
    val packageUrl = ConverterUtils.getYarnUrlFromPath(packagePath)
    val fileStatus = packagePath.getFileSystem(conf).getFileStatus(packagePath)

    packageResource.setResource(packageUrl)
    info("set package url to %s for %s" format (packageUrl, appId.get))
    packageResource.setSize(fileStatus.getLen)
    info("set package size to %s for %s" format (fileStatus.getLen, appId.get))
    packageResource.setTimestamp(fileStatus.getModificationTime)
    packageResource.setType(LocalResourceType.ARCHIVE)
    packageResource.setVisibility(LocalResourceVisibility.APPLICATION)

    resource.setMemory(mem)
    info("set memory request to %s for %s" format (mem, appId.get))
    resource.setVirtualCores(cpu)
    info("set cpu core request to %s for %s" format (cpu, appId.get))
    appCtx.setResource(resource)
    containerCtx.setCommands(cmds.toList)
    info("set command to %s for %s" format (cmds, appId.get))
    containerCtx.setLocalResources(Collections.singletonMap("__package", packageResource))
    appCtx.setApplicationId(appId.get)
    info("set app ID to %s" format appId.get)
    appCtx.setAMContainerSpec(containerCtx)
    appCtx.setApplicationType(ClientHelper.applicationType)
    info("submitting application request for %s" format appId.get)
    yarnClient.submitApplication(appCtx)

 這段程式碼設定了一個ApplicationSubmissionContext物件,然後再用yarnClient把它提交。這樣就提交了一個YARN job。 

這樣YarnClient一共用了兩次,初始一次請求,獲取appID和YARN的資源上限的情況,第二次請求,真正提交job。

  這段程式碼讓我有些疑惑。首先appCtx大致分為兩部分,一部分是job的資訊,比如application type和application ID,另一部分和AM有關。和AM有關的部分又可以分成兩塊: 1. cpu和記憶體的大小,這兩個資源組裝在Resource這個類的物件裡,由setResource設定到 appCtx中 2:執行container所需的命令和檔案、環量變數,這部分設定在一個ContainerLaunchContext物件中,然後這個物件再被調置在appCtx中。疑惑的地方在於:為什麼AM所需的資源要分成兩部分呢?cpu和記憶體本就該是container申請的一部分呀?

看看API裡關於containerLaunchContext類的說明,就更不明白了

ContainerLaunchContext represents all of the information needed by the NodeManager to launch a container.

It includes details such as:

  • ContainerId of the container.
  • Resource allocated to the container.
  • User to whom the container is allocated.
  • Security tokens (if security is enabled).
  • LocalResource necessary for running the container such as binaries, jar, shared-objects, side-files etc.
  • Optional, application-specific binary service data.
  • Environment variables for the launched process.
  • Command to launch the container.

好吧,“Resource allocated to the container.”, 這一條ContainerLanchContext並沒有體現,在它提供的方法中並不能設定Resource。這不是騙人嗎?

而appCtx卻有單獨的一個setAMContainerSpec 方法來設定Resource。那麼在申請執行task所需的container時,如果說明其所需的資源呢?看來一定不是用了這個ContainerLaunchContext物件。

 

兩個不同的協議

 

Samza AM為task申請container的程式碼在SamzaAppMasterTaskManager這個類裡

  protected def requestContainers(memMb: Int, cpuCores: Int, containers: Int) {
    info("Requesting %d container(s) with %dmb of memory" format (containers, memMb))
    val capability = Records.newRecord(classOf[Resource])
    val priority = Records.newRecord(classOf[Priority])
    priority.setPriority(0)
    capability.setMemory(memMb)
    capability.setVirtualCores(cpuCores)
    (0 until containers).foreach(idx => amClient.addContainerRequest(new ContainerRequest(capability, null, null, priority)))
  }

  這裡的amClient就是org.apache.hadoop.yarn.client.api.async.AMRMClientAsync類的物件。它用來和RM聯絡,處理container相關的事情。當AM請求container時,它就不用submitApplication中為AM設定container資源所需的那套動作了,而是使用ContainerRequest這類。而且ContainerRequest的構造方法中

public ContainerRequest(Resource capability, String[] nodes, String[] racks, Priority priority, boolean relaxLocality) 

使用了Resource做為引數。

可見為AM申請container和為task申請container走的過程的確不一樣。畢竟,為AM的執行申請container是作為提交任務的一部分最終發現兩個是使用的不同的協議。提交任務時,使用的是這個協議:

message ApplicationSubmissionContextProto {
    optional ApplicationIdProto application_id = 1;
    optional string application_name = 2 [default = "N/A"];
    optional string queue = 3 [default = "default"];
    optional PriorityProto priority = 4;
    optional ContainerLaunchContextProto am_container_spec = 5;
    optional bool cancel_tokens_when_complete = 6 [default = true];
    optional bool unmanaged_am = 7 [default = false];
    optional int32 maxAppAttempts = 8 [default = 0];
    optional ResourceProto resource = 9;
    optional string applicationType = 10 [default = "YARN"];
}

message ContainerLaunchContextProto {
    repeated StringLocalResourceMapProto localResources = 1;
    optional bytes tokens = 2;
    repeated StringBytesMapProto service_data = 3;
    repeated StringStringMapProto environment = 4;
    repeated string command = 5;
    repeated ApplicationACLMapProto application_ACLs = 6;
}

ContainerLaunchContextProto里根本沒有代表cpu和記憶體資源的ResourceProto,這個Protocol是在ApplicationSubmissionContextProto裡。對照containerLaunchContext類的說明,的確顯得很奇怪。

而申請container的請求,走的是

message ResourceRequestProto {
  optional PriorityProto priority = 1;
  optional string resource_name = 2;
  optional ResourceProto capability = 3;
  optional int32 num_containers = 4;
  optional bool relax_locality = 5 [default = true];
}

message ResourceProto {
  optional int32 memory = 1;
  optional int32 virtual_cores = 2;
}

 

相關文章