Samza在YARN上的啟動過程 =》之二 submitApplication

devos發表於2014-05-10

原文網址 : https://www.cnblogs.com/devos/p/3720174.html

首先，來看怎麼構造一個org.apache.hadoop.yarn.client.api.YarnClient

class ClientHelper(conf: Configuration) extends Logging {
  val yarnClient = YarnClient.createYarnClient
  info("trying to connect to RM %s" format conf.get(YarnConfiguration.RM_ADDRESS, YarnConfiguration.DEFAULT_RM_ADDRESS))
  yarnClient.init(conf);
  yarnClient.start

　　！！！這個client還有個start方法，看來它跟RM很談得來。的確，它實現了service這個介面。好吧，它是一個服務。在YarnJobFactory中，我們用yarn-site.xml構造了一個YarnConfiguration物件，現在用它來初始化YarnClient，因為我們至少需要RM在哪，對不？

下邊分幾部分看submitApplication方法的實現

第一次呼叫YarnClient - 獲取資訊

def submitApplication(packagePath: Path, memoryMb: Int, cpuCore: Int, cmds: List[String], env: Option[Map[String, String]], name: Option[String]): Option[ApplicationId] = {
    val app = yarnClient.createApplication
    val newAppResponse = app.getNewApplicationResponse
    var mem = memoryMb
    var cpu = cpuCore

    // If we are asking for memory more than the max allowed, shout out
    if (mem > newAppResponse.getMaximumResourceCapability().getMemory()) {
      throw new SamzaException("You're asking for more memory (%s) than is allowed by YARN: %s" format
        (mem, newAppResponse.getMaximumResourceCapability().getMemory()))
    }

    // If we are asking for cpu more than the max allowed, shout out
    if (cpu > newAppResponse.getMaximumResourceCapability().getVirtualCores()) {
      throw new SamzaException("You're asking for more CPU (%s) than is allowed by YARN: %s" format
        (cpu, newAppResponse.getMaximumResourceCapability().getVirtualCores()))
    }

    appId = Some(newAppResponse.getApplicationId)

　　首先通過yarnClient的createApplication方法獲取一個YarnClientApplication物件。這是對RM的第一次請求，那麼這次請求能得到什麼資訊呢？

通過這次請求得到的YarnClientApplication物件有兩個方法：

getApplicationSubmissionContext() ，它返回一個 ApplicationSubmissionContext物件。“ApplicationSubmissionContext represents all of the information needed by the ResourceManager to launch the ApplicationMaster for an application.”
getNewApplicationResponse()，它返回一個GetNewApplicationResponse物件。

鑑於YarnClient的createApplication方法沒有任何引數，而YarnClient本身的狀態中由使用者指定的部分只是YarnConfiguration的內容，因此這個createApplication方法並不會告訴YARN客戶端對資源的需求，因此它返回的app物件只包含了yarn的RM本身的資訊。

在獲取了app這個物件之後，submitApplication方法通過

 val newAppResponse = app.getNewApplicationResponse

從中取出了newAppResponse這個物件，然後從中取出了當前YARN叢集最多支援的記憶體和CPU數目(TODO:這個值是當前可用的資源的值，還是整體上最大資源值)。然後對比給AM申請的container想要的記憶體和CPU，如果超出了YARN支援的最大值，就丟擲異常。

否則，就把從newAppResponse中獲取的applicationId賦給appId。看來在第一次請求時，YARN就給分配了appId，只是這個appId，並不和資源關聯。

第二呼叫YarnClient - 提交job

如果資源足夠，AM就可以提交，那就開始填寫AM執行需要的資源，具體來說就是組裝ApplicationSubmissionContext類的一個物件

    name match {
      case Some(name) => { appCtx.setApplicationName(name) }
      case None => { appCtx.setApplicationName(appId.toString) }
    }

    env match {
      case Some(env) => {
        containerCtx.setEnvironment(env)
        info("set environment variables to %s for %s" format (env, appId.get))
      }
      case None => None
    }

    // set the local package so that the containers and app master are provisioned with it
    val packageUrl = ConverterUtils.getYarnUrlFromPath(packagePath)
    val fileStatus = packagePath.getFileSystem(conf).getFileStatus(packagePath)

    packageResource.setResource(packageUrl)
    info("set package url to %s for %s" format (packageUrl, appId.get))
    packageResource.setSize(fileStatus.getLen)
    info("set package size to %s for %s" format (fileStatus.getLen, appId.get))
    packageResource.setTimestamp(fileStatus.getModificationTime)
    packageResource.setType(LocalResourceType.ARCHIVE)
    packageResource.setVisibility(LocalResourceVisibility.APPLICATION)

    resource.setMemory(mem)
    info("set memory request to %s for %s" format (mem, appId.get))
    resource.setVirtualCores(cpu)
    info("set cpu core request to %s for %s" format (cpu, appId.get))
    appCtx.setResource(resource)
    containerCtx.setCommands(cmds.toList)
    info("set command to %s for %s" format (cmds, appId.get))
    containerCtx.setLocalResources(Collections.singletonMap("__package", packageResource))
    appCtx.setApplicationId(appId.get)
    info("set app ID to %s" format appId.get)
    appCtx.setAMContainerSpec(containerCtx)
    appCtx.setApplicationType(ClientHelper.applicationType)
    info("submitting application request for %s" format appId.get)
    yarnClient.submitApplication(appCtx)

　這段程式碼設定了一個ApplicationSubmissionContext物件，然後再用yarnClient把它提交。這樣就提交了一個YARN job。　

這樣YarnClient一共用了兩次，初始一次請求，獲取appID和YARN的資源上限的情況，第二次請求，真正提交job。

　　這段程式碼讓我有些疑惑。首先appCtx大致分為兩部分，一部分是job的資訊，比如application type和application ID，另一部分和AM有關。和AM有關的部分又可以分成兩塊： 1. cpu和記憶體的大小，這兩個資源組裝在Resource這個類的物件裡，由setResource設定到 appCtx中 2：執行container所需的命令和檔案、環量變數，這部分設定在一個ContainerLaunchContext物件中，然後這個物件再被調置在appCtx中。疑惑的地方在於：為什麼AM所需的資源要分成兩部分呢？cpu和記憶體本就該是container申請的一部分呀？

看看API裡關於containerLaunchContext類的說明，就更不明白了

ContainerLaunchContext represents all of the information needed by the NodeManager to launch a container.

It includes details such as:

ContainerId of the container.

Resource allocated to the container.

User to whom the container is allocated.

Security tokens (if security is enabled).

LocalResource necessary for running the container such as binaries, jar, shared-objects, side-files etc.

Optional, application-specific binary service data.

Environment variables for the launched process.

Command to launch the container.

好吧，“Resource allocated to the container.”, 這一條ContainerLanchContext並沒有體現，在它提供的方法中並不能設定Resource。這不是騙人嗎？

而appCtx卻有單獨的一個setAMContainerSpec 方法來設定Resource。那麼在申請執行task所需的container時，如果說明其所需的資源呢？看來一定不是用了這個ContainerLaunchContext物件。

兩個不同的協議

Samza AM為task申請container的程式碼在SamzaAppMasterTaskManager這個類裡

  protected def requestContainers(memMb: Int, cpuCores: Int, containers: Int) {
    info("Requesting %d container(s) with %dmb of memory" format (containers, memMb))
    val capability = Records.newRecord(classOf[Resource])
    val priority = Records.newRecord(classOf[Priority])
    priority.setPriority(0)
    capability.setMemory(memMb)
    capability.setVirtualCores(cpuCores)
    (0 until containers).foreach(idx => amClient.addContainerRequest(new ContainerRequest(capability, null, null, priority)))
  }

　　這裡的amClient就是org.apache.hadoop.yarn.client.api.async.AMRMClientAsync類的物件。它用來和RM聯絡，處理container相關的事情。當AM請求container時，它就不用submitApplication中為AM設定container資源所需的那套動作了，而是使用ContainerRequest這類。而且ContainerRequest的構造方法中

public ContainerRequest(Resource capability, String[] nodes, String[] racks, Priority priority, boolean relaxLocality)

使用了Resource做為引數。

可見為AM申請container和為task申請container走的過程的確不一樣。畢竟，為AM的執行申請container是作為提交任務的一部分。最終發現兩個是使用的不同的協議。提交任務時，使用的是這個協議：

message ApplicationSubmissionContextProto {
    optional ApplicationIdProto application_id = 1;
    optional string application_name = 2 [default = "N/A"];
    optional string queue = 3 [default = "default"];
    optional PriorityProto priority = 4;
    optional ContainerLaunchContextProto am_container_spec = 5;
    optional bool cancel_tokens_when_complete = 6 [default = true];
    optional bool unmanaged_am = 7 [default = false];
    optional int32 maxAppAttempts = 8 [default = 0];
    optional ResourceProto resource = 9;
    optional string applicationType = 10 [default = "YARN"];
}

message ContainerLaunchContextProto {
    repeated StringLocalResourceMapProto localResources = 1;
    optional bytes tokens = 2;
    repeated StringBytesMapProto service_data = 3;
    repeated StringStringMapProto environment = 4;
    repeated string command = 5;
    repeated ApplicationACLMapProto application_ACLs = 6;
}

ContainerLaunchContextProto里根本沒有代表cpu和記憶體資源的ResourceProto，這個Protocol是在ApplicationSubmissionContextProto裡。對照containerLaunchContext類的說明，的確顯得很奇怪。

而申請container的請求，走的是

message ResourceRequestProto {
  optional PriorityProto priority = 1;
  optional string resource_name = 2;
  optional ResourceProto capability = 3;
  optional int32 num_containers = 4;
  optional bool relax_locality = 5 [default = true];
}

message ResourceProto {
  optional int32 memory = 1;
  optional int32 virtual_cores = 2;
}

Angular的啟動過程
2018-08-29
Angular
main的啟動過程
2020-12-20
AI
app的啟動過程（三）
2019-03-15
APP
Service啟動過程
2018-09-08
SpringBoot啟動過程
2024-03-15
Spring Boot
Windows 啟動過程
2022-06-30
Windows
App 啟動過程（含 Activity 啟動過程） | 安卓 offer 收割基
2018-09-30
APP安卓
根Activity元件的啟動過程
2019-03-04
元件
在Linux中，開機啟動過程是什麼？
2024-06-07
Linux
Spring啟動過程（一）
2018-05-02
Spring
Linux 啟動過程分析
2018-03-13
Linux
Android App啟動過程
2019-09-03
AndroidAPP
SpringBoot 系列-啟動過程
2019-12-09
Spring Boot
jmeter 啟動過程剖析
2020-10-27
JMeter
iOS App啟動過程
2024-06-25
iOSAPP
Liferay 啟動過程分析
2021-09-09
Spring Boot 啟動過程
2021-03-13
Spring Boot
走近原始碼：Redis的啟動過程
2019-01-07
原始碼Redis
作業系統啟動的過程
2024-06-27
作業系統
Cypress 本身啟動過程的除錯
2022-09-26
除錯
Linux的啟動過程及init程式
2021-02-25
Linux
DUBBO服務啟動過程
2018-08-03
Linux系統啟動過程
2024-05-07
Linux
Linux核心Kernel啟動過程
2024-05-28
Linux
HDFS啟動過程+安全模式
2020-11-10
模式
計算機啟動過程
2019-03-22
計算機
Spring Security 啟動過程分析
2021-09-09
Spring
Eureka Server啟動過程分析
2021-01-03
Server
Yarn的排程器
2023-10-02
Yarn
7.neutron-server啟動——neutron api啟動過程
2021-01-01
ServerAPI
Activity的啟動過程第二篇
2019-03-01
一張圖弄清Activity的啟動過程
2019-03-06
深入理解 iOS App 的啟動過程
2018-11-27
iOSAPP
springboot啟動過程中常用的回撥
2022-01-09
Spring Boot
Spring啟動過程——原始碼分析
2019-04-14
Spring原始碼
redis啟動初始化過程
2018-08-11
Redis
【原始碼】Redis Server啟動過程
2022-02-07
原始碼RedisServer
Android效能優化之啟動過程（冷啟動和熱啟動）
2019-09-24
Android優化
hadoop實戰2-更改指定hostname啟動hadoop,jps介紹，yarn部署，yarn上執行程式
2019-04-03
HadoopYarn行程

Samza在YARN上的啟動過程 =》 之二 submitApplication

第一次呼叫YarnClient - 獲取資訊

第二呼叫YarnClient - 提交job

兩個不同的協議

相關文章

Samza在YARN上的啟動過程 =》之二 submitApplication