Samza的ApplicationMaster

devos發表於2014-04-26

當Samza ApplicationMaster啟動時,它做以下的事情:

  1. 通過STREAMING_CONFIG環境變數從YARN獲取配置資訊(configuration)
  2. 在隨機埠上 啟動一個JMX server
  3. 例項化一個metrics registry和reporter來追蹤計量資訊
  4. 將AM向YARN的RM註冊
  5. 使用每個stream的PartitionManager來獲取總共的partition數量
  6. 從Samza的job configuration裡獲取總的container數量
  7. 將partition分給container(在Samza AM的dashboard裡,稱為Task Group)
  8. 為每個container向YARN傳送一個ResourceRequest
  9. 每秒向YARN RM poll一次,檢查allocated and released containers
SamzaAppMaster的實現
 
並不是提交AppMaster,只是向RM註冊這個AppMaster。因為此時,AppMaster已經啟動了。
1.在SamzaAppMasterLifecycle物件的onInit()方法中,使用amCient.registerApplicationMaster
2    val response = amClient.registerApplicationMaster (host , state.rpcPort, "%s:%d" format (host, state. trackingPort))
 
amClient物件的類:
 
org.apache.hadoop.yarn.client.api.async.AMRMClientAsync<T extends org.apache.hadoop.yarn.client.api.AMRMClient.ContainerRequest>
 

AMRMClientAsync handles communication with the ResourceManager and provides asynchronous updates on events such as container allocations and completions. It contains a thread that sends periodic heartbeats to the ResourceManager. It should be used by implementing a CallbackHandler:

 class MyCallbackHandler implements AMRMClientAsync.CallbackHandler {
   public void onContainersAllocated(List<Container> containers) {
     [run tasks on the containers]
   }
   
   public void onContainersCompleted(List<ContainerStatus> statuses) {
     [update progress, check whether app is done]
   }
   
   public void onNodesUpdated(List<NodeReport> updated) {}
   
   public void onReboot() {}
 }
 
 
The client's lifecycle should be managed similarly to the following:
 AMRMClientAsync asyncClient = 
     createAMRMClientAsync(appAttId, 1000, new MyCallbackhandler());
 asyncClient.init(conf);
 asyncClient.start();
 RegisterApplicationMasterResponse response = asyncClient
    .registerApplicationMaster(appMasterHostname, appMasterRpcPort,
       appMasterTrackingUrl);
 asyncClient.addContainerRequest(containerRequest);
 [... wait for application to complete]
 asyncClient.unregisterApplicationMaster(status, appMsg, trackingUrl);
 asyncClient.stop();
這個類是用來做為一個Client和RM進行通訊,並且註冊一個用於回撥的物件來處理container 的allocation和completion事件。它啟動一個執行緒,週期性地傳送hearbeat至ResourceManager

相關文章