hadoop如何得到JobID———JobSubmitter的submitJobInternal分析

bgpydpyr發表於2016-03-02

Job類提交作業會呼叫JobSubmitter的submitJobInternal看看它的原始碼

JobStatus submitJobInternal(Job job, Cluster cluster) throws ClassNotFoundException, InterruptedException, IOException {

//validate the jobs output specs 
checkSpecs(job);

Configuration conf = job.getConfiguration();
addMRFrameworkToDistributedCache(conf);

Path jobStagingArea = JobSubmissionFiles.getStagingDir(cluster, conf);
//*****myAdd
LOG.info("---->jobStagingArea:  "+jobStagingArea);
//configure the command line options correctly on the submitting dfs
InetAddress ip = InetAddress.getLocalHost();
if (ip != null) {
  submitHostAddress = ip.getHostAddress();
  submitHostName = ip.getHostName();
  conf.set(MRJobConfig.JOB_SUBMITHOST,submitHostName);
  conf.set(MRJobConfig.JOB_SUBMITHOSTADDR,submitHostAddress);
}
JobID jobId = submitClient.getNewJobID();
job.setJobID(jobId);
Path submitJobDir = new Path(jobStagingArea, jobId.toString());
JobStatus status = null;
try {
  conf.set(MRJobConfig.USER_NAME,
      UserGroupInformation.getCurrentUser().getShortUserName());
  conf.set("hadoop.http.filter.initializers", 
      "org.apache.hadoop.yarn.server.webproxy.amfilter.AmFilterInitializer");
  conf.set(MRJobConfig.MAPREDUCE_JOB_DIR, submitJobDir.toString());
  LOG.debug("Configuring job " + jobId + " with " + submitJobDir 
      + " as the submit dir");
  // get delegation token for the dir
  TokenCache.obtainTokensForNamenodes(job.getCredentials(),
      new Path[] { submitJobDir }, conf);

  populateTokenCache(conf, job.getCredentials());

  // generate a secret to authenticate shuffle transfers
  if (TokenCache.getShuffleSecretKey(job.getCredentials()) == null) {
    KeyGenerator keyGen;
    try {

      int keyLen = CryptoUtils.isShuffleEncrypted(conf) 
          ? conf.getInt(MRJobConfig.MR_ENCRYPTED_INTERMEDIATE_DATA_KEY_SIZE_BITS, 
              MRJobConfig.DEFAULT_MR_ENCRYPTED_INTERMEDIATE_DATA_KEY_SIZE_BITS)
          : SHUFFLE_KEY_LENGTH;
      keyGen = KeyGenerator.getInstance(SHUFFLE_KEYGEN_ALGORITHM);
      keyGen.init(keyLen);
    } catch (NoSuchAlgorithmException e) {
      throw new IOException("Error generating shuffle secret key", e);
    }
    SecretKey shuffleKey = keyGen.generateKey();
    TokenCache.setShuffleSecretKey(shuffleKey.getEncoded(),
        job.getCredentials());
  }

  copyAndConfigureFiles(job, submitJobDir);




  Path submitJobFile = JobSubmissionFiles.getJobConfPath(submitJobDir);

  // Create the splits for the job
  LOG.debug("Creating splits at " + jtFs.makeQualified(submitJobDir));
  int maps = writeSplits(job, submitJobDir);
  conf.setInt(MRJobConfig.NUM_MAPS, maps);
  LOG.info("number of splits:" + maps);

  // write "queue admins of the queue to which job is being submitted"
  // to job file.
  String queue = conf.get(MRJobConfig.QUEUE_NAME,
      JobConf.DEFAULT_QUEUE_NAME);
  AccessControlList acl = submitClient.getQueueAdmins(queue);
  conf.set(toFullPropertyName(queue,
      QueueACL.ADMINISTER_JOBS.getAclName()), acl.getAclString());

  // removing jobtoken referrals before copying the jobconf to HDFS
  // as the tasks don't need this setting, actually they may break
  // because of it if present as the referral will point to a
  // different job.
  TokenCache.cleanUpTokenReferral(conf);

  if (conf.getBoolean(
      MRJobConfig.JOB_TOKEN_TRACKING_IDS_ENABLED,
      MRJobConfig.DEFAULT_JOB_TOKEN_TRACKING_IDS_ENABLED)) {
    // Add HDFS tracking ids
    ArrayList<String> trackingIds = new ArrayList<String>();
    for (Token<? extends TokenIdentifier> t :
        job.getCredentials().getAllTokens()) {
      trackingIds.add(t.decodeIdentifier().getTrackingId());
    }
    conf.setStrings(MRJobConfig.JOB_TOKEN_TRACKING_IDS,
        trackingIds.toArray(new String[trackingIds.size()]));
  }

  // Set reservation info if it exists
  ReservationId reservationId = job.getReservationId();
  if (reservationId != null) {
    conf.set(MRJobConfig.RESERVATION_ID, reservationId.toString());
  }

  // Write job file to submit dir
  writeConf(conf, submitJobFile);

  //
  // Now, actually submit the job (using the submit name)
  //
  printTokens(jobId, job.getCredentials());
  status = submitClient.submitJob(
      jobId, submitJobDir.toString(), job.getCredentials());
  if (status != null) {
    return status;
  } else {
    throw new IOException("Could not launch job");
  }
} finally {
  if (status == null) {
    LOG.info("Cleaning up the staging area " + submitJobDir);
    if (jtFs != null && submitJobDir != null)
      jtFs.delete(submitJobDir, true);

  }
}

獲取JobID的程式碼為 JobID jobId = submitClient.getNewJobID(); 這裡的submitClient為YarnRunner，呼叫它的getNewJobID()會呼叫它內部的resMgrDelegate.getNewJobID(); 在ResourceMgrDelegate類中

public JobID getNewJobID() throws IOException, InterruptedException

try {
  this.application = client.createApplication().getApplicationSubmissionContext();
  this.applicationId = this.application.getApplicationId();
  return TypeConverter.fromYarn(applicationId);
} catch (YarnException e) {
  throw new IOException(e);
}

ResourceMgrDelegate中的client為YarnClientImpl 在YarnClientImpl類中createApplication()的原始碼如下

@Override public YarnClientApplication createApplication() throws YarnException, IOException {

ApplicationSubmissionContext context = Records.newRecord
    (ApplicationSubmissionContext.class);
GetNewApplicationResponse newApp = getNewApplication();
ApplicationId appId = newApp.getApplicationId();
context.setApplicationId(appId);
return new YarnClientApplication(newApp, context);

}

getNewApplication（）原始碼如下

private GetNewApplicationResponse getNewApplication() throws YarnException, IOException {

GetNewApplicationRequest request =
    Records.newRecord(GetNewApplicationRequest.class);
//The interface used by clients to obtain a new ApplicationId for submitting new applications.
//The ResourceManager responds with a new, monotonically increasing, ApplicationId which is used by the client to submit a new application.
return rmClient.getNewApplication(request);

}

總結：

獲取JobID的大致流程如下

1、提交作業的客戶端YarnRunner呼叫getNewJobID方法，內部呼叫ResourceMgrDelegate的getNewJobID

2、ResourceMgrDelegate呼叫內部成員client(實際上是YarnClientImpl)的CreateApplication方法建立一個YarnApplication

3、YarnApplication建立流程為：①構造一個 ApplicationSubmissionContext context物件②構造一個GetNewApplicationRequest request ，③呼叫 rmClient.getNewApplication(request)獲得一個GetNewApplicationResponse newApp物件 newApp中則包含了ResourceManager分配的ApplicationId.

4.呼叫context的setApplicationId設定ApplicationId,將ResourceMgrDelegate的內部成員Application設定為context

5.將ResourceMgrDelegate的內部成員ApplicationId設定為context的ApplicationId

深度分析如何在Hadoop中控制Map的數量
2014-05-03
Hadoop
如何得到Oracle Patch (zt)
2006-12-29
Oracle
Hadoop的GroupComparator是如何起做用的（原始碼分析）
2016-05-07
Hadoop原始碼
如何讓資料分析產生價值得到業務方認可
2013-08-19
【索引】使用索引分析快速得到索引的基本資訊
2009-11-27
索引
希望萬變不離其中，先分析如何得到keyfile的部分 (7千字)
2015-11-15
如何用FGA得到繫結變數的值
2019-07-12
變數
如何得到繫結變數的輸入值
2017-04-18
變數
亞馬遜的Alexa的語義分析效能得到大幅度提高
2019-05-09
亞馬遜
hadoop原始碼分析
2022-09-21
Hadoop原始碼
如何得到暴雪娛樂公司的工作機會？
2018-10-04
如何及時得到Jdon網站的內容更新？
2011-08-02
網站
如何直接在頁面得到系統的時間
2004-04-21
如何得到一個隨機密碼
2020-04-04
隨機密碼
如何在Google得到一份工作
2014-02-25
Go
MySQL中如何得到許可權資訊
2017-08-11
MySql
如何得到javax.servlet.jsp包？
2004-08-22
JavaServletJS
Hadoop的大資料分析技術
2014-02-27
Hadoop大資料
hadoop中的TextInputFormat類原始碼分析
2014-05-06
HadoopORM原始碼
如何在耗時較長的操作完成後得到提醒？
2020-04-06
在Oracle中，如何得到真實的執行計劃？
2018-04-13
Oracle
Aaron Swartz：如何得到一份像我這樣的工作？
2014-09-01
hibernate中如何得到統計資料？
2005-05-16
超越Hadoop的大資料分析之致謝
2017-05-02
Hadoop大資料
Hadoop的Server及其執行緒模型分析
2015-07-28
HadoopServer執行緒模型
Hadoop 的 Server 及其執行緒模型分析
2015-07-26
HadoopServer執行緒模型
如何學習Hadoop
2020-11-05
Hadoop
搭建直播平臺，android 如何得到本地影片的縮圖
2023-04-03
Android
《劍指offer》:[37]如何得到連結串列環的入口地址
2016-06-19
Hadoop2原始碼分析－HDFS核心模組分析
2015-06-04
Hadoop原始碼
如何在《狂怒 2》中得到最佳遊戲體驗
2019-06-05
遊戲
Java 工程師如何得到一個好 Offer
2018-04-12
Java工程師
Hadoop2原始碼分析－Hadoop V2初識
2015-04-15
Hadoop原始碼
如何掌握Spark和Hadoop的架構
2019-07-17
SparkHadoop架構
Hadoop如何設定HDFS的塊大小
2017-03-27
Hadoop
如何高效的閱讀hadoop原始碼？
2015-04-29
Hadoop原始碼
得到promise
2017-09-14
Promise
Hadoop學習——Client原始碼分析
2019-04-06
Hadoopclient原始碼

hadoop如何得到JobID———JobSubmitter的submitJobInternal分析

相關文章