hadoop如何得到JobID———JobSubmitter的submitJobInternal分析

bgpydpyr發表於2016-03-02

Job類提交作業會呼叫JobSubmitter的submitJobInternal看看它的原始碼

JobStatus submitJobInternal(Job job, Cluster cluster) throws ClassNotFoundException, InterruptedException, IOException {

//validate the jobs output specs 
checkSpecs(job);

Configuration conf = job.getConfiguration();
addMRFrameworkToDistributedCache(conf);

Path jobStagingArea = JobSubmissionFiles.getStagingDir(cluster, conf);
//*****myAdd
LOG.info("---->jobStagingArea:  "+jobStagingArea);
//configure the command line options correctly on the submitting dfs
InetAddress ip = InetAddress.getLocalHost();
if (ip != null) {
  submitHostAddress = ip.getHostAddress();
  submitHostName = ip.getHostName();
  conf.set(MRJobConfig.JOB_SUBMITHOST,submitHostName);
  conf.set(MRJobConfig.JOB_SUBMITHOSTADDR,submitHostAddress);
}
JobID jobId = submitClient.getNewJobID();
job.setJobID(jobId);
Path submitJobDir = new Path(jobStagingArea, jobId.toString());
JobStatus status = null;
try {
  conf.set(MRJobConfig.USER_NAME,
      UserGroupInformation.getCurrentUser().getShortUserName());
  conf.set("hadoop.http.filter.initializers", 
      "org.apache.hadoop.yarn.server.webproxy.amfilter.AmFilterInitializer");
  conf.set(MRJobConfig.MAPREDUCE_JOB_DIR, submitJobDir.toString());
  LOG.debug("Configuring job " + jobId + " with " + submitJobDir 
      + " as the submit dir");
  // get delegation token for the dir
  TokenCache.obtainTokensForNamenodes(job.getCredentials(),
      new Path[] { submitJobDir }, conf);

  populateTokenCache(conf, job.getCredentials());

  // generate a secret to authenticate shuffle transfers
  if (TokenCache.getShuffleSecretKey(job.getCredentials()) == null) {
    KeyGenerator keyGen;
    try {

      int keyLen = CryptoUtils.isShuffleEncrypted(conf) 
          ? conf.getInt(MRJobConfig.MR_ENCRYPTED_INTERMEDIATE_DATA_KEY_SIZE_BITS, 
              MRJobConfig.DEFAULT_MR_ENCRYPTED_INTERMEDIATE_DATA_KEY_SIZE_BITS)
          : SHUFFLE_KEY_LENGTH;
      keyGen = KeyGenerator.getInstance(SHUFFLE_KEYGEN_ALGORITHM);
      keyGen.init(keyLen);
    } catch (NoSuchAlgorithmException e) {
      throw new IOException("Error generating shuffle secret key", e);
    }
    SecretKey shuffleKey = keyGen.generateKey();
    TokenCache.setShuffleSecretKey(shuffleKey.getEncoded(),
        job.getCredentials());
  }

  copyAndConfigureFiles(job, submitJobDir);




  Path submitJobFile = JobSubmissionFiles.getJobConfPath(submitJobDir);

  // Create the splits for the job
  LOG.debug("Creating splits at " + jtFs.makeQualified(submitJobDir));
  int maps = writeSplits(job, submitJobDir);
  conf.setInt(MRJobConfig.NUM_MAPS, maps);
  LOG.info("number of splits:" + maps);

  // write "queue admins of the queue to which job is being submitted"
  // to job file.
  String queue = conf.get(MRJobConfig.QUEUE_NAME,
      JobConf.DEFAULT_QUEUE_NAME);
  AccessControlList acl = submitClient.getQueueAdmins(queue);
  conf.set(toFullPropertyName(queue,
      QueueACL.ADMINISTER_JOBS.getAclName()), acl.getAclString());

  // removing jobtoken referrals before copying the jobconf to HDFS
  // as the tasks don't need this setting, actually they may break
  // because of it if present as the referral will point to a
  // different job.
  TokenCache.cleanUpTokenReferral(conf);

  if (conf.getBoolean(
      MRJobConfig.JOB_TOKEN_TRACKING_IDS_ENABLED,
      MRJobConfig.DEFAULT_JOB_TOKEN_TRACKING_IDS_ENABLED)) {
    // Add HDFS tracking ids
    ArrayList<String> trackingIds = new ArrayList<String>();
    for (Token<? extends TokenIdentifier> t :
        job.getCredentials().getAllTokens()) {
      trackingIds.add(t.decodeIdentifier().getTrackingId());
    }
    conf.setStrings(MRJobConfig.JOB_TOKEN_TRACKING_IDS,
        trackingIds.toArray(new String[trackingIds.size()]));
  }

  // Set reservation info if it exists
  ReservationId reservationId = job.getReservationId();
  if (reservationId != null) {
    conf.set(MRJobConfig.RESERVATION_ID, reservationId.toString());
  }

  // Write job file to submit dir
  writeConf(conf, submitJobFile);

  //
  // Now, actually submit the job (using the submit name)
  //
  printTokens(jobId, job.getCredentials());
  status = submitClient.submitJob(
      jobId, submitJobDir.toString(), job.getCredentials());
  if (status != null) {
    return status;
  } else {
    throw new IOException("Could not launch job");
  }
} finally {
  if (status == null) {
    LOG.info("Cleaning up the staging area " + submitJobDir);
    if (jtFs != null && submitJobDir != null)
      jtFs.delete(submitJobDir, true);

  }
}

獲取JobID的程式碼為 JobID jobId = submitClient.getNewJobID(); 這裡的submitClientYarnRunner,呼叫它的getNewJobID()會呼叫它內部的resMgrDelegate.getNewJobID(); 在ResourceMgrDelegate類中

public JobID getNewJobID() throws IOException, InterruptedException

try {
  this.application = client.createApplication().getApplicationSubmissionContext();
  this.applicationId = this.application.getApplicationId();
  return TypeConverter.fromYarn(applicationId);
} catch (YarnException e) {
  throw new IOException(e);
}

ResourceMgrDelegate中的client為YarnClientImplYarnClientImpl類中createApplication()的原始碼如下

@Override public YarnClientApplication createApplication() throws YarnException, IOException {

ApplicationSubmissionContext context = Records.newRecord
    (ApplicationSubmissionContext.class);
GetNewApplicationResponse newApp = getNewApplication();
ApplicationId appId = newApp.getApplicationId();
context.setApplicationId(appId);
return new YarnClientApplication(newApp, context);

}

getNewApplication()原始碼如下

private GetNewApplicationResponse getNewApplication() throws YarnException, IOException {

GetNewApplicationRequest request =
    Records.newRecord(GetNewApplicationRequest.class);
//The interface used by clients to obtain a new ApplicationId for submitting new applications.
//The ResourceManager responds with a new, monotonically increasing, ApplicationId which is used by the client to submit a new application.
return rmClient.getNewApplication(request);

}

總結:

獲取JobID的大致流程如下

1、提交作業的客戶端YarnRunner呼叫getNewJobID方法,內部呼叫ResourceMgrDelegategetNewJobID

2、ResourceMgrDelegate呼叫內部成員client(實際上是YarnClientImpl)的CreateApplication方法建立一個YarnApplication

3、YarnApplication建立流程為:①構造一個 ApplicationSubmissionContext context物件②構造一個GetNewApplicationRequest request ,③呼叫 rmClient.getNewApplication(request)獲得一個GetNewApplicationResponse newApp物件 newApp中則包含了ResourceManager分配的ApplicationId.

4.呼叫contextsetApplicationId設定ApplicationId,將ResourceMgrDelegate的內部成員Application設定為context

5.將ResourceMgrDelegate的內部成員ApplicationId設定為context的ApplicationId

相關文章