Hadoop學習第四天--MapReduce提交過程

shmil發表於2024-08-10

原文網址 : https://www.cnblogs.com/shmil/p/18352316

MapReduce提交過程

在Xshell中輸入bash -X命令可以在Bash shell中啟用debug模式，顯示執行過程中的詳細資訊，例如每條命令的執行結果以及執行的步驟。

Hadoop提交執行
開始使用Java命令執行 java org.apache.hadoop.util.RunJar hadoop-1.0-SNAPSHOT.jar com.shujia.mr.worcount.WordCount

開始執行RunJar類中的main方法

public static void main(String[] args) throws Throwable {
    new RunJar().run(args); => args表示Java命令執行時對應的傳參 
       // 引數：  hadoop-1.0-SNAPSHOT.jar com.shujia.mr.worcount.WordCount
}

開始呼叫run方法

public void run(String[] args) throws Throwable {
    String usage = "RunJar jarFile [mainClass] args...";
	
    if (args.length < 1) {
      System.err.println(usage);
      System.exit(-1);
    }

    int firstArg = 0;
    // fileName = hadoop-1.0-SNAPSHOT.jar
    String fileName = args[firstArg++];
    File file = new File(fileName);
    if (!file.exists() || !file.isFile()) {
      System.err.println("JAR does not exist or is not a normal file: " +
          file.getCanonicalPath());
      System.exit(-1);
    }
    //  mainClassName 主類名稱 => Hadoop jar包中要執行的具體類 
    String mainClassName = null;

    JarFile jarFile;
    try {
      jarFile = new JarFile(fileName);
    } catch (IOException io) {
      throw new IOException("Error opening job jar: " + fileName)
        .initCause(io);
    }
	// 獲取jar包中定義的主類   不用
    Manifest manifest = jarFile.getManifest();
    if (manifest != null) {
      mainClassName = manifest.getMainAttributes().getValue("Main-Class");
    }
    jarFile.close();
	
    //  mainClassName在jar包中沒有定義 => maven打包 
    if (mainClassName == null) {
      if (args.length < 2) {
        System.err.println(usage);
        System.exit(-1);
      }
        // firstArg =1 => 對應 com.shujia.mr.worcount.WordCount
      mainClassName = args[firstArg++];
    }
    // 類路徑的名稱 
    mainClassName = mainClassName.replaceAll("/", ".");
	// java.io.tmpdir 臨時的目錄 
    File tmpDir = new File(System.getProperty("java.io.tmpdir 臨時的目錄 "));
    ensureDirectory(tmpDir);

   ...
	
    //   createClassLoader 類載入器方法 
    ClassLoader loader = createClassLoader(file, workDir);
	
    // 透過建立的類載入器loader 可以載入給定jar包中的類 
    Thread.currentThread().setContextClassLoader(loader);
    // Class.forName 可以構建 WordCount.class的類物件 
    Class<?> mainClass = Class.forName(mainClassName, true, loader);
    // WordCount.class的類物件 getMethod透過反射的方式獲取類中的main方法 
    Method main = mainClass.getMethod("main", String[].class);
    // 將剩餘的引數包裝 再傳入 main方法中 
    List<String> newArgsSubList = Arrays.asList(args)
        .subList(firstArg, args.length);
    String[] newArgs = newArgsSubList
        .toArray(new String[newArgsSubList.size()]);
    // invoke可以對當前Method物件中的方法進行執行 
    //  new Object[] {newArgs} 表示main方法中的傳參 => 
    try {
      main.invoke(null, new Object[] {newArgs});
    } catch (InvocationTargetException e) {
      throw e.getTargetException();
    }
  }

開始執行自定類中的main方法 => 載入配置資訊
```
job.waitForCompletion(true); // 開始提交執行
```

進入waitForCompletion方法內

public boolean waitForCompletion(boolean verbose
                                   ) throws IOException, InterruptedException,
                                            ClassNotFoundException {
    if (state == JobState.DEFINE) {
      submit(); // 當程式提交了 如果是再Hadoop叢集中，是需要提交給Yarn執行 
      //  mapreduce.JobSubmitter: Submitting tokens for job: job_1716520379305_0009
    }
    return isSuccessful();
  }

submit方法

public void submit() 
         throws IOException, InterruptedException, ClassNotFoundException {
    ensureState(JobState.DEFINE);
    setUseNewAPI(); // 設定新的API => Hadoop中有一些老API存在所以需要進行設定 
    connect();  // 非同步建立cluster物件 在該物件中包含了有多個叢集連線資訊  => cluster 表示Yarn叢集客戶端 
    final JobSubmitter submitter = 
        getJobSubmitter(cluster.getFileSystem(), cluster.getClient());
    status = ugi.doAs(new PrivilegedExceptionAction<JobStatus>() {
      public JobStatus run() throws IOException, InterruptedException, 
      ClassNotFoundException {
          // 開始正式提交任務
        return submitter.submitJobInternal(Job.this, cluster);
      }
    });
    state = JobState.RUNNING;
    LOG.info("The url to track the job: " + getTrackingURL());
   }

注意：
- LocalJobRunner對應的是本地的資料執行
- YARNRunner是對應將資料提交到YARN上執行

Hadoop學習——MapReduce
2019-04-06
Hadoop
Hadoop學習（二）——MapReduce\Yarn架構
2019-02-20
HadoopYarn架構
Hadoop 學習系列（四）之 MapReduce 原理講解
2019-03-04
Hadoop
MapReduce過程詳解
2019-04-29
Hadoop（三）通過C#/python實現Hadoop MapReduce
2022-05-01
HadoopC#Python
大資料框架之一——Hadoop學習第四天
2024-08-09
大資料框架Hadoop
MapReduce 執行全過程解析
2019-08-05
hadoop_MapReduce yarn
2020-11-11
HadoopYarn
Hadoop（十四）MapReduce概述
2024-09-18
Hadoop
程式碼提交過程
2020-09-30
好程式設計師大資料學習路線分享MapReduce全過程解析
2019-08-28
程式設計師大資料
Java學習過程
2019-05-09
Java
小白學習大資料測試之hadoop hdfs和MapReduce小實戰
2018-09-03
大資料Hadoop
hadoop支援lzo完整過程
2018-08-12
Hadoop
Python第四天學習
2020-10-30
Python
Hadoop面試題之MapReduce
2021-12-23
Hadoop面試題
Hadoop 專欄 - MapReduce 入門
2021-01-21
Hadoop
memcached的學習過程
2021-09-09
MapReduce 示例：減少 Hadoop MapReduce 中的側連線
2021-09-17
Hadoop
Hadoop學習
2024-07-30
Hadoop
hadoop的mapreduce串聯執行
2018-09-01
Hadoop
MSP430學習過程
2024-05-10
SQL SERVER 學習過程（一）
2022-03-03
SQLServer
java大資料最全課程學習筆記(5)--MapReduce精通(一)
2020-07-19
Java大資料筆記
IT學習過程中看懂=學會嗎？
2020-12-14
java學習第四天7/9
2020-07-09
Java
Java學習筆記第四天
2020-11-12
Java筆記
如何學習Hadoop
2020-11-05
Hadoop
spark原始碼之任務提交過程
2018-10-15
Spark原始碼
從分治演算法到 Hadoop MapReduce
2018-11-23
演算法Hadoop
Hadoop（十九）MapReduce OutputFormat 資料壓縮
2024-09-19
HadoopORM
Hadoop面試題總結（三）——MapReduce
2021-10-16
Hadoop面試題
Javascript Promise學習過程總結
2018-03-16
JavaScriptPromise
效能優化的過程學習
2020-11-07
優化
SpringIOC初始化過程學習
2020-10-10
Spring
有監督學習——高斯過程
2023-03-18
Hadoop學習筆記——————1、Hadoop概述
2018-07-16
Hadoop筆記
Hadoop的mapreduce出現問題，報錯The auxService:mapreduce_shuffle does not exist
2020-12-24
HadoopUX

Hadoop學習第四天--MapReduce提交過程

MapReduce提交過程

相關文章