[原始碼解析]Oozie來龍去脈之內部執行

羅西的思考發表於2020-07-08

原文網址 : https://www.cnblogs.com/rossiXYZ/p/13269263.html

原始碼

[原始碼解析]Oozie來龍去脈之內部執行

0x00 摘要

Oozie由Cloudera公司貢獻給Apache的基於工作流引擎的開源框架，是用於Hadoop平臺的開源的工作流排程引擎，用來管理Hadoop作業，進行。本文是系列的第二篇，介紹Oozie的內部執行階段。

前文[原始碼解析]Oozie的來龍去脈 --- (1)提交任務階段已經為大家展示了使用者提交一個Oozie Job之後做了什麼，本文將沿著一個Workflow的執行流程為大家繼續剖析Oozie接下來做什麼。

大致如下：

在Oozie中準備Yarn Application Master
介紹新舊兩版本的Yarn Application Master區別
介紹Hive on Yarn
Tez是如何亂入到這個流程中的
Java on Yarn會是如何執行
Yarn Job結束之後如何返回Oozie

0x01 Oozie階段

1.1 ActionStartXCommand

我們假設Workflow在start之後，就進入到了一個Hive命令。

ActionStartXCommand的主要作用就是和Yarn互動，最後提交一個Yarn Application Master。

ActionStartXCommand是 WorkflowXCommand的子類。重點函式還是loadState和execute。

public class ActionStartXCommand extends ActionXCommand<org.apache.oozie.command.wf.ActionXCommand.ActionExecutorContext> {
    private String jobId = null;
    protected String actionId = null;
    protected WorkflowJobBean wfJob = null;
    protected WorkflowActionBean wfAction = null;
    private JPAService jpaService = null;
    private ActionExecutor executor = null;
    private List<UpdateEntry> updateList = new ArrayList<UpdateEntry>();
    private List<JsonBean> insertList = new ArrayList<JsonBean>();
    protected ActionExecutorContext context = null;  
}

loadState 的作用就是從資料庫中獲取 WorkflowJobBean 和 WorkflowActionBean 資訊

protected void loadState() throws CommandException {
    try {
        jpaService = Services.get().get(JPAService.class);
        if (jpaService != null) {
            if (wfJob == null) {
                this.wfJob = WorkflowJobQueryExecutor.getInstance().get(WorkflowJobQuery.GET_WORKFLOW, jobId);
            }
            this.wfAction = WorkflowActionQueryExecutor.getInstance().get(WorkflowActionQuery.GET_ACTION, actionId);
        }
    }
}

execute函式如下。其主要業務就是executor.start(context, wfAction); 這裡的executor是HiveActionExecutor。

@Override
protected ActionExecutorContext execute() throws CommandException {
    Configuration conf = wfJob.getWorkflowInstance().getConf();
    try {
        if(!caught) {
            // 這裡是業務重點，就是啟動任務
            executor.start(context, wfAction);
          
            if (wfAction.isExecutionComplete()) {
                if (!context.isExecuted()) {
                    failJob(context);
                } else {
                    wfAction.setPending();
                    if (!(executor instanceof ControlNodeActionExecutor)) {
                        queue(new ActionEndXCommand(wfAction.getId(), wfAction.getType()));
                    }
                    else {
                        execSynchronous = true;
                    }
                }
            }
            updateList.add(new UpdateEntry<WorkflowActionQuery>(WorkflowActionQuery.UPDATE_ACTION_START, wfAction));
        }
    }
    finally {
            BatchQueryExecutor.getInstance().executeBatchInsertUpdateDelete(insertList, updateList, null);
            ......
            if (execSynchronous) {
                // Changing to synchronous call from asynchronous queuing to prevent
                // undue delay from ::start:: to action due to queuing
                callActionEnd();
            }
        }
    }
    return null;
}

ActionExecutor.start是非同步的，還需要檢查Action執行狀態來推進流程，oozie通過兩種方式來檢查任務是否完成。

回撥：當一個任務和一個計算被啟動後，會為任務提供一個回撥url，該任務執行完成後，會執行回撥來通知oozie
輪詢：在任務執行回撥失敗的情況下，無論任何原因，都支援以輪詢的方式進行查詢。

oozie提供這兩種方式來控制任務。後續我們會再提到。

1.2 HiveActionExecutor

上面程式碼中 executor.start(context, wfAction); 就是啟動任務。

HiveActionExecutor繼承 ScriptLanguageActionExecutor，ScriptLanguageActionExecutor繼承 JavaActionExecutor，所以後續很多函式執行的是JavaActionExecutor中的函式。

public class HiveActionExecutor extends ScriptLanguageActionExecutor {}

ActionExecutor.start就是執行的JavaActionExecutor.start()。

其會檢查檔案系統，比如hdfs是不是支援，Action Dir是否ready，然後會submitLauncher。

public void start(Context context, WorkflowAction action) throws ActionExecutorException {
        FileSystem actionFs = context.getAppFileSystem();
        prepareActionDir(actionFs, context);
        submitLauncher(actionFs, context, action); // 這裡是業務
        check(context, action);
}

submitLauncher主要功能是：

1)對於某些型別job，呼叫injectActionCallback配置回撥Action
2)配置 action job
3)呼叫createLauncherConf配置LauncherAM, 即Application Master
- 3.1)配置回撥conf.set(LauncherAMCallbackNotifier.OOZIE_LAUNCHER_CALLBACK_URL, callback);
- 3.2)設定"launcher Main Class"。LauncherHelper.setupMainClass(launcherJobConf, getLauncherMain(launcherJobConf, actionXml));
4)呼叫HadoopAccessorService.createYarnClient來建立一個YarnClient
5)呼叫UserGroupInformation繼續配置
6)呼叫yarnClient.createApplication建立一個YarnClientApplication
7)記錄ApplicationId
8)呼叫createAppSubmissionContext建立Yarn App的執行環境
- 8.1)appContext.setApplicationType("Oozie Launcher");
- 8.2)設定容器資訊 ContainerLaunchContext
- 8.3)vargs.add(LauncherAM.class.getCanonicalName()); 比如設定AM啟動類
- 8.4)return appContext;
9)提交App，yarnClient.submitApplication(appContext); appContext就是前面return的。

具體程式碼如下：

public void submitLauncher(final FileSystem actionFs, final Context context, final WorkflowAction action)throws ActionExecutorException {
    YarnClient yarnClient = null;
    try {
        // action job configuration
        Configuration actionConf = loadHadoopDefaultResources(context, actionXml);
        setupActionConf(actionConf, context, actionXml, appPathRoot);
        addAppNameContext(context, action);
        setLibFilesArchives(context, actionXml, appPathRoot, actionConf);
				// 配置回撥Action
        injectActionCallback(context, actionConf);

        Configuration launcherConf = createLauncherConf(actionFs, context, action, actionXml, actionConf);
        yarnClient = createYarnClient(context, launcherConf);
      
        //繼續配置各種Credentials
        if (UserGroupInformation.isSecurityEnabled()) {
           ......
        }

        if (alreadyRunning && !isUserRetry) {
          ......
        }
        else {
            YarnClientApplication newApp = yarnClient.createApplication();
            ApplicationId appId = newApp.getNewApplicationResponse().getApplicationId();
            ApplicationSubmissionContext appContext =
                    createAppSubmissionContext(appId, launcherConf, context, actionConf, action, credentials, actionXml);
            // 這裡正式與 Yarn 互動。
            yarnClient.submitApplication(appContext);

            launcherId = appId.toString();
            ApplicationReport appReport = yarnClient.getApplicationReport(appId);
            consoleUrl = appReport.getTrackingUrl();
        }

        String jobTracker = launcherConf.get(HADOOP_YARN_RM);
        context.setStartData(launcherId, jobTracker, consoleUrl);
    }
}

protected YarnClient createYarnClient(Context context, Configuration jobConf) throws HadoopAccessorException {
        String user = context.getWorkflow().getUser();
        return Services.get().get(HadoopAccessorService.class).createYarnClient(user, jobConf);
}

0x2 舊版本LauncherMapper

這裡我們有必要提一下舊版本的實現：LauncherMapper。

網上關於Oozie的文章很多都是基於舊版本，所以基本都提到了 LauncherMapper，比如：

Oozie本質就是一個作業協調工具（底層原理是通過將xml語言轉換成mapreduce程式來做，但只是在集中map端做處理，避免shuffle的過程）。

Oozie執行Action時，即ActionExecutor（最主要的子類是JavaActionExecutor，hive、spark等action都是這個類的子類），JavaActionExecutor首先會提交一個LauncherMapper（map任務）到yarn，其中會執行LauncherMain（具體的action是其子類，比如JavaMain、SparkMain等），spark任務會執行SparkMain，在SparkMain中會呼叫org.apache.spark.deploy.SparkSubmit來提交任務。其實訴我的map任務就是識別你是什麼樣的任務（hive,shell,spark等），並通過該任務來啟動任務所需要的環境來提交任務。提供了提交任務的介面（如hive任務，啟動hive客戶端或beeline等）

從文件看，OOZIE-2918 Delete LauncherMapper and its test (asasvari via pbacsko) 這時候被移除了。

我們從舊版本程式碼中大致看看LauncherMapper的實現。

LauncherMapper繼承了 import org.apache.hadoop.mapred.Mapper;，實現了 map 函式。其內部就是呼叫使用者程式碼的主函式。

import org.apache.hadoop.mapred.Mapper;

public class LauncherMapper<K1, V1, K2, V2> implements Mapper<K1, V1, K2, V2>, Runnable {
   @Override
    public void map(K1 key, V1 value, OutputCollector<K2, V2> collector, Reporter reporter) throws IOException {
        SecurityManager initialSecurityManager = System.getSecurityManager();
        try {
            else {
                String mainClass = getJobConf().get(CONF_OOZIE_ACTION_MAIN_CLASS);

                    new LauncherSecurityManager();
                    setupHeartBeater(reporter);
                    setupMainConfiguration();
                    // Propagating the conf to use by child job.
                    propagateToHadoopConf();

                    executePrepare();
                    Class klass = getJobConf().getClass(CONF_OOZIE_ACTION_MAIN_CLASS, Object.class);
                    Method mainMethod = klass.getMethod("main", String[].class);
                    mainMethod.invoke(null, (Object) args);
             }
        }
    }
}

在LauncherMapperHelper中，會設定LauncherMapper為啟動函式。

public static void setupLauncherInfo(JobConf launcherConf, String jobId, String actionId, Path actionDir, String recoveryId, Configuration actionConf, String prepareXML) throws IOException, HadoopAccessorException {
        launcherConf.setMapperClass(LauncherMapper.class);
}

在 JavaActionExecutor 中有 org.apache.hadoop.mapred.JobClient

import org.apache.hadoop.mapred.JobClient;

public void submitLauncher(FileSystem actionFs, Context context, WorkflowAction action) throws ActionExecutorException {
            jobClient = createJobClient(context, launcherJobConf);
            LauncherMapperHelper.setupLauncherInfo(launcherJobConf, jobId, actionId, actionDir, recoveryId, actionConf, prepareXML);

            // Set the launcher Main Class
            LauncherMapperHelper.setupMainClass(launcherJobConf, getLauncherMain(launcherJobConf, actionXml)); 
            LauncherMapperHelper.setupMainArguments(launcherJobConf, args);
            ......
  
            runningJob = jobClient.submitJob(launcherJobConf);  // 這裡進行了提交
}

綜上所述，舊版本 LauncherMapper 實現了一個 import org.apache.hadoop.mapred.Mapper;，具體是org.apache.hadoop.mapred.JobClient 負責與hadoop互動。

0x3 新版本Yarn Application Master

新版本的Oozie是和Yarn深度繫結的，所以我們需要先介紹Yarn。

3. 1 YARN簡介

YARN 是 Hadoop 2.0 中的資源管理系統，它的基本設計思想是將 MRv1 中的 JobTracker拆分成了兩個獨立的服務：一個全域性的資源管理器 ResourceManager 和每個應用程式特有的ApplicationMaster。其中 ResourceManager 負責整個系統的資源管理和分配，而 ApplicationMaster負責單個應用程式的管理。

YARN 總體上仍然是 Master/Slave 結構，在整個資源管理框架中，ResourceManager 為Master，NodeManager 為 Slave，ResourceManager 負責對各個 NodeManager 上的資源進行統一管理和排程。

當使用者提交一個應用程式時，需要提供一個用以跟蹤和管理這個程式的ApplicationMaster，它負責向 ResourceManager 申請資源，並要求 NodeManager 啟動可以佔用一定資源的任務。由於不同的ApplicationMaster 被分佈到不同的節點上，因此它們之間不會相互影響。

3.2 ApplicationMaster

使用者提交的每個應用程式均包含一個 AM，主要功能包括：

與 RM 排程器協商以獲取資源（用 Container 表示）；
將得到的任務進一步分配給內部的任務；
與 NM 通訊以啟動 / 停止任務；
監控所有任務執行狀態，並在任務執行失敗時重新為任務申請資源以重啟任務。

當使用者向 YARN 中提交一個應用程式後， YARN 將分兩個階段執行該應用程式：

第一個階段是啟動 ApplicationMaster ；
第二個階段是由 ApplicationMaster 建立應用程式，為它申請資源，並監控它的整個執行過程，直到執行完成。

工作流程分為以下幾個步驟：

用戶向 YARN 中提交應用程序，其中包括 ApplicationMaster 程序、啟動ApplicationMaster 的命令、使用者程式等。
ResourceManager 為該應用程序分配第一個 Container，並與對應的 NodeManager 通訊，要求它在這個 Container 中啟動應用程式的 ApplicationMaster。
ApplicationMaster 首先向 ResourceManager 注冊，這樣用戶可以直接通過ResourceManage 檢視應用程式的執行狀態，然後它將為各個任務申請資源，並監控它的執行狀態，直到執行結束，即重複步驟 4~7。
ApplicationMaster 採用輪詢的方式通過 RPC 協議向 ResourceManager 申請和領取資源。
一旦 ApplicationMaster 申請到資源後，便與對應的 NodeManager 通訊，要求它啟動任務。
NodeManager 為任務設定好執行環境（包括環境變數、 JAR 包、二進位制程式等）後，將任務啟動命令寫到一個指令碼中，並通過執行該指令碼啟動任務。
各個任務通過某個 RPC 協議向 ApplicationMaster 彙報自己的狀態和進度，以讓 ApplicationMaster 隨時掌握各個任務的執行狀態，從而可以在任務失敗時重新啟動任務。在應用程式執行過程中，使用者可隨時通過RPC向ApplicationMaster查詢應用程式的當前執行狀態。
應用程式執行完成後，ApplicationMaster 向 ResourceManager 登出並關閉自己。

3.3 LauncherAM

LauncherAM就是Oozie的ApplicationMaster實現。LauncherAM.main就是Yarn呼叫之處。

public class LauncherAM {
  
    public static void main(String[] args) throws Exception {
        final LocalFsOperations localFsOperations = new LocalFsOperations();
        final Configuration launcherConf = readLauncherConfiguration(localFsOperations);
        UserGroupInformation.setConfiguration(launcherConf);
        // MRAppMaster adds this call as well, but it's included only in Hadoop 2.9+
        // SecurityUtil.setConfiguration(launcherConf);
        UserGroupInformation ugi = getUserGroupInformation(launcherConf);
        // Executing code inside a doAs with an ugi equipped with correct tokens.
        ugi.doAs(new PrivilegedExceptionAction<Object>() {
            @Override
            public Object run() throws Exception {
                  LauncherAM launcher = new LauncherAM(new AMRMClientAsyncFactory(),
                        new AMRMCallBackHandler(),
                        new HdfsOperations(new SequenceFileWriterFactory()),
                        new LocalFsOperations(),
                        new PrepareActionsHandler(new LauncherURIHandlerFactory(null)),
                        new LauncherAMCallbackNotifierFactory(),
                        new LauncherSecurityManager(),
                        sysenv.getenv(ApplicationConstants.Environment.CONTAINER_ID.name()),
                        launcherConf);
                    launcher.run();
                    return null;
            }
        });
    }  
}

launcher.run主要完成

通過registerWithRM呼叫AMRMClientAsync來註冊到Resource Manager

executePrepare / setupMainConfiguration 完成初始化，準備和配置
runActionMain會根據配置呼叫具體的main函式，比如HiveMain
- Class<?> klass = launcherConf.getClass(CONF_OOZIE_ACTION_MAIN_CLASS, null);
- Method mainMethod = klass.getMethod("main", String[].class);
- mainMethod.invoke(null, (Object) mainArgs);
呼叫uploadActionDataToHDFS同步HDFS
呼叫unregisterWithRM從RM解綁
呼叫LauncherAMCallbackNotifier.notifyURL通知Oozie

具體程式碼如下：

public void run() throws Exception {
    try {
        actionDir = new Path(launcherConf.get(OOZIE_ACTION_DIR_PATH));
        registerWithRM(amrmCallBackHandler);
        // Run user code without the AM_RM_TOKEN so users can't request containers
        UserGroupInformation ugi = getUserGroupInformation(launcherConf, AMRMTokenIdentifier.KIND_NAME);

        ugi.doAs(new PrivilegedExceptionAction<Object>() {
            @Override
            public Object run() throws Exception {
                executePrepare(errorHolder);
                setupMainConfiguration();
                runActionMain(errorHolder); // 會根據配置呼叫具體的main函式，比如HiveMain
                return null;
            }
        });
    } 
    finally {
        try {
            actionData.put(ACTION_DATA_FINAL_STATUS, actionResult.toString());
            hdfsOperations.uploadActionDataToHDFS(launcherConf, actionDir, actionData);
        } finally {
            try {
                unregisterWithRM(actionResult, errorHolder.getErrorMessage());
            } finally {
                LauncherAMCallbackNotifier cn = callbackNotifierFactory.createCallbackNotifier(launcherConf);
                cn.notifyURL(actionResult);
            }
        }
    }
}

但是你會發現，對比之前所說的ApplicationMaster應該實現的功能，LauncherAM 做得恁少了點，這是個疑問！ 我們在後續研究中會為大家揭開這個祕密。

0x4 Hive on Yarn

上文提到，runActionMain會根據配置呼叫具體的main函式。我們假設是hive action，則對應的是HiveMain。

Hive job的入口函式是在HIVE_MAIN_CLASS_NAME配置的。

public class HiveActionExecutor extends ScriptLanguageActionExecutor {
    private static final String HIVE_MAIN_CLASS_NAME = "org.apache.oozie.action.hadoop.HiveMain";

	  @Override
    public List<Class<?>> getLauncherClasses() {
        List<Class<?>> classes = new ArrayList<Class<?>>();
        classes.add(Class.forName(HIVE_MAIN_CLASS_NAME)); // 這裡配置了 HiveMain
        return classes;
    }  
}

HiveMain後續呼叫如下

HiveMain.main ----> run ----> runHive ----> CliDriver.main(args);

最後呼叫 org.apache.hadoop.hive.cli.CliDriver 完成了hive操作，大致有：

設定引數；
如果有指令碼，則設定指令碼路徑；
如果有之前的yarn child jobs，殺掉；
執行hive；
寫log；

具體如下：

public class HiveMain extends LauncherMain {
    public static void main(String[] args) throws Exception {
        run(HiveMain.class, args);
    }
  
   @Override
    protected void run(String[] args) throws Exception {
        Configuration hiveConf = setUpHiveSite();
        List<String> arguments = new ArrayList<String>();

        String logFile = setUpHiveLog4J(hiveConf);
        arguments.add("--hiveconf");
        arguments.add("hive.log4j.file=" + new File(HIVE_L4J_PROPS).getAbsolutePath());
        arguments.add("--hiveconf");
        arguments.add("hive.exec.log4j.file=" + new File(HIVE_EXEC_L4J_PROPS).getAbsolutePath());

        //setting oozie workflow id as caller context id for hive
        String callerId = "oozie:" + System.getProperty(LauncherAM.OOZIE_JOB_ID);
        arguments.add("--hiveconf");
        arguments.add("hive.log.trace.id=" + callerId);

        String scriptPath = hiveConf.get(HiveActionExecutor.HIVE_SCRIPT);
        String query = hiveConf.get(HiveActionExecutor.HIVE_QUERY);
        if (scriptPath != null) {
            ......
            // print out current directory & its contents
            File localDir = new File("dummy").getAbsoluteFile().getParentFile();
            String[] files = localDir.list();

            // Prepare the Hive Script
            String script = readStringFromFile(scriptPath);
            arguments.add("-f");
            arguments.add(scriptPath);
        } else if (query != null) {
            String filename = createScriptFile(query);
            arguments.add("-f");
            arguments.add(filename);
        } 

        // Pass any parameters to Hive via arguments
        ......
        String[] hiveArgs = ActionUtils.getStrings(hiveConf, HiveActionExecutor.HIVE_ARGS);
        for (String hiveArg : hiveArgs) {
            arguments.add(hiveArg);
        }
        LauncherMain.killChildYarnJobs(hiveConf);

        try {
            runHive(arguments.toArray(new String[arguments.size()]));
        }
        finally {
            writeExternalChildIDs(logFile, HIVE_JOB_IDS_PATTERNS, "Hive");
        }
    }  
}

因此我們能看到，Oozie ApplicationMaster 在被Yarn呼叫之後，就是通過org.apache.hadoop.hive.cli.CliDriver 給Hive傳送命令讓其執行，沒有什麼再和ResourceManager / NodeManager 互動的過程，這真的很奇怪。這個祕密要由下面的Tez來解答。

0x5 Tez計算框架

Tez是Apache開源的支援DAG作業的計算框架，它直接源於MapReduce框架，核心思想是將Map和Reduce兩個操作進一步拆分，即Map被拆分成Input、Processor、Sort、Merge和Output， Reduce被拆分成Input、Shuffle、Sort、Merge、Processor和Output等，這樣，這些分解後的元操作可以任意靈活組合，產生新的操作，這些操作經過一些控制程式組裝後，可形成一個大的DAG作業。

Tez有以下特點：

Apache二級開源專案
執行在YARN之上
適用於DAG（有向圖）應用（同Impala、Dremel和Drill一樣，可用於替換Hive/Pig等）

可以看到，Tez也是和Yarn深度繫結的。

5.1 DAGAppMaster

首先我們就找到了Tez對應的Application Master，即Tez DAG Application Master。

public class DAGAppMaster extends AbstractService {
  public String submitDAGToAppMaster(DAGPlan dagPlan,
      Map<String, LocalResource> additionalResources) throws TezException {
      startDAG(dagPlan, additionalResources);
    }
  }  
}

我們能看到提交Application Master程式碼。

public class TezYarnClient extends FrameworkClient {
  @Override
  public ApplicationId submitApplication(ApplicationSubmissionContext appSubmissionContext)
      throws YarnException, IOException, TezException {
   	ApplicationId appId= yarnClient.submitApplication(appSubmissionContext);
    ApplicationReport appReport = getApplicationReport(appId);
    return appId;
  }
}

這裡是建立Application Master context 程式碼，設定了Application Maste類和Container。

  public static ApplicationSubmissionContext createApplicationSubmissionContext(
      ApplicationId appId, DAG dag, String amName,
      AMConfiguration amConfig, Map<String, LocalResource> tezJarResources,
      Credentials sessionCreds, boolean tezLrsAsArchive,
      TezApiVersionInfo apiVersionInfo,
      ServicePluginsDescriptor servicePluginsDescriptor, JavaOptsChecker javaOptsChecker)
      throws IOException, YarnException {

    // Setup the command to run the AM
    List<String> vargs = new ArrayList<String>(8);
    vargs.add(Environment.JAVA_HOME.$() + "/bin/java");

    String amOpts = constructAMLaunchOpts(amConfig.getTezConfiguration(), capability);
    vargs.add(amOpts);

    // 這裡設定了 Application Master
    vargs.add(TezConstants.TEZ_APPLICATION_MASTER_CLASS);

    // 這裡設定了命令列引數 
    Vector<String> vargsFinal = new Vector<String>(8);
    // Final command
    StringBuilder mergedCommand = new StringBuilder();
    for (CharSequence str : vargs) {
      mergedCommand.append(str).append(" ");
    }
    vargsFinal.add(mergedCommand.toString());

    // 設定了container
    // Setup ContainerLaunchContext for AM container
    ContainerLaunchContext amContainer =
        ContainerLaunchContext.newInstance(amLocalResources, environment,
            vargsFinal, serviceData, securityTokens, acls);

    // Set up the ApplicationSubmissionContext
    ApplicationSubmissionContext appContext = Records
        .newRecord(ApplicationSubmissionContext.class);

    appContext.setAMContainerSpec(amContainer);

    return appContext;
}

5.2 與Resource Manager互動

這裡只摘要部分程式碼，能看到Tez實現了與Yarn Resource Manager互動。

YarnTaskSchedulerService實現了AMRMClientAsync.CallbackHandler，其功能是處理由Resource Manager收到的訊息，其實現了方法

import org.apache.hadoop.yarn.client.api.AMRMClient.ContainerRequest;
import org.apache.hadoop.yarn.client.api.async.AMRMClientAsync;

public class YarnTaskSchedulerService extends TaskScheduler
                             implements AMRMClientAsync.CallbackHandler {
  @Override
  public void onContainersAllocated(List<Container> containers) {
      if (!shouldReuseContainers) {
        List<Container> modifiableContainerList = Lists.newLinkedList(containers);
        assignedContainers = assignNewlyAllocatedContainers(
            modifiableContainerList);
      } 
    }
    // upcall to app must be outside locks
    informAppAboutAssignments(assignedContainers);
  }

  @Override
  public void onContainersCompleted(List<ContainerStatus> statuses) {
    synchronized (this) {
      for(ContainerStatus containerStatus : statuses) {
        ContainerId completedId = containerStatus.getContainerId();
        HeldContainer delayedContainer = heldContainers.get(completedId);

        Object task = releasedContainers.remove(completedId);
        appContainerStatus.put(task, containerStatus);
        continue;
       }

        // not found in released containers. check currently allocated containers
        // no need to release this container as the RM has already completed it
        task = unAssignContainer(completedId, false);
        if (delayedContainer != null) {
          heldContainers.remove(completedId);
          Resources.subtract(allocatedResources, delayedContainer.getContainer().getResource());
        } 
        if(task != null) {
          // completion of a container we have allocated currently
          // an allocated container completed. notify app. This will cause attempt to get killed
          appContainerStatus.put(task, containerStatus);
          continue;
        }
      }
    }

    // upcall to app must be outside locks
    for (Entry<Object, ContainerStatus> entry : appContainerStatus.entrySet()) {
      getContext().containerCompleted(entry.getKey(), entry.getValue());
    }
  }
}

onContainersAllocated ：當有新的Container 可以使用。這裡時啟動container 的程式碼。
onContainersCompleted 是Container 執行結束。在onContainersCompleted 中，如果是失敗的Container，我們需要重新申請並啟動Container，成功的將做記錄既可以。

由此我們可以看到，Oozie是一個甩手掌櫃，他只管啟動Hive，具體後續如何與RM互動，則完全由Tez搞定。這就解答了之前我們所有疑惑。

最後總結下新流程：

Oozie提交LauncherAM到Yarn；
LauncherAM執行HiveMain，其呼叫CliDriver.main給Hive提交任務；
Hive on Tez，所以Tez準備DAGAppMaster；
Yarn與Tez互動：Tez提交DAGAppMaster到Yarn，Tez解析執行Hive命令；
Hive執行結束後，呼叫回撥 url 通知Oozie；

原諒我用這種辦法畫圖，因為我最討厭看到一篇好文，結果發現圖沒了......

+---------+                       +----------+                       +-----------+
|         | 1-submit LauncherAM   |          | 2.CliDriver.main      |           |  
|         |---------------------->| HiveMain |---------------------> |           |
|         |                       |          |                       |           |--+
| [Oozie] |                       |  [Yarn]  |                       |   [Hive]  |  | 3.Run 
|         |                       |          |                       |           |  | Hive     
|         | 5-notifyURL of Oozie  |          | 4-submit DAGAppMaster |           |<-+
|         |<----------------------|          | <-------------------->|    Tez    |
|         |                       |          |                       |           |
+---------+                       +----------+                       +-----------+

0x6 Java on Yarn

下面我們看看如果Oozie執行一個Java程式，是如何進行的。

Java程式的主執行函式是 JavaMain，這個就簡單多了，就是直接呼叫使用者的Java主函式。

public class JavaMain extends LauncherMain {
    public static final String JAVA_MAIN_CLASS = "oozie.action.java.main";

   /**
    * @param args Invoked from LauncherAM:run()
    * @throws Exception in case of error when running the application
    */
    public static void main(String[] args) throws Exception {
        run(JavaMain.class, args);
    }

    @Override
    protected void run(String[] args) throws Exception {

        Configuration actionConf = loadActionConf();
        setYarnTag(actionConf);
        setApplicationTags(actionConf, TEZ_APPLICATION_TAGS);
        setApplicationTags(actionConf, SPARK_YARN_TAGS);

        LauncherMain.killChildYarnJobs(actionConf);

        Class<?> klass = actionConf.getClass(JAVA_MAIN_CLASS, Object.class);
        Method mainMethod = klass.getMethod("main", String[].class);
        mainMethod.invoke(null, (Object) args);
    }
}

0x7 Yarn job 執行結束

7.1 檢查任務機制

前面提到，ActionExecutor.start是非同步的，還需要檢查Action執行狀態來推進流程，oozie通過兩種方式來檢查任務是否完成。

回撥：當一個任務和一個計算被啟動後，會為任務提供一個回撥url，該任務執行完成後，會執行回撥來通知oozie
輪詢：在任務執行回撥失敗的情況下，無論任何原因，都支援以輪詢的方式進行查詢。

oozie提供這兩種方式來控制任務。

7.2 回撥機制

LauncherAM 在使用者程式執行完成之後，會做如下呼叫，以通知Oozie。這就用到了“回撥”機制。

LauncherAMCallbackNotifier cn = callbackNotifierFactory.createCallbackNotifier(launcherConf);
                cn.notifyURL(actionResult);

Oozie的CallbackServlet會響應這個呼叫。可以看到，DagEngine.processCallback是Oozie處理程式結束之處。

public class CallbackServlet extends JsonRestServlet {
    @Override
    protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
        String queryString = request.getQueryString();
        CallbackService callbackService = Services.get().get(CallbackService.class);

        String actionId = callbackService.getActionId(queryString);

        DagEngine dagEngine = Services.get().get(DagEngineService.class).getSystemDagEngine();

        dagEngine.processCallback(actionId, callbackService.getExternalStatus(queryString), null);
        }
    }
}

DagEngine.processCallback主要是使用CompletedActionXCommand來進行。可以看到這個命令是放到 CallableQueueService 的 queue中，所以下面我們需要介紹 CallableQueueService。

 public void processCallback(String actionId, String externalStatus, Properties actionData)
          throws DagEngineException {
      XCallable<Void> command = new CompletedActionXCommand(actionId, externalStatus,
      actionData, HIGH_PRIORITY);
      if (!Services.get().get(CallableQueueService.class).queue(command)) {
          LOG.warn(XLog.OPS, "queue is full or system is in SAFEMODE, ignoring callback");
      }
}

7.3 非同步執行

7.3.1 CallableQueueService

Oozie 使用 CallableQueueService 來非同步執行操作；

public class CallableQueueService implements Service, Instrumentable {
    private final Map<String, AtomicInteger> activeCallables = new HashMap<String, AtomicInteger>();
    private final Map<String, Date> uniqueCallables = new ConcurrentHashMap<String, Date>();
    private final ConcurrentHashMap<String, Set<XCallable<?>>> interruptCommandsMap = new ConcurrentHashMap<>();
    private Set<String> interruptTypes;
    private int interruptMapMaxSize;
    private int maxCallableConcurrency;
    private int queueAwaitTerminationTimeoutSeconds;
    private int queueSize;
    private PriorityDelayQueue<CallableWrapper<?>> queue;
    private ThreadPoolExecutor executor;
    private Instrumentation instrumentation;
    private boolean newImpl = false;
    private AsyncXCommandExecutor asyncXCommandExecutor; 
  
    public void init(Services services) {
          queue = new PollablePriorityDelayQueue<CallableWrapper<?>>(PRIORITIES,
                    MAX_CALLABLE_WAITTIME_MS,
                    TimeUnit.MILLISECONDS,
                    queueSize) {
                @Override
                protected boolean eligibleToPoll(QueueElement<?> element) {
                    if (element != null) {
                        CallableWrapper wrapper = (CallableWrapper) element;
                        if (element.getElement() != null) {
                            return callableReachMaxConcurrency(wrapper.getElement());
                        }
                    }
                    return false;
                }
            };  
    }
}

特點：

加入執行佇列的任務可能是可以立即被吊起的，也可能是未來某個時間才觸發的。
執行執行緒池根據任務的執行時間和任務的優先順序別來選取任務吊起。
執行執行緒池的任務佇列大小可配置，當到達佇列最大值，執行緒池將不再接收任務。

7.3.3 PriorityDelayQueue

執行緒池選取的佇列是oozie自定義的佇列 PriorityDelayQueue：

特點：

根據佇列中元素的延時時間以及其執行優先順序出佇列：

實現策略：

PriorityDelayQueue 中為每個優先順序別的任務設定一個 延時佇列 DelayQueue
因為使用的是jdk自帶的延時佇列 DelayQueue，可以保證的是如果任務在該佇列中的延時時間滿足條件，我們
通過poll()方法即可得到滿足延時條件的任務，如果 poll()得到的是null，說明該佇列的中任務沒有滿足時間條件的任務。

如何編排多個優先順序的佇列：
每次從PriorityDelayQueue去選取任務，都優先從最高優先順序的佇列來poll出任務，如果最高的優先順序佇列中沒有滿足條件的任務，則次優先順序佇列poll出任務，如果仍未獲取
將按照佇列優先等級以此類推。
餓死現象：假如高優先順序中的任務在每次獲取的時候都滿足條件，這樣容易將低優先順序的佇列中滿足條件的任務活活餓死，為了防止這種情況的產生，在每次選取任務之前，遍歷
低優先順序佇列任務，如果任務早已經滿足出佇列條件，如果超時時間超過了我們設定的最大值，我們會為這個任務提高優先順序，將這個任務優先順序加一，新增到上個優先順序佇列中進行
排隊。

7.3.3 PollablePriorityDelayQueue

特點：

在從佇列中選取任務的時候，先判斷滿足時間的任務是否滿足併發等限制，如果滿足再從佇列中取出，而不是像PriorityDelayQueue那樣，先取出如果不滿足併發等限制，再將該任務重新放置回去。

任務型別：

使用執行緒池非同步執行任務，任務和任務之間是無序的，針對具體的業務場景，可能執行的單元是需要串序執行的。oozie中封裝了 CompositeCallable 和 一般的 XCallable的任務型別，前者是XCallable的一個集合，它能保證的是這個集合裡面的XCallable是順序執行的。

7.4 跳轉下一個操作

CompletedActionXCommand 當Workflow command結束時候會執行，且只執行一次。對於程式結束，會在非同步佇列中加入一個 ActionCheckXCommand。

public class CompletedActionXCommand extends WorkflowXCommand<Void> {
    @Override
    protected Void execute() throws CommandException {
        if (this.wfactionBean.getStatus() == WorkflowActionBean.Status.PREP) {
           .....
        } else {    // RUNNING
            ActionExecutor executor = Services.get().get(ActionService.class).getExecutor(this.wfactionBean.getType());
            // this is done because oozie notifications (of sub-wfs) is send
            // every status change, not only on completion.
            if (executor.isCompleted(externalStatus)) {
                queue(new ActionCheckXCommand(this.wfactionBean.getId(), getPriority(), -1));
            }
        }
        return null;
    }  
}

非同步呼叫到ActionCheckXCommand，其主要作用是：

如果有重試機制，則做相應配置
呼叫 executor.check(context, wfAction); 來檢查環境資訊
更新資料庫中的任務資訊
因為已經結束了，所以用ActionEndXCommand來執行結束

public class ActionCheckXCommand extends ActionXCommand<Void> {
    @Override
    protected Void execute() throws CommandException {

        ActionExecutorContext context = null;
        boolean execSynchronous = false;
        try {
            boolean isRetry = false; // 如果有重試機制，則做相應配置
            if (wfAction.getRetries() > 0) {
                isRetry = true;
            }
            boolean isUserRetry = false;
            context = new ActionXCommand.ActionExecutorContext(wfJob, wfAction, isRetry, isUserRetry);
          
            executor.check(context, wfAction); // 檢查環境資訊

            if (wfAction.isExecutionComplete()) {
                if (!context.isExecuted()) {
                    failJob(context);
                    generateEvent = true;
                } else {
                    wfAction.setPending();
                    execSynchronous = true;
                }
            }
            updateList.add(new UpdateEntry<WorkflowActionQuery>(WorkflowActionQuery.UPDATE_ACTION_CHECK, wfAction));
            updateList.add(new UpdateEntry<WorkflowJobQuery> (WorkflowJobQuery.UPDATE_WORKFLOW_STATUS_INSTANCE_MODIFIED,
                    wfJob));
        }
        finally {
                // 更新資料庫中的任務資訊
                BatchQueryExecutor.getInstance().executeBatchInsertUpdateDelete(null, updateList, null);
                if (generateEvent && EventHandlerService.isEnabled()) {
                    generateEvent(wfAction, wfJob.getUser());
                }
                if (execSynchronous) {
                    // 用ActionEndXCommand來執行結束
                    new ActionEndXCommand(wfAction.getId(), wfAction.getType()).call();
                }
        }
        return null;
    }
}

呼叫到 JavaActionExecutor.check

根據配置資訊建立 yarnClient = createYarnClient(context, jobConf);
獲取程式報告資訊 ApplicationReport appReport = yarnClient.getApplicationReport(applicationId);
獲取程式資料 Map<String, String> actionData = LauncherHelper.getActionData(actionFs, actionDir, jobConf);
設定各種資訊

@Override
public void check(Context context, WorkflowAction action) throws ActionExecutorException {
    boolean fallback = false;
    YarnClient yarnClient = null;
    try {
        Element actionXml = XmlUtils.parseXml(action.getConf());
        Configuration jobConf = createBaseHadoopConf(context, actionXml);
        FileSystem actionFs = context.getAppFileSystem();
        yarnClient = createYarnClient(context, jobConf); // 根據配置資訊建立
        FinalApplicationStatus appStatus = null;
        try {
            final String effectiveApplicationId = findYarnApplicationId(context, action);
            final ApplicationId applicationId = ConverterUtils.toApplicationId(effectiveApplicationId);
            final ApplicationReport appReport = yarnClient.getApplicationReport(applicationId); // 獲取程式報告資訊
            final YarnApplicationState appState = appReport.getYarnApplicationState();
            if (appState == YarnApplicationState.FAILED || appState == YarnApplicationState.FINISHED
                    || appState == YarnApplicationState.KILLED) {
                appStatus = appReport.getFinalApplicationStatus();
            }
        } 
        if (appStatus != null || fallback) {
            Path actionDir = context.getActionDir();
            // load sequence file into object
            Map<String, String> actionData = LauncherHelper.getActionData(actionFs, actionDir, jobConf);   // 獲取程式資料
            if (fallback) {
                String finalStatus = actionData.get(LauncherAM.ACTION_DATA_FINAL_STATUS);
                if (finalStatus != null) {
                    appStatus = FinalApplicationStatus.valueOf(finalStatus);
                } else {
                    context.setExecutionData(FAILED, null);
                }
            }

            String externalID = actionData.get(LauncherAM.ACTION_DATA_NEW_ID);  // MapReduce was launched
            if (externalID != null) {
                context.setExternalChildIDs(externalID);
             }

           // Multiple child IDs - Pig or Hive action
            String externalIDs = actionData.get(LauncherAM.ACTION_DATA_EXTERNAL_CHILD_IDS);
            if (externalIDs != null) {
                context.setExternalChildIDs(externalIDs);
             }

            // 設定各種資訊
            context.setExecutionData(appStatus.toString(), null);
            if (appStatus == FinalApplicationStatus.SUCCEEDED) {
                if (getCaptureOutput(action) && LauncherHelper.hasOutputData(actionData)) {
                    context.setExecutionData(SUCCEEDED, PropertiesUtils.stringToProperties(actionData
                            .get(LauncherAM.ACTION_DATA_OUTPUT_PROPS)));
                }
                else {
                    context.setExecutionData(SUCCEEDED, null);
                }
                if (LauncherHelper.hasStatsData(actionData)) {
                    context.setExecutionStats(actionData.get(LauncherAM.ACTION_DATA_STATS));
                }
                getActionData(actionFs, action, context);
            }
            else {
                ......
                context.setExecutionData(FAILED_KILLED, null);
            }
        }
    }
    finally {
        if (yarnClient != null) {
            IOUtils.closeQuietly(yarnClient);
        }
    }
}

ActionEndXCommand會進行結束和跳轉:

呼叫Executor來完成結束操作 executor.end(context, wfAction);
更新資料庫的job資訊 BatchQueryExecutor.getInstance().executeBatchInsertUpdateDelete
用 SignalXCommand 來進行跳轉，進行下一個Action的執行

public class ActionEndXCommand extends ActionXCommand<Void> {
    @Override
    protected Void execute() throws CommandException {

        Configuration conf = wfJob.getWorkflowInstance().getConf();

        if (!(executor instanceof ControlNodeActionExecutor)) {
            maxRetries = conf.getInt(OozieClient.ACTION_MAX_RETRIES, executor.getMaxRetries());
            retryInterval = conf.getLong(OozieClient.ACTION_RETRY_INTERVAL, executor.getRetryInterval());
        }

        executor.setMaxRetries(maxRetries);
        executor.setRetryInterval(retryInterval);

        boolean isRetry = false;
        if (wfAction.getStatus() == WorkflowActionBean.Status.END_RETRY
                || wfAction.getStatus() == WorkflowActionBean.Status.END_MANUAL) {
            isRetry = true;
        }
        boolean isUserRetry = false;
        ActionExecutorContext context = new ActionXCommand.ActionExecutorContext(wfJob, wfAction, isRetry, isUserRetry);
        try {
          
            executor.end(context, wfAction); // 呼叫Executor來完成結束操作

            if (!context.isEnded()) {
                failJob(context);
            } else {
                wfAction.setRetries(0);
                wfAction.setEndTime(new Date());

                boolean shouldHandleUserRetry = false;
                Status slaStatus = null;
                switch (wfAction.getStatus()) {
                    case OK:
                        slaStatus = Status.SUCCEEDED;
                        break;
                    ......
                }
                if (!shouldHandleUserRetry || !handleUserRetry(context, wfAction)) {
                    SLAEventBean slaEvent = SLADbXOperations.createStatusEvent(wfAction.getSlaXml(), wfAction.getId(), slaStatus,
                            SlaAppType.WORKFLOW_ACTION);
                    if(slaEvent != null) {
                        insertList.add(slaEvent);
                    }
                }
            }
            WorkflowInstance wfInstance = wfJob.getWorkflowInstance();
            DagELFunctions.setActionInfo(wfInstance, wfAction);
            wfJob.setWorkflowInstance(wfInstance);

            updateList.add(new UpdateEntry<WorkflowActionQuery>(WorkflowActionQuery.UPDATE_ACTION_END,wfAction));
            wfJob.setLastModifiedTime(new Date());
            updateList.add(new UpdateEntry<WorkflowJobQuery>(WorkflowJobQuery.UPDATE_WORKFLOW_STATUS_INSTANCE_MODIFIED, wfJob));
        }
        finally {
            try { 
                // 更新資料庫的job資訊
                BatchQueryExecutor.getInstance().executeBatchInsertUpdateDelete(insertList, updateList, null);
            }
            if (!(executor instanceof ControlNodeActionExecutor) && EventHandlerService.isEnabled()) {
                generateEvent(wfAction, wfJob.getUser());
            }
            new SignalXCommand(jobId, actionId).call(); // 進行跳轉，進行下一個Action的執行
        }
        return null;
    }  
}