Drill-on-YARN之原始碼解析

288735804232677072發表於2018-09-20

1. 概要

前面介紹瞭如何把Drill部署在YARN上,然後通過Drill-on-YARN客戶端,你可以啟動、停止、調整、清零命令操作Drill。但是在這麼命令背後,到底是如何執行的呢,下面會對Drill-on-YARN的原始碼進行詳細的解析,重點解析啟動過程,其他命令簡單介紹。

說明:下面涉及到的程式碼,以drill 1.14.0為準,並且為了減少篇幅,進行了刪減。

2. Drill-on-YARN start

2.1 drill-on-yarn.sh

通過檢視drill-on-yarn.sh指令碼,很容易發現最終執行的java類是CLIENT_CMD="$JAVA $VM_OPTS -cp $CP org.apache.drill.yarn.client.DrillOnYarn ${args[@]}" org.apache.drill.yarn.client.DrillOnYarn便是啟動Drill-on-YARN的入口。我們可以總覽一下這個類:

public class DrillOnYarn {
  public static void main(String argv[]) {
    BasicConfigurator.configure();
    ClientContext.init();
    run(argv);
  }
  public static void run(String argv[]) {
    ClientContext context = ClientContext.instance();
    CommandLineOptions opts = new CommandLineOptions();
    if (!opts.parse(argv)) {
      opts.usage();
      context.exit(-1);
    }
    if (opts.getCommand() == null) {
      opts.usage();
      context.exit(-1);
    }
    try {
      DrillOnYarnConfig.load().setClientPaths();
    } catch (DoyConfigException e) {
      ClientContext.err.println(e.getMessage());
      context.exit(-1);
    }
    ClientCommand cmd;
    switch (opts.getCommand()) {
    case UPLOAD:
      cmd = new StartCommand(true, false);
      break;
    case START:
      cmd = new StartCommand(true, true);
      break;
    case DESCRIBE:
      cmd = new PrintConfigCommand();
      break;
    case STATUS:
      cmd = new StatusCommand();
      break;
    case STOP:
      cmd = new StopCommand();
      break;
    case CLEAN:
      cmd = new CleanCommand();
      break;
    case RESIZE:
      cmd = new ResizeCommand();
      break;
    default:
      cmd = new HelpCommand();
    }
    cmd.setOpts(opts);
    try {
      cmd.run();
    } catch (ClientException e) {
      displayError(opts, e);
      context.exit(1);
    }
  }
}

可以看到入口main方法,其中最關鍵的便是run方法,包含了很多的命令,我們重點看start命令,程式碼如下:

public void run() throws ClientException {
    checkExistingApp();

    dryRun = opts.dryRun;
    config = DrillOnYarnConfig.config();
    FileUploader uploader = upload();
    if (launch) {
      launch(uploader);
    }
}

概括的來說,它主要包含以下流程:

  1. 檢查application是否已經存在,如果已經存在,便不允許啟動,否則執行啟動操作(此處檢查的application是YARN的application,啟動成功會將YARN的applicationId寫入本地磁碟的一個檔案,通過此檔案來檢查)。
  2. 上傳Drill二方包和site目錄下的內容至DFS上,其中site目錄下的內容會被打包為site.tar.gz

    public void run() throws ClientException {
      setup();
      uploadDrillArchive();
      if (hasSiteDir()) {
        uploadSite();
      }
    }
  3. 啟動ApplicationMaster,主要流程為:

    • 建立YARN客戶端,並啟動

      // AMRunner#connectToYarn
      private void connectToYarn() {
          System.out.print("Loading YARN Config...");
          client = new YarnRMClient();
          System.out.println(" Loaded.");
      }
    • 建立ApplicationMaster

      // AMRunner#createApp
      private void createApp() throws ClientException {
          try {
            appResponse = client.createAppMaster();
          } catch (YarnClientException e) {
            throw new ClientException("Failed to allocate Drill application master",
                e);
          }
          appId = appResponse.getApplicationId();
          System.out.println("Application ID: " + appId.toString());
      }
    • 設定ApplicationMaster上下文,包括:Heap memory、Class Path、啟動的命令(dirll-am.sh)、啟動am容器使用的資源(memory、vCores、disks)
    • 校驗資源,主要是ApplicationMaster使用資源是否超過了YARN的設定
    • 提交ApplicationMaster

      private void launchApp(AppSpec master) throws ClientException {
          try {
            client.submitAppMaster(master);
          } catch (YarnClientException e) {
            throw new ClientException("Failed to start Drill application master", e);
          }
      }
    • 等待啟動,並列印啟動日誌
    • 將ApplicationMaster的appid寫入檔案(在第1步,檢測Application是否存在,就是使用這個檔案)

ApplicationMaster啟動後,會向RM申請資源,啟動Drillbits,下面詳細介紹ApplicationMaster啟動後的操作

2.2 drill-am.sh

通過檢視drill-am.sh指令碼,很容易發現最終執行的java類是AMCMD="$JAVA $AM_JAVA_OPTS ${args[@]} -cp $CP org.apache.drill.yarn.appMaster.DrillApplicationMaster"org.apache.drill.yarn.appMaster.DrillApplicationMaste表示ApplicationMaster執行的入口,下面總覽一下這個類:

public class DrillApplicationMaster {
  public static void main(String[] args) {
    LOG.trace("Drill Application Master starting.");
    try {
      DrillOnYarnConfig.load().setAmDrillHome();
    } catch (DoyConfigException e) {
      System.err.println(e.getMessage());
      System.exit(-1);
    }
    Dispatcher dispatcher;
    try {
      dispatcher = (new DrillControllerFactory()).build();
    } catch (ControllerFactoryException e) {
      LOG.error("Setup failed, exiting: " + e.getMessage(), e);
      System.exit(-1);
      return;
    }
    try {
      if (!dispatcher.start()) {
        return;
      }
    } catch (Throwable e) {
      LOG.error("Fatal error, exiting: " + e.getMessage(), e);
      System.exit(-1);
    }
    WebServer webServer = new WebServer(dispatcher);
    try {
      webServer.start();
    } catch (Exception e) {
      LOG.error("Web server setup failed, exiting: " + e.getMessage(), e);
      System.exit(-1);
    }
    try {
      dispatcher.run();
    } catch (Throwable e) {
      LOG.error("Fatal error, exiting: " + e.getMessage(), e);
      System.exit(-1);
    } finally {
      try {
        webServer.close();
      } catch (Exception e) {
      }
    }
  }
}

概況的來說,它主要包含以下流程:

  1. 載入Drill-on-YARN的配置,並設定AM的DirllHome,比如/home/admin/tmp2/hadoop/nm-local-dir/usercache/admin/appcache/application_1534698866098_0022/container_1534698866098_0022_01_000001/drill/apache-drill-1.14.0
  2. 構造Dispatcher,Dispatcher用於分配YARN、timer、ZooKeeper事件給給叢集控制器,它是輕量級多執行緒的,用於響應RM、NM、timer執行緒的事件,對於某一個事件,它是連續的,所以需要同步,但是不同型別的事件不需要同步。整個的構造流程如下:

    • 準備資源,包括:drill二方包、site壓縮包的目錄

      private Map<String, LocalResource> prepareResources() {
          ...
          drillArchivePath = drillConfig.getDrillArchiveDfsPath();
          siteArchivePath = drillConfig.getSiteArchiveDfsPath();
          ...
      }
    • 定義任務啟動的規格(TaskSpec),包括:執行時環境、YARN container的規格、dirllbit的規格

      private TaskSpec buildDrillTaskSpec(Map<String, LocalResource> resources) throws DoyConfigException {
          ...
          ContainerRequestSpec containerSpec = new ContainerRequestSpec();
          containerSpec.memoryMb = config.getInt(DrillOnYarnConfig.DRILLBIT_MEMORY);
          ...
          LaunchSpec drillbitSpec = new LaunchSpec();
          ...
          TaskSpec taskSpec = new TaskSpec();
          taskSpec.name = "Drillbit";
          taskSpec.containerSpec = containerSpec;
          taskSpec.launchSpec = drillbitSpec;
      }
    • 設定Dispatcher的控制器:實現類為ClusterControllerImpl,它主要通過狀態來控制Drill叢集、調整整個叢集的任務(Drill啟動、停止等任務)、處理container的回撥

      public void setYarn(AMYarnFacade yarn) throws YarnFacadeException {
              this.yarn = yarn;
              controller = new ClusterControllerImpl(yarn);
          }
    • 為控制器註冊Scheduler,比如DrillbitScheduler,此外Scheduler配置來源於之前drill-on-yarn.conf

      cluster: [
          {
            name: "drill-group1"
            type: "basic"
            count: 1
          }
      ]
      ...
      ClusterDef.ClusterGroup pool = ClusterDef.getCluster(config, 0);
      Scheduler testGroup = new DrillbitScheduler(pool.getName(), taskSpec,
      pool.getCount(), requestTimeoutSecs, maxExtraNodes);
      dispatcher.getController().registerScheduler(testGroup);
      ...
    • 建立ZooKeeper叢集協調器

      String zkConnect = config.getString(DrillOnYarnConfig.ZK_CONNECT);
      String zkRoot = config.getString(DrillOnYarnConfig.ZK_ROOT);
      String clusterId = config.getString(DrillOnYarnConfig.CLUSTER_ID);
  3. 啟動Dispatcher,主要啟動AMRMClientAsync、NMClientAsync、YarnClient

    ...
    yarn.start(new ResourceCallback(), new NodeCallback());
    String url = trackingUrl.replace("<port>", Integer.toString(httpPort));
    if (DrillOnYarnConfig.config().getBoolean(DrillOnYarnConfig.HTTP_ENABLE_SSL)) {
      url = url.replace("http:", "https:");
    }
    yarn.register(url);
    controller.started();
    ...
    ...
    resourceMgr = AMRMClientAsync.createAMRMClientAsync(pollPeriodMs, resourceCallback);
    resourceMgr.init(conf);
    resourceMgr.start();
    ...
    nodeMgr = NMClientAsync.createNMClientAsync(nodeCallback);
    nodeMgr.init(conf);
    nodeMgr.start();
    ...
    client = YarnClient.createYarnClient();
    client.init(conf);
    client.start();
    ...
  4. 啟動dirll運維介面

    WebServer webServer = new WebServer(dispatcher);
    webServer.start();
  5. 執行Dispatcher,主要是啟動一個執行緒,此執行緒會不斷的輪詢當前的任務佇列中的任務情況,比如啟動、停止、resize等型別的任務,然後執行相應的動作,拿啟動來說

    • 新增一個啟動任務,然後放入pendingTask佇列中

      if (state == State.LIVE) {
        adjustTasks(curTime);
        requestContainers();
      }
    • 向RM請求container:建立一個ContainerRequest

      ContainerRequest request = containerSpec.makeRequest();
      resourceMgr.addContainerRequest(containerSpec.makeRequest());
      return request;
    • ResourceCallback監聽container分配,然後啟動container

      private class ResourceCallback implements AMRMClientAsync.CallbackHandler {
          @Override
          public void onContainersAllocated(List<Container> containers) {
            controller.containersAllocated(containers);
          }
      }
      public void containerAllocated(EventContext context, Container container) {
        Task task = context.task;
        LOG.info(task.getLabel() + " - Received container: "
            + DoYUtil.describeContainer(container));
        context.group.dequeueAllocatingTask(task);
      
        // No matter what happens below, we don`t want to ask for this
        // container again. The RM async API is a bit bizarre in this
        // regard: it will keep asking for container over and over until
        // we tell it to stop.
      
        context.yarn.removeContainerRequest(task.containerRequest);
      
        // The container is need both in the normal and in the cancellation
        // path, so set it here.
      
        task.container = container;
        if (task.cancelled) {
          context.yarn.releaseContainer(container);
          taskStartFailed(context, Disposition.CANCELLED);
          return;
        }
        task.error = null;
        task.completionStatus = null;
        transition(context, LAUNCHING);
      
        // The pool that manages this task wants to know that we have
        // a container. The task manager may want to do some task-
        // specific setup.
      
        context.group.containerAllocated(context.task);
        context.getTaskManager().allocated(context);
      
        // Go ahead and launch a task in the container using the launch
        // specification provided by the task group (pool).
      
        try {
          context.yarn.launchContainer(container, task.getLaunchSpec());
          task.launchTime = System.currentTimeMillis();
        } catch (YarnFacadeException e) {
          LOG.error("Container launch failed: " + task.getContainerId(), e);
      
          // This may not be the right response. RM may still think
          // we have the container if the above is a local failure.
      
          task.error = e;
          context.group.containerReleased(task);
          task.container = null;
          taskStartFailed(context, Disposition.LAUNCH_FAILED);
        }
      }
    • NodeCallback監聽container啟動

      public class NodeCallback implements NMClientAsync.CallbackHandler {
          @Override
          public void onStartContainerError(ContainerId containerId, Throwable t) {
            controller.taskStartFailed(containerId, t);
          }
      
          @Override
          public void onContainerStarted(ContainerId containerId, Map<String, ByteBuffer> allServiceResponse) {
            controller.containerStarted(containerId);
          }
      
          @Override
          public void onContainerStatusReceived(ContainerId containerId, ContainerStatus containerStatus) {
          }
      
          @Override
          public void onGetContainerStatusError(ContainerId containerId, Throwable t) {
          }
      
          @Override
          public void onStopContainerError(ContainerId containerId, Throwable t) {
            controller.stopTaskFailed(containerId, t);
          }
      
          @Override
          public void onContainerStopped(ContainerId containerId) {
            controller.containerStopped(containerId);
          }
      }

2.3 fail over

Drill-on-YARN除了提供start、stop、resize功能外,還提供了fail over功能,當前某個drillbit掛掉後,Drill-on-YARN會嘗試再次啟動drillbit,目前重試的次數為2。此外,如果一個drillbit所在的節點頻繁掛掉,會被列入黑名單。

我們可以通過手動kill drillbit來模擬drillbit掛掉的情況,然後等待一會兒,可以看到,drillbit程式重新啟動了。下面我們看看,程式碼的執行流程

  1. drillbit掛掉,container結束
private class ResourceCallback implements AMRMClientAsync.CallbackHandler {
    @Override
    public void onContainersCompleted(List<ContainerStatus> statuses) {
      controller.containersCompleted(statuses);
    }
}
  1. retry task:重新將這個task加入pendingTasks,然後輪詢的執行緒檢測到pendingTasks不為空,執行啟動操作
protected void taskTerminated(EventContext context) {
    Task task = context.task;
    context.getTaskManager().completed(context);
    context.group.containerReleased(task);
    assert task.completionStatus != null;
    // container結束的狀態不是0,說明不是正常結束
    if (task.completionStatus.getExitStatus() == 0) {
      taskEnded(context, Disposition.COMPLETED);
      context.group.taskEnded(context.task);
    } else {
      taskEnded(context, Disposition.RUN_FAILED);
      retryTask(context);
    }
}
private void retryTask(EventContext context) {
    Task task = context.task;
    assert task.state == END;
    if (!context.controller.isLive() || !task.retryable()) {
      context.group.taskEnded(task);
      return;
    }
    if (task.tryCount > task.taskGroup.getMaxRetries()) {
      LOG.error(task.getLabel() + " - Too many retries: " + task.tryCount);
      task.disposition = Disposition.TOO_MANY_RETRIES;
      context.group.taskEnded(task);
      return;
    }
    LOG.info(task.getLabel() + " - Retrying task, try " + task.tryCount);
    context.group.taskRetried(task);
    task.reset();
    transition(context, START);
    context.group.enqueuePendingRequest(task);
}

3. 停止

除了前面詳情介紹的start命令外,Drill-on-YARN也提供了stop命令,其中stop分兩種:

  1. 強制停止:直接呼叫yarn客戶端的killApplication api yarnClient.killApplication(appId);
  2. 優雅停止:先清理所有的任務,包括pending、running的,然後呼叫yarn的api殺死容器,關閉controller,然後通知am執行結束
...
for (Task task : getStartingTasks()) {
  context.setTask(task);
  context.getState().cancel(context);
}
for (Task task : getActiveTasks()) {
  context.setTask(task);
  context.getState().cancel(context);
}
...
...
context.yarn.killContainer(task.container);
...
public void run() throws YarnFacadeException {
    ...
    boolean success = controller.waitForCompletion();
    ...
    ...
    finish(success, null);
    ...
  }
public boolean waitForCompletion() {
    start();
    synchronized (completionMutex) {
      try {
        completionMutex.wait();
      } catch (InterruptedException e) {
        
      }
    }
    return succeeded();
}
public void finish(boolean succeeded, String msg) throws YarnFacadeException {
    nodeMgr.stop();
    String appMsg = "Drill Cluster Shut-Down";
    FinalApplicationStatus status = FinalApplicationStatus.SUCCEEDED;
    if (!succeeded) {
      appMsg = "Drill Cluster Fatal Error - check logs";
      status = FinalApplicationStatus.FAILED;
    }
    if (msg != null) {
      appMsg = msg;
    }
    try {
      resourceMgr.unregisterApplicationMaster(status, appMsg, "");
    } catch (YarnException | IOException e) {
      throw new YarnFacadeException("Deregister AM failed", e);
    }
    resourceMgr.stop();
  }

4. resize

resize流程為:調整quantity(保留多少個container),之後輪詢執行緒會根據quantity,調整任務,執行resize操作

public int resize(int level) {
    int limit = quantity + state.getController().getFreeNodeCount() +maxExtraNodes;
    return super.resize( Math.min( limit, level ) );
}

5. 總結

總的來說,Drill-on-YARN分為兩大模組,drill-on-yarn.sh和drill-am.sh。drill-on-yarn.sh用於啟動ApplicationMaster,drill-am.sh用於向ResourceManager申請資源並啟動Drill叢集。其中Drill的啟動、停止、縮容、擴容,都被封裝為一個任務,在執行這些命令時,會構建一個任務,放入任務佇列中。有一個執行緒會一直輪詢此佇列,根據佇列中的任務執行不同的操作,從而達到啟動、停止、縮容、擴容Drill叢集的功能。此外,相比獨立部署,Drill-on-YARN提供的failover功能強化了Drill的穩定性。


相關文章