EMR叢集上capacityscheduler的ACL實現

梅熙發表於2017-05-15

背景

前面一篇介紹了yarn的capacity scheduler原理,實驗了在EMR叢集上使用capacity scheduler對叢集資源的隔離和quota的限制。本文會介紹EMR叢集上capacity scheduler的ACL實現。

為什麼要做這個?前面給叢集分配的資源分配了多個佇列,以及每個佇列的資源配比和作業排程的優先順序。如果多租戶裡面的每個都按照約定,各自往自己對應的佇列裡面提交作業,自然沒有問題。但是如果使用者熟悉capacity scheduler的操作和原理,也是可以佔用別組的資源佇列。所有有了capacity scheduler的ACL設定。

關鍵引數

  • yarn.scheduler.capacity.queue-mappings

    • 指定使用者和queue的對映關係。預設使用者上來,不用指定queue引數就能直接到對應的queue。這個比較方便,引數的格式為:[u|g]:[name]:[queue_name][,next mapping]*
  • yarn.scheduler.capacity.root.{queue-path}.acl_administer_queue

    • 指定誰能管理這個佇列裡面的job,英文解釋為The ACL of who can administer jobs on the default queue. 星號*表示all,一個空格表示none;
  • yarn.scheduler.capacity.root.{queue-path}.acl_submit_applications

    • 指定誰能提交job到這個佇列,英文解釋是The ACL of who can administer jobs on the queue.星號*表示all,一個空格表示none;

EMR叢集上具體操作步驟

  • 建立EMR叢集
  • 修改相關配置來支援queue acl

    • yarn-site: yarn.acl.enable=true
    • mapred-site: mapreduce.cluster.acls.enabled=true
    • hdfs-site: dfs.permissions.enabled=true這個跟capacity scheduler queue的acl沒什麼關係,是控制hdfs acl的,這裡一併設定了
    • hdfs-site: mapreduce.job.acl-view-job=* 如果配置了dfs.permissions.enabled=true,就需要配置一下這個,要不然在hadoop ui上面沒發檢視job資訊
  • 重啟yarn和hdfs,使配置生效(root賬戶)

    • su -l hdfs -c `/usr/lib/hadoop-current/sbin/stop-dfs.sh`
    • su -l hadoop -c `/usr/lib/hadoop-current/sbin/stop-yarn.sh`
    • su -l hdfs -c `/usr/lib/hadoop-current/sbin/start-dfs.sh`
    • su -l hadoop -c `/usr/lib/hadoop-current/sbin/start-yarn.sh`
    • su -l hadoop -c `/usr/lib/hadoop-current/sbin/yarn-daemon.sh start proxyserver`
  • 修改capacity scheduler配置
    完整配置
<configuration>
  <property>
    <name>yarn.scheduler.capacity.maximum-applications</name>
    <value>10000</value>
    <description>
      Maximum number of applications that can be pending and running.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
    <value>0.25</value>
    <description>
      Maximum percent of resources in the cluster which can be used to run
      application masters i.e. controls number of concurrent running
      applications.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.resource-calculator</name>
    <value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value>
    <description>
      The ResourceCalculator implementation to be used to compare
      Resources in the scheduler.
      The default i.e. DefaultResourceCalculator only uses Memory while
      DominantResourceCalculator uses dominant-resource to compare
      multi-dimensional resources such as Memory, CPU etc.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.queues</name>
    <value>a,b,default</value>
    <description>
      The queues at the this level (root is the root queue).
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.capacity</name>
    <value>20</value>
    <description>Default queue target capacity.</description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.a.capacity</name>
    <value>30</value>
    <description>Default queue target capacity.</description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.b.capacity</name>
    <value>50</value>
    <description>Default queue target capacity.</description>
  </property>


  <property>
    <name>yarn.scheduler.capacity.root.default.user-limit-factor</name>
    <value>1</value>
    <description>
      Default queue user limit a percentage from 0.0 to 1.0.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
    <value>100</value>
    <description>
      The maximum capacity of the default queue.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.state</name>
    <value>RUNNING</value>
    <description>
      The state of the default queue. State can be one of RUNNING or STOPPED.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.a.state</name>
    <value>RUNNING</value>
    <description>
      The state of the default queue. State can be one of RUNNING or STOPPED.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.b.state</name>
    <value>RUNNING</value>
    <description>
      The state of the default queue. State can be one of RUNNING or STOPPED.
    </description>
  </property>


  <property>
    <name>yarn.scheduler.capacity.root.acl_submit_applications</name>
     <value> </value>
     <description>
       The ACL of who can submit jobs to the root queue.
     </description>
   </property>

  <property>
    <name>yarn.scheduler.capacity.root.a.acl_submit_applications</name>
    <value>root</value>
    <description>
      The ACL of who can submit jobs to the default queue.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.b.acl_submit_applications</name>
    <value>hadoop</value>
    <description>
      The ACL of who can submit jobs to the default queue.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.acl_submit_applications</name>
    <value>root</value>
    <description>
      The ACL of who can submit jobs to the default queue.
    </description>
  </property>

<property>
    <name>yarn.scheduler.capacity.root.acl_administer_queue</name>
    <value> </value>
    <description>
      The ACL of who can administer jobs on the default queue.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.acl_administer_queue</name>
    <value>root</value>
    <description>
      The ACL of who can administer jobs on the default queue.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.a.acl_administer_queue</name>
    <value>root</value>
    <description>
      The ACL of who can administer jobs on the default queue.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.b.acl_administer_queue</name>
    <value>root</value>
    <description>
      The ACL of who can administer jobs on the default queue.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.node-locality-delay</name>
    <value>40</value>
    <description>
      Number of missed scheduling opportunities after which the CapacityScheduler
      attempts to schedule rack-local containers.
      Typically this should be set to number of nodes in the cluster, By default is setting
      approximately number of nodes in one rack which is 40.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.queue-mappings</name>
    <value>u:hadoop:b,u:root:a</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.queue-mappings-override.enable</name>
    <value>false</value>
    <description>
      If a queue mapping is present, will it override the value specified
      by the user? This can be used by administrators to place jobs in queues
      that are different than the one specified by the user.
      The default is false.
    </description>
  </property>

</configuration>

上面的配置,分配了三個佇列和對應的資源配比,設定使用者hadoop預設(不指定佇列的時候)往b佇列提,root預設往a佇列提。同時hadoop只能往b佇列提交作業,root可以往所有佇列提交作業。其它使用者沒有許可權提交作業。

踩過的坑

  • acl_administer_queue的配置

    • 配置中支援兩種操作的acl許可權配置acl_administer_queueacl_submit_applications。按照語意,如果要控制是否能提交作業,只要配置佇列的acl_submit_applications屬性即可,按照文件,也就是這個意思。但是其實不是的,只要有administer許可權的,就能提交作業。這個問題查了好久,找原始碼才找到。
  @Override
  public void submitApplication(ApplicationId applicationId, String userName,
      String queue)  throws AccessControlException {
    // Careful! Locking order is important!

    // Check queue ACLs
    UserGroupInformation userUgi = UserGroupInformation.createRemoteUser(userName);
    if (!hasAccess(QueueACL.SUBMIT_APPLICATIONS, userUgi)
        && !hasAccess(QueueACL.ADMINISTER_QUEUE, userUgi)) {
      throw new AccessControlException("User " + userName + " cannot submit" +
          " applications to queue " + getQueuePath());
    }
  • root queue的配置

    • 如果要限制使用者對queue的許可權root queue一定要設定,不能只設定leaf queue。因為許可權是根許可權具有更高的優先順序,看程式碼註釋說:// recursively look up the queue to see if parent queue has the permission。這個跟常人理解也b不一樣。所以需要先把把的許可權限制住,要不然配置的各種自佇列的許可權根本沒有用。
<property>
    <name>yarn.scheduler.capacity.root.acl_submit_applications</name>
     <value> </value>
     <description>
       The ACL of who can submit jobs to the root queue.
     </description>
   </property>


相關文章