EMR叢集上capacityscheduler的ACL實現

梅熙發表於2017-05-15

原文網址 : https://flycode.co/archives/112557

背景

前面一篇介紹了yarn的capacity scheduler原理，實驗了在EMR叢集上使用capacity scheduler對叢集資源的隔離和quota的限制。本文會介紹EMR叢集上capacity scheduler的ACL實現。

為什麼要做這個？前面給叢集分配的資源分配了多個佇列，以及每個佇列的資源配比和作業排程的優先順序。如果多租戶裡面的每個都按照約定，各自往自己對應的佇列裡面提交作業，自然沒有問題。但是如果使用者熟悉capacity scheduler的操作和原理，也是可以佔用別組的資源佇列。所有有了capacity scheduler的ACL設定。

關鍵引數

yarn.scheduler.capacity.queue-mappings
- 指定使用者和queue的對映關係。預設使用者上來，不用指定queue引數就能直接到對應的queue。這個比較方便，引數的格式為：[u|g]:[name]:[queue_name][,next mapping]*
yarn.scheduler.capacity.root.{queue-path}.acl_administer_queue
- 指定誰能管理這個佇列裡面的job，英文解釋為The ACL of who can administer jobs on the default queue. 星號*表示all，一個空格表示none；
yarn.scheduler.capacity.root.{queue-path}.acl_submit_applications
- 指定誰能提交job到這個佇列，英文解釋是The ACL of who can administer jobs on the queue.星號*表示all，一個空格表示none；

EMR叢集上具體操作步驟

建立EMR叢集
修改相關配置來支援queue acl
- yarn-site: yarn.acl.enable=true
- mapred-site: mapreduce.cluster.acls.enabled=true
- hdfs-site: dfs.permissions.enabled=true這個跟capacity scheduler queue的acl沒什麼關係，是控制hdfs acl的，這裡一併設定了
- hdfs-site: mapreduce.job.acl-view-job=* 如果配置了dfs.permissions.enabled=true，就需要配置一下這個，要不然在hadoop ui上面沒發檢視job資訊
重啟yarn和hdfs，使配置生效(root賬戶)
- su -l hdfs -c `/usr/lib/hadoop-current/sbin/stop-dfs.sh`
- su -l hadoop -c `/usr/lib/hadoop-current/sbin/stop-yarn.sh`
- su -l hdfs -c `/usr/lib/hadoop-current/sbin/start-dfs.sh`
- su -l hadoop -c `/usr/lib/hadoop-current/sbin/start-yarn.sh`
- su -l hadoop -c `/usr/lib/hadoop-current/sbin/yarn-daemon.sh start proxyserver`
修改capacity scheduler配置
完整配置

<configuration>
  <property>
    <name>yarn.scheduler.capacity.maximum-applications</name>
    <value>10000</value>
    <description>
      Maximum number of applications that can be pending and running.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
    <value>0.25</value>
    <description>
      Maximum percent of resources in the cluster which can be used to run
      application masters i.e. controls number of concurrent running
      applications.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.resource-calculator</name>
    <value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value>
    <description>
      The ResourceCalculator implementation to be used to compare
      Resources in the scheduler.
      The default i.e. DefaultResourceCalculator only uses Memory while
      DominantResourceCalculator uses dominant-resource to compare
      multi-dimensional resources such as Memory, CPU etc.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.queues</name>
    <value>a,b,default</value>
    <description>
      The queues at the this level (root is the root queue).
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.capacity</name>
    <value>20</value>
    <description>Default queue target capacity.</description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.a.capacity</name>
    <value>30</value>
    <description>Default queue target capacity.</description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.b.capacity</name>
    <value>50</value>
    <description>Default queue target capacity.</description>
  </property>


  <property>
    <name>yarn.scheduler.capacity.root.default.user-limit-factor</name>
    <value>1</value>
    <description>
      Default queue user limit a percentage from 0.0 to 1.0.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
    <value>100</value>
    <description>
      The maximum capacity of the default queue.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.state</name>
    <value>RUNNING</value>
    <description>
      The state of the default queue. State can be one of RUNNING or STOPPED.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.a.state</name>
    <value>RUNNING</value>
    <description>
      The state of the default queue. State can be one of RUNNING or STOPPED.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.b.state</name>
    <value>RUNNING</value>
    <description>
      The state of the default queue. State can be one of RUNNING or STOPPED.
    </description>
  </property>


  <property>
    <name>yarn.scheduler.capacity.root.acl_submit_applications</name>
     <value> </value>
     <description>
       The ACL of who can submit jobs to the root queue.
     </description>
   </property>

  <property>
    <name>yarn.scheduler.capacity.root.a.acl_submit_applications</name>
    <value>root</value>
    <description>
      The ACL of who can submit jobs to the default queue.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.b.acl_submit_applications</name>
    <value>hadoop</value>
    <description>
      The ACL of who can submit jobs to the default queue.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.acl_submit_applications</name>
    <value>root</value>
    <description>
      The ACL of who can submit jobs to the default queue.
    </description>
  </property>

<property>
    <name>yarn.scheduler.capacity.root.acl_administer_queue</name>
    <value> </value>
    <description>
      The ACL of who can administer jobs on the default queue.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.acl_administer_queue</name>
    <value>root</value>
    <description>
      The ACL of who can administer jobs on the default queue.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.a.acl_administer_queue</name>
    <value>root</value>
    <description>
      The ACL of who can administer jobs on the default queue.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.b.acl_administer_queue</name>
    <value>root</value>
    <description>
      The ACL of who can administer jobs on the default queue.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.node-locality-delay</name>
    <value>40</value>
    <description>
      Number of missed scheduling opportunities after which the CapacityScheduler
      attempts to schedule rack-local containers.
      Typically this should be set to number of nodes in the cluster, By default is setting
      approximately number of nodes in one rack which is 40.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.queue-mappings</name>
    <value>u:hadoop:b,u:root:a</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.queue-mappings-override.enable</name>
    <value>false</value>
    <description>
      If a queue mapping is present, will it override the value specified
      by the user? This can be used by administrators to place jobs in queues
      that are different than the one specified by the user.
      The default is false.
    </description>
  </property>

</configuration>

上面的配置，分配了三個佇列和對應的資源配比，設定使用者hadoop預設（不指定佇列的時候）往b佇列提，root預設往a佇列提。同時hadoop只能往b佇列提交作業，root可以往所有佇列提交作業。其它使用者沒有許可權提交作業。

踩過的坑

acl_administer_queue的配置
- 配置中支援兩種操作的acl許可權配置acl_administer_queue和acl_submit_applications。按照語意，如果要控制是否能提交作業，只要配置佇列的acl_submit_applications屬性即可，按照文件，也就是這個意思。但是其實不是的，只要有administer許可權的，就能提交作業。這個問題查了好久，找原始碼才找到。

  @Override
  public void submitApplication(ApplicationId applicationId, String userName,
      String queue)  throws AccessControlException {
    // Careful! Locking order is important!

    // Check queue ACLs
    UserGroupInformation userUgi = UserGroupInformation.createRemoteUser(userName);
    if (!hasAccess(QueueACL.SUBMIT_APPLICATIONS, userUgi)
        && !hasAccess(QueueACL.ADMINISTER_QUEUE, userUgi)) {
      throw new AccessControlException("User " + userName + " cannot submit" +
          " applications to queue " + getQueuePath());
    }

root queue的配置
- 如果要限制使用者對queue的許可權root queue一定要設定，不能只設定leaf queue。因為許可權是根許可權具有更高的優先順序，看程式碼註釋說:// recursively look up the queue to see if parent queue has the permission。這個跟常人理解也b不一樣。所以需要先把把的許可權限制住，要不然配置的各種自佇列的許可權根本沒有用。

<property>
    <name>yarn.scheduler.capacity.root.acl_submit_applications</name>
     <value> </value>
     <description>
       The ACL of who can submit jobs to the root queue.
     </description>
   </property>

在大規模 Kubernetes 叢集上實現高 SLO 的方法
2020-11-06
GO實現Redis：GO實現Redis叢集（5）
2023-03-27
GoRedis
kafkaer：基於模板的 Kafka 主題/叢集/ACL 管理自動化
2022-07-06
Kafka
快速實現 Tomcat 叢集 Session 共享
2019-01-28
TomcatSession
Kubernetes 叢集搭建（上）
2020-06-02
Kafka 叢集如何實現資料同步？
2023-11-16
Kafka
教你用Magent實現Memcached叢集
2020-05-08
Redis叢集實現方案選型分析
2019-03-03
Redis
orleans叢集及負載均衡實現
2022-01-15
負載
實現一鍵部署與高效叢集管理，SphereEx-Boot 正式上線
2021-12-30
boot
通過memberlist庫實現gossip管理叢集以及叢集資料互動
2022-07-12
Go
傳統上的叢集運算
2018-10-11
基於 ZooKeeper 實現爬蟲叢集的監控
2021-09-09
爬蟲
叢集映象：實現高效的分散式應用交付
2021-05-26
分散式
玩轉Redis叢集（上）
2018-10-08
Redis
SpringSession+Redis實現叢集會話共享
2018-08-13
SpringGseSessionRedis會話
11、redis使用ruby實現叢集高可用
2018-03-26
Redis
ShardingSphere 雲上實踐：開箱即用的 ShardingSphere-Proxy 叢集
2022-07-12
互動贈書 | 雲上雲下K8s多叢集如何實現叢集管理和安全治理的一致體驗？
2021-11-01
K8S
PasteSpider的叢集元件PasteCluster(讓你的專案快速支援叢集模式)的思路及實現(含原始碼)
2024-06-12
ASTIDE元件模式原始碼
實現Kubernetes跨叢集服務應用的高可用
2019-04-15
基於Jenkins + Argo 實現多叢集的持續交付
2024-03-20
JenkinsGo
Quartz - Spring整合Quartz實現叢集的定時任務
2018-03-06
quartzSpring
MinIO分散式叢集的擴充套件方案及實現
2021-04-23
分散式套件
Jenkins叢集下的pipeline實戰
2022-11-10
Jenkins
【技術解析】如何用Docker實現SequoiaDB叢集的快速部署
2019-02-22
Docker
Spring Boot系列22 Spring Websocket實現websocket叢集方案的Demo
2018-07-30
Spring BootWeb
IoT 邊緣叢集基於 Kubernetes Events 的告警通知實現
2023-02-16
運維實戰：K8s 上的 Doris 高可用叢集最佳實踐
2024-12-10
運維K8S
ES系列(二)：基於多播的叢集發現實現原理解析
2021-04-18
基於istio實現單叢集地域故障轉移
2024-04-10
構建MHA實現MySQL高可用叢集架構
2019-07-29
MySql架構
Nginx搭建Tomcat9叢集並實現Session共享
2020-07-10
NginxTomcatSession
Apache+tomcat實現應用伺服器叢集
2022-03-26
ApacheTomcat伺服器
(13) SpringCloud-使用Eureka叢集搭建實現高可用
2022-10-19
SpringGCCloud
(15) SpringCloud-使用Eureka叢集搭建實現高可用
2022-10-21
SpringGCCloud
LVS+Keepalive 實現負載均衡高可用叢集
2021-01-03
負載
docker搭建redis叢集和Sentinel，實現故障轉移
2021-03-07
DockerRedis
Ubuntu上kubeadm安裝Kubernetes叢集
2019-02-21
Ubuntu

EMR叢集上capacityscheduler的ACL實現

背景

關鍵引數

EMR叢集上具體操作步驟

踩過的坑

相關文章