EMR叢集上capacityscheduler的ACL實現
背景
前面一篇介紹了yarn的capacity scheduler原理,實驗了在EMR叢集上使用capacity scheduler對叢集資源的隔離和quota的限制。本文會介紹EMR叢集上capacity scheduler的ACL實現。
為什麼要做這個?前面給叢集分配的資源分配了多個佇列,以及每個佇列的資源配比和作業排程的優先順序。如果多租戶裡面的每個都按照約定,各自往自己對應的佇列裡面提交作業,自然沒有問題。但是如果使用者熟悉capacity scheduler的操作和原理,也是可以佔用別組的資源佇列。所有有了capacity scheduler的ACL設定。
關鍵引數
-
yarn.scheduler.capacity.queue-mappings
- 指定使用者和queue的對映關係。預設使用者上來,不用指定queue引數就能直接到對應的queue。這個比較方便,引數的格式為:
[u|g]:[name]:[queue_name][,next mapping]*
- 指定使用者和queue的對映關係。預設使用者上來,不用指定queue引數就能直接到對應的queue。這個比較方便,引數的格式為:
-
yarn.scheduler.capacity.root.{queue-path}.acl_administer_queue
- 指定誰能管理這個佇列裡面的job,英文解釋為
The ACL of who can administer jobs on the default queue.
星號*
表示all,一個空格表示none;
- 指定誰能管理這個佇列裡面的job,英文解釋為
-
yarn.scheduler.capacity.root.{queue-path}.acl_submit_applications
- 指定誰能提交job到這個佇列,英文解釋是
The ACL of who can administer jobs on the queue.
星號*
表示all,一個空格表示none;
- 指定誰能提交job到這個佇列,英文解釋是
EMR叢集上具體操作步驟
- 建立EMR叢集
-
修改相關配置來支援queue acl
- yarn-site:
yarn.acl.enable=true
- mapred-site:
mapreduce.cluster.acls.enabled=true
- hdfs-site:
dfs.permissions.enabled=true
這個跟capacity scheduler queue的acl沒什麼關係,是控制hdfs acl的,這裡一併設定了 - hdfs-site:
mapreduce.job.acl-view-job=*
如果配置了dfs.permissions.enabled=true
,就需要配置一下這個,要不然在hadoop ui上面沒發檢視job資訊
- yarn-site:
-
重啟yarn和hdfs,使配置生效(root賬戶)
su -l hdfs -c `/usr/lib/hadoop-current/sbin/stop-dfs.sh`
su -l hadoop -c `/usr/lib/hadoop-current/sbin/stop-yarn.sh`
su -l hdfs -c `/usr/lib/hadoop-current/sbin/start-dfs.sh`
su -l hadoop -c `/usr/lib/hadoop-current/sbin/start-yarn.sh`
su -l hadoop -c `/usr/lib/hadoop-current/sbin/yarn-daemon.sh start proxyserver`
- 修改capacity scheduler配置
完整配置
<configuration>
<property>
<name>yarn.scheduler.capacity.maximum-applications</name>
<value>10000</value>
<description>
Maximum number of applications that can be pending and running.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
<value>0.25</value>
<description>
Maximum percent of resources in the cluster which can be used to run
application masters i.e. controls number of concurrent running
applications.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.resource-calculator</name>
<value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value>
<description>
The ResourceCalculator implementation to be used to compare
Resources in the scheduler.
The default i.e. DefaultResourceCalculator only uses Memory while
DominantResourceCalculator uses dominant-resource to compare
multi-dimensional resources such as Memory, CPU etc.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.queues</name>
<value>a,b,default</value>
<description>
The queues at the this level (root is the root queue).
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.capacity</name>
<value>20</value>
<description>Default queue target capacity.</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.a.capacity</name>
<value>30</value>
<description>Default queue target capacity.</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.b.capacity</name>
<value>50</value>
<description>Default queue target capacity.</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.user-limit-factor</name>
<value>1</value>
<description>
Default queue user limit a percentage from 0.0 to 1.0.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
<value>100</value>
<description>
The maximum capacity of the default queue.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.state</name>
<value>RUNNING</value>
<description>
The state of the default queue. State can be one of RUNNING or STOPPED.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.a.state</name>
<value>RUNNING</value>
<description>
The state of the default queue. State can be one of RUNNING or STOPPED.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.b.state</name>
<value>RUNNING</value>
<description>
The state of the default queue. State can be one of RUNNING or STOPPED.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.acl_submit_applications</name>
<value> </value>
<description>
The ACL of who can submit jobs to the root queue.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.a.acl_submit_applications</name>
<value>root</value>
<description>
The ACL of who can submit jobs to the default queue.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.b.acl_submit_applications</name>
<value>hadoop</value>
<description>
The ACL of who can submit jobs to the default queue.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.acl_submit_applications</name>
<value>root</value>
<description>
The ACL of who can submit jobs to the default queue.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.acl_administer_queue</name>
<value> </value>
<description>
The ACL of who can administer jobs on the default queue.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.acl_administer_queue</name>
<value>root</value>
<description>
The ACL of who can administer jobs on the default queue.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.a.acl_administer_queue</name>
<value>root</value>
<description>
The ACL of who can administer jobs on the default queue.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.b.acl_administer_queue</name>
<value>root</value>
<description>
The ACL of who can administer jobs on the default queue.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.node-locality-delay</name>
<value>40</value>
<description>
Number of missed scheduling opportunities after which the CapacityScheduler
attempts to schedule rack-local containers.
Typically this should be set to number of nodes in the cluster, By default is setting
approximately number of nodes in one rack which is 40.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.queue-mappings</name>
<value>u:hadoop:b,u:root:a</value>
</property>
<property>
<name>yarn.scheduler.capacity.queue-mappings-override.enable</name>
<value>false</value>
<description>
If a queue mapping is present, will it override the value specified
by the user? This can be used by administrators to place jobs in queues
that are different than the one specified by the user.
The default is false.
</description>
</property>
</configuration>
上面的配置,分配了三個佇列和對應的資源配比,設定使用者hadoop預設(不指定佇列的時候)往b佇列提,root預設往a佇列提。同時hadoop只能往b佇列提交作業,root可以往所有佇列提交作業。其它使用者沒有許可權提交作業。
踩過的坑
-
acl_administer_queue的配置
- 配置中支援兩種操作的acl許可權配置
acl_administer_queue
和acl_submit_applications
。按照語意,如果要控制是否能提交作業,只要配置佇列的acl_submit_applications
屬性即可,按照文件,也就是這個意思。但是其實不是的,只要有administer許可權的,就能提交作業。這個問題查了好久,找原始碼才找到。
- 配置中支援兩種操作的acl許可權配置
@Override
public void submitApplication(ApplicationId applicationId, String userName,
String queue) throws AccessControlException {
// Careful! Locking order is important!
// Check queue ACLs
UserGroupInformation userUgi = UserGroupInformation.createRemoteUser(userName);
if (!hasAccess(QueueACL.SUBMIT_APPLICATIONS, userUgi)
&& !hasAccess(QueueACL.ADMINISTER_QUEUE, userUgi)) {
throw new AccessControlException("User " + userName + " cannot submit" +
" applications to queue " + getQueuePath());
}
-
root queue的配置
- 如果要限制使用者對queue的許可權root queue一定要設定,不能只設定leaf queue。因為許可權是根許可權具有更高的優先順序,看程式碼註釋說:
// recursively look up the queue to see if parent queue has the permission
。這個跟常人理解也b不一樣。所以需要先把把的許可權限制住,要不然配置的各種自佇列的許可權根本沒有用。
- 如果要限制使用者對queue的許可權root queue一定要設定,不能只設定leaf queue。因為許可權是根許可權具有更高的優先順序,看程式碼註釋說:
<property>
<name>yarn.scheduler.capacity.root.acl_submit_applications</name>
<value> </value>
<description>
The ACL of who can submit jobs to the root queue.
</description>
</property>
相關文章
- 在大規模 Kubernetes 叢集上實現高 SLO 的方法
- GO實現Redis:GO實現Redis叢集(5)GoRedis
- kafkaer:基於模板的 Kafka 主題/叢集/ACL 管理自動化Kafka
- 快速實現 Tomcat 叢集 Session 共享TomcatSession
- Kubernetes 叢集搭建(上)
- Kafka 叢集如何實現資料同步?Kafka
- 教你用Magent實現Memcached叢集
- Redis叢集實現方案選型分析Redis
- orleans叢集及負載均衡實現負載
- 實現一鍵部署與高效叢集管理,SphereEx-Boot 正式上線boot
- 通過memberlist庫實現gossip管理叢集以及叢集資料互動Go
- 傳統上的叢集運算
- 基於 ZooKeeper 實現爬蟲叢集的監控爬蟲
- 叢集映象:實現高效的分散式應用交付分散式
- 玩轉Redis叢集(上)Redis
- SpringSession+Redis實現叢集會話共享SpringGseSessionRedis會話
- 11、redis使用ruby實現叢集高可用Redis
- ShardingSphere 雲上實踐:開箱即用的 ShardingSphere-Proxy 叢集
- 互動贈書 | 雲上雲下K8s多叢集如何實現叢集管理和安全治理的一致體驗?K8S
- PasteSpider的叢集元件PasteCluster(讓你的專案快速支援叢集模式)的思路及實現(含原始碼)ASTIDE元件模式原始碼
- 實現Kubernetes跨叢集服務應用的高可用
- 基於Jenkins + Argo 實現多叢集的持續交付JenkinsGo
- Quartz - Spring整合Quartz實現叢集的定時任務quartzSpring
- MinIO分散式叢集的擴充套件方案及實現分散式套件
- Jenkins叢集下的pipeline實戰Jenkins
- 【技術解析】如何用Docker實現SequoiaDB叢集的快速部署Docker
- Spring Boot系列22 Spring Websocket實現websocket叢集方案的DemoSpring BootWeb
- IoT 邊緣叢集基於 Kubernetes Events 的告警通知實現
- 運維實戰:K8s 上的 Doris 高可用叢集最佳實踐運維K8S
- ES系列(二):基於多播的叢集發現實現原理解析
- 基於istio實現單叢集地域故障轉移
- 構建MHA實現MySQL高可用叢集架構MySql架構
- Nginx搭建Tomcat9叢集並實現Session共享NginxTomcatSession
- Apache+tomcat實現應用伺服器叢集ApacheTomcat伺服器
- (13) SpringCloud-使用Eureka叢集搭建實現高可用SpringGCCloud
- (15) SpringCloud-使用Eureka叢集搭建實現高可用SpringGCCloud
- LVS+Keepalive 實現負載均衡高可用叢集負載
- docker搭建redis叢集和Sentinel,實現故障轉移DockerRedis
- Ubuntu上kubeadm安裝Kubernetes叢集Ubuntu