Hadoop2.7實戰v1.0之YARN HA

hackeruncle發表於2016-03-06

YARN HA實戰v1.0

當前環境:hadoop+zookeeper(namenode,resourcemanager HA)   

              resourcemanager

serviceId

init status

sht-sgmhadoopnn-01

rm1

active

sht-sgmhadoopnn-02

rm2

standby

參考:
http://blog.csdn.net/u011414200/article/details/50336735

http://blog.csdn.net/u011414200/article/details/50276257

.檢視resourcemanageractive還是standby

1.開啟網頁

 

2.檢視resourcemanager日誌

點選(此處)摺疊或開啟

  1. [root@sht-sgmhadoopnn-01 logs]# more yarn-root-resourcemanager-sht-sgmhadoopnn-01.telenav.cn.log
  2. …………………..
  3. 2016-03-03 18:10:01,289 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioned to active state

點選(此處)摺疊或開啟

  1. [root@sht-sgmhadoopnn-02 logs]# more yarn-root-resourcemanager-sht-sgmhadoopnn-02.telenav.cn.log
  2. …………………..
  3. 2016-03-03 18:10:34,250 INFO org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher: YARN system metrics publishing service is not enabled
  4. 2016-03-03 18:10:34,250 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning to standby state

3. yarn rmadmin –getServiceState 


點選(此處)摺疊或開啟

  1. ###$HADOOP_HOME/etc/hadoop/yarn-site.xml
  2.               <property>
  3.         <name>yarn.resourcemanager.ha.rm-ids</name>
  4.         <value>rm1,rm2</value>
  5.     </property>


  6. [root@sht-sgmhadoopnn-02 logs]# yarn rmadmin -getServiceState rm1
  7. active
  8. [root@sht-sgmhadoopnn-02 logs]# yarn rmadmin -getServiceState rm2
  9. standby
.命令

點選(此處)摺疊或開啟

  1. [root@sht-sgmhadoopnn-01 bin]# yarn --help
  2. Usage: yarn [--config confdir] [COMMAND | CLASSNAME]
  3.   CLASSNAME run the class named CLASSNAME
  4.  or
  5.   where COMMAND is one of:
  6.   resourcemanager -format-state-store deletes the RMStateStore
  7.   resourcemanager run the ResourceManager
  8.   nodemanager run a nodemanager on each slave
  9.   timelineserver run the timeline server
  10.   rmadmin admin tools
  11.   sharedcachemanager run the SharedCacheManager daemon
  12.   scmadmin SharedCacheManager admin tools
  13.   version print the version
  14.   jar <jar> run a jar file
  15.   application prints application(s)
  16.                                         report/kill application
  17.   applicationattempt prints applicationattempt(s)
  18.                                         report
  19.   container prints container(s) report
  20.   node prints node report(s)
  21.   queue prints queue information
  22.   logs dump container logs
  23.   classpath prints the class path needed to
  24.                                         get the Hadoop jar and the
  25.                                         required libraries
  26.   cluster prints cluster information
  27.   daemonlog get/set the log level for each
  28.                                         daemon

  29. ###########################################################################
  30. [root@sht-sgmhadoopnn-01 bin]# yarn rmadmin --help
  31. -help: Unknown command
  32. Usage: yarn rmadmin
  33.    -refreshQueues
  34.    -refreshNodes
  35.    -refreshSuperUserGroupsConfiguration
  36.    -refreshUserToGroupsMappings
  37.    -refreshAdminAcls
  38.    -refreshServiceAcl
  39.    -getGroups [username]
  40.    -addToClusterNodeLabels [label1,label2,label3] (label splitted by ",")
  41.    -removeFromClusterNodeLabels [label1,label2,label3] (label splitted by ",")
  42.    -replaceLabelsOnNode [node1[:port]=label1,label2 node2[:port]=label1,label2]
  43.    -directlyAccessNodeLabelStore
  44.    -transitionToActive [--forceactive] <serviceId>
  45.    -transitionToStandby <serviceId>
  46.    -failover [--forcefence] [--forceactive] <serviceId> <serviceId>
  47.    -getServiceState <serviceId>
  48.    -checkHealth <serviceId>
  49.    -help [cmd]
transitionToActivetransitionToStandby是用於在不同狀態之間切換的。

failover 初始化一個故障恢復。該命令會從一個失效的resourcemanager切換到另一個上面(不支援在自動切換的環境操作)

getServiceState 獲取當前resourcemanager的狀態。

checkHealth 檢查resourcemanager的狀態。正常就返回0,否則返回非0值。

.實驗

1.測試YARN的手工切換功能(失敗)

點選(此處)摺疊或開啟

  1. [root@sht-sgmhadoopnn-01 ~]# yarn rmadmin -failover --forceactive rm1 rm2
  2. forcefence and forceactive flags not supported with auto-failover enabled.

#yarn-site.xml 中設定yarn.resourcemanager.ha.automatic-failover.enabled true,故提示不能手動切換

2.測試YARN的自動切換功能(成功)

 a.active resoucemanager機器上透過jps命令查詢到resoucemanager的程式號,然後透過kill -9的方式殺掉程式,觀察另一個resoucemanager節點是否會從狀態standby變成active狀態

點選(此處)摺疊或開啟

  1. [root@sht-sgmhadoopnn-01 ~]# yarn rmadmin -getServiceState rm1
  2. active
  3. [root@sht-sgmhadoopnn-01 ~]# yarn rmadmin -getServiceState rm2
  4. standby

  5. [root@sht-sgmhadoopnn-01 bin]# jps
  6. 2583 Jps
  7. 10162 DFSZKFailoverController
  8. 28432 ResourceManager
  9. 21679 NameNode

  10. [root@sht-sgmhadoopnn-01 ~]# kill -9 28432

  11. [root@sht-sgmhadoopnn-02 bin]# jps
  12. 19147 ResourceManager
  13. 17837 NameNode
  14. 17970 DFSZKFailoverController
  15. 27330 Jps


  16. [root@sht-sgmhadoopnn-01 bin]# yarn rmadmin -getServiceState rm1
  17. 16/03/03 19:23:39 INFO ipc.Client: Retrying connect to server: sht-sgmhadoopnn-01/172.16.101.55:8033. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
  18. Operation failed: Call From sht-sgmhadoopnn-01.telenav.cn/172.16.101.55 to sht-sgmhadoopnn-01:8033 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused



  19. [root@sht-sgmhadoopnn-01 bin]# yarn rmadmin -getServiceState rm2
  20. active
  21. [root@sht-sgmhadoopnn-01 bin]#

#### sht-sgmhadoopnn-01 機器上resourcemanager程式已經起來,且狀態為standby

  c. 再次切換

點選(此處)摺疊或開啟

  1. [root@sht-sgmhadoopnn-01 sbin]# yarn rmadmin -transitionToStandby rm2
  2. Automatic failover is enabled for org.apache.hadoop.yarn.client.RMHAServiceTarget@11f69937
  3. Refusing to manually manage HA state, since it may cause
  4. a split-brain scenario or other incorrect state.
  5. If you are very sure you know what you are doing, please
  6. specify the --forcemanual flag.

  7. [root@sht-sgmhadoopnn-01 sbin]# yarn rmadmin -transitionToStandby --forcemanual rm2
  8. You have specified the --forcemanual flag. This flag is dangerous, as it can induce a split-brain scenario that WILL CORRUPT your HDFS namespace, possibly irrecoverably.

  9. It is recommended not to use this flag, but instead to shut down the cluster and disable automatic failover if you prefer to manually manage your HA state.

  10. You may abort safely by answering 'n' or hitting ^C now.

  11. Are you sure you want to continue? (Y or N) Y
  12. 16/03/03 19:29:36 WARN ha.HAAdmin: Proceeding with manual HA state management even though
  13. automatic failover is enabled for org.apache.hadoop.yarn.client.RMHAServiceTarget@4e33967b


  14. [root@sht-sgmhadoopnn-01 sbin]# yarn rmadmin -getServiceState rm1
  15. standby
  16. [root@sht-sgmhadoopnn-01 sbin]# yarn rmadmin -getServiceState rm2
  17. standby

  18. [root@sht-sgmhadoopnn-01 sbin]# yarn rmadmin -transitionToActive rm1
  19. Automatic failover is enabled for org.apache.hadoop.yarn.client.RMHAServiceTarget@54c4f317
  20. Refusing to manually manage HA state, since it may cause
  21. a split-brain scenario or other incorrect state.
  22. If you are very sure you know what you are doing, please
  23. specify the --forcemanual flag.

  24. [root@sht-sgmhadoopnn-01 sbin]# yarn rmadmin -transitionToActive --forcemanual rm1
  25. You have specified the --forcemanual flag. This flag is dangerous, as it can induce a split-brain scenario that WILL CORRUPT your HDFS namespace, possibly irrecoverably.

  26. It is recommended not to use this flag, but instead to shut down the cluster and disable automatic failover if you prefer to manually manage your HA state.

  27. You may abort safely by answering 'n' or hitting ^C now.

  28. Are you sure you want to continue? (Y or N) Y
  29. 16/03/03 19:32:46 WARN ha.HAAdmin: Proceeding with manual HA state management even though
  30. automatic failover is enabled for org.apache.hadoop.yarn.client.RMHAServiceTarget@54c4f317

  31.  [root@sht-sgmhadoopnn-01 sbin]# yarn rmadmin -getServiceState rm1
  32. 16/03/03 19:32:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  33. active
  34. [root@sht-sgmhadoopnn-01 sbin]# yarn rmadmin -getServiceState rm2
  35. 16/03/03 19:33:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  36. standby
  37. [root@sht-sgmhadoopnn-01 sbin]#

#HDFS HA切換實驗不一樣, -transitionToStandby,會自動將standby—>active,active-->standby;

YARN HA就不一樣,需要還有手動再執行一下-transitionToActive.

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/30089851/viewspace-2047918/,如需轉載,請註明出處,否則將追究法律責任。

相關文章