Hadoop2.7實戰v1.0之HDFS HA

hackeruncle發表於2016-03-06

HDFS HA實戰v1.0


當前環境:hadoop+zookeeper(namenode,resourcemanager HA)
 namenode  serviceId  Init status
 sht-sgmhadoopnn-01  nn1  active
 sht-sgmhadoopnn-02  nn2  standby

參考: http://blog.csdn.net/u011414200/article/details/50336735

.檢視namenodeactive還是standby

1.開啟網頁

Hadoop2.7實戰v1.0之HDFS HA

Hadoop2.7實戰v1.0之HDFS HA


2.檢視zkfc日誌 

點選(此處)摺疊或開啟

  1. [root@sht-sgmhadoopnn-01 logs]# more hadoop-root-zkfc-sht-sgmhadoopnn-01.telenav.cn.log

  2. …………………..

  3. 2016-02-28 00:24:00,692 INFO org.apache.hadoop.ha.ZKFailoverController: Trying to make NameNode at sht-sgmhadoopnn-01/172.16.101.55:8020 active...

  4. 2016-02-28 00:24:01,762 INFO org.apache.hadoop.ha.ZKFailoverController: Successfully transitioned NameNode at sht-sgmhadoopnn-01/172.16.101.55:8020 to active state
  5.  

  6. [root@sht-sgmhadoopnn-02 logs]# more hadoop-root-zkfc-sht-sgmhadoopnn-01.telenav.cn.log

  7. …………………..

  8. 2016-02-28 00:24:01,186 INFO org.apache.hadoop.ha.ZKFailoverController: ZK Election indicated that NameNode at sht-sgmhadoopnn-02/172.16.101.56:8020 should become standby

  9. 2016-02-28 00:24:01,209 INFO org.apache.hadoop.ha.ZKFailoverController: Successfully transitioned NameNode at sht-sgmhadoopnn-02/172.16.101.56:8020 to standby state


 3. 透過命令hdfs haadmin –getServiceState    

###$HADOOP_HOME/etc/hadoop/hdfs-site.xml, dfs.ha.namenodes.[dfs.nameservices]

               

                                <!--設定NameNode IDs 此版本最大隻支援兩個NameNode -->

                                dfs.ha.namenodes.mycluster

                                nn1,nn2

               


 

點選(此處)摺疊或開啟

  1. [root@sht-sgmhadoopnn-02 logs]# hdfs haadmin -getServiceState nn1

  2. active

  3.  [root@sht-sgmhadoopnn-02 logs]# hdfs haadmin -getServiceState nn2

  4. standby
 

.基本命令

點選(此處)摺疊或開啟

  1. [root@sht-sgmhadoopnn-02 logs]# hdfs --help

  2. Usage: hdfs [--config confdir] [--loglevel loglevel] COMMAND

  3.        where COMMAND is one of:

  4.   dfs run a filesystem command on the file systems supported in Hadoop.

  5.   classpath prints the classpath

  6.   namenode -format format the DFS filesystem

  7.   secondarynamenode run the DFS secondary namenode

  8.   namenode run the DFS namenode

  9.   journalnode run the DFS journalnode

  10.   zkfc run the ZK Failover Controller daemon

  11.   datanode run a DFS datanode

  12.   dfsadmin run a DFS admin client

  13.   haadmin run a DFS HA admin client

  14.   fsck run a DFS filesystem checking utility

  15.   balancer run a cluster balancing utility

  16.   jmxget get JMX exported values from NameNode or DataNode.

  17.   mover run a utility to move block replicas across

  18.                                storage types

  19.   oiv apply the offline fsimage viewer to an fsimage

  20.   oiv_legacy apply the offline fsimage viewer to an legacy fsimage

  21.   oev apply the offline edits viewer to an edits file

  22.   fetchdt fetch a delegation token from the NameNode

  23.   getconf get config values from configuration

  24.   groups get the groups which users belong to

  25.   snapshotDiff diff two snapshots of a directory or diff the

  26.                        current directory contents with a snapshot

  27.   lsSnapshottableDir list all snapshottable dirs owned by the current user

  28.                                                 Use -help to see options

  29.   portmap run a portmap service

  30.   nfs3 run an NFS version 3 gateway

  31.   cacheadmin configure the HDFS cache

  32.   crypto configure HDFS encryption zones

  33.   storagepolicies list/get/set block storage policies

  34.   version print the version

  35. ###########################################################################

  36. [root@sht-sgmhadoopnn-02 logs]# hdfs namenode --help

  37. Usage: java NameNode [-backup] |

  38.         [-checkpoint] |

  39.         [-format [-clusterid cid ] [-force] [-nonInteractive] ] |

  40.         [-upgrade [-clusterid cid] [-renameReserved<k-v pairs>] ] |

  41.         [-upgradeOnly [-clusterid cid] [-renameReserved<k-v pairs>] ] |

  42.         [-rollback] |

  43.         [-rollingUpgrade <rollback|downgrade|started> ] |

  44.         [-finalize] |

  45.         [-importCheckpoint] |

  46.         [-initializeSharedEdits] |

  47.         [-bootstrapStandby] |

  48.         [-recover [ -force] ] |

  49.         [-metadataVersion ] ]




  50. ###########################################################################

  51. [root@sht-sgmhadoopnn-02 logs]# hdfs haadmin --help

  52. -help: Unknown command

  53. Usage: haadmin

  54.     [-transitionToActive [--forceactive] <serviceId>]

  55.     [-transitionToStandby <serviceId>]

  56.     [-failover [--forcefence] [--forceactive] <serviceId> <serviceId>]

  57.     [-getServiceState <serviceId>]

  58.     [-checkHealth <serviceId>]

  59.     [-help <command>]
transitionToActive
transitionToStandby是用於在不同狀態之間切換的。 

failover 初始化一個故障恢復。該命令會從一個失效的NameNode切換到另一個上面(不支援在自動切換的環境操作)

Hadoop2.7實戰v1.0之HDFS HA

getServiceState 獲取當前NameNode的狀態。

checkHealth 檢查NameNode的狀態。正常就返回0,否則返回非0值。  

 

.實驗

1.測試HDFS的手工切換功能(失敗)

點選(此處)摺疊或開啟

  1. [root@sht-sgmhadoopnn-01 ~]# hdfs haadmin -failover --forceactive nn1 nn2

  2. forcefence and forceactive flags not supported with auto-failover enabled.

#hdfs-site.xml 中設定dfs.ha.automatic-failover.enabled true,故提示不能手動切換

 

2.測試HDFS的自動切換功能(成功)

   a.active namenode機器上透過jps命令查詢到namenode的程式號,然後透過kill -9的方式殺掉程式,觀察另一個namenode節點是否會從狀態standby變成active狀態

點選(此處)摺疊或開啟

  1. [root@sht-sgmhadoopnn-01 ~]# hdfs haadmin -getServiceState nn1

  2. active

  3. [root@sht-sgmhadoopnn-01 ~]# hdfs haadmin -getServiceState nn2

  4. standby




  5. [root@sht-sgmhadoopnn-01 ~]# jps

  6. 10327 ResourceManager

  7. 10162 DFSZKFailoverController

  8. 9821 NameNode

  9. 20064 Jps

  10. [root@sht-sgmhadoopnn-01 ~]# kill -9 9821

  11. [root@sht-sgmhadoopnn-01 ~]# jps

  12. 10327 ResourceManager

  13. 10162 DFSZKFailoverController

  14. 20121 Jps

  15.  

  16. [root@sht-sgmhadoopnn-01 ~]# hdfs haadmin -getServiceState nn1

  17. 16/02/28 22:01:37 INFO ipc.Client: Retrying connect to server: sht-sgmhadoopnn-01/172.16.101.55:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)

  18. Operation failed: Call From sht-sgmhadoopnn-01.telenav.cn/172.16.101.55 to sht-sgmhadoopnn-01:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused

  19. [root@sht-sgmhadoopnn-01 ~]# hdfs haadmin -getServiceState nn2

  20. active

  21. [root@sht-sgmhadoopnn-01 ~]#

##也可以透過觀察zkfc日誌或者 來檢視是否切換成功
 

   b.手動啟動sht-sgmhadoopnn-01namenode 

點選(此處)摺疊或開啟

  1. [root@sht-sgmhadoopnn-01 ~]# cd /hadoop/hadoop-2.7.2/sbin

  2. [root@sht-sgmhadoopnn-01 sbin]# jps

  3. 10327 ResourceManager

  4. 10162 DFSZKFailoverController

  5. 21640 Jps

  6. [root@sht-sgmhadoopnn-01 sbin]# hadoop-daemon.sh start namenode

  7. starting namenode, logging to /hadoop/hadoop-2.7.2/logs/hadoop-root-namenode-sht-sgmhadoopnn-01.telenav.cn.out

  8.  [root@sht-sgmhadoopnn-01 sbin]# jps

  9. 10327 ResourceManager

  10. 10162 DFSZKFailoverController

  11. 21784 Jps

  12. 21679 NameNode

  13. [root@sht-sgmhadoopnn-01 sbin]# hdfs haadmin -getServiceState nn1

  14. standby

  15. [root@sht-sgmhadoopnn-01 sbin]#

#### sht-sgmhadoopnn-01 機器上namenode 程式已經起來,且狀態為standby

   c. 再次切換 

點選(此處)摺疊或開啟

  1. [root@sht-sgmhadoopnn-01 sbin]# hdfs haadmin -transitionToStandby nn2

  2. Automatic failover is enabled for NameNode at sht-sgmhadoopnn-02/172.16.101.56:8020

  3. Refusing to manually manage HA state, since it may cause

  4. a split-brain scenario or other incorrect state.

  5. If you are very sure you know what you are doing, please

  6. specify the --forcemanual flag.


  7. [root@sht-sgmhadoopnn-01 sbin]# hdfs haadmin -transitionToStandby --forcemanual nn2

  8. You have specified the --forcemanual flag. This flag is dangerous, as it can induce a split-brain scenario that WILL CORRUPT your HDFS namespace, possibly irrecoverably.

  9. It is recommended not to use this flag, but instead to shut down the cluster and disable automatic failover if you prefer to manually manage your HA state.

  10. You may abort safely by answering 'n' or hitting ^C now.

  11. Are you sure you want to continue? (Y or N) Y

  12. 16/02/28 22:30:22 WARN ha.HAAdmin: Proceeding with manual HA state management even though

  13. automatic failover is enabled for NameNode at sht-sgmhadoopnn-02/172.16.101.56:8020

  14.  

  15. [root@sht-sgmhadoopnn-01 sbin]# hdfs haadmin -getServiceState nn1

  16. active

  17. [root@sht-sgmhadoopnn-01 sbin]# hdfs haadmin -getServiceState nn2

  18. standby

  19. [root@sht-sgmhadoopnn-01 sbin]#


來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/30089851/viewspace-2047907/,如需轉載,請註明出處,否則將追究法律責任。

相關文章