HDFS HA實戰v1.0

當前環境:hadoop+zookeeper(namenode,resourcemanager HA)

namenode	serviceId	Init status
sht-sgmhadoopnn-01	nn1	active
sht-sgmhadoopnn-02	nn2	standby

參考: http://blog.csdn.net/u011414200/article/details/50336735

一.檢視namenode是active還是standby

1.開啟網頁

2.檢視zkfc日誌

點選(此處)摺疊或開啟

[root@sht-sgmhadoopnn-01 logs]# more hadoop-root-zkfc-sht-sgmhadoopnn-01.telenav.cn.log
…………………..
2016-02-28 00:24:00,692 INFO org.apache.hadoop.ha.ZKFailoverController: Trying to make NameNode at sht-sgmhadoopnn-01/172.16.101.55:8020 active...
2016-02-28 00:24:01,762 INFO org.apache.hadoop.ha.ZKFailoverController: Successfully transitioned NameNode at sht-sgmhadoopnn-01/172.16.101.55:8020 to active state
[root@sht-sgmhadoopnn-02 logs]# more hadoop-root-zkfc-sht-sgmhadoopnn-01.telenav.cn.log
…………………..
2016-02-28 00:24:01,186 INFO org.apache.hadoop.ha.ZKFailoverController: ZK Election indicated that NameNode at sht-sgmhadoopnn-02/172.16.101.56:8020 should become standby
2016-02-28 00:24:01,209 INFO org.apache.hadoop.ha.ZKFailoverController: Successfully transitioned NameNode at sht-sgmhadoopnn-02/172.16.101.56:8020 to standby state

3. 透過命令hdfs haadmin –getServiceState

###$HADOOP_HOME/etc/hadoop/hdfs-site.xml, dfs.ha.namenodes.[dfs.nameservices]

dfs.ha.namenodes.mycluster

nn1,nn2

點選(此處)摺疊或開啟

[root@sht-sgmhadoopnn-02 logs]# hdfs haadmin -getServiceState nn1
active
[root@sht-sgmhadoopnn-02 logs]# hdfs haadmin -getServiceState nn2
standby

二.基本命令

點選(此處)摺疊或開啟

[root@sht-sgmhadoopnn-02 logs]# hdfs --help

Usage: hdfs [--config confdir] [--loglevel loglevel] COMMAND

       where COMMAND is one of:

  dfs run a filesystem command on the file systems supported in Hadoop.

  classpath prints the classpath

  namenode -format format the DFS filesystem

  secondarynamenode run the DFS secondary namenode

  namenode run the DFS namenode

  journalnode run the DFS journalnode

  zkfc run the ZK Failover Controller daemon

  datanode run a DFS datanode

  dfsadmin run a DFS admin client

  haadmin run a DFS HA admin client

  fsck run a DFS filesystem checking utility

  balancer run a cluster balancing utility

  jmxget get JMX exported values from NameNode or DataNode.

  mover run a utility to move block replicas across

                               storage types

  oiv apply the offline fsimage viewer to an fsimage

  oiv_legacy apply the offline fsimage viewer to an legacy fsimage

  oev apply the offline edits viewer to an edits file

  fetchdt fetch a delegation token from the NameNode

  getconf get config values from configuration

  groups get the groups which users belong to

  snapshotDiff diff two snapshots of a directory or diff the

                       current directory contents with a snapshot

  lsSnapshottableDir list all snapshottable dirs owned by the current user

                                                Use -help to see options

  portmap run a portmap service

  nfs3 run an NFS version 3 gateway

  cacheadmin configure the HDFS cache

  crypto configure HDFS encryption zones

  storagepolicies list/get/set block storage policies

  version print the version

###########################################################################

[root@sht-sgmhadoopnn-02 logs]# hdfs namenode --help

Usage: java NameNode [-backup] |

        [-checkpoint] |

        [-format [-clusterid cid ] [-force] [-nonInteractive] ] |

        [-upgrade [-clusterid cid] [-renameReserved<k-v pairs>] ] |

        [-upgradeOnly [-clusterid cid] [-renameReserved<k-v pairs>] ] |

        [-rollback] |

        [-rollingUpgrade <rollback|downgrade|started> ] |

        [-finalize] |

        [-importCheckpoint] |

        [-initializeSharedEdits] |

        [-bootstrapStandby] |

        [-recover [ -force] ] |

        [-metadataVersion ] ]

###########################################################################

[root@sht-sgmhadoopnn-02 logs]# hdfs haadmin --help

-help: Unknown command

Usage: haadmin

    [-transitionToActive [--forceactive] <serviceId>]

    [-transitionToStandby <serviceId>]

    [-failover [--forcefence] [--forceactive] <serviceId> <serviceId>]

    [-getServiceState <serviceId>]

    [-checkHealth <serviceId>]

    [-help <command>]

transitionToActive和transitionToStandby是用於在不同狀態之間切換的。

failover 初始化一個故障恢復。該命令會從一個失效的NameNode切換到另一個上面(不支援在自動切換的環境操作)。

Hadoop2.7實戰v1.0之HDFS HA

getServiceState 獲取當前NameNode的狀態。

checkHealth 檢查NameNode的狀態。正常就返回0，否則返回非0值。

三.實驗

1.測試HDFS的手工切換功能(失敗)

點選(此處)摺疊或開啟

[root@sht-sgmhadoopnn-01 ~]# hdfs haadmin -failover --forceactive nn1 nn2
forcefence and forceactive flags not supported with auto-failover enabled.

#hdfs-site.xml 中設定dfs.ha.automatic-failover.enabled為 true,故提示不能手動切換

2.測試HDFS的自動切換功能(成功)

a.在active namenode機器上透過jps命令查詢到namenode的程式號，然後透過kill -9的方式殺掉程式，觀察另一個namenode節點是否會從狀態standby變成active狀態

點選(此處)摺疊或開啟

[root@sht-sgmhadoopnn-01 ~]# hdfs haadmin -getServiceState nn1
active
[root@sht-sgmhadoopnn-01 ~]# hdfs haadmin -getServiceState nn2
standby
[root@sht-sgmhadoopnn-01 ~]# jps
10327 ResourceManager
10162 DFSZKFailoverController
9821 NameNode
20064 Jps
[root@sht-sgmhadoopnn-01 ~]# kill -9 9821
[root@sht-sgmhadoopnn-01 ~]# jps
10327 ResourceManager
10162 DFSZKFailoverController
20121 Jps
[root@sht-sgmhadoopnn-01 ~]# hdfs haadmin -getServiceState nn1
16/02/28 22:01:37 INFO ipc.Client: Retrying connect to server: sht-sgmhadoopnn-01/172.16.101.55:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
Operation failed: Call From sht-sgmhadoopnn-01.telenav.cn/172.16.101.55 to sht-sgmhadoopnn-01:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
[root@sht-sgmhadoopnn-01 ~]# hdfs haadmin -getServiceState nn2
active
[root@sht-sgmhadoopnn-01 ~]#

##也可以透過觀察zkfc日誌或者來檢視是否切換成功

b.手動啟動sht-sgmhadoopnn-01的namenode

點選(此處)摺疊或開啟

[root@sht-sgmhadoopnn-01 ~]# cd /hadoop/hadoop-2.7.2/sbin
[root@sht-sgmhadoopnn-01 sbin]# jps
10327 ResourceManager
10162 DFSZKFailoverController
21640 Jps
[root@sht-sgmhadoopnn-01 sbin]# hadoop-daemon.sh start namenode
starting namenode, logging to /hadoop/hadoop-2.7.2/logs/hadoop-root-namenode-sht-sgmhadoopnn-01.telenav.cn.out
[root@sht-sgmhadoopnn-01 sbin]# jps
10327 ResourceManager
10162 DFSZKFailoverController
21784 Jps
21679 NameNode
[root@sht-sgmhadoopnn-01 sbin]# hdfs haadmin -getServiceState nn1
standby
[root@sht-sgmhadoopnn-01 sbin]#

#### sht-sgmhadoopnn-01 機器上namenode 程式已經起來,且狀態為standby

c. 再次切換

點選(此處)摺疊或開啟

[root@sht-sgmhadoopnn-01 sbin]# hdfs haadmin -transitionToStandby nn2
Automatic failover is enabled for NameNode at sht-sgmhadoopnn-02/172.16.101.56:8020
Refusing to manually manage HA state, since it may cause
a split-brain scenario or other incorrect state.
If you are very sure you know what you are doing, please
specify the --forcemanual flag.
[root@sht-sgmhadoopnn-01 sbin]# hdfs haadmin -transitionToStandby --forcemanual nn2
You have specified the --forcemanual flag. This flag is dangerous, as it can induce a split-brain scenario that WILL CORRUPT your HDFS namespace, possibly irrecoverably.
It is recommended not to use this flag, but instead to shut down the cluster and disable automatic failover if you prefer to manually manage your HA state.
You may abort safely by answering 'n' or hitting ^C now.
Are you sure you want to continue? (Y or N) Y
16/02/28 22:30:22 WARN ha.HAAdmin: Proceeding with manual HA state management even though
automatic failover is enabled for NameNode at sht-sgmhadoopnn-02/172.16.101.56:8020
[root@sht-sgmhadoopnn-01 sbin]# hdfs haadmin -getServiceState nn1
active
[root@sht-sgmhadoopnn-01 sbin]# hdfs haadmin -getServiceState nn2
standby
[root@sht-sgmhadoopnn-01 sbin]#

Hadoop2.7實戰v1.0之HDFS HA

一.檢視namenode是active還是standby

1.開啟網頁

2.檢視zkfc日誌

二.基本命令

三.實驗

1.測試HDFS的手工切換功能(失敗)

2.測試HDFS的自動切換功能(成功)

a.在active namenode機器上透過jps命令查詢到namenode的程式號，然後透過kill -9的方式殺掉程式，觀察另一個namenode節點是否會從狀態standby變成active狀態

b.手動啟動sht-sgmhadoopnn-01的namenode

c. 再次切換

相關文章