動態刪除DataNode(含NodeManager)節點(修改dfs.replication)【終極版】

1.ActiveNameNode修改hdfs-site.xml檔案

點選(此處)摺疊或開啟

[root@sht-sgmhadoopnn-01 hadoop]# vi hdfs-site.xml
<property>
<name>dfs.hosts</name>
<value>/hadoop/hadoop-2.7.2/etc/hadoop/include_datanode</value>
</property>
<property>
<name>dfs.hosts.exclude</name>
<value>/hadoop/hadoop-2.7.2/etc/hadoop/exclude_datanode</value>
</property>

###StandbyNameNode節點可以不同步,也可以同步(我採取同步)

點選(此處)摺疊或開啟

[root@sht-sgmhadoopnn-01 hadoop]# scp hdfs-site.xml root@sht-sgmhadoopnn-02:/hadoop/hadoop-2.7.2/etc/hadoop/
hdfs-site.xml 100% 4711 4.6KB/s 00:00

2.建立include_datanode和exclude_datanode檔案

點選(此處)摺疊或開啟

[root@sht-sgmhadoopnn-01 hadoop]# vi /hadoop/hadoop-2.7.2/etc/hadoop/include_datanode
sht-sgmhadoopdn-01
sht-sgmhadoopdn-02
sht-sgmhadoopdn-03
sht-sgmhadoopdn-04

#在檔案中羅列出能夠訪問namenode的所有datanode節點

點選(此處)摺疊或開啟

[root@sht-sgmhadoopnn-01 hadoop]# vi /hadoop/hadoop-2.7.2/etc/hadoop/exclude_datanode
sht-sgmhadoopdn-04

#在檔案中羅列出拒絕訪問namenode的所有datanode節點

###StandbyNameNode節點可以不同步,也可以同步(我採取同步)

點選(此處)摺疊或開啟

[root@sht-sgmhadoopnn-01 hadoop]# scp include_datanode exclude_datanode root@sht-sgmhadoopnn-02:/hadoop/hadoop-2.7.2/etc/hadoop/

3.檢視當前備份係數

在我的測試環境中,目前節點為4臺,備份係數為4,將備份係數從4降低到3

點選(此處)摺疊或開啟

[root@sht-sgmhadoopnn-01 hadoop]# more hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>4</value>
</property>
[root@sht-sgmhadoopnn-01 hadoop]# hdfs fsck /
16/03/06 21:49:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Connecting to namenode via http://sht-sgmhadoopnn-01:50070/fsck?ugi=root&path=%2F
FSCK started by root (auth:SIMPLE) from /172.16.101.55 for path / at Sun Mar 06 21:49:12 CST 2016
...............Status: HEALTHY
Total size: 580152025 B
Total dirs: 17
Total files: 15
Total symlinks: 0
Total blocks (validated): 14 (avg. block size 41439430 B)
Minimally replicated blocks: 14 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 4.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 4
Number of racks: 1
FSCK ended at Sun Mar 06 21:49:12 CST 2016 in 8 milliseconds

###引數Default replication factor為3,而hdfs-site.xml檔案中dfs.replication值為4,說明設定了,然而叢集沒有重啟生效。

故在本次實驗中只需修改hdfs-site.xml檔案而不需要重啟叢集,和 修改引數Average block replication值從4到3（hdfs dfs -setrep -w 3 -R /）。

4.修改引數

點選(此處)摺疊或開啟

[root@sht-sgmhadoopnn-01 hadoop]# more hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
[root@sht-sgmhadoopnn-01 hadoop]# scp hdfs-site.xml root@sht-sgmhadoopnn-02:/hadoop/hadoop-2.7.2/etc/hadoop/
[root@sht-sgmhadoopnn-01 hadoop]# hdfs dfs -setrep -w 3 -R /

###檔案系統假如灰常大,建議在業務峰谷時操作這條命令,因為耗時。

遇到的疑問：

在進行檔案備份係數的降低時，能夠很快的進行Replication set，但是在Waiting for的過程中卻很長時間沒有完成。

最終只能手動Ctrl+C中斷，個人猜測在這個過程中HDFS正檢視對資料檔案進行操作，在刪除一個副本容量的資料。

因此，我們應該對dfs.replication的數值做出很好的規劃，儘量避免需要降低該數值的情況出現。

###步驟4導致datanode1節點資料塊刪除

5.再次hdfs fsck /

點選(此處)摺疊或開啟

[root@sht-sgmhadoopnn-01 hadoop]# hdfs fsck /
16/03/06 22:45:45 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Connecting to namenode via http://sht-sgmhadoopnn-01:50070/fsck?ugi=root&path=%2F
FSCK started by root (auth:SIMPLE) from /172.16.101.55 for path / at Sun Mar 06 22:45:47 CST 2016
................Status: HEALTHY
Total size: 580152087 B
Total dirs: 17
Total files: 16
Total symlinks: 0
Total blocks (validated): 15 (avg. block size 38676805 B)
Minimally replicated blocks: 15 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 4
Number of racks: 1
FSCK ended at Sun Mar 06 22:45:47 CST 2016 in 7 milliseconds
The filesystem under path '/' is HEALTHY
You have mail in /var/spool/mail/root
[root@sht-sgmhadoopnn-01 hadoop]#

### Average block replication值為3.0

6.第一次動態重新整理配置hdfs dfsadmin -refreshNodes

點選(此處)摺疊或開啟

[root@sht-sgmhadoopnn-01 hadoop]# hdfs dfsadmin -refreshNodes
Refresh nodes successful for sht-sgmhadoopnn-01/172.16.101.55:8020
Refresh nodes successful for sht-sgmhadoopnn-02/172.16.101.56:8020

7.透過過hdfs dfsadmin -report或

###剛開始狀態為Decommission In Progress,會平衡資料的(datanode1當前資料量used:138.88kb,blocks:0, 會被複制平衡資料塊)

###過一會狀態為Decommissioned

需要注意的是：

在刪除節點時一定要停止所有Hadoop的Job，否則程式還會向要刪除的節點同步資料，這樣也會導致Decommissioned的過程一直無法完成。

8.當狀態為Decommissioned後,執行命令hadoop-daemon.sh stop datanode或者直接kill -9 datanode程式

點選(此處)摺疊或開啟

[root@sht-sgmhadoopdn-04 sbin]# jps
14508 DataNode
11025 Jps
15517 NodeManager
[root@sht-sgmhadoopdn-04 sbin]# ./hadoop-daemon.sh stop datanode
stopping datanode
[root@sht-sgmhadoopdn-04 sbin]# jps
11056 Jps
15517 NodeManager
[root@sht-sgmhadoopdn-04 sbin]#

9.由於Hadoop 2.X引入了YARN框架，所以對於每個計算節點都可以透過NodeManager進行管理，同理啟動NodeManager程式後，即可將其加入叢集。在新增節點，執行sbin/yarn-daemon.sh start nodemanager即可,反之手動執行命令sbin/yarn-daemon.sh stop nodemanager。在ResourceManager，透過yarn node -list檢視叢集情況。

點選(此處)摺疊或開啟

[root@sht-sgmhadoopdn-04 sbin]# ./yarn-daemon.sh stop nodemanager
stopping nodemanager
[root@sht-sgmhadoopdn-01 ~]# yarn node -list
16/03/06 23:39:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Total Nodes:4
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
sht-sgmhadoopdn-04.telenav.cn:54705 RUNNING sht-sgmhadoopdn-04.telenav.cn:23999 0
sht-sgmhadoopdn-03.telenav.cn:7573 RUNNING sht-sgmhadoopdn-03.telenav.cn:23999 0
sht-sgmhadoopdn-02.telenav.cn:38316 RUNNING sht-sgmhadoopdn-02.telenav.cn:23999 0
sht-sgmhadoopdn-01.telenav.cn:43903 RUNNING sht-sgmhadoopdn-01.telenav.cn:23999 0
[root@sht-sgmhadoopdn-04 sbin]# jps
11158 Jps
[root@sht-sgmhadoopdn-04 sbin]#

10.【註釋掉】要從叢集中刪除的datanode機器

點選(此處)摺疊或開啟

[root@sht-sgmhadoopnn-01 hadoop]# vi allow_datanode
sht-sgmhadoopdn-01
sht-sgmhadoopdn-02
sht-sgmhadoopdn-03
#sht-sgmhadoopdn-04
[root@sht-sgmhadoopnn-01 hadoop]# vi exclude_datanode
#sht-sgmhadoopdn-04
[root@sht-sgmhadoopnn-01 hadoop]# scp allow_datanode exclude_datanode root@sht-sgmhadoopnn-02:/hadoop/hadoop-2.7.2/etc/hadoop/

###這裡我並不需要也註釋掉slaves檔案，因為dfs.hosts級別要比slaves檔案要高些，當然註釋掉slaves檔案中sht-sgmhadoopdn-04機器也是無可厚非的！！！
###其實主要是之前的步驟,nn-01機器操作同步給nn-02了,所以在此步驟，也應當一致

疑問:怎樣清除Decommissioned datanode information?透過執行hdfs dfsadmin -refreshNodes命令還是重啟叢集？

11. 第二次動態重新整理配置hdfs dfsadmin –refreshNodes(正確做法,無需重啟叢集，適用於生產環境)

[root@sht-sgmhadoopnn-01 hadoop]# hdfs dfsadmin -refreshNodes

### Decommissioned datanode資訊清除乾淨！

12.透過重啟叢集測試(也是正確做法,需要重啟叢集，不適用於生產環境，需要註釋掉slaves檔案中不需要連線到namenode機器)

停止叢集

[root@sht-sgmhadoopnn-01 sbin]# stop-yarn.sh

[root@sht-sgmhadoopnn-02 sbin]# yarn-daemon.sh stop resourcemanager

[root@sht-sgmhadoopnn-01 sbin]# stop-dfs.sh

重啟叢集

[root@sht-sgmhadoopnn-01 sbin]# start-dfs.sh

[root@sht-sgmhadoopnn-01 sbin]# start-yarn.sh

[root@sht-sgmhadoopnn-02 sbin]# yarn-daemon.sh start resourcemanager

### Decommissioned datanode資訊清除乾淨！

13.執行yarn rmadmin -refreshNodes清除sht-sgmhadoopnn-04 nodemanager資訊

透過命令或者web檢視:

yarn node -list

[root@sht-sgmhadoopnn-01 bin]# yarn rmadmin –refreshNodes

###重新整理web

14.引數官網解釋

【dfs.hosts】 :

Names a file that contains a list of hosts that are permitted to connect to the namenode.

The full pathname of the file must be specified. If the value is empty, all hosts are permitted.

【dfs.hosts.exclude】 :

Names a file that contains a list of hosts that are not permitted to connect to the namenode.

The full pathname of the file must be specified. If the value is empty, no hosts are excluded.

[root@sht-sgmhadoopnn-01 hadoop]# hadoop dfsadmin -help

-refreshNodes: Updates the namenode with the set of datanodes allowed to connect to the namenode.

Namenode re-reads datanode hostnames from the file defined by

dfs.hosts, dfs.hosts.exclude configuration parameters.

Hosts defined in dfs.hosts are the datanodes that are part of

the cluster. If there are entries in dfs.hosts, only the hosts

in it are allowed to register with the namenode.

Entries in dfs.hosts.exclude are datanodes that need to be

decommissioned. Datanodes complete decommissioning when

all the replicas from them are replicated to other datanodes.

Decommissioned nodes are not automatically shutdown and

are not chosen for writing new replicas.

Hadoop2.7實戰v1.0之動態刪除DataNode(含NodeManager)節點(修改dfs.replication)

1.ActiveNameNode修改hdfs-site.xml檔案

2.建立include_datanode和exclude_datanode檔案

3.檢視當前備份係數

4.修改引數

5.再次hdfs fsck /

6.第一次動態重新整理配置hdfs dfsadmin -refreshNodes

7.透過過hdfs dfsadmin -report或

8.當狀態為Decommissioned後,執行命令hadoop-daemon.sh stop datanode或者直接kill -9 datanode程式

10.【註釋掉】要從叢集中刪除的datanode機器

11. 第二次動態重新整理配置hdfs dfsadmin –refreshNodes(正確做法,無需重啟叢集，適用於生產環境)

12.透過重啟叢集測試(也是正確做法,需要重啟叢集，不適用於生產環境，需要註釋掉slaves檔案中不需要連線到namenode機器)

13.執行yarn rmadmin -refreshNodes清除sht-sgmhadoopnn-04 nodemanager資訊

14.引數官網解釋

相關文章