Hadoop2.7實戰v1.0之start-balancer.sh與hdfs balancer資料均衡
適用場景:
a.當動態新增或者刪除叢集的資料節點,必然會使各節點的資料不均衡
b.當正常維護時
1.對hdfs負載設定均衡,因為預設的資料傳輸頻寬比較低,可以設定為64M,
即hdfs dfsadmin -setBalancerBandwidth 67108864即可
即hdfs dfsadmin -setBalancerBandwidth 67108864即可
點選(此處)摺疊或開啟
-
[root@sht-sgmhadoopnn-01 ~]# cd /hadoop/hadoop-2.7.2/bin
-
[root@sht-sgmhadoopdn-01 bin]# ./hdfs dfsadmin -setBalancerBandwidth 67108864
-
Balancer bandwidth is set to 67108864 for sht-sgmhadoopnn-01/172.16.101.55:8020
- Balancer bandwidth is set to 67108864 for sht-sgmhadoopnn-02/172.16.101.56:8020
2.預設balancer的threshold為10%,即各個節點儲存使用率偏差不超過10%,我們可將其設定為5%;然後啟動Balancer,sbin/start-balancer.sh -threshold 5,等待叢集自均衡完成即可
點選(此處)摺疊或開啟
-
[root@sht-sgmhadoopdn-01 bin]# cd ../sbin
-
starting balancer, logging to /hadoop/hadoop-2.7.2/logs/hadoop-root-balancer-sht-sgmhadoopnn-01.telenav.cn.out
-
[root@sht-sgmhadoopnn-01 sbin]# ./start-balancer.sh -threshold 5
- starting balancer, logging to /hadoop/hadoop-2.7.2/logs/hadoop-root-balancer-sht-sgmhadoopnn-01.telenav.cn.out
###執行這個命令start-balancer.sh -threshold 5和使用hdfs balancer -threshold 5是一樣的
#### Usage: hdfs balancer
點選(此處)摺疊或開啟
-
[root@sht-sgmhadoopnn-01 bin]# ./hdfs balancer -threshold 5
-
16/03/05 18:57:33 INFO balancer.Balancer: Using a threshold of 1.0
-
16/03/05 18:57:33 INFO balancer.Balancer: namenodes = [hdfs://mycluster]
-
16/03/05 18:57:33 INFO balancer.Balancer: parameters = Balancer.Parameters[BalancingPolicy.Node, threshold=1.0, max idle iteration = 5, number of nodes to be excluded = 0, number of nodes to be included = 0]
-
Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved
-
16/03/05 18:57:34 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
-
16/03/05 18:57:35 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.58:50010
-
16/03/05 18:57:35 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.60:50010
-
16/03/05 18:57:35 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.66:50010
-
16/03/05 18:57:35 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.59:50010
-
16/03/05 18:57:35 INFO balancer.Balancer: 0 over-utilized: []
-
16/03/05 18:57:35 INFO balancer.Balancer: 0 underutilized: []
-
The cluster is balanced. Exiting...
-
Mar 5, 2016 6:57:35 PM 0 0 B 0 B -1 B
- Mar 5, 2016 6:57:35 PM Balancing took 2.66 seconds
- 1).為什麼我執行該命令hdfs balancer -threshold 5平衡資料命令沒有反應呢?
5表示5% ,
群總儲存使用率: 1.74%
sht-sgmhadoopdn-01: 1.74%
sht-sgmhadoopdn-02: 1.74%
sht-sgmhadoopdn-03: 1.74%
sht-sgmhadoopdn-04: 0%
執行-threshold 5, 表示每一個 datanode 儲存使用率和叢集總儲存使用率的差值都應該小於這個閥值5%;
假如超過5%,會執行資料平衡操作。
B. 2).怎樣判斷執行命令是否會生效,資料平衡操作?
if (群總儲存使用率 — 每一臺datanode 儲存使用率) > -threshold 5
#會執行資料平衡
else
#該命令不生效
end if
C. 3).the threshold range of [1.0, 100.0],所以最小隻能設定 -threshold 1
D. 4).balance命令可以執行在namenode或者datanode節點上,最好在新增的或者空閒的資料節點上執行
3. 執行命令hdfs balancer -threshold 1
點選(此處)摺疊或開啟
-
[root@sht-sgmhadoopnn-01 hadoop]# hdfs balancer -threshold 1
-
……………..
-
……………..
-
16/03/08 16:08:09 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.59:50010
-
16/03/08 16:08:09 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.58:50010
-
16/03/08 16:08:09 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.66:50010
-
16/03/08 16:08:09 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.60:50010
-
16/03/08 16:08:09 INFO balancer.Balancer: 0 over-utilized: []
-
16/03/08 16:08:09 INFO balancer.Balancer: 0 underutilized: []
-
The cluster is balanced. Exiting...
-
Mar 8, 2016 4:08:09 PM 1 382.22 MB 0 B -1 B
- Mar 8, 2016 4:08:09 PM Balancing took 6.7001 minutes
###新增資料節點的411.7M,偏差小於1%。
原始碼解析:
點選(此處)摺疊或開啟
-
[root@sht-sgmhadoopnn-01 sbin]# more start-balancer.sh
-
#!/usr/bin/env bash
-
-
bin=`dirname "${BASH_SOURCE-$0}"`
-
bin=`cd "$bin"; pwd`
-
-
DEFAULT_LIBEXEC_DIR="$bin"/../libexec
-
HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}
-
. $HADOOP_LIBEXEC_DIR/hdfs-config.sh
-
-
# Start balancer daemon.
-
-
"$HADOOP_PREFIX"/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script "$bin"/hdfs start balancer $@
-
解析:start-balancer.sh指令碼其實最終還是呼叫hdfs start balancer $@ 命令,對於 $@ 是指shell指令碼執行的傳遞的引數列表,一般引數為-threshold 5
-
- [root@sht-sgmhadoopnn-01 sbin]# more stop-balancer.sh
-
#!/usr/bin/env bash
-
-
bin=`dirname "${BASH_SOURCE-$0}"`
-
bin=`cd "$bin"; pwd`
-
-
DEFAULT_LIBEXEC_DIR="$bin"/../libexec
-
HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}
-
. $HADOOP_LIBEXEC_DIR/hdfs-config.sh
-
-
# Stop balancer daemon.
-
# Run this on the machine where the balancer is running
-
-
"$HADOOP_PREFIX"/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script "$bin"/hdfs stop balancer
- 解析:stop-balancer.sh指令碼其實最終還是呼叫hdfs stop balancer命令
- [root@sht-sgmhadoopnn-01 sbin]#
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/30089851/viewspace-2052138/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- Hadoop2.7實戰v1.0之HDFS HAHadoop
- Hadoop2.7實戰v1.0之YARN HAHadoopYarn
- Hadoop2.7實戰v1.0之Flume1.6.0搭建(Http Source-->Memory Chanel --> Hdfs Sink)HadoopHTTP
- Hadoop2.7實戰v1.0之JVM引數調優HadoopJVM
- Hadoop2.7實戰v1.0之Linux引數調優HadoopLinux
- Hadoop2.7實戰v1.0之HBase1.1.5 HA分散式搭建Hadoop分散式
- Hadoop2.7實戰v1.0之Hive-2.0.0+MySQL本地模式安裝HadoopHiveMySql模式
- Hadoop2.7實戰v1.0之Hive-2.0.0+MySQL遠端模式安裝HadoopHiveMySql模式
- Mongodb原始碼分析之balancer(均衡)KBMongoDB原始碼
- 漫談Hadoop HDFS BalancerHadoop
- 關於Hadoop HDFS資料均衡。Hadoop
- Hadoop2.7實戰v1.0之Eclipse+Hive2.0.0的JDBC案例(最詳細搭建)HadoopEclipseHiveJDBC
- Hadoop2.7實戰v1.0之動態刪除DataNode(含NodeManager)節點(修改dfs.replication)Hadoop
- Hadoop2.7實戰v1.0之動態新增、刪除DataNode節點及複製策略導向Hadoop
- Hadoop2.7實戰v1.0之新增DataNode節點後,更改檔案複製策略dfs.replicationHadoop
- Hadoop大資料實戰系列文章之HDFS檔案系統Hadoop大資料
- 小白學習大資料測試之hadoop hdfs和MapReduce小實戰大資料Hadoop
- Hadoop2.7實戰v1.0之Hive-2.0.0的Hiveserver2服務和beeline遠端除錯HadoopHiveServer除錯
- Hadoop系列之HDFS 資料塊Hadoop
- .Net微服務實戰之負載均衡(上)微服務負載
- .Net微服務實戰之負載均衡(下)微服務負載
- 大資料專案實踐(一)——之HDFS叢集配置大資料
- 分散式檔案系統HDFS,大資料儲存實戰(一)分散式大資料
- 淺談hdfs架構與資料流架構
- 深入理解GlusterFS之資料均衡
- 原始碼|HDFS之DataNode:寫資料塊(2)原始碼
- 原始碼|HDFS之DataNode:寫資料塊(3)原始碼
- 原始碼|HDFS之DataNode:寫資料塊(1)原始碼
- HDFS資料平衡
- Sqoop解決關係型資料庫與HDFS之間進行資料轉換OOP資料庫
- Golang負載均衡器Balancer的原始碼解讀Golang負載原始碼
- MaxCompute實戰之資料儲存
- C++實戰之資料抽象 (轉)C++抽象
- JavaScript逆向之七麥資料實戰JavaScript
- 通過Sqoop實現Mysql / Oracle 與HDFS / Hbase互導資料OOPMySqlOracle
- Hadoop之 Balancer平衡速度Hadoop
- 資料庫國產化實戰之達夢資料庫資料庫
- Kafka實時流資料經Storm至HdfsKafkaORM