centos7 hadoop3.2.0分散式叢集搭建步驟
一、環境介紹
1.四臺CentOS7 Linux虛擬機器機器分佈情況:
192.168.0.1 安裝NameNode,ResourceManager和SecondaryNameNode
192.168.0.2 安裝NodeManager和DataNode
192.168.0.3 安裝NodeManager和DataNode
192.168.0.4 安裝NodeManager和DataNode
2.配置DNS(每個節點)
編輯配置檔案,新增主節點和從節點的對映關係。
#vi /etc/hosts
192.168.0.1 mdw2 hadoop01
192.168.0.2 mdw3 hadoop02
192.168.0.3 mdw4 hadoop03
192.168.0.4 mdw5 hadoop04
3. 關閉防火牆(每個節點)
# systemctl stop firewalld
#關閉開機自啟動
# systemctl disable firewalld
4. 配置免密碼登入
有關【配置免密碼登入方法】,請參考
https://www.cnblogs.com/shireenlee4testing/p/10366061.html
5. 配置Java環境(每個節點)
有關【配置java環境方法】,請參考
https://www.cnblogs.com/shireenlee4testing/p/10368961.html
二、搭建Hadoop完全分散式叢集
1. 下載Hadoop安裝包,解壓,配置Hadoop環境變數
# wget
#解壓到/opt目錄
# tar -zxvf hadoop-3.2.0.tar.gz
#連結/opt/hadoop-3.2.0到/opt/hadoop,方便後續配置
#ln -s hadoop-3.2.0 hadoop
#配置Hadoop環境變數和java環境變數
# vi /etc/profile
Hadoop
export HADOOP_HOME=/opt/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
#jdk
export JAVA_HOME=/opt/jdk
export PATH=$PATH:$JAVA_HOME/bin
2. 配置Hadoop環境指令碼檔案中的JAVA_HOME引數
#進入Hadoop安裝目錄下的etc/hadoop目錄
# cd /opt/hadoop/etc/hadoop
#分別在hadoop-env.sh、mapred-env.sh和yarn-env.sh 檔案中新增或修改如下引數:
# vi hadoop-env.sh
............................................................
............................................................
# The java implementation to use. By default, this environment
# variable is REQUIRED on ALL platforms except OS X!
export JAVA_HOME=/opt/jdk
# vi mapred-env.sh
............................................................
............................................................
# Specify the log4j settings for the JobHistoryServer
# Java property: hadoop.root.logger
#export HADOOP_JHS_LOGGER=INFO,RFA
export JAVA_HOME=/opt/jdk
# vi mapred-env.sh
............................................................
............................................................
# Specify the log4j settings for the JobHistoryServer
# Java property: hadoop.root.logger
#export HADOOP_JHS_LOGGER=INFO,RFA
export JAVA_HOME=/opt/jdk
# vi yarn-env.sh
............................................................
............................................................
# YARN Services parameters
###
# Directory containing service examples
# export YARN_SERVICE_EXAMPLES_DIR = $HADOOP_YARN_HOME/share/hadoop/yarn/yarn-service-examples
# export YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE=true
export JAVA_HOME=/opt/jdk
#驗證Hadoop配置是否生效
# hadoop version
Hadoop 3.2.0
Source code repository -r e97acb3bd8f3befd27418996fa5d4b50bf2e17bf
Compiled by sunilg on 2019-01-08T06:08Z
Compiled with protoc 2.5.0
From source with checksum d3f0795ed0d9dc378e2c785d3668f39
This command was run using /opt/hadoop-3.2.0/share/hadoop/common/hadoop-common-3.2.0.jar
3. 修改Hadoop配置檔案
Hadoop安裝目錄下的etc/hadoop目錄中,需修改core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml、workers檔案,根據實際情況修改配置資訊。
# cat /opt/hadoop/etc/hadoop/core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<!-- 配置hdfs地址 -->
<name>fs.defaultFS</name>
<value>hdfs://hadoop01:9000</value>
</property>
<property>
<!-- 儲存臨時檔案目錄,需先在/opt/hadoop下建立tmp目錄 -->
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
</property>
</configuration>
# cat /opt/hadoop/etc/hadoop/hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<!-- 主節點地址 -->
<name>dfs.namenode.http-address</name>
<value>hadoop01:50070</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/opt/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/opt/hadoop/dfs/data</value>
</property>
<property>
<!-- 備份數為預設值3 -->
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
<description>配置為false後,可以允許不要檢查許可權就生成dfs上的檔案,方便倒是方便了,但是你需要防止誤刪除.</description>
</property>
</configuration>
# cat /opt/hadoop/etc/hadoop/mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value> #設定MapReduce的執行平臺為yarn
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop01:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop01:19888</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>
</configuration>
# cat /opt/hadoop/etc/hadoop/yarn-site.xml
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoop01:8088</value>
<description>配置外網只需要替換外網ip為真實ip,否則預設為 localhost:8088</description>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>2048</value>
<description>每個節點可用記憶體,單位MB,預設8182MB</description>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
<description>忽略虛擬記憶體的檢查,如果你是安裝在虛擬機器上,這個配置很有用,配上去之後後續操作不容易出問題。</description>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop01:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop01:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop01:8031</value>
</property>
</configuration>
# cat /opt/hadoop/etc/hadoop/workers
hadoop02
hadoop03
hadoop04
4. 配置啟動指令碼,新增HDFS和Yarn許可權
新增HDFS許可權:編輯如下指令碼,在第二行空白位置新增HDFS許可權
# vi /opt/hadoop/sbin/start-dfs.sh
HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=root
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
# vi /opt/hadoop/sbin/stop-dfs.sh
HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=root
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
新增Yarn許可權:編輯如下指令碼,在第二行空白位置新增Yarn許可權
# vi /opt/hadoop/sbin/start-yarn.sh
YARN_RESOURCEMANAGER_USER=root
HDFS_DATANODE_SECURE_USER=root
YARN_NODEMANAGER_USER=root
# vi /opt/hadoop/sbin/stop-yarn.sh
YARN_RESOURCEMANAGER_USER=root
HDFS_DATANODE_SECURE_USER=root
YARN_NODEMANAGER_USER=root
注意:若不新增上述許可權,則會報錯:缺少使用者許可權定義所致。
5. 將配置好的資料夾複製到其他從節點
# scp -r /opt/hadoop-3.2.0 root@hadoop02:/opt/
# scp -r /opt/hadoop-3.2.0 root@hadoop03:/opt/
# scp -r /opt/hadoop-3.2.0 root@hadoop04:/opt/
# scp -r /opt/hadoop root@hadoop02:/opt/
# scp -r /opt/hadoop root@hadoop03:/opt/
# scp -r /opt/hadoop root@hadoop04:/opt/
6. 初始化 & 啟動
#格式化
[root@hadoop01 hadoop-3.2.0]# /opt/hadoop/bin/hdfs namenode -format
#啟動
[root@hadoop01 hadoop-3.2.0]# /opt/hadoop/sbin/start-all.sh
Starting namenodes on [hadoop01]
上一次登入:一 10月 12 16:22:06 CST 2020pts/1 上
Starting datanodes
上一次登入:一 10月 12 16:22:32 CST 2020pts/1 上
Starting secondary namenodes [mdw2]
上一次登入:一 10月 12 16:22:34 CST 2020pts/1 上
Starting resourcemanager
上一次登入:一 10月 12 16:22:40 CST 2020pts/1 上
Starting nodemanagers
上一次登入:一 10月 12 16:22:47 CST 2020pts/1 上
7. 驗證Hadoop啟動成功
#主節點
[root@mdw2 ~]# jps
5089 NameNode
5625 ResourceManager
99770 Jps
5372 SecondaryNameNode
#從節點
# jps
56978 NodeManager
80172 Jps
56862 DataNode
檢視Hadoop叢集狀態
[root@mdw2 ~]# hadoop dfsadmin -report
WARNING: Use of this script to execute dfsadmin is deprecated.
WARNING: Attempting to execute replacement "hdfs dfsadmin" instead.
Configured Capacity: 160982630400 (149.93 GB)
Present Capacity: 131017445376 (122.02 GB)
DFS Remaining: 131017408512 (122.02 GB)
DFS Used: 36864 (36 KB)
DFS Used%: 0.00%
Replicated Blocks:
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Low redundancy blocks with highest priority to recover: 0
Pending deletion blocks: 0
Erasure Coded Block Groups:
Low redundancy block groups: 0
Block groups with corrupt internal blocks: 0
Missing block groups: 0
Low redundancy blocks with highest priority to recover: 0
Pending deletion blocks: 0
-------------------------------------------------
Live datanodes (3):
Name: 192.168.0.2:9866 (mdw3)
Hostname: mdw3
Decommission Status : Normal
Configured Capacity: 53660876800 (49.98 GB)
DFS Used: 12288 (12 KB)
Non DFS Used: 10945437696 (10.19 GB)
DFS Remaining: 42715426816 (39.78 GB)
DFS Used%: 0.00%
DFS Remaining%: 79.60%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Wed Oct 14 13:55:20 CST 2020
Last Block Report: Wed Oct 14 11:53:54 CST 2020
Num of Blocks: 0
Name: 192.168.0.3:9866 (mdw4)
Hostname: mdw4
Decommission Status : Normal
Configured Capacity: 53660876800 (49.98 GB)
DFS Used: 12288 (12 KB)
Non DFS Used: 10945388544 (10.19 GB)
DFS Remaining: 42715475968 (39.78 GB)
DFS Used%: 0.00%
DFS Remaining%: 79.60%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Wed Oct 14 13:55:21 CST 2020
Last Block Report: Wed Oct 14 12:57:21 CST 2020
Num of Blocks: 0
Name: 192.168.0.4:9866 (mdw5)
Hostname: mdw5
Decommission Status : Normal
Configured Capacity: 53660876800 (49.98 GB)
DFS Used: 12288 (12 KB)
Non DFS Used: 8074358784 (7.52 GB)
DFS Remaining: 45586505728 (42.46 GB)
DFS Used%: 0.00%
DFS Remaining%: 84.95%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Wed Oct 14 13:55:20 CST 2020
Last Block Report: Wed Oct 14 12:17:55 CST 2020
Num of Blocks: 0
單獨啟動resourcemanager:
[root@mdw2 hadoop]# yarn-daemon.sh start resourcemanager
WARNING: Use of this script to start YARN daemons is deprecated.
WARNING: Attempting to execute replacement "yarn --daemon start" instead.
[root@mdw2 hadoop]# jps
35411 NameNode
35691 SecondaryNameNode
38558 Jps
38319 ResourceManager
8. Web埠訪問
http://192.168.0.1:50070/
http://192.168.0.1:8088/
從節點NodeManager程式啟動不起來的解決方案:
NodeManager程式報錯資訊如下:
INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
原因:Hadoop叢集yarn-site.xml配置錯誤所致:
預設情況下yarn ResourceManager 相關服務IP地址指向的是0.0.0.0。
而在伺服器中,0.0.0.0指的是本機網路地址,那麼NodeManager就會在本機找ResourceManager相關服務,而slave節點上並沒有這些服務,這些服務在ResourceManager Master主節點上。
所以針對Hadoop叢集配置yare-site.xml某些配置項不能使用預設配置。
解決方法:
修改hadoop叢集所有節點上yarn-site.xml配置檔案,在該檔案中配置ResourceManager Master主節點所在地址即可解決問題。詳細配置資訊如下:
# vi /opt/hadoop/etc/hadoop/yarn-site.xml,在<configuration>和</configuration>之間加入如下配置:
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop01:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop01:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop01:8031</value>
</property>
NodeManager程式正常啟動後的日誌如下:
2020-10-13 14:15:53,762 INFO org.eclipse.jetty.server.handler.ContextHandler: Started o.e.j.s.ServletContextHandler@14b030a0{/static,jar:file:/opt/hadoop-3.2.0/share/hadoop/yarn/hadoop-yarn-common-3.2.0.jar!/webapps/static,AVAILABLE}
2020-10-13 14:15:55,165 INFO org.eclipse.jetty.server.handler.ContextHandler: Started o.e.j.w.WebAppContext@2b5183ec{/,file:///tmp/jetty-0.0.0.0-8042-node-_-any-5774776794028847658.dir/webapp/,AVAILABLE}{/node}
2020-10-13 14:15:55,186 INFO org.eclipse.jetty.server.AbstractConnector: Started ServerConnector@5eb2172{HTTP/1.1,[http/1.1]}{0.0.0.0:8042}
2020-10-13 14:15:55,186 INFO org.eclipse.jetty.server.Server: Started @5011ms
2020-10-13 14:15:55,186 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app node started at 8042
2020-10-13 14:15:55,210 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Node ID assigned is : mdw3:24558
2020-10-13 14:15:55,218 INFO org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at hadoop01/192.168.0.1:8031
2020-10-13 14:15:55,223 INFO org.apache.hadoop.util.JvmPauseMonitor: Starting JVM pause monitor
2020-10-13 14:15:55,323 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out 0 NM container statuses: []
2020-10-13 14:15:55,349 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registering with RM using containers :[]
2020-10-13 14:15:55,520 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: Rolling master-key for container-tokens, got key with id -955208939
2020-10-13 14:15:55,521 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM: Rolling master-key for container-tokens, got key with id -1467324462
2020-10-13 14:15:55,522 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as mdw3:24558 with total resource of <memory:8192, vCores:8>
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/15498/viewspace-2726822/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- greenplum 6.9 for centos7叢集搭建步驟CentOS
- Centos7搭建hadoop3.3.4分散式叢集CentOSHadoop分散式
- HA分散式叢集搭建分散式
- HDFS分散式叢集搭建分散式
- hadoop分散式叢集搭建Hadoop分散式
- Hadoop分散式叢集搭建_1Hadoop分散式
- Ubuntu上搭建Hadoop叢集環境的步驟UbuntuHadoop
- Cassandra安裝及分散式叢集搭建分散式
- centos7搭建redis叢集CentOSRedis
- CentOS7 搭建 Redis 叢集CentOSRedis
- centos7搭建dolphinscheduler叢集CentOS
- 使用Docker Swarm搭建分散式爬蟲叢集DockerSwarm分散式爬蟲
- hadoop叢集搭建——單節點(偽分散式)Hadoop分散式
- 分散式協調服務☞zookeeper叢集搭建分散式
- Redis 超詳細的手動搭建Cluster叢集步驟Redis
- 在centos7上用PXD方式部署PolarDB-X叢集步驟CentOS
- ElasticSearch 分散式叢集Elasticsearch分散式
- 搭建分散式 Redis Cluster 叢集與 Redis 入門分散式Redis
- Hadoop框架:叢集模式下分散式環境搭建Hadoop框架模式分散式
- kafka系列二:多節點分散式叢集搭建Kafka分散式
- Linux(Centos7)Redis叢集的搭建LinuxCentOSRedis
- Centos7 ELK7.6.2叢集搭建CentOS
- elasticsearch(三)---分散式叢集Elasticsearch分散式
- golang分散式與叢集Golang分散式
- hadoop叢集搭建,CentOS7克隆HadoopCentOS
- Centos7下GlusterFS分散式儲存叢集環境部署記錄CentOS分散式
- 叢集和分散式區別分散式
- Hadoop完全分散式叢集配置Hadoop分散式
- LNMP 分散式叢集(四):Memcached 快取伺服器的搭建LNMP分散式快取伺服器
- CentOS 7上搭建Spark 3.0.1 + Hadoop 3.2.1分散式叢集CentOSSparkHadoop分散式
- CentOS7 上搭建多節點 Elasticsearch叢集CentOSElasticsearch
- CentOS7搭建Hadoop-3.3.0叢集手記CentOSHadoop
- Mongodb分散式叢集副本集+分片MongoDB分散式
- GBase 8a 叢集修改 IP 地址操作步驟
- 【ASK_ORACLE】Relink RAC叢集詳細步驟Oracle
- 一鍵在本地搭建執行Istio 1.0的分散式Kubernetes叢集分散式
- 百度架構師是怎樣搭建MySQL分散式叢集架構MySql分散式
- Citus 分散式 PostgreSQL 叢集 - SQL Reference(查詢分散式表 SQL)分散式SQL