centos7 hadoop3.2.0分散式叢集搭建步驟

chenfeng發表於2020-10-14

一、環境介紹

1.四臺CentOS7 Linux虛擬機器機器分佈情況:

192.168.0.1  安裝NameNode,ResourceManager和SecondaryNameNode

192.168.0.2  安裝NodeManager和DataNode

192.168.0.3  安裝NodeManager和DataNode

192.168.0.4  安裝NodeManager和DataNode


2.配置DNS(每個節點)

 編輯配置檔案,新增主節點和從節點的對映關係。

#vi /etc/hosts

192.168.0.1   mdw2  hadoop01

192.168.0.2   mdw3  hadoop02

192.168.0.3   mdw4  hadoop03

192.168.0.4   mdw5  hadoop04


3. 關閉防火牆(每個節點)

# systemctl stop firewalld

#關閉開機自啟動

# systemctl disable firewalld


4. 配置免密碼登入

有關【配置免密碼登入方法】,請參考

https://www.cnblogs.com/shireenlee4testing/p/10366061.html


5. 配置Java環境(每個節點)

有關【配置java環境方法】,請參考

https://www.cnblogs.com/shireenlee4testing/p/10368961.html



二、搭建Hadoop完全分散式叢集

1. 下載Hadoop安裝包,解壓,配置Hadoop環境變數

# wget


#解壓到/opt目錄

# tar -zxvf hadoop-3.2.0.tar.gz

#連結/opt/hadoop-3.2.0到/opt/hadoop,方便後續配置

 #ln -s hadoop-3.2.0 hadoop


#配置Hadoop環境變數和java環境變數

# vi /etc/profile

Hadoop

export HADOOP_HOME=/opt/hadoop 

export PATH=$PATH:$HADOOP_HOME/bin

export PATH=$PATH:$HADOOP_HOME/sbin export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop


#jdk

export JAVA_HOME=/opt/jdk

export PATH=$PATH:$JAVA_HOME/bin


 2. 配置Hadoop環境指令碼檔案中的JAVA_HOME引數

#進入Hadoop安裝目錄下的etc/hadoop目錄

# cd /opt/hadoop/etc/hadoop


#分別在hadoop-env.sh、mapred-env.sh和yarn-env.sh 檔案中新增或修改如下引數:

# vi hadoop-env.sh 

............................................................

............................................................

# The java implementation to use. By default, this environment

# variable is REQUIRED on ALL platforms except OS X!

 export JAVA_HOME=/opt/jdk

 

# vi mapred-env.sh

............................................................

............................................................

 # Specify the log4j settings for the JobHistoryServer

# Java property: hadoop.root.logger

#export HADOOP_JHS_LOGGER=INFO,RFA


export JAVA_HOME=/opt/jdk


# vi mapred-env.sh

............................................................

............................................................

# Specify the log4j settings for the JobHistoryServer

# Java property: hadoop.root.logger

#export HADOOP_JHS_LOGGER=INFO,RFA


export JAVA_HOME=/opt/jdk


# vi yarn-env.sh

............................................................

............................................................

# YARN Services parameters

###

# Directory containing service examples

# export YARN_SERVICE_EXAMPLES_DIR = $HADOOP_YARN_HOME/share/hadoop/yarn/yarn-service-examples

# export YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE=true


export JAVA_HOME=/opt/jdk



#驗證Hadoop配置是否生效

# hadoop version

Hadoop 3.2.0

Source code repository -r e97acb3bd8f3befd27418996fa5d4b50bf2e17bf

Compiled by sunilg on 2019-01-08T06:08Z

Compiled with protoc 2.5.0

From source with checksum d3f0795ed0d9dc378e2c785d3668f39

This command was run using /opt/hadoop-3.2.0/share/hadoop/common/hadoop-common-3.2.0.jar


3. 修改Hadoop配置檔案

Hadoop安裝目錄下的etc/hadoop目錄中,需修改core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml、workers檔案,根據實際情況修改配置資訊。

# cat /opt/hadoop/etc/hadoop/core-site.xml 

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

  Licensed under the Apache License, Version 2.0 (the "License");

  you may not use this file except in compliance with the License.

  You may obtain a copy of the License at


   


  Unless required by applicable law or agreed to in writing, software

  distributed under the License is distributed on an "AS IS" BASIS,

  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

  See the License for the specific language governing permissions and

  limitations under the License. See accompanying LICENSE file.

-->


<!-- Put site-specific property overrides in this file. -->




<configuration>

  <property>

      <!-- 配置hdfs地址 -->

      <name>fs.defaultFS</name>

      <value>hdfs://hadoop01:9000</value>

  </property>

  <property>

      <!-- 儲存臨時檔案目錄,需先在/opt/hadoop下建立tmp目錄 -->

      <name>hadoop.tmp.dir</name>

     <value>/home/hadoop/tmp</value>

 </property>

 </configuration>


# cat /opt/hadoop/etc/hadoop/hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

  Licensed under the Apache License, Version 2.0 (the "License");

  you may not use this file except in compliance with the License.

  You may obtain a copy of the License at


   


  Unless required by applicable law or agreed to in writing, software

  distributed under the License is distributed on an "AS IS" BASIS,

  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

  See the License for the specific language governing permissions and

  limitations under the License. See accompanying LICENSE file.

-->


<!-- Put site-specific property overrides in this file. -->


<configuration>

      <property>

         <!-- 主節點地址 -->

          <name>dfs.namenode.http-address</name>

          <value>hadoop01:50070</value>

      </property>

      <property>

          <name>dfs.namenode.name.dir</name>

          <value>file:/opt/hadoop/dfs/name</value>

     </property>

     <property>

         <name>dfs.datanode.data.dir</name>

         <value>file:/opt/hadoop/dfs/data</value>

     </property>

     <property>

        <!-- 備份數為預設值3 -->

        <name>dfs.replication</name>

         <value>3</value>

     </property>



    <property>

      <name>dfs.webhdfs.enabled</name>

      <value>true</value>

     </property>


     <property>

      <name>dfs.permissions</name>

      <value>false</value>

      <description>配置為false後,可以允許不要檢查許可權就生成dfs上的檔案,方便倒是方便了,但是你需要防止誤刪除.</description>

  </property>


</configuration>



# cat /opt/hadoop/etc/hadoop/mapred-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

  Licensed under the Apache License, Version 2.0 (the "License");

  you may not use this file except in compliance with the License.

  You may obtain a copy of the License at


   


  Unless required by applicable law or agreed to in writing, software

  distributed under the License is distributed on an "AS IS" BASIS,

  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

  See the License for the specific language governing permissions and

  limitations under the License. See accompanying LICENSE file.

-->


<!-- Put site-specific property overrides in this file. -->




<configuration>

      <property>

          <name>mapreduce.framework.name</name>

          <value>yarn</value> #設定MapReduce的執行平臺為yarn

      </property>

      <property>

          <name>mapreduce.jobhistory.address</name>

          <value>hadoop01:10020</value>

      </property>

     <property>

         <name>mapreduce.jobhistory.webapp.address</name>

         <value>hadoop01:19888</value>

     </property>

    <property>

        <name>mapreduce.application.classpath</name>

        <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>

     </property>

</configuration>


# cat /opt/hadoop/etc/hadoop/yarn-site.xml

<?xml version="1.0"?>

<!--

  Licensed under the Apache License, Version 2.0 (the "License");

  you may not use this file except in compliance with the License.

  You may obtain a copy of the License at


   


  Unless required by applicable law or agreed to in writing, software

  distributed under the License is distributed on an "AS IS" BASIS,

  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

  See the License for the specific language governing permissions and

  limitations under the License. See accompanying LICENSE file.

-->



<configuration>

<property>

<name>yarn.resourcemanager.webapp.address</name>

<value>hadoop01:8088</value>

<description>配置外網只需要替換外網ip為真實ip,否則預設為 localhost:8088</description>

</property>

<property>

<name>yarn.scheduler.maximum-allocation-mb</name>

<value>2048</value>

<description>每個節點可用記憶體,單位MB,預設8182MB</description>

</property>

<property>

<name>yarn.nodemanager.vmem-check-enabled</name>

<value>false</value>

<description>忽略虛擬記憶體的檢查,如果你是安裝在虛擬機器上,這個配置很有用,配上去之後後續操作不容易出問題。</description>

</property>


<property>

<name>yarn.nodemanager.env-whitelist</name>

<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>

</property>



<property>  

    <name>yarn.resourcemanager.address</name>  

    <value>hadoop01:8032</value>  

</property> 

<property>

    <name>yarn.resourcemanager.scheduler.address</name>  

    <value>hadoop01:8030</value>  

</property>

<property>

    <name>yarn.resourcemanager.resource-tracker.address</name>  

    <value>hadoop01:8031</value>  

</property>



</configuration>


# cat /opt/hadoop/etc/hadoop/workers 

hadoop02

hadoop03

hadoop04



4. 配置啟動指令碼,新增HDFS和Yarn許可權

新增HDFS許可權:編輯如下指令碼,在第二行空白位置新增HDFS許可權

# vi /opt/hadoop/sbin/start-dfs.sh 


HDFS_DATANODE_USER=root

HDFS_DATANODE_SECURE_USER=root

HDFS_NAMENODE_USER=root

HDFS_SECONDARYNAMENODE_USER=root


# vi /opt/hadoop/sbin/stop-dfs.sh


HDFS_DATANODE_USER=root

HDFS_DATANODE_SECURE_USER=root

HDFS_NAMENODE_USER=root

HDFS_SECONDARYNAMENODE_USER=root


新增Yarn許可權:編輯如下指令碼,在第二行空白位置新增Yarn許可權

# vi /opt/hadoop/sbin/start-yarn.sh 


YARN_RESOURCEMANAGER_USER=root

HDFS_DATANODE_SECURE_USER=root

YARN_NODEMANAGER_USER=root


# vi /opt/hadoop/sbin/stop-yarn.sh


YARN_RESOURCEMANAGER_USER=root

HDFS_DATANODE_SECURE_USER=root

YARN_NODEMANAGER_USER=root


注意:若不新增上述許可權,則會報錯:缺少使用者許可權定義所致。


5. 將配置好的資料夾複製到其他從節點


# scp -r /opt/hadoop-3.2.0 root@hadoop02:/opt/

# scp -r /opt/hadoop-3.2.0 root@hadoop03:/opt/

# scp -r /opt/hadoop-3.2.0 root@hadoop04:/opt/


# scp -r /opt/hadoop root@hadoop02:/opt/

# scp -r /opt/hadoop root@hadoop03:/opt/

# scp -r /opt/hadoop root@hadoop04:/opt/


6. 初始化 & 啟動


#格式化

[root@hadoop01 hadoop-3.2.0]# /opt/hadoop/bin/hdfs namenode -format


#啟動

[root@hadoop01 hadoop-3.2.0]# /opt/hadoop/sbin/start-all.sh

Starting namenodes on [hadoop01]

上一次登入:一 10月 12 16:22:06 CST 2020pts/1 上

Starting datanodes

上一次登入:一 10月 12 16:22:32 CST 2020pts/1 上

Starting secondary namenodes [mdw2]

上一次登入:一 10月 12 16:22:34 CST 2020pts/1 上

Starting resourcemanager

上一次登入:一 10月 12 16:22:40 CST 2020pts/1 上

Starting nodemanagers

上一次登入:一 10月 12 16:22:47 CST 2020pts/1 上


7. 驗證Hadoop啟動成功

#主節點

[root@mdw2 ~]# jps

5089 NameNode

5625 ResourceManager

99770 Jps

5372 SecondaryNameNode


#從節點

# jps

56978 NodeManager

80172 Jps

56862 DataNode


檢視Hadoop叢集狀態

[root@mdw2 ~]# hadoop dfsadmin -report

WARNING: Use of this script to execute dfsadmin is deprecated.

WARNING: Attempting to execute replacement "hdfs dfsadmin" instead.


Configured Capacity: 160982630400 (149.93 GB)

Present Capacity: 131017445376 (122.02 GB)

DFS Remaining: 131017408512 (122.02 GB)

DFS Used: 36864 (36 KB)

DFS Used%: 0.00%

Replicated Blocks:

        Under replicated blocks: 0

        Blocks with corrupt replicas: 0

        Missing blocks: 0

        Missing blocks (with replication factor 1): 0

        Low redundancy blocks with highest priority to recover: 0

        Pending deletion blocks: 0

Erasure Coded Block Groups: 

        Low redundancy block groups: 0

        Block groups with corrupt internal blocks: 0

        Missing block groups: 0

        Low redundancy blocks with highest priority to recover: 0

        Pending deletion blocks: 0


-------------------------------------------------

Live datanodes (3):


Name: 192.168.0.2:9866 (mdw3)

Hostname: mdw3

Decommission Status : Normal

Configured Capacity: 53660876800 (49.98 GB)

DFS Used: 12288 (12 KB)

Non DFS Used: 10945437696 (10.19 GB)

DFS Remaining: 42715426816 (39.78 GB)

DFS Used%: 0.00%

DFS Remaining%: 79.60%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 1

Last contact: Wed Oct 14 13:55:20 CST 2020

Last Block Report: Wed Oct 14 11:53:54 CST 2020

Num of Blocks: 0



Name: 192.168.0.3:9866 (mdw4)

Hostname: mdw4

Decommission Status : Normal

Configured Capacity: 53660876800 (49.98 GB)

DFS Used: 12288 (12 KB)

Non DFS Used: 10945388544 (10.19 GB)

DFS Remaining: 42715475968 (39.78 GB)

DFS Used%: 0.00%

DFS Remaining%: 79.60%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 1

Last contact: Wed Oct 14 13:55:21 CST 2020

Last Block Report: Wed Oct 14 12:57:21 CST 2020

Num of Blocks: 0



Name: 192.168.0.4:9866 (mdw5)

Hostname: mdw5

Decommission Status : Normal

Configured Capacity: 53660876800 (49.98 GB)

DFS Used: 12288 (12 KB)

Non DFS Used: 8074358784 (7.52 GB)

DFS Remaining: 45586505728 (42.46 GB)

DFS Used%: 0.00%

DFS Remaining%: 84.95%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 1

Last contact: Wed Oct 14 13:55:20 CST 2020

Last Block Report: Wed Oct 14 12:17:55 CST 2020

Num of Blocks: 0



單獨啟動resourcemanager:


[root@mdw2 hadoop]# yarn-daemon.sh start resourcemanager

WARNING: Use of this script to start YARN daemons is deprecated.

WARNING: Attempting to execute replacement "yarn --daemon start" instead.

[root@mdw2 hadoop]# jps

35411 NameNode

35691 SecondaryNameNode

38558 Jps

38319 ResourceManager


8. Web埠訪問

http://192.168.0.1:50070/

http://192.168.0.1:8088/



從節點NodeManager程式啟動不起來的解決方案:

NodeManager程式報錯資訊如下:

INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)


原因:Hadoop叢集yarn-site.xml配置錯誤所致:


預設情況下yarn ResourceManager 相關服務IP地址指向的是0.0.0.0。


而在伺服器中,0.0.0.0指的是本機網路地址,那麼NodeManager就會在本機找ResourceManager相關服務,而slave節點上並沒有這些服務,這些服務在ResourceManager Master主節點上。

所以針對Hadoop叢集配置yare-site.xml某些配置項不能使用預設配置。


解決方法:

修改hadoop叢集所有節點上yarn-site.xml配置檔案,在該檔案中配置ResourceManager Master主節點所在地址即可解決問題。詳細配置資訊如下:

# vi /opt/hadoop/etc/hadoop/yarn-site.xml,在<configuration>和</configuration>之間加入如下配置:


<property>

    <name>yarn.resourcemanager.address</name>

    <value>hadoop01:8032</value>

</property>

<property>

    <name>yarn.resourcemanager.scheduler.address</name>

    <value>hadoop01:8030</value>

</property>

<property>

    <name>yarn.resourcemanager.resource-tracker.address</name>

    <value>hadoop01:8031</value>

</property>


NodeManager程式正常啟動後的日誌如下:

2020-10-13 14:15:53,762 INFO org.eclipse.jetty.server.handler.ContextHandler: Started o.e.j.s.ServletContextHandler@14b030a0{/static,jar:file:/opt/hadoop-3.2.0/share/hadoop/yarn/hadoop-yarn-common-3.2.0.jar!/webapps/static,AVAILABLE}

2020-10-13 14:15:55,165 INFO org.eclipse.jetty.server.handler.ContextHandler: Started o.e.j.w.WebAppContext@2b5183ec{/,file:///tmp/jetty-0.0.0.0-8042-node-_-any-5774776794028847658.dir/webapp/,AVAILABLE}{/node}

2020-10-13 14:15:55,186 INFO org.eclipse.jetty.server.AbstractConnector: Started ServerConnector@5eb2172{HTTP/1.1,[http/1.1]}{0.0.0.0:8042}

2020-10-13 14:15:55,186 INFO org.eclipse.jetty.server.Server: Started @5011ms

2020-10-13 14:15:55,186 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app node started at 8042

2020-10-13 14:15:55,210 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Node ID assigned is : mdw3:24558

2020-10-13 14:15:55,218 INFO org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at hadoop01/192.168.0.1:8031

2020-10-13 14:15:55,223 INFO org.apache.hadoop.util.JvmPauseMonitor: Starting JVM pause monitor

2020-10-13 14:15:55,323 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out 0 NM container statuses: []

2020-10-13 14:15:55,349 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registering with RM using containers :[]

2020-10-13 14:15:55,520 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: Rolling master-key for container-tokens, got key with id -955208939

2020-10-13 14:15:55,521 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM: Rolling master-key for container-tokens, got key with id -1467324462

2020-10-13 14:15:55,522 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as mdw3:24558 with total resource of <memory:8192, vCores:8>










來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/15498/viewspace-2726822/,如需轉載,請註明出處,否則將追究法律責任。

相關文章