節點功能規劃
作業系統:CentOS7.2(1511)
Java JDK版本:jdk-8u65-linux-x64.tar.gz
Hadoop版本:hadoop-2.8.3.tar.gz
下載地址:
連結:https://pan.baidu.com/s/1iQfjO-d2ojA6mAeOOKb6CA
提取碼:l0qp
node1 | node2 | node3 |
---|---|---|
NameNode | ResourceManage | |
DataNode | DataNode | DataNode |
NodeManager | NodeManager | NodeManager |
HistoryServer | SecondaryNameNode |
配置主機IP地址和主機名稱
三始主機分別命名為:node1,node2,node3,IP地址和主機名稱對應關係如下:
序號 | 主機名 | IP地址 | 備註 |
---|---|---|---|
1 | node1 | 192.168.100.11 | 主節點 |
2 | node2 | 192.168.100.12 | 從節點 |
3 | node3 | 192.168.100.13 | 從節點 |
修改主機名
在三個節點上分別執行修改主機名的命令:
node1:
[root@localhost ~]# hostnamectl set-hostname node1
node2:
[root@localhost ~]# hostnamectl set-hostname node2
node3:
[root@localhost ~]# hostnamectl set-hostname node3
按ctrl+d快捷鍵或輸入exit,退出終端,重新登入後,檢視主機名,如下圖所示:
修改IP地址
以node1節點為例,在三個節點執行修改IP地址的操作(注意網路卡名稱因機器的不同可能不一樣,例如,node1的網路卡名為:eno16777736):
[root@node1 ~]# vi /etc/sysconfig/network-scripts/ifcfg-eno16777736
將node1,node2,node3節點的IP地址分別設定為:192.168.100.11,192.168.100.12,192.168.100.13
修改主機對映
在三個節點分別執行如下操作,新增主機名和IP地址的對映關係:
[root@node1 ~]# vi /etc/hosts
配置節點主機之間的免密登入
生成本節點公鑰
在node1,node2,node3三個節點上分別執行生成金鑰的命令(遇到選擇項,直接按回國鍵Enter):
[root@node1 ~]# ssh-keygen
進入.ssh目錄,檢視生成的公鑰:
[root@node1 ~]# cd ~/.ssh/
[root@node1 .ssh]# ls
id_rsa id_rsa.pub
拷貝公鑰
將生成的公鑰拷貝至節點(包括自身節點):
node1節點:
[root@node1 .ssh]# ssh-copy-id -i id_rsa.pub root@node1
The authenticity of host 'node1 (192.168.100.11)' can't be established.
ECDSA key fingerprint is e1:6c:f3:7f:be:79:dc:87:15:97:51:4d:e5:b4:56:78.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@node1's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'root@node1'"
and check to make sure that only the key(s) you wanted were added.
[root@node1 .ssh]# ssh-copy-id -i id_rsa.pub root@node2
The authenticity of host 'node2 (192.168.100.12)' can't be established.
ECDSA key fingerprint is e1:6c:f3:7f:be:79:dc:87:15:97:51:4d:e5:b4:56:78.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@node2's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'root@node2'"
and check to make sure that only the key(s) you wanted were added.
[root@node1 .ssh]# ssh-copy-id -i id_rsa.pub root@node3
The authenticity of host 'node3 (192.168.100.13)' can't be established.
ECDSA key fingerprint is e1:6c:f3:7f:be:79:dc:87:15:97:51:4d:e5:b4:56:78.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@node3's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'root@node3'"
and check to make sure that only the key(s) you wanted were added.
node2節點:
[root@node2 .ssh]# ssh-copy-id -i id_rsa.pub root@node1
The authenticity of host 'node1 (192.168.100.11)' can't be established.
ECDSA key fingerprint is e1:6c:f3:7f:be:79:dc:87:15:97:51:4d:e5:b4:56:78.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@node1's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'root@node1'"
and check to make sure that only the key(s) you wanted were added.
[root@node2 .ssh]# ssh-copy-id -i id_rsa.pub root@node2
The authenticity of host 'node2 (192.168.100.12)' can't be established.
ECDSA key fingerprint is e1:6c:f3:7f:be:79:dc:87:15:97:51:4d:e5:b4:56:78.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@node2's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'root@node2'"
and check to make sure that only the key(s) you wanted were added.
[root@node2 .ssh]# ssh-copy-id -i id_rsa.pub root@node3
The authenticity of host 'node3 (192.168.100.13)' can't be established.
ECDSA key fingerprint is e1:6c:f3:7f:be:79:dc:87:15:97:51:4d:e5:b4:56:78.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@node3's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'root@node3'"
and check to make sure that only the key(s) you wanted were added.
node3節點:
[root@node3 .ssh]# ssh-copy-id -i id_rsa.pub root@node1
The authenticity of host 'node1 (192.168.100.11)' can't be established.
ECDSA key fingerprint is e1:6c:f3:7f:be:79:dc:87:15:97:51:4d:e5:b4:56:78.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@node1's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'root@node1'"
and check to make sure that only the key(s) you wanted were added.
[root@node3 .ssh]# ssh-copy-id -i id_rsa.pub root@node2
The authenticity of host 'node2 (192.168.100.12)' can't be established.
ECDSA key fingerprint is e1:6c:f3:7f:be:79:dc:87:15:97:51:4d:e5:b4:56:78.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@node2's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'root@node2'"
and check to make sure that only the key(s) you wanted were added.
[root@node3 .ssh]# ssh-copy-id -i id_rsa.pub root@node3
The authenticity of host 'node3 (192.168.100.13)' can't be established.
ECDSA key fingerprint is e1:6c:f3:7f:be:79:dc:87:15:97:51:4d:e5:b4:56:78.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@node3's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'root@node3'"
and check to make sure that only the key(s) you wanted were added.
測試免密登入
在三個節點上分別執行命令,訪問相關節點(含自身節點),如果不需要輸入密碼進行身份驗證,則表示成功(以node3節點上的操作為例):
[root@node3 .ssh]# ssh node1
Last login: Thu Jan 21 11:32:29 2021 from 192.168.100.1
[root@node1 ~]# exit
logout
Connection to node1 closed.
[root@node3 .ssh]# ssh node2
Last login: Thu Jan 21 16:01:47 2021 from node1
[root@node2 ~]# exit
logout
Connection to node2 closed.
[root@node3 .ssh]# ssh node3
Last login: Thu Jan 21 16:01:59 2021 from node1
[root@node3 ~]# exit
logout
Connection to node3 closed.
關閉防火牆
三個節點都要執行:
[root@node1 .ssh]# systemctl stop firewalld
[root@node1 .ssh]# systemctl disable firewalld
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
Removed symlink /etc/systemd/system/basic.target.wants/firewalld.service.
設定Selinux
三個節點都要設定selinux為disabled:
[root@node1 ~]# vi /etc/selinux/config
將selinux設定為disabled後,需要重啟機器生效,也可以執行如下命令,將selinux設定為permissive(同樣也要在三個節點操作):
[root@node1 ~]# setenforce 0
[root@node1 ~]# getenforce
Permissive
配置Java環境
在node1節點下建立目錄/opt/jdk,將jdk包上傳至此目錄:
[root@node1 ~]# mkdir -p /opt/jdk
[root@node1 ~]# cd /opt/jdk
[root@node1 jdk]# ls
jdk-8u65-linux-x64.tar.gz
解壓縮jdk-8u65-linux-x64.tar.gz至當前目錄,完成後刪除壓縮包:
[root@node1 jdk]# tar zxvf jdk-8u65-linux-x64.tar.gz
[root@node1 jdk]# rm -f jdk-8u65-linux-x64.tar.gz
修改/etc/profile檔案,新增Java環境配置資訊:
[root@node1 jdk]# vi /etc/profile
#Java Start
export JAVA_HOME=/opt/jdk/jdk1.8.0_65
export PATH=$PATH:${JAVA_HOME}/bin
export CLASSPATH=.:${JAVA_HOME}/jre/lib/rt.jar:${JAVA_HOME}/lib/dt.jar:${JAVA_HOME}/lib/tools.jar
#Java End
使用Java環境配置資訊生效:
[root@node1 jdk]# source /etc/profile
[root@node1 jdk]# java -version
java version "1.8.0_65"
Java(TM) SE Runtime Environment (build 1.8.0_65-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode)
配置Hadoop環境
在node1節點下建立目錄/opt/hadoop,將hadoop包上傳至此目錄:
[root@node1 ~]# mkdir -p /opt/hadoop
[root@node1 ~]# cd /opt/hadoop/
[root@node1 hadoop]# ls
hadoop-2.8.3.tar.gz
解壓縮hadoop-2.8.3.tar.gz至當前目錄,完成後刪除壓縮包:
[root@node1 hadoop]# tar zxvf hadoop-2.8.3.tar.gz
[root@node1 hadoop]# rm -f hadoop-2.8.3.tar.gz
新增Java環境資訊
依次修改etc目錄下 hadoop-env.sh、mapred-env.sh、yarn-env.sh檔案中的JDK路徑,將其分別指向/opt/jdk/jdk1.8.0_65/,注意在編輯配置檔案時,先把# export前的符號”#“去掉:
[root@node1 ~]# cd /opt/hadoop/hadoop-2.8.3/etc/hadoop/
[root@node1 hadoop]# vi hadoop-env.sh
[root@node1 hadoop]# vi mapred-env.sh
[root@node1 hadoop]# vi yarn-env.sh
配置core-site.xml
在三個節點上分別建立hadoop臨時目錄/opt/datas/tmp:
[root@node1 ~]# mkdir -p /opt/datas/tmp
[root@node2 ~]# mkdir -p /opt/datas/tmp
[root@node3 ~]# mkdir -p /opt/datas/tmp
在node1節點上修改core-site.xml配置資訊:
[root@node1 ~]# vi /opt/hadoop/hadoop-2.8.3/etc/hadoop/core-site.xml
新增如下內容:
<configuration>
<property>
<!-- NameNode主機地址及埠號 -->
<name>fs.defaultFS</name>
<value>hdfs://node1:8020</value>
</property>
<!-- hadoop臨時目錄的地址 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/data/tmp</value>
</property>
</configuration>
配置hdfs-site.xml
在三個節點分別建立好存放NameNode資料的目錄/opt/datas/dfs/namenode,以及存入DataNode資料的目錄/opt/datas/dfs/datanode(以node1上的操作為例,node2和node3上的操作相同):
[root@node1 ~]# mkdir -p /opt/datas/dfs/namenode
[root@node1 ~]# mkdir -p /opt/datas/dfs/datanode
編輯hdfs-site.xml檔案,配置相關資訊:
[root@node1 ~]# vi /opt/hadoop/hadoop-2.8.3/etc/hadoop/hdfs-site.xml
<configuration>
<!-- 指定建立的副本數 -->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!-- 指定SecondaryNameNode的地址和埠號,將node2作為SecondaryNameNode伺服器 -->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>node2:50090</value>
</property>
<!-- NameNode 資料存放路徑 -->
<property>
<name>dfs.namenode.name.dir</name>
<value>/opt/datas/dfs/namenode</value>
</property>
<!-- DataNode 資料存放路徑 -->
<property>
<name>dfs.datanode.data.dir</name>
<value>/opt/datas/dfs/datanode</value>
</property>
</configuration>
配置slaves
slaves檔案用於指定hdfs DataNode 工作節點,編輯slaves檔案:
[root@node1 ~]# vi /opt/hadoop/hadoop-2.8.3/etc/hadoop/slaves
將檔案內容修改為:
配置yarn-site.xml
編輯yarn-site.xml檔案:
[root@node1 ~]# vi /opt/hadoop/hadoop-2.8.3/etc/hadoop/yarn-site.xml
修改檔案內容:
<configuration>
<!-- NodeManager上執行的附屬服務,需配置成mapreduce_shuffle,才能執行MapReduce程式 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 指定ResourceManager伺服器-->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>node2</value>
</property>
<!-- 配置是否啟用日誌聚集功能 -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 配置聚集的日誌在hdfs上最長儲存時間 -->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>106800</value>
</property>
</configuration>
配置mapred-site.xml
以mapred-site.xml.template為模板,複製一個mapred-site.xml檔案:
[root@node1 ~]# cp /opt/hadoop/hadoop-2.8.3/etc/hadoop/mapred-site.xml.template /opt/hadoop/hadoop-2.8.3/etc/hadoop/mapred-site.xml
編輯mapred-site.xml檔案:
[root@node1 ~]# vi /opt/hadoop/hadoop-2.8.3/etc/hadoop/mapred-site.xml
<configuration>
<!-- 設定mapreduce任務執行在yarn上 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- 設定mapreduce歷史伺服器地址及埠號 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>node1:10020</value>
</property>
<!-- 設定mapreduce歷史伺服器的web頁面地址和埠號 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>node1:19888</value>
</property>
</configuration>
在profile檔案中配置hadoop環境資訊
編輯環境配置檔案/etc/profile:
[root@node1 ~]# vi /etc/profile
#Hadoop Start
export HADOOP_HOME=/opt/hadoop/hadoop-2.8.3
export PATH=$PATH:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin
#Hadoop End
讓環境配置資訊生效:
[root@node1 ~]# source /etc/profile
分發內容至節點
在node2,node3節點上建立目錄/opt/jdk,/opt/hadoop:
[root@node2 ~]# mkdir -p /opt/jdk
[root@node2 ~]# mkdir -p /opt/hadoop
分發jdk至node2,node3:
[root@node1 ~]# scp -r /opt/jdk/jdk1.8.0_65/ node2:/opt/jdk
[root@node1 ~]# scp -r /opt/jdk/jdk1.8.0_65/ node3:/opt/jdk
分發hadoop至node2,node3:
[root@node1 ~]# scp -r /opt/hadoop/hadoop-2.8.3/ node2:/opt/hadoop
[root@node1 ~]# scp -r /opt/hadoop/hadoop-2.8.3/ node3:/opt/hadoop
分發profile至node2,node3:
[root@node1 ~]# scp /etc/profile node2:/etc/profile
[root@node1 ~]# scp /etc/profile node3:/etc/profile
在node2,node3節點上執行命令使配置生效:
node2:
[root@node2 ~]# source /etc/profile
node3:
[root@node3 ~]# source /etc/profile
格式化NameNode
如果需要重新格式化NameNode,需要先將原來NameNode和DataNode下的檔案全部刪,不然會報錯,因為每次格式化,預設是建立一個叢集ID,並寫入NameNode和DataNode的VERSION檔案中(VERSION檔案所在目錄為dfs/namenode/current 和 dfs/datanode/current),重新格式化時,預設會生成一個新的叢集ID,如果不刪除原來的目錄,會導致NameNode中的VERSION檔案中是新的叢集ID,而DataNode中是舊的叢集ID,從而不一致,導致報錯,另一種方法是格式化時指定叢集ID引數,指定為舊的叢集ID。
NameNode和DataNode所在目錄是在hdfs-site.xml中dfs.namenode.name.dir、dfs.datanode.data.dir所配置。
[root@node1 ~]# cd /opt/hadoop/hadoop-2.8.3/bin/
[root@node1 bin]# ./hdfs namenode -format
啟動叢集
啟動HDFS
[root@node1 ~]# cd /opt/hadoop/hadoop-2.8.3/sbin/
[root@node1 sbin]# ./start-dfs.sh
Starting namenodes on [node1]
node1: starting namenode, logging to /opt/hadoop/hadoop-2.8.3/logs/hadoop-root-namenode-node1.out
node3: starting datanode, logging to /opt/hadoop/hadoop-2.8.3/logs/hadoop-root-datanode-node3.out
node2: starting datanode, logging to /opt/hadoop/hadoop-2.8.3/logs/hadoop-root-datanode-node2.out
node1: starting datanode, logging to /opt/hadoop/hadoop-2.8.3/logs/hadoop-root-datanode-node1.out
Starting secondary namenodes [node2]
node2: starting secondarynamenode, logging to /opt/hadoop/hadoop-2.8.3/logs/hadoop-root-secondarynamenode-node2.out
[root@node1 sbin]#
jps 命令檢視程式啟動情況,能看到node1節點啟動了 NameNode 和 DataNode程式。
[root@node1 sbin]# jps
1588 NameNode
1717 DataNode
1930 Jps
啟動YARN
在node2節點上執行命令:
[root@node2 ~]# cd /opt/hadoop/hadoop-2.8.3/sbin/
[root@node2 sbin]# ./start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /opt/hadoop/hadoop-2.8.3/logs/yarn-root-resourcemanager-node2.out
node3: starting nodemanager, logging to /opt/hadoop/hadoop-2.8.3/logs/yarn-root-nodemanager-node3.out
node1: starting nodemanager, logging to /opt/hadoop/hadoop-2.8.3/logs/yarn-root-nodemanager-node1.out
node2: starting nodemanager, logging to /opt/hadoop/hadoop-2.8.3/logs/yarn-root-nodemanager-node2.out
[root@node2 sbin]#
jps 命令檢視程式啟動情況,能看到node2節點啟動了ResourceManager程式:
[root@node2 sbin]# jps
2629 NodeManager
2937 Jps
1434 DataNode
1531 SecondaryNameNode
2525 ResourceManager
[root@node2 sbin]#
注意,如果不在ResourceManager主機上執行 $HADOOP_HOME/sbin/start-yarn.sh 命令的話,ResourceManager 程式將不會啟動,需要到 ResourceManager 主機上執行./yarn-daemon.sh start resourcemanager 命令來啟動ResourceManager程式。
啟動日誌伺服器
在node1節點上啟動MapReduce日誌服務:
[root@node1 sbin]# ./mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /opt/hadoop/hadoop-2.8.3/logs/mapred-root-historyserver-node1.out
[root@node1 sbin]#
jps 命令檢視程式啟動情況,能看到node1節點啟動了JobHistoryServer程式:
[root@node1 sbin]# jps
1588 NameNode
1717 DataNode
2502 Jps
2462 JobHistoryServer
2303 NodeManager
[root@node1 sbin]#
檢視HDFS Web頁面
地址為 NameNode 程式執行主機ip,埠為50070(網址:http://192.168.100.11:50070):
檢視YARN Web頁面
地址為node2主機ip,埠號為:8088(網址:http://192.168.100.12:8088)
檢視JobHistory Web 頁面
地址為node1主機ip,埠號為:19888(網址:http://192.168.100.11:19888/jobhistory)
測試案例(使用分詞工具統計樣本詞頻)
在node1節點上準備樣本檔案
[root@node1 ~]# vi example.txt
在example.txt檔案中新增如下內容:
hadoop mapreduce hive
hbase spark storm
sqoop hadoop hive
spark hadoop
在hdfs中建立輸入目錄/datas/input
[root@node1 ~]# hadoop fs -mkdir -p /datas/input
將樣本檔案example.txt上傳至hdfs目錄中
[root@node1 ~]# hadoop fs -put ~/example.txt /datas/input
執行hadoop自帶的mapreduce Demo程式
[root@node1 ~]# hadoop jar /opt/hadoop/hadoop-2.8.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.3.jar wordcount /datas/input/example.txt /datas/output
檢視輸出檔案
[root@node1 ~]# hadoop fs -cat /datas/output/part-r-00000
hadoop 3
hbase 1
hive 2
mapreduce 1
spark 2
sqoop 1
storm 1
[root@node1 ~]#