Hadoop HA叢集簡單搭建
1、叢集規劃
2、伺服器規劃
3、安裝檔案準備
4、ha系統環境準備
4.1、作業系統安裝,此環境採用centos minimal安裝,系統版本CentOS Linux release 7.8.2003 (Core)。
mkdir /soft
上傳安裝檔案到soft目錄(hadoop-2.7.7-centos7.tar.gz 、jdk-8u151-linux-x64.tar.gz、 zookeeper-3.4.14.tar.gz)
4.2、關閉防火牆
systemctl stop firewalld.service && systemctl disable firewalld.service
4.3、修改selinux
sed -i 's/enforcing/disabled/g' /etc/selinux/config
4.4、Java安裝
tar -zxvf jdk-8u201-linux-x64.tar.gz -C /usr/local/
修改環境變數
echo "export ZOOKEEPER_HOME=/opt/hadoop/zookeeper">> /etc/profile
echo "export HADOOP_HOME=/opt/hadoop/hadoop" >> /etc/profile
echo "export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH" >> /etc/profile
echo "export JAVA_HOME=/usr/local/jdk1.8.0_151" >> /etc/profile
echo "export JRE_HOME=/usr/local/jdk1.8.0_151/jre" >> /etc/profile
echo "export CLASS_PATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib" >> /etc/profile
echo "export PATH=/opt/hadoop/zookeeper/bin:$JAVA_HOME/bin:$JRE_HOME/bin:$PATH" >> /etc/profile
source /etc/profile
4.5、新增hadoop使用者
groupadd hadoop && useradd -g hadoop -d /home/hadoop hadoop echo 'hadoop' | passwd hadoop --stdin
4.5、建立hadoop安裝目錄,並修改許可權
mkdir -p /opt/hadoop && chown -R hadoop:hadoop /opt/hadoop
mkdir -p /opt/data/hadoop/hdfs && chown -R hadoop:hadoop /opt/data/hadoop/hdfs
mkdir -p /opt/data/hadoop/tmp && chown -R hadoop:hadoop /opt/data/hadoop/tmp
4.6、修改hosts配置檔案
echo "192.168.32.11 node1" >>/etc/hosts
echo "192.168.32.12 node2" >>/etc/hosts
echo "192.168.32.13 node3" >>/etc/hosts
echo "192.168.32.14 node4" >>/etc/hosts
4.7、配置免密碼登入(相關節點均執行)
ssh-keygen -t rsa
[root@node1 ~]# ssh-copy-id node1
[root@node1 ~]# ssh-copy-id node2
[root@node1 ~]# ssh-copy-id node3
[root@node1 ~]# ssh-copy-id node4
5、hadoop高可用安裝
5.1、zookeeper安裝
[root@node1 ~]# cd /soft/
tar -zxcf zookeeper-3.4.14.tar.gz -C /opt/hadoop/
mv zookeeper-3.4.14/ zookeeper
配置zoo.cfg配置檔案,並建立相關目錄
[root@node1 ~]# cd /opt/hadoop/zookeeper/conf
[root@node1 conf]# cp zoo_sample.cfg zoo.cfg
[root@node1 conf]# vim zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/opt/hadoop/zookeeper/data
dataLogDir=/opt/hadoop/zookeeper/dataLog
clientPort=2181
server.1=node1:2888:3888
server.2=node2:2888:3888
server.3=node3:2888:3888
maxClientCnxns=60
[root@node1 ~]# mkdir -p /opt/hadoop/zookeeper/data
[root@node1 ~]# mkdir -p /opt/hadoop/zookeeper/dataLog
修改myid檔案
[root@node1 conf]# cd /opt/hadoop/zookeeper/data
[root@node1 data]# touch myid && echo 1 > myid
將zookeeper整個資料夾傳送到其他節點的相同目錄
[root@node1 hadoop]# scp -r zookeeper/ node2:$PWD
[root@node1 hadoop]# scp -r zookeeper/ node3:$PWD
[root@node1 hadoop]# scp -r zookeeper/ node4:$PWD
修改其他節點myid檔案內容,序號與zoo.cfg配置檔案內server.1=node1保持一致
[root@node3 data]# sed -i "s/1/3/g" /opt/hadoop/zookeeper/data/myid
[root@node2 data]# sed -i "s/1/2/g" /opt/hadoop/zookeeper/data/myid
[root@node4 data]# sed -i "s/1/4/g" /opt/hadoop/zookeeper/data/myid
三個節點均啟動zk並檢視zk狀態
[root@node3 hadoop]# zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /opt/hadoop/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@node3 hadoop]# jps
1949 QuorumPeerMain
1983 Jps
[root@node3 hadoop]# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /opt/hadoop/zookeeper/bin/../conf/zoo.cfg
Mode: leader
[root@node3 hadoop]#
5.2、hdfs元件安裝
[root@node1 conf]# cd /soft
[root@node1 soft]# ls
hadoop-2.7.7-centos7.tar.gz jdk-8u151-linux-x64.tar.gz zookeeper-3.4.14.tar.gz
[root@node1 soft]# tar -zxvf hadoop-2.7.7-centos7.tar.gz -C /opt/hadoop/
[root@node1 hadoop]# mv hadoop-2.7.7/ hadoop
修改配置檔案新增JAVA_HOME環境變數
修改hadoop-env.sh
export JAVA_HOME=/usr/local/jdk1.8.0_151
修改hdfs-site.xml檔案
<configuration>
<property>
<!-- 為namenode叢集定義一個services name -->
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<property>
<!-- nameservice 包含哪些namenode,為各個namenode起名 -->
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
<property>
<!-- 名為nn1的namenode 的rpc地址和埠號,rpc用來和datanode通訊 -->
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>node1:8020</value>
</property>
<property>
<!-- 名為nn2的namenode 的rpc地址和埠號,rpc用來和datanode通訊 -->
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>node2:8020</value>
</property>
<property>
<!--名為nn1的namenode 的http地址和埠號,web客戶端 -->
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>node1:50070</value>
</property>
<property>
<!--名為nn2的namenode 的http地址和埠號,web客戶端 -->
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>node2:50070</value>
</property>
<property>
<!-- namenode間用於共享編輯日誌的journal節點列表 -->
<!-- 指定NameNode的edits後設資料的共享儲存位置 -->
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://node1:8485;node2:8485;node3:8485/mycluster</value>
</property>
<property>
<!-- journalnode 上用於存放edits日誌的目錄 -->
<name>dfs.journalnode.edits.dir</name>
<value>/opt/hadoop/hadoop/tmp/data/dfs/jn</value>
</property>
<property>
<!-- 客戶端連線可用狀態的NameNode所用的代理類 -->
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<!-- 配置隔離機制方法,多個機制用換行分割,即每個機制暫用一行 -->
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<!-- 使用sshfence隔離機制時需要ssh免登陸 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
<!-- journalnode叢集之間通訊的超時時間 -->
<property>
<name>dfs.qjournal.start-segment.timeout.ms</name>
<value>60000</value>
</property>
<!-- 指定副本數 -->
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<!--namenode路徑-->
<property>
<name>dfs.namenode.name.dir</name>
<value>/opt/data/hadoop/hdfs/nn</value>
</property>
<!--datanode路徑-->
<property>
<name>dfs.datanode.data.dir</name>
<value>/opt/data/hadoop/hdfs/dn</value>
</property>
<!-- 開啟NameNode失敗自動切換 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 啟用webhdfs -->
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<!-- 配置sshfence隔離機制超時時間 -->
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
<property>
<name>ha.failover-controller.cli-check.rpc-timeout.ms</name>
<value>60000</value>
</property>
</configuration>
修改core-site.xml檔案
<configuration>
<property>
<!-- hdfs 地址,ha中是連線到nameservice -->
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<property>
<!-- -->
<name>hadoop.tmp.dir</name>
<value>/opt/data/hadoop/tmp</value>
</property>
<!-- 指定zookeeper地址 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>node1:2181,node2:2181,node3:2181</value>
</property>
<!-- hadoop連結zookeeper的超時時長設定 -->
<property>
<name>ha.zookeeper.session-timeout.ms</name>
<value>30000</value>
<description>ms</description>
</property>
<property>
<name>fs.trash.interval</name>
<value>1440</value>
</property>
<property>
<name>hadoop.proxyuser.hdfs.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hdfs.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hive.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hive.hosts</name>
<value>*</value>
</property>
</configuration>
修改yarn-site.xml檔案
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>106800</value>
</property>
<property>
<!-- 啟用resourcemanager的ha功能 -->
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<!-- 為resourcemanage ha 叢集起個id -->
<name>yarn.resourcemanager.cluster-id</name>
<value>yarn-cluster</value>
</property>
<property>
<!-- 指定resourcemanger ha 有哪些節點名 -->
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<!-- 指定第一個節點的所在機器 -->
<name>yarn.resourcemanager.hostname.rm1</name>
<value>node2</value>
</property>
<property>
<!-- 指定第二個節點所在機器 -->
<name>yarn.resourcemanager.hostname.rm2</name>
<value>node3</value>
</property>
<property>
<!-- 指定resourcemanger ha 所用的zookeeper 節點 -->
<name>yarn.resourcemanager.zk-address</name>
<value>node1:2181,node2:2181,node3:2181</value>
</property>
<property>
<!-- -->
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<!-- 制定resourcemanager的狀態資訊儲存在zookeeper叢集上 -->
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://node2:19888/jobhistory/logs/</value>
</property>
</configuration>
修改mapred-site.xml檔案
<configuration>
<!-- 指定mr框架為yarn方式 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- 設定mapreduce的歷史伺服器地址和埠號 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>node1:10020</value>
</property>
<!-- mapreduce歷史伺服器的web訪問地址 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>node1:19888</value>
</property>
</configuration>
6、hadoop啟動
6.1、啟動journalnode
[root@node1 hadoop]# hadoop-daemon.sh start journalnode
[root@node2 hadoop]# hadoop-daemon.sh start journalnode
[root@node3 hadoop]# hadoop-daemon.sh start journalnode
6.2、第一個namenode啟動
[root@node1 hadoop]# hdfs namenode -format
[root@node1 hadoop]#hadoop-daemon.sh start namenode
6.3、第二個namenode啟動
[root@node2 hadoop]# hdfs namenode -bootstrapStandby
[root@node2 hadoop]# hadoop-daemon.sh start namenode
6.4、啟用node1為active狀態
[root@node1 hadoop]# hdfs haadmin -transitionToActive --forcemanual nn1
6.5、檢查namenode執行狀態
[root@node1 hadoop]# hdfs haadmin -getServiceState nn1
active
[root@node1 hadoop]# hdfs haadmin -getServiceState nn2
standby
6.6、註冊HA到zookeeper
[root@node1 hadoop]# hdfs zkfc -formatZK
6.7、啟動datanode
[root@node1 hadoop]# start-dfs.sh
6.8、啟動yarn
[root@node3 hadoop]# start-yarn.sh
[root@node2 hadoop]# start-yarn.sh
[root@node1 hadoop]# start-yarn.sh
6.9、檢查yarn的啟動情況
[root@node1 hadoop]# yarn rmadmin -getServiceState rm1
active
[root@node1 hadoop]# yarn rmadmin -getServiceState rm2
standby
相關文章
- BigData~03:Hadoop05 ~ HA叢集搭建Hadoop
- HA分散式叢集搭建分散式
- Hadoop叢集搭建Hadoop
- Hadoop搭建叢集Hadoop
- Hadoop HA叢集 與 開發環境部署Hadoop開發環境
- 最詳細的hadoop2.2.0叢集的HA高可靠的最簡單配置Hadoop
- 4.4 Hadoop叢集搭建Hadoop
- Hadoop叢集搭建(一)Hadoop
- Redis叢集搭建與簡單使用Redis
- hadoop叢集篇--從0到1搭建hadoop叢集Hadoop
- hadoop叢集搭建——單節點(偽分散式)Hadoop分散式
- 03【線上日誌分析】之hadoop-2.7.3編譯和搭建叢集環境(HDFS HA,Yarn HA)Hadoop編譯Yarn
- hadoop分散式叢集搭建Hadoop分散式
- Hadoop叢集搭建文件Hadoop
- hadoop叢集環境搭建Hadoop
- hadoop2.2.0叢集搭建Hadoop
- swarm mode叢集搭建及簡單概念Swarm
- 高階k8s HA 叢集搭建(一)K8S
- Canalv1.1.4版本搭建HA叢集
- HADOOP SPARK 叢集環境搭建HadoopSpark
- Hadoop-2.7.4 叢集快速搭建Hadoop
- Hadoop分散式叢集搭建_1Hadoop分散式
- Windows 10環境簡單搭建ELK叢集Windows
- hadoop 2.0 hdfs HA 搭建Hadoop
- HA叢集heartbeat配置--NginxNginx
- 簡單的方式搭建k8s叢集K8S
- hadoop-2.5.0-cdh5.3.6叢集搭建HadoopH5
- 大資料7.1 - hadoop叢集搭建大資料Hadoop
- 從零自學Hadoop(06):叢集搭建Hadoop
- 【教程】手把手教你如何搭建Hadoop單機偽叢集Hadoop
- Hadoop雙namenode配置搭建(HA)Hadoop
- 基於 ZooKeeper 搭建 Hadoop 高可用叢集Hadoop
- 基於kerberos的hadoop安全叢集搭建ROSHadoop
- 大資料平臺Hadoop叢集搭建大資料Hadoop
- 搭建hadoop2/CDH4叢集Hadoop
- 高可用Hadoop平臺-HBase叢集搭建Hadoop
- Zookeeper簡介與叢集搭建
- Linux學習之使用RHCS套件搭建HA高可用叢集Linux套件