hadoop叢集多節點安裝詳解

flzhang發表於2015-09-01

經常使用工具自動構建大規模叢集環境,小環境也有10幾臺的機器,雖然自動部署很省事,但自動構建的背後那些機器自動完成的工作讓我們疏忽了,特別是要自己構建一個小叢集用於瞭解搭建細節和原理還是很有幫助的,今天為複習和了解下hadoop各程式間協調執行的原理,搭建了一個3節點的機器,並記錄自己的搭建過程。
一 搭建叢集
基本環境配置
IP                        Host                             部署程式
192.168.0.110         elephant                         namenode
                                                                   datanode
                                                                   nodemanager
192.168.0.110         tiger                               nodemanager
                                                                   datanode
  
192.168.0.110         horse                             resourcemanager
                                                                  datanode
                                                                  nodemanager
                                                                  jobhistoryserver                            

1.1 安裝CDH5 yum 源
下載cdh5包
Wget http://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/cloudera-cdh5.repo
mv cloudera-cdh5.repo /etc/yum.repo.d
1.2 在各節點安裝對應元件
1. 安裝namenode和datanode
在elephant上安裝namenode
sudo yum install --assumeyes hadoop-hdfs-namenode
在elephant,tiger和horse上安裝datanode

sudo yum install --assumeyes hadoop-hdfs-datanode
2. 安裝resourceManger和nodeManager
在horse上安裝resourceManager
sudo yum install –assumeyes Hadoop-yarn-resourcemanager

在elephant,tiger和horse上安裝nodemanager
sudo yum install –assumeyes Hadoop-yarn-nodemanager
3. 安裝mapreduce框架
在elephant,tiger和horse上安裝mapreduce
sudo yum install –assumeyes Hadoop-mapreduce
4.  安裝jobhistoryserver
在hosrse 安裝jobhistoryserver
sudo yum install –assumeyes Hadoop-mapreduce-historyserver

1.3 修改配置檔案
在elephant上修改配置檔案
1 Copy模板檔案
sudo cp core-site.xml /etc/hadoop/conf/
sudo cp hdfs-site.xml /etc/hadoop/conf/
sudo cp yarn-site.xml /etc/hadoop/conf/
sudo cp mapred-site.xml /etc/hadoop/conf/
2 sudo vi core-site.xml
name value
fs.defaultFS hdfs://elephant:8020

3 sudo vi hdfs-site.xml
dfs.namenode.name.dir file:///disk1/dfs/nn,file:///disk2/dfs/nn
dfs.datanode.data.dir file:///disk1/dfs/dn,file:///disk2/dfs/dn


4 sudo vi yarn-site.xml
yarn.resourcemanager.hostname horse
yarn.application.classpath 保留模板中預設值
yarn.nodemanager.aux-services mapreduce_shuffle
--yarn中使用mapreduce計算框架
yarn.nodemanager.local-dirs file:///disk1/nodemgr/local,file:///disk2/nodemgr/local

yarn.nodemanager.log-dirs /var/log/hadoop-yarn/containers
yarn.nodemanager.remote-app-log-dir /var/log/hadoop-yarn/apps
yarn.log-aggregation-enable TRUE

5 sudo vi mapred-sitexml
mapreduce.framework.name yarn
mapreduce.jobhistory.address horse:10020
mapreduce.jobhistory.webapp.address horse:19888
yarn.app.mapreduce.am.staging-dir /user

6 減小jvm堆大小
export HADOOP_NAMENODE_OPTS="-Xmx64m"
export HADOOP_SECONDARYNAMENODE_OPTS="-Xmx64m"
export HADOOP_DATANODE_OPTS="-Xmx64m"
export YARN_RESOURCEMANAGER_OPTS="-Xmx64m"
export YARN_NODEMANAGER_OPTS="-Xmx64m"
export HADOOP_JOB_HISTORYSERVER_OPTS="-Xmx64m"
7 Copy 所有配置檔案到tiger,horse主機

1.4 建立指定目錄
1 在elephant 建立和存放nodemanger,namenode,datanode相關目錄
$ sudo mkdir -p /disk1/dfs/nn
$ sudo mkdir -p /disk2/dfs/nn
$ sudo mkdir -p /disk1/dfs/dn
$ sudo mkdir -p /disk2/dfs/dn
$ sudo mkdir -p /disk1/nodemgr/local
$ sudo mkdir -p /disk2/nodemgr/local
2 設定目錄許可權
$ sudo chown -R hdfs:hadoop /disk1/dfs/nn
$ sudo chown -R hdfs:hadoop /disk2/dfs/nn
$ sudo chown -R hdfs:hadoop /disk1/dfs/dn
$ sudo chown -R hdfs:hadoop /disk2/dfs/dn
$ sudo chown -R yarn:yarn /disk1/nodemgr/local
$ sudo chown -R yarn:yarn /disk2/nodemgr/local
3 驗證目錄和許可權
$ ls -lR /disk1
$ ls -lR /disk2

1.5  格式化hdfs並啟動hdfs相關程式
1 啟動namenode 和查錯
1) 在elephant
sudo –u hdfs hdfs namenode –format
如果提示是否重新格式化,輸入Y
啟動namenode
sudo service hadoop-hdfs-namenode start
2)檢視namenode日誌
手工檢視
可以根據啟動時提示的.out 檔案路徑檢視對應.log的檔案
less /var/log/hadoop-hdfs/ hadoop-hdfs-namenode-elephant.log
web UI檢視
檢視namenode 的web UI http://elephant:50070.
選擇 Utilities->Logs.
2 啟動datanode和查錯
1)在elephant,tiger,horse啟動
sudo service hadoop-hdfs-datanode start
2) 檢視datanode日誌
手工檢視
less /var/log/hadoop-hdfs/ hadoop-hdfs-datanode-tiger.log
web UI檢視
檢視datanode的web UI http://tiger:50075 ,選擇datanode日誌
在其他節點horse上檢視日誌也可用如上方法


1.6 在hdfs上建立為yarn和mapreduce建立目錄
$ sudo -u hdfs hadoop fs -mkdir /tmp
$ sudo -u hdfs hadoop fs -chmod -R 1777 /tmp
$ sudo -u hdfs hadoop fs -mkdir -p /var/log/hadoop-yarn
$ sudo -u hdfs hadoop fs -chown yarn:mapred /var/log/hadoop-yarn
$ sudo -u hdfs hadoop fs -mkdir /user
$ sudo -u hdfs hadoop fs -mkdir /user/training
$ sudo -u hdfs hadoop fs -chown training /user/training
$ sudo -u hdfs hadoop fs -mkdir /user/history
$ sudo -u hdfs hadoop fs -chmod 1777 /user/history
$ sudo -u hdfs hadoop fs -chown mapred:hadoop /user/history
1.7  啟動yarn和mapreduce程式
1 horse上啟動resourcemanager
sudo service hadoop-yarn-resourcemanager start
2所有節點上啟動nodemanager
sudo service hadoop-yarn-nodemanager start
3horse上啟動historyserver
sudo service hadoop-mapreduce-historyserver start

1.8 測試叢集
1 上傳測試檔案到hdfs
$ hadoop fs -mkdir -p elephant/shakespeare
$ hadoop fs -put shakespeare.txt elephant/shakespeare
2 通過namenode webui 檢視檔案是否上傳
檢視 Utilities->“Browse the file system”選擇目錄檢視檔案
3 測試mapreduce
在elephant
$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount elephant/shakespeare elephant/output
使用webui 訪問resourcemanager 判斷applicationmaster,mapper,reducer這些task執行在哪些主機

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/750077/viewspace-1788595/,如需轉載,請註明出處,否則將追究法律責任。

相關文章