學習三:基於Hadoop的Cloudera CDH3平臺安裝

yezhibin發表於2012-04-12
      Cloudera為Hadoop開源軟體整體方案供應商,提供基於Apache Hadoop軟體

和服務領域。去年11月獲得4千萬美金風險投資,Oracle, Dell等與Cloudera達成了

合作計劃。IBM,Amazon,微軟等都加入了Hadoop俱樂部,併發布了各自的

Hadoop-as-a-Service。相信該計劃將成為雲端計算平臺的一個主流。以下是基於

CDH3在RedHat 5 安裝Hadoop產品步驟。(CDH3:Cloudera's Distribution

including Apache Hadoop Version 3)

1、三臺主機地址和IP如下:
#vi /etc/hosts
172.16.130.136  masternode
172.16.130.137 slavenode1
172.16.130.138 slavenode2

2、root使用者配置SSH
masternode節點
#ssh-keygen -t rsa
#cat /root/.ssh/id_rsa_pub >>/root/.ssh/authorized_keys
slavenode
#ssh-keygen -t rsa
拷貝authorized_keys 到slavenode節點上

測試是否能不用密碼登陸:
masternode
 #ssh slavenode 1
 #ssh slavenode2

3、下載Cloudera安裝包,下載地址:
http://archive.cloudera.com/redhat/cdh/cdh3-repository-1.0-1.noarch.rpm

4、在各個節點上執行
#sudo yum --nogpgcheck localinstall cdh3-repository-1.0-1.noarch.rpm

5、各個節點安裝hadoop core包
# yum search hadoop
# sudo yum install hadoop-0.20

6、在masternode上安裝namenode和jobtracker
#sudo yum install hadoop-0.20-namenode
#sudo yum install hadoop-0.20-jobtracker

7、在slvaenode節點上安裝datanode和tasktracker
#sudo yum install hadoop-0.20-datanode
#sudo yum install hadoop-0.20-tasktracker

8、配置cluster,在masternode操作
#sudo cp -r /etc/hadoop-0.20/conf.empty /etc/hadoop-0.20/conf.my_cluster
新增自己配置
#sudo alternatives --install /etc/hadoop-0.20/conf hadoop-0.20-conf /etc/hadoop-0.20/conf.my_cluster 50
設定所定義的配置
#sudo alternatives --set hadoop-0.20-conf /etc/hadoop-0.20/conf.my_cluster
顯示配置
#sudo alternatives --display hadoop-0.20-conf
刪除配置
#sudo alternatives --remove hadoop-0.20-conf /etc/hadoop-0.20/conf.my_cluster

9、配置/etc/hadoop-0.20/conf/core-site.xml檔案(預設埠8020)

 
   fs.default.name
   hdfs://masternode/
  



10、配置/etc/hadoop-0.20/conf/hdfs-site.xml檔案


dfs.name.dir
 /data/1/dfs/nn,/data/2/dfs/nn
 

 
 dfs.data.dir
  /data/1/dfs/dn,/data/2/dfs/dn,/data/3/dfs/dn


   dfs.replication
   3



11、配置/etc/hadoop-0.20/conf/mapred-site.xml


  mapred.job.tracker
  masternode:54311
  The host and port that the Mapreduce job tracker runs
   at. If "local", then jobs are run in-process as a single map and
   ruduce task.
 

 


 mapred.local.dir
 /data/1/mapred/local,/data/2/mapred/local,/data/3/mapred/local
 



12、配置/etc/hadoop-0.20/conf/masters和slaves
masters檔案新增masternode
slaves檔案新增slavenode1 和slavenode2

13、根據配置建立檔案
masternode:
#sudo mkdir -p /data/1/dfs/nn /data/2/dfs/nn
#sudo chown -R hdfs:hadoop /data/1/dfs/nn /data/2/dfs/nn
#sudo chmod 700 /data/1/dfs/nn /data/2/dfs/nn
slavenode:
#sudo mkdir -p /data/1/dfs/dn /data/2/dfs/dn /data/3/dfs/dn
#sudo mkdir -p /data/1/mapred/local /data/2/mapred/local /data/3/mapred/local
#sudo chown -R hdfs:hadoop /data/1/dfs/dn /data/2/dfs/dn /data/3/dfs/dn
#sudo chown -R hdfs:hadoop /data/1/mapred/local /data/2/mapred/local /data/3/mapred/local

14、將配置檔案conf_mycluster打包,分發到各slavenode節點

15、按照步驟8步驟,啟用配置檔案

16、在masternode節點執行初始化
#sudo -u hdfs hadoop namenode -format

17、啟動後臺程式
masternode:
#sudo service hadoop-0.20-namenode start
slavenode
#sudo service hadoop-0.20-datanode start

18、建立HDFS檔案目錄
#sudo -u hdfs hadoop fs -mkdir /tmp
#sudo -u hdfs hadoop fs -chown -R 1777 /tmp
#sudo -u hdfs hadoop fs -mkdir /mapred/system
#sudo -u hdfs hadoop fs -chown mapred:hadoop /mapred/system

19、啟動mapred
masternode:
#sudo service hadoop-0.20-jobtracker start
slavenode
#sudo service hadoop-0.20-tasjtracker start

20、配置機器啟動之後,後臺服務自動執行
masternode:
#sudo chkconfig hadoop-0.20-namenode on
#sudo chkconfig hadoop-0.20-jobtracker on
slavenode:
#sudo chkconfig hadoop-0.20-datanode on
#sudo chkconfig hadoop-0.20-tasktracker on

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/354732/viewspace-720985/,如需轉載,請註明出處,否則將追究法律責任。

相關文章