hadoop2.6.0版本叢集環境搭建

LuckyJiang-2019發表於2016-12-25
問題導讀
1.安裝hadoop需要做哪些準備?
2.如何驗證hadoop是否成功?
3.如何執行wordcout?






一、環境說明
1、機器:一臺物理機 和一臺虛擬機器
2、linux版本:[spark@S1PA11 ~]$ cat /etc/issue
Red Hat Enterprise Linux Server release 5.4 (Tikanga)

3、JDK: [spark@S1PA11 ~]$ java -version
java version "1.6.0_27"
Java(TM) SE Runtime Environment (build 1.6.0_27-b07)
Java HotSpot(TM) 64-Bit Server VM (build 20.2-b06, mixed mode)
4、叢集節點:兩個 S1PA11(Master),S1PA222(Slave)
二、準備工作
1、安裝Java jdk
2、ssh免密碼驗證 
3、下載Hadoop版本
三、安裝Hadoop
這是下載後的hadoop-2.6.0.tar.gz壓縮包,   

1、解壓 tar -xzvf hadoop-2.6.0.tar.gz 
2、move到指定目錄下:[spark@S1PA11 software]$ mv hadoop-2.6.0 ~/opt/ 
3、進入hadoop目前  [spark@S1PA11 opt]$ cd hadoop-2.6.0/
[spark@S1PA11 hadoop-2.6.0]$ ls
bin  dfs  etc  include  input  lib  libexec  LICENSE.txt  logs  NOTICE.txt  README.txt  sbin  share  tmp
配置之前,先在本地檔案系統建立以下資料夾:~/hadoop/tmp、~/dfs/data、~/dfs/name。 主要涉及的配置檔案有7個:都在/hadoop/etc/hadoop資料夾下,可以用gedit命令對其進行編輯。


  1. ~/hadoop/etc/hadoop/hadoop-env.sh
  2. ~/hadoop/etc/hadoop/yarn-env.sh
  3. ~/hadoop/etc/hadoop/slaves
  4. ~/hadoop/etc/hadoop/core-site.xml
  5. ~/hadoop/etc/hadoop/hdfs-site.xml
  6. ~/hadoop/etc/hadoop/mapred-site.xml
  7. ~/hadoop/etc/hadoop/yarn-site.xml
複製程式碼


4、進去hadoop配置檔案目錄
  1. [spark@S1PA11 hadoop-2.6.0]$ cd etc/hadoop/
  2. [spark@S1PA11 hadoop]$ ls
  3. capacity-scheduler.xml  hadoop-env.sh               httpfs-env.sh            kms-env.sh            mapred-env.sh               ssl-client.xml.example
  4. configuration.xsl       hadoop-metrics2.properties  httpfs-log4j.properties  kms-log4j.properties  mapred-queues.xml.template  ssl-server.xml.example
  5. container-executor.cfg  hadoop-metrics.properties   httpfs-signature.secret  kms-site.xml          mapred-site.xml             yarn-env.cmd
  6. core-site.xml           hadoop-policy.xml           httpfs-site.xml          log4j.properties      mapred-site.xml.template    yarn-env.sh
  7. hadoop-env.cmd          hdfs-site.xml               kms-acls.xml             mapred-env.cmd        slaves                      yarn-site.xml
複製程式碼

4.1、配置 hadoop-env.sh檔案-->修改JAVA_HOME
  1. # The java implementation to use.
  2. export JAVA_HOME=/home/spark/opt/java/jdk1.6.0_37
複製程式碼

4.2、配置 yarn-env.sh 檔案-->>修改JAVA_HOME
  1. # some Java parameters
  2. export JAVA_HOME=/home/spark/opt/java/jdk1.6.0_37
複製程式碼

4.3、配置slaves檔案-->>增加slave節點 
  1. S1PA222
複製程式碼

4.4、配置 core-site.xml檔案-->>增加hadoop核心配置(hdfs檔案埠是9000、file:/home/spark/opt/hadoop-2.6.0/tmp、)
  1. <configuration>
  2. <property>
  3.   <name>fs.defaultFS</name>
  4.   <value>hdfs://S1PA11:9000</value>
  5. </property>
  6. <property>
  7.   <name>io.file.buffer.size</name>
  8.   <value>131072</value>
  9. </property>
  10. <property>
  11.   <name>hadoop.tmp.dir</name>
  12.   <value>file:/home/spark/opt/hadoop-2.6.0/tmp</value>
  13.   <description>Abasefor other temporary directories.</description>
  14. </property>
  15. <property>
  16.   <name>hadoop.proxyuser.spark.hosts</name>
  17.   <value>*</value>
  18. </property>
  19. <property>
  20.   <name>hadoop.proxyuser.spark.groups</name>
  21.   <value>*</value>
  22. </property>
  23. </configuration>
複製程式碼

4.5、配置  hdfs-site.xml 檔案-->>增加hdfs配置資訊(namenode、datanode埠和目錄位置)

  1. <configuration>
  2. <property>
  3.   <name>dfs.namenode.secondary.http-address</name>
  4.   <value>S1PA11:9001</value>
  5. </property>

  6.   <property>
  7.    <name>dfs.namenode.name.dir</name>
  8.    <value>file:/home/spark/opt/hadoop-2.6.0/dfs/name</value>
  9. </property>

  10. <property>
  11.   <name>dfs.datanode.data.dir</name>
  12.   <value>file:/home/spark/opt/hadoop-2.6.0/dfs/data</value>
  13.   </property>

  14. <property>
  15.   <name>dfs.replication</name>
  16.   <value>3</value>
  17. </property>

  18. <property>
  19.   <name>dfs.webhdfs.enabled</name>
  20.   <value>true</value>
  21. </property>

  22. </configuration>
複製程式碼

4.6、配置  mapred-site.xml 檔案-->>增加mapreduce配置(使用yarn框架、jobhistory使用地址以及web地址)

  1. <configuration>
  2.   <property>
  3.    <name>mapreduce.framework.name</name>
  4.    <value>yarn</value>
  5. </property>
  6. <property>
  7.   <name>mapreduce.jobhistory.address</name>
  8.   <value>S1PA11:10020</value>
  9. </property>
  10. <property>
  11.   <name>mapreduce.jobhistory.webapp.address</name>
  12.   <value>S1PA11:19888</value>
  13. </property>
  14. </configuration>
複製程式碼

4.7、配置   yarn-site.xml  檔案-->>增加yarn功能

  1. <configuration>
  2.   <property>
  3.    <name>yarn.nodemanager.aux-services</name>
  4.    <value>mapreduce_shuffle</value>
  5.   </property>
  6.   <property>
  7.    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
  8.    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  9.   </property>
  10.   <property>
  11.    <name>yarn.resourcemanager.address</name>
  12.    <value>S1PA11:8032</value>
  13.   </property>
  14.   <property>
  15.    <name>yarn.resourcemanager.scheduler.address</name>
  16.    <value>S1PA11:8030</value>
  17.   </property>
  18.   <property>
  19.    <name>yarn.resourcemanager.resource-tracker.address</name>
  20.    <value>S1PA11:8035</value>
  21.   </property>
  22.   <property>
  23.    <name>yarn.resourcemanager.admin.address</name>
  24.    <value>S1PA11:8033</value>
  25.   </property>
  26.   <property>
  27.    <name>yarn.resourcemanager.webapp.address</name>
  28.    <value>S1PA11:8088</value>
  29.   </property>

  30. </configuration>
複製程式碼

5、將配置好的hadoop檔案copy到另一臺slave機器上
  1. [spark@S1PA11 opt]$ scp -r hadoop-2.6.0/ spark@10.126.34.43:~/opt/
複製程式碼

四、驗證

1、格式化namenode:
  1. [spark@S1PA11 opt]$ cd hadoop-2.6.0/
  2. [spark@S1PA11 hadoop-2.6.0]$ ls
  3. bin  dfs  etc  include  input  lib  libexec  LICENSE.txt  logs  NOTICE.txt  README.txt  sbin  share  tmp
  4. [spark@S1PA11 hadoop-2.6.0]$ ./bin/hdfs namenode -format
  5. [spark@S1PA222 .ssh]$ cd ~/opt/hadoop-2.6.0
  6. [spark@S1PA222 hadoop-2.6.0]$ ./bin/hdfs  namenode -format
複製程式碼

2、啟動hdfs:
  1. [spark@S1PA11 hadoop-2.6.0]$ ./sbin/start-dfs.sh 
  2. 15/01/05 16:41:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  3. Starting namenodes on [S1PA11]
  4. S1PA11: starting namenode, logging to /home/spark/opt/hadoop-2.6.0/logs/hadoop-spark-namenode-S1PA11.out
  5. S1PA222: starting datanode, logging to /home/spark/opt/hadoop-2.6.0/logs/hadoop-spark-datanode-S1PA222.out
  6. Starting secondary namenodes [S1PA11]
  7. S1PA11: starting secondarynamenode, logging to /home/spark/opt/hadoop-2.6.0/logs/hadoop-spark-secondarynamenode-S1PA11.out
  8. 15/01/05 16:41:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
複製程式碼
  1. [spark@S1PA11 hadoop-2.6.0]$ jps
  2. 22230 Master
  3. 30889 Jps
  4. 22478 Worker
  5. 30498 NameNode
  6. 30733 SecondaryNameNode
  7. 19781 ResourceManager
複製程式碼

3、停止hdfs:
  1. [spark@S1PA11 hadoop-2.6.0]$./sbin/stop-dfs.sh 
  2. 15/01/05 16:40:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  3. Stopping namenodes on [S1PA11]
  4. S1PA11: stopping namenode
  5. S1PA222: stopping datanode
  6. Stopping secondary namenodes [S1PA11]
  7. S1PA11: stopping secondarynamenode
  8. 15/01/05 16:40:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
複製程式碼
  1. [spark@S1PA11 hadoop-2.6.0]$ jps
  2. 30336 Jps
  3. 22230 Master
  4. 22478 Worker
  5. 19781 ResourceManager
複製程式碼

4、啟動yarn:
  1. [spark@S1PA11 hadoop-2.6.0]$./sbin/start-yarn.sh 
  2. starting yarn daemons
  3. starting resourcemanager, logging to /home/spark/opt/hadoop-2.6.0/logs/yarn-spark-resourcemanager-S1PA11.out
  4. S1PA222: starting nodemanager, logging to /home/spark/opt/hadoop-2.6.0/logs/yarn-spark-nodemanager-S1PA222.out
複製程式碼
  1. [spark@S1PA11 hadoop-2.6.0]$ jps
  2. 31233 ResourceManager
  3. 22230 Master
  4. 22478 Worker
  5. 30498 NameNode
  6. 30733 SecondaryNameNode
  7. 31503 Jps
複製程式碼

5、停止yarn:
  1. [spark@S1PA11 hadoop-2.6.0]$ ./sbin/stop-yarn.sh 
  2. stopping yarn daemons
  3. stopping resourcemanager
  4. S1PA222: stopping nodemanager
  5. no proxyserver to stop
複製程式碼
  1. [spark@S1PA11 hadoop-2.6.0]$ jps
  2. 31167 Jps
  3. 22230 Master
  4. 22478 Worker
  5. 30498 NameNode
  6. 30733 SecondaryNameNode
複製程式碼

6、檢視叢集狀態:
  1. [spark@S1PA11 hadoop-2.6.0]$ ./bin/hdfs dfsadmin -report
  2. 15/01/05 16:44:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  3. Configured Capacity: 52101857280 (48.52 GB)
  4. Present Capacity: 45749510144 (42.61 GB)
  5. DFS Remaining: 45748686848 (42.61 GB)
  6. DFS Used: 823296 (804 KB)
  7. DFS Used%: 0.00%
  8. Under replicated blocks: 10
  9. Blocks with corrupt replicas: 0
  10. Missing blocks: 0
複製程式碼



-------------------------------------------------
Live datanodes (1):


Name: 10.126.45.56:50010 (S1PA222)
Hostname: S1PA209
Decommission Status : Normal
Configured Capacity: 52101857280 (48.52 GB)
DFS Used: 823296 (804 KB)
Non DFS Used: 6352347136 (5.92 GB)
DFS Remaining: 45748686848 (42.61 GB)
DFS Used%: 0.00%
DFS Remaining%: 87.81%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Mon Jan 05 16:44:50 CST 2015


7、檢視hdfs:http://10.58.44.47:50070/

 

8、檢視RM:http://10.58.44.47:8088/

 

9、執行wordcount程式

9.1、建立 input目錄:[spark@S1PA11 hadoop-2.6.0]$ mkdir input

9.2、在input建立f1、f2並寫內容
[spark@S1PA11 hadoop-2.6.0]$ cat input/f1 
Hello world  bye jj
[spark@S1PA11 hadoop-2.6.0]$ cat input/f2
Hello Hadoop  bye Hadoop

9.3、在hdfs建立/tmp/input目錄
[spark@S1PA11 hadoop-2.6.0]$ ./bin/hadoop fs  -mkdir /tmp
15/01/05 16:53:57 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

[spark@S1PA11 hadoop-2.6.0]$ ./bin/hadoop fs  -mkdir /tmp/input
15/01/05 16:54:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


9.4、將f1、f2檔案copy到hdfs /tmp/input目錄
[spark@S1PA11 hadoop-2.6.0]$ ./bin/hadoop fs  -put input/ /tmp
15/01/05 16:56:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


9.5、檢視hdfs上是否有f1、f2檔案
[spark@S1PA11 hadoop-2.6.0]$ ./bin/hadoop fs -ls /tmp/input/
15/01/05 16:57:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r--   3 spark supergroup         20 2015-01-04 19:09 /tmp/input/f1
-rw-r--r--   3 spark supergroup         25 2015-01-04 19:09 /tmp/input/f2


9.6、執行wordcount程式
[spark@S1PA11 hadoop-2.6.0]$ ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /tmp/input /output
15/01/05 17:00:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/01/05 17:00:09 INFO client.RMProxy: Connecting to ResourceManager at S1PA11/10.58.44.47:8032
15/01/05 17:00:11 INFO input.FileInputFormat: Total input paths to process : 2
15/01/05 17:00:11 INFO mapreduce.JobSubmitter: number of splits:2
15/01/05 17:00:11 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1420447392452_0001
15/01/05 17:00:12 INFO impl.YarnClientImpl: Submitted application application_1420447392452_0001
15/01/05 17:00:12 INFO mapreduce.Job: The url to track the job: http://S1PA11:8088/proxy/application_1420447392452_0001/
15/01/05 17:00:12 INFO mapreduce.Job: Running job: job_1420447392452_0001


9.7、檢視執行結果
[spark@S1PA11 hadoop-2.6.0]$ ./bin/hadoop fs -cat /output/part-r-0000
15/01/05 17:06:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

相關文章