CDH5包下載:http://archive.cloudera.com/cdh5/
主機規劃:
IP |
Host |
部署模組 |
程式 |
192.168.107.82 |
Hadoop-NN-01 |
NameNode ResourceManager |
NameNode DFSZKFailoverController ResourceManager |
192.168.107.83 |
Hadoop-DN-01 Zookeeper-01 |
DataNode NodeManager Zookeeper |
DataNode NodeManager JournalNode QuorumPeerMain |
192.168.107.84 |
Hadoop-DN-02 Zookeeper-02 |
DataNode NodeManager Zookeeper |
DataNode NodeManager JournalNode QuorumPeerMain |
各個程式解釋:
- NameNode
- ResourceManager
- DFSZKFC:DFS Zookeeper Failover Controller 啟用Standby NameNode
- DataNode
- NodeManager
- JournalNode:NameNode共享editlog結點服務(如果使用NFS共享,則該程式和所有啟動相關配置接可省略)。
- QuorumPeerMain:Zookeeper主程式
目錄規劃:
名稱 |
路徑 |
$HADOOP_HOME |
/home/hadoopuser/hadoop-2.6.0-cdh5.6.0 |
Data |
$ HADOOP_HOME/data |
Log |
$ HADOOP_HOME/logs |
配置:
一、關閉防火牆(防火牆可以以後配置)
二、安裝JDK(略)
三、修改HostName並配置Host(3臺)
[root@Linux01 ~]# vim /etc/sysconfig/network [root@Linux01 ~]# vim /etc/hosts 192.168.107.82 Hadoop-NN-01 192.168.107.83 Hadoop-DN-01 Zookeeper-01 192.168.107.84 Hadoop-DN-02 Zookeeper-01
四、為了安全,建立Hadoop專門登入的使用者(5臺)
[root@Linux01 ~]# useradd hadoopuser [root@Linux01 ~]# passwd hadoopuser [root@Linux01 ~]# su – hadoopuser #切換使用者
五、配置SSH免密碼登入(2臺NameNode)
[hadoopuser@Linux05 hadoop-2.6.0-cdh5.6.0]$ ssh-keygen #生成公私鑰 [hadoopuser@Linux05 hadoop-2.6.0-cdh5.6.0]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoopuser@Hadoop-NN-01
-I 表示 input
~/.ssh/id_rsa.pub 表示哪個公鑰組
或者省略為:
[hadoopuser@Linux05 hadoop-2.6.0-cdh5.6.0]$ ssh-copy-id Hadoop-NN-01(或寫IP:10.10.51.231) #將公鑰扔到對方伺服器 [hadoopuser@Linux05 hadoop-2.6.0-cdh5.6.0]$ ssh-copy-id ”6000 Hadoop-NN-01” #如果帶埠則這樣寫
注意修改Hadoop的配置檔案 Hadoop-env.sh
export HADOOP_SSH_OPTS=”-p 6000”
[hadoopuser@Linux05 hadoop-2.6.0-cdh5.6.0]$ ssh Hadoop-NN-01 #驗證(退出當前連線命令:exit、logout) [hadoopuser@Linux05 hadoop-2.6.0-cdh5.6.0]$ ssh Hadoop-NN-01 –p 6000 #如果帶埠這樣寫
六、配置環境變數:vi ~/.bashrc 然後 source ~/.bashrc(5臺)
[hadoopuser@Linux01 ~]$ vi ~/.bashrc # hadoop cdh5 export HADOOP_HOME=/home/hadoopuser/hadoop-2.6.0-cdh5.6.0 export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin [hadoopuser@Linux01 ~]$ source ~/.bashrc #生效
七、安裝zookeeper(2臺DataNode)
1、解壓
2、配置環境變數:vi ~/.bashrc
[hadoopuser@Linux01 ~]$ vi ~/.bashrc # zookeeper cdh5 export ZOOKEEPER_HOME=/home/hadoopuser/zookeeper-3.4.5-cdh5.6.0 export PATH=$PATH:$ZOOKEEPER_HOME/bin [hadoopuser@Linux01 ~]$ source ~/.bashrc #生效
3、修改日誌輸出
[hadoopuser@Linux01 ~]$ vi $ZOOKEEPER_HOME/libexec/zkEnv.sh 56行: 找到如下位置修改語句:ZOO_LOG_DIR="$ZOOKEEPER_HOME/logs"
4、修改配置檔案
[hadoopuser@Linux01 ~]$ vi $ZOOKEEPER_HOME/conf/zoo.cfg # zookeeper tickTime=2000 initLimit=10 syncLimit=5 dataDir=/home/hadoopuser/zookeeper-3.4.5-cdh5.6.0/data clientPort=2181 # cluster server.1=Zookeeper-01:2888:3888 server.2=Zookeeper-02:2888:3888
5、設定myid
(1)Hadoop-DN -01:
mkdir $ZOOKEEPER_HOME/data echo 1 > $ZOOKEEPER_HOME/data/myid
(2)Hadoop-DN -02:
mkdir $ZOOKEEPER_HOME/data echo 2 > $ZOOKEEPER_HOME/data/myid
6、各結點啟動:
[hadoopuser@Linux01 ~]$ zkServer.sh start
7、驗證
[hadoopuser@Linux01 ~]$ jps 3051 Jps 2829 QuorumPeerMain
8、狀態
[hadoopuser@Linux01 ~]$ zkServer.sh status JMX enabled by default Using config: /home/zero/zookeeper/zookeeper-3.4.5-cdh5.0.1/bin/../conf/zoo.cfg Mode: follower
9、附錄zoo.cfg各配置項說明
屬性 |
意義 |
tickTime |
時間單元,心跳和最低會話超時時間為tickTime的兩倍 |
dataDir |
資料存放位置,存放記憶體快照和事務更新日誌 |
clientPort |
客戶端訪問埠 |
initLimit |
配 置 Zookeeper 接受客戶端(這裡所說的客戶端不是使用者連線 Zookeeper伺服器的客戶端,而是 Zookeeper 伺服器叢集中連線到 Leader 的 Follower 伺服器)初始化連線時最長能忍受多少個心跳時間間隔數。當已經超過 10 個心跳的時間(也就是 tickTime)長度後 Zookeeper 伺服器還沒有收到客戶端的返回資訊,那麼表明這個客戶端連線失敗。總的時間長度就是 5*2000=10 秒。 |
syncLimit |
這個配置項標識 Leader 與 Follower 之間傳送訊息,請求和應答時間長度,最長不能超過多少個 |
server.id=host:port:port server.A=B:C:D |
叢集結點列表: A :是一個數字,表示這個是第幾號伺服器; B :是這個伺服器的 ip 地址; C :表示的是這個伺服器與叢集中的 Leader 伺服器交換資訊的埠; D :表示的是萬一叢集中的 Leader 伺服器掛了,需要一個埠來重新進行選舉,選出一個新的 Leader,而這個埠就是用來執行選舉時伺服器相互通訊的埠。如果是偽叢集的配置方式,由於 B 都是一樣,所以不同的 Zookeeper 例項通訊埠號不能一樣,所以要給它們分配不同的埠號。 |
八、安裝Hadoop,並配置(只裝1臺配置完成後分發給其它節點)
1、解壓
2、修改配置檔案
(1)修改 $HADOOP_HOME/etc/hadoop/masters
Hadoop-NN-01
(2)修改 $HADOOP_HOME/etc/hadoop/slaves
Hadoop-DN-01
Hadoop-DN-02
(3)修改 $HADOOP_HOME/etc/hadoop/vi core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://Hadoop-NN-01:9000</value> <description>定義HadoopMaster的URI和埠</description> </property> <property> <name>io.file.buffer.size</name> <value>131072</value> <description>用作序列化檔案處理時讀寫buffer的大小</description> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoopuser/hadoop-2.6.0-cdh5.6.0/tmp</value> <description>臨時資料儲存目錄設定</description> </property> </configuration>
(4)修改 $HADOOP_HOME/etc/hadoop/hdfs-site.xml
<configuration> <property> <name>dfs.namenode.name.dir</name> <value>/home/hadoopuser/hadoop-2.6.0-cdh5.6.0/dfs/name</value> <description> namenode 存放name table(fsimage)本地目錄(需要修改)</description> </property> <property> <name>dfs.datanode.data.dir</name> <value>/home/hadoopuser/hadoop-2.6.0-cdh5.6.0/dfs/data</value> <description>datanode存放block本地目錄(需要修改)</description> </property> <property> <name>dfs.replication</name> <value>1</value> <description>檔案副本個數,預設為3</description> </property> <property> <name>dfs.blocksize</name> <value>134217728</value> <description>塊大小128M</description> </property> <property> <name>dfs.permissions</name> <value>false</value> <description>是否對DFS中的檔案進行許可權控制(測試中一般用false)</description> </property> </configuration>
(5)修改 $HADOOP_HOME/etc/hadoop/yarn-site.xml
<configuration> <property> <name>yarn.resourcemanager.address</name> <value>Hadoop-NN-01:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>Hadoop-NN-01:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>Hadoop-NN-01:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>Hadoop-NN-01:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>Hadoop-NN-01:8088</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> </configuration>
(6)修改 $HADOOP_HOME/etc/hadoop/ mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>Hadoop-NN-01:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>Hadoop-NN-01:19888</value> </property> </configuration>
(7)修改 $HADOOP_HOME/etc/hadoop/hadoop-env.sh
#--------------------Java Env------------------------------ export JAVA_HOME="/usr/java/jdk1.8.0_73" #--------------------Hadoop Env---------------------------- #export HADOOP_PID_DIR=${HADOOP_PID_DIR} export HADOOP_PREFIX="/home/hadoopuser/hadoop-2.6.0-cdh5.6.0" #--------------------Hadoop Daemon Options----------------- # export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS" # export HADOOP_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS" #--------------------Hadoop Logs--------------------------- #export HADOOP_LOG_DIR=${HADOOP_LOG_DIR}/$USER #--------------------SSH PORT------------------------------- export HADOOP_SSH_OPTS="-p 6000" #如果你修改了SSH登入埠,一定要修改此配置。
(8)修改 $HADOOP_HOME/etc/hadoop/yarn-env.sh
#Yarn Daemon Options #export YARN_RESOURCEMANAGER_OPTS #export YARN_NODEMANAGER_OPTS #export YARN_PROXYSERVER_OPTS #export HADOOP_JOB_HISTORYSERVER_OPTS #Yarn Logs export YARN_LOG_DIR="/home/hadoopuser/hadoop-2.6.0-cdh5.6.0/logs"
3、分發程式
scp -r /home/hadoopuser/hadoop-2.6.0-cdh5.6.0 hadoopuser@Hadoop-DN-01:/home/hadoopuser
scp -r /home/hadoopuser/hadoop-2.6.0-cdh5.6.0 hadoopuser@Hadoop-DN-02:/home/hadoopuser
4、格式化NameNode
[hadoopuser@Linux01 hadoop-2.6.0-cdh5.6.0]$ hadoop namenode -format
5、啟動JournalNode:
[hadoopuser@Linux01 hadoop-2.6.0-cdh5.6.0]$ hadoop-daemon.sh start journalnode starting journalnode, logging to /home/hadoopuser/hadoop-2.6.0-cdh5.6.0/logs/hadoop-puppet-journalnode-BigData-03.out
驗證JournalNode:
[hadoopuser@Linux01 hadoop-2.6.0-cdh5.6.0]$ jps 9076 Jps 9029 JournalNode
6、啟動HDFS
叢集啟動法:Hadoop-NN-01: start-dfs.sh
[hadoopuser@Linux01 hadoop-2.6.0-cdh5.6.0]$ start-dfs.sh
單程式啟動法:
<1>NameNode(Hadoop-NN-01,Hadoop-NN-02):hadoop-daemon.sh start namenode
<2>DataNode(Hadoop-DN-01,Hadoop-DN-02,Hadoop-DN-03):hadoop-daemon.sh start datanode
<3>JournalNode(Hadoop-DN-01,Hadoop-DN-02,Hadoop-DN-03):hadoop-daemon.sh start journalnode
7、啟動Yarn
<1>叢集啟動
Hadoop-NN-01啟動Yarn,命令所在目錄:$HADOOP_HOME/sbin
[hadoopuser@Linux01 hadoop-2.6.0-cdh5.6.0]$ start-yarn.sh
<2>單程式啟動
ResourceManager(Hadoop-NN-01,Hadoop-NN-02):yarn-daemon.sh start resourcemanager
DataNode(Hadoop-DN-01,Hadoop-DN-02,Hadoop-DN-03):yarn-daemon.sh start nodemanager
驗證(略)