Spark3.0.0叢集搭建
說明
基礎環境
hadoop3.2.1叢集已經安裝完成
Hadoop 3.2.1 叢集
JDK 1.8
下載Spark 軟體包
官方網址如下:
下載scala 軟體包
官方網址如下:
安裝scala
解壓scala 到指定目錄
[root@hadoop1 hadoop]# mkdir /hadoop/scala
上傳scala-2.13.3.tgz 到/hadoop/soft
tar -xvf scala-2.13.3.tgz -C /hadoop/scala/
修改環境變數
修改/etc/profile, 新增相應的配置資訊
[root@hadoop1 scala-2.13.3]# vi /etc/profile
#set scala environment
export SCALA_HOME=/hadoop/scala/scala-2.13.3
export PATH=$PATH:${JAVA_PATH}:$SCALA_HOME/bin
生效環境變數
[root@hadoop1 scala-2.13.3]# source /etc/profile
測試scala 配置是否正常
依次安裝各個節點。
安裝配置spark
上傳並解壓spark 到spark 安裝目錄
tar -xvf spark-3.0.0-bin-hadoop3.2.tgz -C
修改環境變數
修改/etc/profile, 新增相應的配置資訊
[root@hadoop1 scala-2.13.3]# vi /etc/profile
#set spark environment
export SPARK_HOME=/hadoop/spark/spark-3.0.0-bin-hadoop3.2
export SPARK_EXAMPLES_JAR=$SPARK_HOME/examples/jars/spark-examples_2.12-3.0.0.jar
export PATH=$PATH:${JAVA_PATH}:$SCALA_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin
生效環境變數
[root@hadoop1 scala-2.13.3]# source /etc/profile
修改spark-env.sh
[root@hadoop1 ~]# cd /hadoop/spark/spark-3.0.0-bin-hadoop3.2/conf/
[root@hadoop1 conf]# cp spark-env.sh.template spark-env.sh
[root@hadoop1 conf]# vi spark-env.sh
export SCALA_HOME=/hadoop/scala/scala-2.13.3
export JAVA_HOME=/hadoop/jdk1.8/jdk1.8.0_211-amd64
export SPARK_MASTER_IP=hadoop1
# 設定web 頁面埠
export SPARK_MASTER_WEBUI_PORT=8888
# Spark master 和worker 守護程式的JVM 選項(預設:none)
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=hadoop1:2181,hadoop2:2181,hadoop3:2181 -Dspark.deploy.zookeeper.dir=/spark"
修改slaves
( 在其中加入叢集的從節點的主機或者IP)
[root@hadoop1 conf]# cp slaves.template slaves
[root@hadoop1 conf]# vi slaves
hadoop2
hadoop3
將spark 複製到其他節點
[root@hadoop1 hadoop]# scp -r /hadoop/spark hadoop2:/hadoop/
[root@hadoop1 hadoop]# scp -r /hadoop/spark hadoop3:/hadoop/
啟動spark
首先確保hadoop 叢集已經正常啟動
[root@hadoop1 soft]# jps
23968 NameNode
23666 QuorumPeerMain
28147 Jps
24472 DFSZKFailoverController
24920 ResourceManager
24265 JournalNode
26301 HMaster
25327 JobHistoryServer
啟動master :start-master.sh
[root@hadoop1 soft]# start-master.sh
starting org.apache.spark.deploy.master.Master, logging to /hadoop/spark/spark-3.0.0-bin-hadoop3.2/logs/spark-root-org.apache.spark.deploy.master.Master-1-hadoop1.out
檢查
[root@hadoop1 soft]# jps
23968 NameNode
23666 QuorumPeerMain
28293 Master
24472 DFSZKFailoverController
24920 ResourceManager
24265 JournalNode
28363 Jps
26301 HMaster
25327 JobHistoryServer
可以看到master 已經啟動
啟動slaves :start-slaves.sh
[root@hadoop1 soft]# start-slaves.sh
hadoop3: starting org.apache.spark.deploy.worker.Worker, logging to /hadoop/spark/spark-3.0.0-bin-hadoop3.2/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-hadoop3.out
hadoop2: starting org.apache.spark.deploy.worker.Worker, logging to /hadoop/spark/spark-3.0.0-bin-hadoop3.2/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-hadoop2.out
檢查hadoop2
[root@hadoop2 soft]# jps
15584 Worker
12210 ResourceManager
11731 NameNode
13155 HMaster
13077 HRegionServer
12038 DFSZKFailoverController
15638 Jps
11559 QuorumPeerMain
11931 JournalNode
11820 DataNode
12287 NodeManager
檢查hadoop3
[root@hadoop3 soft]# jps
5088 JournalNode
4979 DataNode
5763 HRegionServer
4813 QuorumPeerMain
5245 NodeManager
7390 Worker
7455 Jps
修改SPARK_HOME/sbin 下的start-all.sh 和stop-all.sh 這兩個檔案的名字
上面是單獨啟動master 和slaves ,我們也可以使用start-all.sh 和stop-all.sh 來啟動會關閉spark 叢集。
在使用這兩個命令之前,我們可以修改這兩個檔名字,如start-spark-all.sh 和stop-spark-all.sh
原因:
如果叢集中也配置HADOOP_HOME ,那麼在HADOOP_HOME/sbin 目錄下也有start-all.sh 和stop-all.sh 這兩個檔案,當你執行這兩個檔案,系統不知道是操作hadoop 叢集還是spark 叢集。修改後就不會衝突了,當然,不修改的話,你需要進入它們的sbin 目錄下執行這些檔案,這肯定就不會發生衝突了。我們配置SPARK_HOME 主要也是為了執行其他spark 命令方便。
[root@hadoop2 soft]# cd /hadoop/spark/spark-3.0.0-bin-hadoop3.2/sbin/
[root@hadoop2 sbin]# mv start-all.sh start-spark-all.sh
[root@hadoop2 sbin]# mv stop-all.sh stop-spark-all.sh
修改後可以使用下面方法啟動關閉spark 叢集
[root@hadoop1 sbin]# start-spark-all.sh
starting org.apache.spark.deploy.master.Master, logging to /hadoop/spark/spark-3.0.0-bin-hadoop3.2/logs/spark-root-org.apache.spark.deploy.master.Master-1-hadoop1.out
hadoop3: starting org.apache.spark.deploy.worker.Worker, logging to /hadoop/spark/spark-3.0.0-bin-hadoop3.2/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-hadoop3.out
hadoop2: starting org.apache.spark.deploy.worker.Worker, logging to /hadoop/spark/spark-3.0.0-bin-hadoop3.2/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-hadoop2.out
[root@hadoop1 sbin]# jps
23968 NameNode
29168 Jps
23666 QuorumPeerMain
29107 Master
24472 DFSZKFailoverController
24920 ResourceManager
24265 JournalNode
26301 HMaster
25327 JobHistoryServer
[root@hadoop2 ~]# jps
16352 Worker
16433 Jps
12210 ResourceManager
11731 NameNode
13155 HMaster
13077 HRegionServer
12038 DFSZKFailoverController
11559 QuorumPeerMain
11931 JournalNode
11820 DataNode
12287 NodeManager
[root@hadoop3 sbin]# jps
5088 JournalNode
7936 Jps
4979 DataNode
5763 HRegionServer
7863 Worker
4813 QuorumPeerMain
5245 NodeManager
在hadoop2 上, 啟動master ,實現master 的HA
[root@hadoop2 conf]# start-master.sh
starting org.apache.spark.deploy.master.Master, logging to /hadoop/spark/spark-3.0.0-bin-hadoop3.2/logs/spark-root-org.apache.spark.deploy.master.Master-1-hadoop2.out
[root@hadoop2 conf]# jps
17584 Jps
12210 ResourceManager
11731 NameNode
13155 HMaster
13077 HRegionServer
17541 Master
12038 DFSZKFailoverController
11559 QuorumPeerMain
11931 JournalNode
11820 DataNode
12287 NodeManager
17439 Worker
檢視spark web
http://192.168.242.81:8888/
http://192.168.242.82:8888/
測試主備自動切換
在hadoop1 節點中檢視並殺掉master 程式
root@hadoop1 conf]# jps
23968 NameNode
23666 QuorumPeerMain
24472 DFSZKFailoverController
24920 ResourceManager
29960 Master
24265 JournalNode
26301 HMaster
30125 Jps
25327 JobHistoryServer
[root@hadoop1 conf]# kill -9 29960
[root@hadoop1 conf]# jps
23968 NameNode
23666 QuorumPeerMain
30147 Jps
24472 DFSZKFailoverController
24920 ResourceManager
24265 JournalNode
26301 HMaster
25327 JobHistoryServer
Hadoop2 自動切換成master 的alive 節點
自動切換後,原來執行的job 應該也不受影響,這裡不做演示。
啟動hadoop1 的master
[root@hadoop1 conf]# start-master.sh
starting org.apache.spark.deploy.master.Master, logging to /hadoop/spark/spark-3.0.0-bin-hadoop3.2/logs/spark-root-org.apache.spark.deploy.master.Master-1-hadoop1.out
[root@hadoop1 conf]# jps
23968 NameNode
23666 QuorumPeerMain
30213 Master
24472 DFSZKFailoverController
24920 ResourceManager
30248 Jps
24265 JournalNode
26301 HMaster
25327 JobHistoryServer
Hadoop2 自動切換成master 的standby 節點
至此,spark 的HA 叢集搭建完成。
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29956245/viewspace-2933090/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 搭建zookeeper叢集(偽叢集)
- zookeeper叢集及kafka叢集搭建Kafka
- linux下搭建ZooKeeper叢集(偽叢集)Linux
- Redis系列:搭建Redis叢集(叢集模式)Redis模式
- 搭建ELK叢集
- Ambari叢集搭建
- kafka叢集搭建Kafka
- Hadoop搭建叢集Hadoop
- zookeeper 叢集搭建
- 搭建 Redis 叢集Redis
- nacos 叢集搭建
- mysql叢集搭建MySql
- redis叢集搭建Redis
- Hadoop叢集搭建Hadoop
- Zookeeper叢集搭建
- RabbitMQ叢集搭建MQ
- HBASE叢集搭建
- 【環境搭建】RocketMQ叢集搭建MQ
- 4.4 Hadoop叢集搭建Hadoop
- Redis(5.0) 叢集搭建Redis
- MySQL 5.7 叢集搭建MySql
- 搭建spark on yarn 叢集SparkYarn
- ZooKeeper 搭建 solr 叢集Solr
- 搭建Redis原生叢集Redis
- 搭建MongoDB分片叢集MongoDB
- MySQL MGR 叢集搭建MySql
- 【greenplum】greenplum叢集搭建
- Kubernetes 叢集搭建(上)
- Kubernetes 叢集搭建(下)
- MongoDB 分片叢集搭建MongoDB
- ElasticSearch 7.8.1叢集搭建Elasticsearch
- Redis--叢集搭建Redis
- Docker 搭建叢集 MongoDBDockerMongoDB
- zookeeper叢集的搭建
- Hadoop叢集搭建(一)Hadoop
- Kubernetes叢集搭建(vagrant)
- ONOS叢集的搭建
- 搭建redis cluster叢集Redis