Spark3.0.0叢集搭建

夢夕林1992發表於2023-01-28


說明

基礎環境

hadoop3.2.1叢集已經安裝完成

 Hadoop 3.2.1 叢集

 JDK 1.8

 

下載Spark 軟體包

官方網址如下:

 

下載scala 軟體包

官方網址如下:

 

安裝scala

解壓scala 到指定目錄

[root@hadoop1 hadoop]# mkdir /hadoop/scala

上傳scala-2.13.3.tgz /hadoop/soft

tar -xvf scala-2.13.3.tgz -C /hadoop/scala/

 

修改環境變數

修改/etc/profile,  新增相應的配置資訊

[root@hadoop1 scala-2.13.3]# vi /etc/profile

#set scala environment

export SCALA_HOME=/hadoop/scala/scala-2.13.3

export PATH=$PATH:${JAVA_PATH}:$SCALA_HOME/bin 

 

生效環境變數

[root@hadoop1 scala-2.13.3]# source /etc/profile

 

測試scala 配置是否正常  

 

 

 

依次安裝各個節點。


安裝配置spark

上傳並解壓spark spark 安裝目錄

tar -xvf spark-3.0.0-bin-hadoop3.2.tgz -C

 

修改環境變數

修改/etc/profile,  新增相應的配置資訊

[root@hadoop1 scala-2.13.3]# vi /etc/profile

#set spark environment

export SPARK_HOME=/hadoop/spark/spark-3.0.0-bin-hadoop3.2

export SPARK_EXAMPLES_JAR=$SPARK_HOME/examples/jars/spark-examples_2.12-3.0.0.jar

export PATH=$PATH:${JAVA_PATH}:$SCALA_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin

 

生效環境變數

[root@hadoop1 scala-2.13.3]# source /etc/profile

 

修改spark-env.sh

[root@hadoop1 ~]# cd /hadoop/spark/spark-3.0.0-bin-hadoop3.2/conf/

[root@hadoop1 conf]# cp spark-env.sh.template spark-env.sh

 

[root@hadoop1 conf]# vi spark-env.sh

export SCALA_HOME=/hadoop/scala/scala-2.13.3

export JAVA_HOME=/hadoop/jdk1.8/jdk1.8.0_211-amd64

export SPARK_MASTER_IP=hadoop1

# 設定web 頁面埠

export SPARK_MASTER_WEBUI_PORT=8888

# Spark master worker 守護程式的JVM 選項(預設:none)

export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=hadoop1:2181,hadoop2:2181,hadoop3:2181 -Dspark.deploy.zookeeper.dir=/spark"

 

修改slaves

( 在其中加入叢集的從節點的主機或者IP)

[root@hadoop1 conf]# cp slaves.template slaves

[root@hadoop1 conf]# vi slaves

hadoop2

hadoop3

 

spark 複製到其他節點  

[root@hadoop1 hadoop]# scp -r /hadoop/spark hadoop2:/hadoop/

[root@hadoop1 hadoop]# scp -r /hadoop/spark hadoop3:/hadoop/

 

啟動spark

首先確保hadoop 叢集已經正常啟動

[root@hadoop1 soft]# jps

23968 NameNode

23666 QuorumPeerMain

28147 Jps

24472 DFSZKFailoverController

24920 ResourceManager

24265 JournalNode

26301 HMaster

25327 JobHistoryServer

 

啟動master start-master.sh

[root@hadoop1 soft]#  start-master.sh

starting org.apache.spark.deploy.master.Master, logging to /hadoop/spark/spark-3.0.0-bin-hadoop3.2/logs/spark-root-org.apache.spark.deploy.master.Master-1-hadoop1.out

 

檢查

[root@hadoop1 soft]# jps

23968 NameNode

23666 QuorumPeerMain

28293 Master

24472 DFSZKFailoverController

24920 ResourceManager

24265 JournalNode

28363 Jps

26301 HMaster

25327 JobHistoryServer

可以看到master 已經啟動

 

啟動slaves start-slaves.sh

[root@hadoop1 soft]# start-slaves.sh 

hadoop3: starting org.apache.spark.deploy.worker.Worker, logging to /hadoop/spark/spark-3.0.0-bin-hadoop3.2/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-hadoop3.out

hadoop2: starting org.apache.spark.deploy.worker.Worker, logging to /hadoop/spark/spark-3.0.0-bin-hadoop3.2/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-hadoop2.out

 

檢查hadoop2

[root@hadoop2 soft]# jps

15584 Worker

12210 ResourceManager

11731 NameNode

13155 HMaster

13077 HRegionServer

12038 DFSZKFailoverController

15638 Jps

11559 QuorumPeerMain

11931 JournalNode

11820 DataNode

12287 NodeManager

 

檢查hadoop3

[root@hadoop3 soft]# jps

5088 JournalNode

4979 DataNode

5763 HRegionServer

4813 QuorumPeerMain

5245 NodeManager

7390 Worker

7455 Jps

 

修改SPARK_HOME/sbin 下的start-all.sh stop-all.sh 這兩個檔案的名字

上面是單獨啟動master slaves ,我們也可以使用start-all.sh stop-all.sh 來啟動會關閉spark 叢集。

在使用這兩個命令之前,我們可以修改這兩個檔名字,如start-spark-all.sh stop-spark-all.sh

原因:

如果叢集中也配置HADOOP_HOME ,那麼在HADOOP_HOME/sbin 目錄下也有start-all.sh stop-all.sh 這兩個檔案,當你執行這兩個檔案,系統不知道是操作hadoop 叢集還是spark 叢集。修改後就不會衝突了,當然,不修改的話,你需要進入它們的sbin 目錄下執行這些檔案,這肯定就不會發生衝突了。我們配置SPARK_HOME 主要也是為了執行其他spark 命令方便。

[root@hadoop2 soft]# cd /hadoop/spark/spark-3.0.0-bin-hadoop3.2/sbin/

[root@hadoop2 sbin]# mv start-all.sh start-spark-all.sh

[root@hadoop2 sbin]# mv stop-all.sh stop-spark-all.sh

 

修改後可以使用下面方法啟動關閉spark 叢集

[root@hadoop1 sbin]#  start-spark-all.sh 

starting org.apache.spark.deploy.master.Master, logging to /hadoop/spark/spark-3.0.0-bin-hadoop3.2/logs/spark-root-org.apache.spark.deploy.master.Master-1-hadoop1.out

hadoop3: starting org.apache.spark.deploy.worker.Worker, logging to /hadoop/spark/spark-3.0.0-bin-hadoop3.2/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-hadoop3.out

hadoop2: starting org.apache.spark.deploy.worker.Worker, logging to /hadoop/spark/spark-3.0.0-bin-hadoop3.2/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-hadoop2.out

[root@hadoop1 sbin]# jps

23968 NameNode

29168 Jps

23666 QuorumPeerMain

29107 Master

24472 DFSZKFailoverController

24920 ResourceManager

24265 JournalNode

26301 HMaster

25327 JobHistoryServer

 

[root@hadoop2 ~]# jps

16352 Worker

16433 Jps

12210 ResourceManager

11731 NameNode

13155 HMaster

13077 HRegionServer

12038 DFSZKFailoverController

11559 QuorumPeerMain

11931 JournalNode

11820 DataNode

12287 NodeManager

 

[root@hadoop3 sbin]# jps

5088 JournalNode

7936 Jps

4979 DataNode

5763 HRegionServer

7863 Worker

4813 QuorumPeerMain

5245 NodeManager

 

hadoop2 , 啟動master ,實現master HA

[root@hadoop2 conf]# start-master.sh 

starting org.apache.spark.deploy.master.Master, logging to /hadoop/spark/spark-3.0.0-bin-hadoop3.2/logs/spark-root-org.apache.spark.deploy.master.Master-1-hadoop2.out

[root@hadoop2 conf]# jps

17584 Jps

12210 ResourceManager

11731 NameNode

13155 HMaster

13077 HRegionServer

17541 Master

12038 DFSZKFailoverController

11559 QuorumPeerMain

11931 JournalNode

11820 DataNode

12287 NodeManager

17439 Worker

 

檢視spark web  

http://192.168.242.81:8888/

http://192.168.242.82:8888/

 

測試主備自動切換

hadoop1 節點中檢視並殺掉master 程式

root@hadoop1 conf]# jps

23968 NameNode

23666 QuorumPeerMain

24472 DFSZKFailoverController

24920 ResourceManager

29960 Master

24265 JournalNode

26301 HMaster

30125 Jps

25327 JobHistoryServer

[root@hadoop1 conf]#  kill -9 29960

[root@hadoop1 conf]# jps

23968 NameNode

23666 QuorumPeerMain

30147 Jps

24472 DFSZKFailoverController

24920 ResourceManager

24265 JournalNode

26301 HMaster

25327 JobHistoryServer

 

Hadoop2 自動切換成master alive 節點

自動切換後,原來執行的job 應該也不受影響,這裡不做演示。

 

啟動hadoop1 master

[root@hadoop1 conf]# start-master.sh 

starting org.apache.spark.deploy.master.Master, logging to /hadoop/spark/spark-3.0.0-bin-hadoop3.2/logs/spark-root-org.apache.spark.deploy.master.Master-1-hadoop1.out

[root@hadoop1 conf]# jps

23968 NameNode

23666 QuorumPeerMain

30213 Master

24472 DFSZKFailoverController

24920 ResourceManager

30248 Jps

24265 JournalNode

26301 HMaster

25327 JobHistoryServer

 

Hadoop2 自動切換成master standby 節點

 

至此,spark HA 叢集搭建完成。


來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29956245/viewspace-2933090/,如需轉載,請註明出處,否則將追究法律責任。

相關文章