Spark2.0.0 Install And Examples
1.Scala 2.11.8 下載解壓
[root@sht-sgmhadoopnn-01 hadoop]# wget
[root@sht-sgmhadoopnn-01 hadoop]# tar xzvf scala-2.11.8.tgz
[root@sht-sgmhadoopnn-01 hadoop]# mv scala-2.11.8 scala
[root@sht-sgmhadoopnn-01 hadoop]#
2.將scala資料夾同步到叢集其他機器
[root@sht-sgmhadoopnn-01 hadoop]# scp -r scala
[root@sht-sgmhadoopnn-01 hadoop]# scp -r scala
[root@sht-sgmhadoopnn-01 hadoop]# scp -r scala
[root@sht-sgmhadoopnn-01 hadoop]# scp -r scala
3.在叢集的每臺機器配置環境變數,生效
###在檔案末尾新增兩行
[root@sht-sgmhadoopnn-01 hadoop]# vi /etc/profile
export SCALA_HOME=/hadoop/scala
export PATH=$SCALA_HOME/bin:$PATH
[root@sht-sgmhadoopnn-01 hadoop]# scp -r /etc/profile
[root@sht-sgmhadoopnn-01 hadoop]# scp -r /etc/profile
[root@sht-sgmhadoopnn-01 hadoop]# scp -r /etc/profile
[root@sht-sgmhadoopnn-01 hadoop]# scp -r /etc/profile
[root@sht-sgmhadoopnn-01 hadoop]# source /etc/profile
[root@sht-sgmhadoopnn-02 hadoop]# source /etc/profile
[root@sht-sgmhadoopdn-01 hadoop]# source /etc/profile
[root@sht-sgmhadoopdn-02 hadoop]# source /etc/profile
[root@sht-sgmhadoopdn-03 hadoop]# source /etc/profile
---------------------------------------------------------------------------------------------------------------------
1.Spark2.0.0下載解壓
[root@sht-sgmhadoopnn-01 hadoop]# wget
[root@sht-sgmhadoopnn-01 hadoop]# tar xzvf spark-2.0.0-bin-hadoop2.7.tgz
[root@sht-sgmhadoopnn-01 hadoop]# mv spark-2.0.0-bin-hadoop2.7 spark
2.配置spark-env.sh
[root@sht-sgmhadoopnn-01 conf]# pwd
/hadoop/spark/conf
[root@sht-sgmhadoopnn-01 conf]# cp spark-env.sh.template spark-env.sh
[root@sht-sgmhadoopnn-01 conf]#
###新增以下5行
[root@sht-sgmhadoopnn-01 conf]# vi spark-env.sh
export SCALA_HOME=/hadoop/scala
export JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera
export SPARK_MASTER_IP=172.16.101.55
export SPARK_WORKER_MEMORY=1g
export SPARK_PID_DIR=/hadoop/pid
export HADOOP_CONF_DIR=/hadoop/hadoop/etc/hadoop
3.配置slaves檔案
[root@sht-sgmhadoopnn-01 conf]# cp slaves.template slaves
[root@sht-sgmhadoopnn-01 conf]# vi slaves
sht-sgmhadoopdn-01
sht-sgmhadoopdn-02
sht-sgmhadoopdn-03
4.將spark資料夾copy到配置slaves檔案的機器上
[root@sht-sgmhadoopnn-01 hadoop]# scp -r spark
[root@sht-sgmhadoopnn-01 hadoop]# scp -r spark
[root@sht-sgmhadoopnn-01 hadoop]# scp -r spark
5.在叢集的每臺機器配置環境變數,生效
[root@sht-sgmhadoopnn-01 hadoop]# vi /etc/profile
export SPARK_HOME=/hadoop/scala
export PATH=$SPARK_HOME/bin:$PATH
[root@sht-sgmhadoopnn-01 hadoop]# scp -r /etc/profile
[root@sht-sgmhadoopnn-01 hadoop]# scp -r /etc/profile
[root@sht-sgmhadoopnn-01 hadoop]# scp -r /etc/profile
[root@sht-sgmhadoopnn-01 hadoop]# scp -r /etc/profile
[root@sht-sgmhadoopnn-01 hadoop]# source /etc/profile
[root@sht-sgmhadoopnn-02 hadoop]# source /etc/profile
[root@sht-sgmhadoopdn-01 hadoop]# source /etc/profile
[root@sht-sgmhadoopdn-02 hadoop]# source /etc/profile
[root@sht-sgmhadoopdn-03 hadoop]# source /etc/profile
6.啟動spark
[root@sht-sgmhadoopnn-01 sbin]# ./start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /hadoop/spark/logs/spark-root-org.apache.spark.deploy.master.Master-1-sht-sgmhadoopnn-01.out
sht-sgmhadoopdn-01: starting org.apache.spark.deploy.worker.Worker, logging to /hadoop/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-sht-sgmhadoopdn-01.telenav.cn.out
sht-sgmhadoopdn-02: starting org.apache.spark.deploy.worker.Worker, logging to /hadoop/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-sht-sgmhadoopdn-02.telenav.cn.out
sht-sgmhadoopdn-03: starting org.apache.spark.deploy.worker.Worker, logging to /hadoop/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-sht-sgmhadoopdn-03.telenav.cn.out
[root@sht-sgmhadoopnn-01 sbin]#
7.web檢視
[root@sht-sgmhadoopnn-01 sbin]# jps
27169 HMaster
26233 NameNode
26641 ResourceManager
2312 Jps
26542 DFSZKFailoverController
2092 Master
27303 RunJar
26989 JobHistoryServer
[root@sht-sgmhadoopdn-01 ~]# jps
19907 Worker
2086 jar
17265 DataNode
17486 NodeManager
20055 Jps
17377 JournalNode
17697 HRegionServer
3671 QuorumPeerMain
8.執行WordCount案例
[root@sht-sgmhadoopnn-01 hadoop]# vi wordcount.txt
hello abc 123
abc hadoop hello hdfs
spark yarn
123 abc hello hdfs spark
wjp wjp abc hello
[root@sht-sgmhadoopnn-01 bin]# spark-shell
scala>
scala> val textfile = sc.textFile("")
textfile: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[1] at textFile at :24
scala> val count=textfile.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)
count: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[4] at reduceByKey at :26
scala> count.collect()
res0: Array[(String, Int)] = Array((hello,4), (123,2), (yarn,1), (abc,4), (wjp,2), (spark,2), (hadoop,1), (hdfs,2))
scala>
###val file=sc.textFile("hadoop fs -ls hdfs://172.16.101.56:8020/wordcount.txt")
val file = sc.textFile("hdfs://namenode:8020/path/to/input")
val counts = file.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
counts.saveAsTextFile("hdfs://namenode:8020/output")
--------------------------------------------------------------------------------------------------------------------------------------------------------
a.本地模式兩執行緒執行
#[root@sht-sgmhadoopdn-01 ~]# ./bin/run-example SparkPi 2>&1 | grep "Pi is roughly"
[root@sht-sgmhadoopnn-01 spark]# ./bin/run-example SparkPi 10 --master local[2]
b.Spark Standalone 叢集模式執行
[root@sht-sgmhadoopnn-01 spark]# ./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://sht-sgmhadoopnn-01:7077 \
examples/jars/spark-examples_2.11-2.0.0.jar \
100
c.注意 Spark on YARN 支援兩種執行模式,分別為yarn-cluster和yarn-client,具體的區別可以看這篇博文,
從廣義上講,yarn-cluster適用於生產環境;而yarn-client適用於互動和除錯,也就是希望快速地看到application的輸出。
#Spark on YARN 叢集上 yarn-cluster 模式執行
[root@sht-sgmhadoopnn-01 spark]# ./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn-cluster ./examples/jars/spark-examples_2.11-2.0.0.jar \
10
spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn-cluster \
--num-executors 3 \
--driver-memory 4g \
--executor-memory 2g \
--executor-cores 1 \
$SPARK_HOME/examples/jars/spark-examples_2.11-2.0.0.jar \
10
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/30089851/viewspace-2122819/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- VirtualHost Examples
- simd examples
- crewAI-examplesAI
- gitglossary learning by examplesGit
- examples for oracle ref cursorsOracle
- "bare repository" learning by examples
- airflow DAG/PIPELINE examples referenceAI
- Ten examples of git-archiveGitHive
- zero-shot-learning-definition-examples-comparison
- cannot find trajectory file at ./examples/trajectory.txt
- 【Basic Abstract Algebra】Exercises for Section 2.1 — Definitions and examples
- go install: no install location for directory outside GOPATHGoIDE
- semantic-ui@2.4.2 install: `gulp install`UI
- pip install 提示:Could not install packages due to an EnvironmentErrorPackageError
- install qdrant
- Install clickhouse
- habitat install
- Python:conda install 和pip install的區別Python
- basictracer-go原始碼閱讀——examples(完結)Go原始碼
- MIT6.S081 - Lecture1: Introduction and ExamplesMIT
- before install octave package you must be install gcc-fortranPackageGC
- mvn install 命令
- KubeSphere 3.3.2 install
- kubernetes 1.25.9 install
- openwrt install lsblk
- pip install METIS
- postgreSQL install pgvectorSQL
- etcd install & configuration
- Deepin Qt InstallQT
- v2.5.4,changelog:add examples in json schema 例子沒找到JSON
- 《CMake實踐》筆記二:INSTALL/CMAKE_INSTALL_PREFIX筆記
- npm install -g 和npm install --save-dev的關係NPMdev
- flutter install安裝app時,出現INSTALL_FAILED_NO_MATCHING_ABIS: INSTALL_FAILED_NO_MATCH錯誤的原因FlutterAPPAI
- npm install 失敗NPM
- npm install 報錯NPM
- centos yum install nginxCentOSNginx
- Install python on AIX 7PythonAI
- 1. install software
- Mysql 5.7.21 install for LinuxMySqlLinux