大資料學習路線分享Master的jps

好程式設計師IT發表於2019-08-19

大資料學習路線分享Master的jps,SparkSubmit  

   類啟動後的服務程式,用於提交任務,

   哪一段啟動提交任務,哪一段啟動submit(Driver端)

 

提交任務流程


1.Driver端提交任務到Master(啟動sparkSubmit程式)

2.Master生成任務資訊,放入對列中

3.Master通知Worker啟動Executor,(Master過濾出存活的Worker,將任務分配給空閒資源多的worker)

4.worker的Executor向Driver端註冊(只有executor真正參與計算) -> worker從Dirver端拿資訊

5.Driver端啟動Executor將任務劃分階段,分成小的task,再廣播給相應的Worker讓他去執行

6.worker會將執行完的任務回傳給Driver

 

 

range 相當於集合子類

scala> 1.to(10)

res0: scala.collection.immutable.Range.Inclusive = Range(1, 2, 3, 4, 5, 6, 7, 8,

 9, 10)

 

scala> 1 to 10

res1: scala.collection.immutable.Range.Inclusive = Range(1, 2, 3, 4, 5, 6, 7, 8,

 9, 10)

 

提交任務到叢集的任務類 :

Spark context available as  sc

SQL context available as sqlContext

直接呼叫:


spark WordCount

構建模板程式碼:

SparkConf:構建配置資訊類,該配置優先於叢集配置檔案

setAppName :指定應用程式名稱,如果不指定,會自動生成一個類似於uuid產生的名稱

setMaster :指定執行模式:local-用1個執行緒模擬叢集執行,

local[2]: 用2個執行緒模擬叢集執行,loca[*]-當前有多少空閒到的執行緒就用多少執行緒來執行該任務

/**

  * 用spark實現單詞計數

  */

object SparkWordCount {

  def main(args: Array[String]): Unit = {

    /**

      * 構建模板程式碼

      */

    val conf: SparkConf = new SparkConf()

      .setAppName("SparkWordCount")

//      .setMaster("local[2]")

 

    // 建立提交任務到叢集的入口類(上下文物件)

    val sc: SparkContext = new SparkContext(conf)

 

    // 獲取HDFS的資料

    val lines: RDD[String] = sc.textFile(args(0))

 

    // 切分資料,生成一個個單詞

    val words: RDD[String] = lines.flatMap(_.split(" "))

 

    // 把單詞生成一個個元組

    val tuples: RDD[(String, Int)] = words.map((_, 1))

 

    // 進行聚合操作

//    tuples.reduceByKey((x, y) => x + y)

    val sumed: RDD[(String, Int)] = tuples.reduceByKey(_+_)

 

    // 以單詞出現的次數進行降序排序

    val sorted: RDD[(String, Int)] = sumed.sortBy(_._2, false)

 

    // 列印到控制檯

//    println(sorted.collect.toBuffer)

//    sorted.foreach(x => println(x))

//    sorted.foreach(println)

 

    // 把結果儲存到HDFS

    sorted.saveAsTextFile(args(1))

 

    // 釋放資源

    sc.stop()

  }

}

打包後上傳Linux

1.首先啟動zookeeper,hdfs和Spark叢集

啟動hdfs

/usr/local/hadoop-2.6.1/sbin/start-dfs.sh

啟動spark

/usr/local/spark-1.6.1-bin-hadoop2.6/sbin/start-all.sh

 

2.使用spark-submit命令提交Spark應用(注意引數的順序)

/usr/local/spark-1.6.1-bin-hadoop2.6/bin/spark-submit \

--class com.qf.spark.WordCount \

--master spark://node01:7077 \

--executor-memory 2G \

--total-executor-cores 4 \

/root/spark-mvn-1.0-SNAPSHOT.jar \

hdfs://node01:9000/words.txt \

hdfs://node01:9000/out

 

3.檢視程式執行結果

hdfs dfs -cat hdfs://node01:9000/out/part-00000

 

javaSparkWC

import org.apache.spark.SparkConf;

import org.apache.spark.api.java.JavaPairRDD;

import org.apache.spark.api.java.JavaRDD;

import org.apache.spark.api.java.JavaSparkContext;

import org.apache.spark.api.java.function.FlatMapFunction;

import org.apache.spark.api.java.function.Function2;

import org.apache.spark.api.java.function.PairFunction;

import scala.Tuple2;

 

import java.util.Arrays;

import java.util.List;

 

public class JavaSparkWC {

    public static void main(String[] args) {

        SparkConf conf = new SparkConf()

                .setAppName("JavaSparkWC").setMaster("local[1]");

        

//提交任務入口類

        JavaSparkContext jsc = new JavaSparkContext(conf);

 

        //獲取資料

        JavaRDD<String> lines = jsc.textFile("hdfs://hadoop01:9000/wordcount/input/a.txt");

        //切分資料

        JavaRDD<String> words =

lines.flatMap(new FlatMapFunction<String, String>() {

            @Override

            public Iterable<String> call(String s) throws Exception {

                List<String> splited = Arrays.asList(s.split(" ")); //生成list

                return splited;

            }

        });

 

        //生成元祖                               //一對一組 ,(輸入單詞,輸出單詞,輸出1)

        JavaPairRDD<String, Integer> tuples =

words.mapToPair(new PairFunction<String, String, Integer>() {

            @Override

            public Tuple2<String, Integer> call(String s) throws Exception {

                return new Tuple2<String, Integer>(s, 1);

            }

        });

 

        //聚合                                                  //2個相同key的value,聚合

        JavaPairRDD<String, Integer> sumed =

tuples.reduceByKey(new Function2<Integer, Integer, Integer>() {

            @Override

            public Integer call(Integer v1, Integer v2) throws Exception {

                return v1 + v2;

            }

        });

 

        //此前key為String型別,沒有辦法排序

        //Java api並沒有提供sortBy運算元,此時需要把兩個值位置調換,排序完成後,在換回來

        final JavaPairRDD<Integer, String> swaped =

sumed.mapToPair(new PairFunction<Tuple2<String, Integer>, Integer, String>() {

            @Override

            public Tuple2<Integer, String> call(Tuple2<String, Integer> tup) throws Exception {

//                return new Tuple2<Integer, String>(tup._2, tup._1);

                return tup.swap(); //swap(),交換方法

            }

        });

 

        //降序排序

        JavaPairRDD<Integer, String> sorted = swaped.sortByKey(false);

        //再次交換

        JavaPairRDD<String, Integer> res = sorted.mapToPair(

            new PairFunction<Tuple2<Integer, String>, String, Integer>() {

               @Override

               public Tuple2<String, Integer> call(Tuple2<Integer, String> tup)throws Exception {

                    return tup.swap();

               }

        });

 

        System.out.println(res.collect());

 

        jsc.stop();//釋放資源

    }

}


來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/69913892/viewspace-2654150/,如需轉載,請註明出處,否則將追究法律責任。

相關文章