Spark修煉之道(進階篇)——Spark入門到精通:第七節 Spark執行原理

五柳-先生發表於2015-11-14

本節主要內容

  1. Spark執行方式
  2. Spark執行原理解析

本節內容及部分圖片來自: 
http://blog.csdn.net/book_mmicky/article/details/25714419 
http://blog.csdn.net/yirenboy/article/details/47441465 
這兩篇檔案對Spark的執行架構原理進行了比較深入的講解,寫得非常好,建議大家認真看一下,在此向作者致敬!

1. Spark執行方式

使用者編寫完Spark應用程式之後,需要將應用程式提交到叢集中執行,提交時使用指令碼spark-submit進行,spark-submit可以帶多種引數,引數選項可以通過下列命令檢視

<code class="hljs ruby has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">root<span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">@sparkmaster</span><span class="hljs-symbol" style="color: rgb(0, 102, 102); box-sizing: border-box;">:/hadoopLearning/spark-</span><span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1.5</span>.<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>-bin-hadoop2.<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4</span>/bin<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># ./spark-submit --help</span>
</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li></ul>

這裡寫圖片描述

可以看到,spark-submit提交引數如下:

<code class="hljs haml has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">./bin/spark-submit \
  -<span class="ruby" style="box-sizing: border-box;">-<span class="hljs-class" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">class</span> <span class="hljs-inheritance" style="box-sizing: border-box;"><<span class="hljs-parent" style="box-sizing: border-box;">main</span></span>-<span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">class</span>></span>
</span>  -<span class="ruby" style="box-sizing: border-box;">-master <master-url> \
</span>  -<span class="ruby" style="box-sizing: border-box;">-deploy-mode <deploy-mode> \
</span>  -<span class="ruby" style="box-sizing: border-box;">-conf <key>=<value> \
</span>  ... # other options
  <application-jar> \
  [application-arguments]</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li></ul>

下面介紹幾種常用Spark應用程式提交方式:

(1)本地執行方式 –master local

<code class="hljs vhdl has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">//<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">--master local,本地執行方式。讀取檔案可以採用本地檔案系統也可採用HDFS,這裡給出的例子是採用本地檔案系統</span>
//從本地檔案系統讀取檔案<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">file</span>:/hadoopLearning/spark-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1.5</span><span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">.0</span>-bin-hadoop2<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">.4</span>/README.md
//生成的結果也儲存到本地檔案系統:<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">file</span>:/SparkWordCountResult
root@sparkmaster:/hadoopLearning/spark-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1.5</span><span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">.0</span>-bin-hadoop2<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">.4</span>/bin# ./spark-submit <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">--master local </span>
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">--class SparkWordCount --executor-memory 1g </span>
/root/IdeaProjects/SparkWordCount/<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">out</span>/artifacts/SparkWordCount_jar/SparkWordCount.jar 
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">file</span>:/hadoopLearning/spark-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1.5</span><span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">.0</span>-bin-hadoop2<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">.4</span>/README.md 
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">file</span>:/SparkWordCountResult</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li></ul>

這裡寫圖片描述

(2)Standalone執行方式 –master spark://sparkmaster:7077

採用Spark自帶的資源管理器進行叢集資源管理

<code class="hljs livecodeserver has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">//standalone執行,指定<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">--master spark://sparkmaster:7077</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">
//採用本地檔案系統,也可採用HDFS</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">
//沒有指定deploy-mode,預設為client deploy mode</span>
root@sparkmaster:/hadoopLearning/spark-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1.5</span><span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">.0</span>-bin-hadoop2<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">.4</span>/bin<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># </span>
./spark-submit <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">--master spark://sparkmaster:7077 </span>
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">--class SparkWordCount --executor-memory 1g </span>
/root/IdeaProjects/SparkWordCount/out/artifacts/SparkWordCount_jar/SparkWordCount.jar 
<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">file</span>:/hadoopLearning/spark-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1.5</span><span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">.0</span>-bin-hadoop2<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">.4</span>/README.md 
<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">file</span>:/SparkWordCountResult2
</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li></ul>

這裡寫圖片描述 
圖片來源:http://blog.csdn.net/book_mmicky/article/details/25714419

在執行過程中,可以通過http://192.168.1.103:4040檢視任務狀態,192.168.1.103為sparkmaster IP地址: 
這裡寫圖片描述

也可以指定為cluster deploy mode,例如:

<code class="hljs haml has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">root@sparkmaster:/hadoopLearning/spark-1.5.0-bin-hadoop2.4/bin# ./spark-submit 
-<span class="ruby" style="box-sizing: border-box;">-master <span class="hljs-symbol" style="color: rgb(0, 102, 102); box-sizing: border-box;">spark:</span>/<span class="hljs-regexp" style="color: rgb(0, 136, 0); box-sizing: border-box;">/sparkmaster:7077 
</span></span>-<span class="ruby" style="box-sizing: border-box;">-deploy-mode cluster
</span>-<span class="ruby" style="box-sizing: border-box;">-supervise --<span class="hljs-class" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">class</span> <span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">SparkWordCount</span> --<span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">executor</span>-<span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">memory</span> 1<span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">g</span> </span>
</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">/root/IdeaProjects/SparkWordCount/out/artifacts/SparkWordCount_jar/SparkWordCount.jar </span>
 file:/hadoopLearning/spark-1.5.0-bin-hadoop2.4/README.md 
 file:/SparkWordCountResult3</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li></ul>

與 clinet deploy mode不同的是 cluster deploy mode中的SparkContext在叢集內部建立。

(3)Yarn執行方式

採用Yarn作為底層資源管理器

<code class="hljs avrasm has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">//Yarn Cluster
root@sparkmaster:/hadoopLearning/spark-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1.5</span><span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">.0</span>-bin-hadoop2<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">.4</span>/bin<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;"># </span>
./spark-submit --master yarn-cluster 
--class org<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.apache</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.spark</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.examples</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.SparkPi</span> 
--executor-memory <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>g 
/root/IdeaProjects/SparkWordCount/<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">out</span>/artifacts/SparkWordCount_jar/SparkWordCount<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.jar</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li></ul>

這裡寫圖片描述 
圖片來源:http://blog.csdn.net/yirenboy/article/details/47441465

<code class="hljs avrasm has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">//Yarn Client
root@sparkmaster:/hadoopLearning/spark-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1.5</span><span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">.0</span>-bin-hadoop2<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">.4</span>/bin<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;"># </span>
./spark-submit --master yarn-client  
--class org<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.apache</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.spark</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.examples</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.SparkPi</span> 
--executor-memory <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>g 
/root/IdeaProjects/SparkWordCount/<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">out</span>/artifacts/SparkWordCount_jar/SparkWordCount<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.jar</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li></ul>

這裡寫圖片描述 
圖片來源:http://blog.csdn.net/yirenboy/article/details/47441465

//Yarn Client執行效果圖 
這裡寫圖片描述

2. Spark執行原理解析

(1)窄依賴與寬依賴

在前面講的Spark程式設計模型當中,我們對RDD中的常用transformation與action 函式進行了講解,我們提到RDD經過transformation操作後會生成新的RDD,前一個RDD與tranformation操作後的RDD構成了lineage關係,也即後一個RDD與前一個RDD存在一定的依賴關係,根據tranformation操作後RDD與父RDD中的分割槽對應關係,可以將依賴分為兩種:寬依賴(wide dependency)和窄依賴(narrow dependency),如下圖所示:

這裡寫圖片描述

圖中的實線空心矩形代表一個RDD,實線空心矩形中的帶陰影的小矩形表示分割槽(partition)。從上圖中可以看到, map,filter、union等transformation操作後的RDD僅依賴於父RDD的固定分割槽,它們是窄依賴的;而groupByKey後的RDD的分割槽與父RDD所有的分割槽都有依賴關係,此時它們就是寬依賴的。join操作存在兩種情況,如果分割槽僅僅依賴於父RDD的某一分割槽,則是窄依賴的,否則就是寬依賴。

(2)Spark job執行原理

spark-submit提交Spark應用程式後,其執行流程如下: 
1 建立SparkContext物件,然後SparkContext會向Clutser Manager(叢集資源管理器),例如Yarn、Standalone、Mesos等申請資源 
2 資源管理器在worker node上建立executor並分配資源(CPU、記憶體等),後期excutor會定時向資源管理器傳送心跳資訊 
3 SparkContext啟動DAGScheduler,將提交的作業(job)轉換成若干Stage,各Stage構成DAG(Directed Acyclic Graph有向無環圖),各個Stage包含若干相task,這些task的集合被稱為TaskSet 
4 TaskSet傳送給TaskSet Scheduler,TaskSet Scheduler將Task傳送給對應的Executor,同時SparkContext將應用程式程式碼傳送到Executor,從而啟動任務的執行 
5 Executor執行Task,完成後釋放相應的資源。

下圖給出了DAGScheduler的工作原理:

這裡寫圖片描述

當RDDG觸發相應的action操作(如collect)後,DAGScheduler會根據程式中的transformation型別構造相應的DAG並生成相應的stage,所有窄依賴構成一個stage,而單個寬依賴會生成相應的stage。上圖中的黑色矩形表示這些RDD被快取過,因此上圖中的只需要計算stage2、 stage3即可

前面我們提到各Stage由若干個task組成,這些task構建taskset,最終交給Task Scheduler進行排程,最終將task傳送到executor上執行,如下圖所示 。

這裡寫圖片描述

(3)spark-Shell jobs排程演示

在spark-master上,啟動spark-shell

<code class="hljs livecodeserver has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">root@sparkmaster:/hadoopLearning/spark-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1.5</span><span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">.0</span>-bin-hadoop2<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">.4</span>/bin<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># </span>
./spark-<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">shell</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">--master spark://sparkmaster:7077 </span>
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">--executor-memory 1g</span>
</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li></ul>

開啟瀏覽器,輸入: http://sparkmaster:4040/,並點選executors,可以檢視叢集中所有的executor,如下圖所示 
這裡寫圖片描述 
從圖中可以看到sparkmaster除了是一個executor之外,它還是一個driver即(standalone clinet模式)

<code class="hljs livecodeserver has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">val rdd1= sc.textFile(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"/README.md"</span>)
.flatMap(<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">line</span> => <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">line</span>.<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">split</span>(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">" "</span>))
.map(<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">word</span> => (<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">word</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>))
.groupByKey().reduceByKey((<span class="hljs-operator" style="box-sizing: border-box;">a</span>,b)=><span class="hljs-operator" style="box-sizing: border-box;">a</span>+b)</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li></ul>

這裡寫圖片描述

點選stage 1 對應的的map,檢視該stage中對應的task資訊及在對應的executor上的執行情況: 
這裡寫圖片描述

轉載: http://blog.csdn.net/lovehuangjiaju/article/details/48634607

相關文章