Spark修煉之道(進階篇)——Spark入門到精通:第三節 Spark Intellij IDEA開發環境搭建

五柳-先生發表於2015-11-14

本節主要內容

  1. Intellij IDEA 14.1.4開發環境配置
  2. Spark應用程式開發

1. Intellij IDEA 14.1.4開發環境配置

Intellij IDEA 功能十分強大,能夠開發JAVA、Scala等相關應用程式,在依賴管理 
智慧提示等方面做到了極致,大家可以到:http://www.jetbrains.com/idea/download/下載,目前有兩種:Ultimate Edition Free 30-day trial;Community Edition FREE。Ultimate版本是商業軟體,需要付費,Community 版為免費版,足夠平時日常開發需要。最新的版是 Intellij IDEA 14.1.4,但Intellij IDEA 沒有自帶scala開發外掛,需要手功安裝,但本人測試的時候發現,直接在Intellij IDEA中裝很難裝成功(箇中原因大家懂得),為此本人將帶有Scala外掛的Intellij IDEA已經打包好了,大家直接下載就可以進行Scala及後期的Spark應用程式開發,下載地址:連結:http://pan.baidu.com/s/1sjmS3jJ 密碼:rcsy 
當然,上面提供的是Linux環境下的Intellij IDEA。

下載完成後,解壓到/hadoopLearning目錄,得到 
這裡寫圖片描述 
將其設定到環境變數 vim /etc/profile,新增紅色下劃線內容 
這裡寫圖片描述

然後執行

<code class="hljs ruby has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">root<span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">@sparkmaster</span><span class="hljs-symbol" style="color: rgb(0, 102, 102); box-sizing: border-box;">:/hadoopLearning</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># idea.sh </span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li></ul>

啟動Intellij IDEA,如下圖所示(本機上已經建立過scala工程專案) 
這裡寫圖片描述

(1)建立Scala專案

File->new->Project,如下圖 
這裡寫圖片描述 
選擇Scala 
這裡寫圖片描述 
然後next 
這裡寫圖片描述 
其中Project SDK指定安裝的JDK,Scala SDK指定安裝的Scala(這裡使用的是IDEA自帶的scala SDK),這裡將專案名稱命令為SparkWordCount,然後finish 
這裡寫圖片描述

在IDEA中開發應用程式時,常常需要通過一定的檔案目錄組織進行原始碼編寫,例如原始檔目錄、測試原始檔目錄,下面演示在Intellij IDEA的src目錄下建立main/scala原始檔目錄。 
直接按F4或右鍵點選工程檔案 
這裡寫圖片描述

再選擇open module setting,開啟專案配置,點選src目錄,然後右鍵建立main/scala資料夾,再點選scala資料夾為sources,如下圖所示 
這裡寫圖片描述

### (2)匯入Spark 1.5.0依賴包 
直接F4開啟Project Structure,然後選擇libraries 
這裡寫圖片描述 
點選上圖中的+新增外部依賴包,選擇”java”,然後再選擇spark-assembly-1.5.0-hadoop2.4.0.jar 
這裡寫圖片描述 
成功後如下圖 
這裡寫圖片描述

至此Spark開發環境配置完成

2. Spark應用程式開發

(1) 本地方式執行Spark WordCount程式

在src/main/scala原始檔目錄中建立一個SparkWordCount 應用程式物件,編輯內容如下:

<code class="hljs avrasm has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">import org<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.apache</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.spark</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.SparkContext</span>._
import org<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.apache</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.spark</span>.{SparkConf, SparkContext}

object SparkWordCount{
  def main(args: Array[String]) {
    //輸入檔案既可以是本地linux系統檔案,也可以是其它來原始檔,例如HDFS
    if (args<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.length</span> == <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>) {
      System<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.err</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.println</span>(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"Usage: SparkWordCount <inputfile>"</span>)
      System<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.exit</span>(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>)
    }
    //以本地執行緒方式執行,可以指定執行緒個數,
    //如<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.setMaster</span>(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"local[2]"</span>),兩個執行緒執行
    //下面給出的是單執行緒執行
    val conf = new SparkConf()<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.setAppName</span>(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"SparkWordCount"</span>)<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.setMaster</span>(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"local"</span>)
    val sc = new SparkContext(conf)

    //wordcount操作,計算檔案中包含Spark的行數
    val count=sc<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.textFile</span>(args(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>))<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.filter</span>(line => line<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.contains</span>(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"Spark"</span>))<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.count</span>()
    //列印結果
     println(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"count="</span>+count)
    sc<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.stop</span>()
  }
}</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li></ul>

編譯程式碼,直接Build->Make Project 
這裡寫圖片描述 
然後程式設計執行引數,Run->Edit Configurations 
這裡寫圖片描述 
Main Class輸入:SparkWordCount 
Program arguments輸入:/hadoopLearning/spark-1.5.0-bin-hadoop2.4/README.md 
如下圖: 
這裡寫圖片描述

完成後直接Run->Run或Alt+Shift+F10執行程式,執行結果如下圖: 
這裡寫圖片描述

(2) Spark叢集上執行Spark WordCount程式

將SparkWordCount打包成Jar檔案

將程式內容修改如下:

<code class="hljs avrasm has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">import org<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.apache</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.spark</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.SparkContext</span>._
import org<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.apache</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.spark</span>.{SparkConf, SparkContext}

object SparkWordCount{
  def main(args: Array[String]) {
    //輸入檔案既可以是本地linux系統檔案,也可以是其它來原始檔,例如HDFS
    if (args<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.length</span> == <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>) {
      System<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.err</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.println</span>(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"Usage: SparkWordCount <inputfile> <outputfile>"</span>)
      System<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.exit</span>(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>)
    }
    //提交叢集時,本地執行緒不起作用
    val conf = new SparkConf()<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.setAppName</span>(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"SparkWordCount"</span>)<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.setMaster</span>(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"local"</span>)
    val sc = new SparkContext(conf)

    //rdd2為所有包含Spark的行
    val rdd2=sc<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.textFile</span>(args(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>))<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.filter</span>(line => line<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.contains</span>(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"Spark"</span>))
    //儲存內容,在例子中是儲存在HDFS上
    rdd2<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.saveAsTextFile</span>(args(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>))
    sc<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.stop</span>()
  }
}</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li></ul>

點選工程SparkWordCount,然後按F4打個Project Structure並選擇Artifacts,如下圖 
這裡寫圖片描述 
選擇Jar->form modules with dependencies,如下圖 
這裡寫圖片描述 
進入下面的介面這裡寫圖片描述 
在main class中,選擇SparkWordCount,如下圖 
這裡寫圖片描述 
點選確定後得到如下介面 
這裡寫圖片描述

因為後期提交到叢集上執行,因此相關jar包都存在,為減小jar包的體積,將spark-assembly-1.5.0-hadoop2.4.0.jar等jar包刪除即可,如下圖 
這裡寫圖片描述 
確定後,再點選Build->Build Artifacts 
這裡寫圖片描述 
生成後的jar檔案儲存在root@sparkmaster:~/IdeaProjects/SparkWordCount/out/artifacts/SparkWordCount_jar# 目錄中,如下圖: 
這裡寫圖片描述

提交叢集執行

<code class="hljs ruby has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">./spark-submit --master <span class="hljs-symbol" style="color: rgb(0, 102, 102); box-sizing: border-box;">spark:</span>/<span class="hljs-regexp" style="color: rgb(0, 136, 0); box-sizing: border-box;">/sparkmaster:7077 --class SparkWordCount --executor-memory 1g /root</span><span class="hljs-regexp" style="color: rgb(0, 136, 0); box-sizing: border-box;">/IdeaProjects/</span><span class="hljs-constant" style="box-sizing: border-box;">SparkWordCount</span>/out/artifacts/<span class="hljs-constant" style="box-sizing: border-box;">SparkWordCount_jar</span>/<span class="hljs-constant" style="box-sizing: border-box;">SparkWordCount</span>.jar <span class="hljs-symbol" style="color: rgb(0, 102, 102); box-sizing: border-box;">hdfs:</span>/<span class="hljs-regexp" style="color: rgb(0, 136, 0); box-sizing: border-box;">/ns1/</span><span class="hljs-constant" style="box-sizing: border-box;">README</span>.md <span class="hljs-symbol" style="color: rgb(0, 102, 102); box-sizing: border-box;">hdfs:</span>/<span class="hljs-regexp" style="color: rgb(0, 136, 0); box-sizing: border-box;">/ns1/</span><span class="hljs-constant" style="box-sizing: border-box;">SparkWordCountResult</span>
</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li></ul>

將任務提交到叢集: 
這裡寫圖片描述

執行結果: 
這裡寫圖片描述

HDFS檔案已經生成了SparkWordCountResult 
這裡寫圖片描述

使用

<code class="hljs ruby has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">root<span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">@sparkmaster</span><span class="hljs-symbol" style="color: rgb(0, 102, 102); box-sizing: border-box;">:/hadoopLearning/spark-</span><span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1.5</span>.<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>-bin-hadoop2.<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4</span>/bin<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># hadoop dfs -ls /SparkWordCountResult</span>
root<span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">@sparkmaster</span><span class="hljs-symbol" style="color: rgb(0, 102, 102); box-sizing: border-box;">:/hadoopLearning/spark-</span><span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1.5</span>.<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>-bin-hadoop2.<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4</span>/bin<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># hadoop dfs -cat /SparkWordCountResult/part-00000</span>
</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li></ul>

檢視目錄內容,具體結果如下圖所示:

這裡寫圖片描述

轉載:http://blog.csdn.net/lovehuangjiaju/article/details/48577281

相關文章