windows下使用idea maven配置spark執行環境、執行WordCount例子以及碰到的問題

後開啟撒打發了發表於2017-12-28

一、安裝JAVA JDK 、Maven 、scala

這些安裝都比較簡單都可以去官網下載最新版本的安裝包一一安裝就可以了。

scala官網下載地址:http://www.scala-lang.org/download/

二、安裝idea scala 外掛

setting—>plugins頁面點選下面的角Browse repositories… 選項搜素scala點選安裝。
這裡面不建議線上安裝外掛,建議根據Updated 2014/12/18去下載離線的scala外掛,比如本文中的IDEA Updated日期是2014/12/18然後找到對應的外掛版本是1.2.1,下載即可。下面是scala外掛的離線下載地址。 scala外掛離線下載地址:https://plugins.jetbrains.com/plugin/1347-scala
然後根據Update日期去找Intellij IDEA對應得scala外掛,不同版本的IDEA對應的scala外掛不一樣,請務必下載對應的scala外掛否則無法識別。
離線外掛下載完成後,將離線scala外掛通過如下方式加入到IDEA中去:點選Install plugin from disk…,然後找到你scala外掛的zip檔案的本機磁碟位置,點ok即可


三、Intellij IDEA通過Maven搭建spark環境

  • (1)開啟IDEA新建一個maven專案。

依次選擇file-->new project--->maven 點選create from archetyep,選擇scala-archetype—simple外掛(有可能版本不對會造成問題,文章後面有解決辦法。)

  • (2)新增scala sdk。 file-->project structure-->global libaraies選擇scala對應版本,千萬要對應不然會造成編譯程式碼的通不過。

四、碰到的一些問題的解決

1.編譯時idea報錯:Error:scalac: error while loading JUnit4, Scala signature

Error:scalac: error while loading JUnit4, Scala signature
JUnit4 has wrong version
expected: 5.0
found: 4.1 in JUnit4.class

這個錯誤是由於我是直接用的maven提供的模版,而沒有注意Archetype中的版本問題。各版本一定要注意對應,解決方法是點選Add Archetype, 新增新的scala-archetype-simple,填寫GAV, groupID:net.alchim31.maven ArtifactID:scala-archetype-simple Version:1.4 如果還有問題,去maven倉庫找合適的版本http://mvnrepository.com

2.scala報錯:scala error:bad option ‘-make:transitive

scala報錯:scala error:bad option ‘-make:transitive

scala版本問題,scala2.11不支援make引數,將pom.xml中的這個引數去掉即可解決

<configuration>
    <args>
        <!--arg>-make:transitive</arg-->
        <arg>-dependencyfile</arg>
        <arg>${project.build.directory}/.scala_dependencies</arg>
    </args>
</configuration>

3.ShouldMatchers is not a member of package org.scalatest

編譯的時候報這些錯誤,就jar包沒有載入好,去maven倉庫找scala版本對應的jar包,參考如下:

<properties>
    <maven.compiler.source>1.6</maven.compiler.source>
    <maven.compiler.target>1.6</maven.compiler.target>
    <encoding>UTF-8</encoding>
    <scala.version>2.11.11</scala.version>
  </properties>

  <dependencies>
    <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>${scala.version}</version>
    </dependency>

    <!-- Test -->
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.11</version>
      <scope>test</scope>
    </dependency>

    <dependency>
      <!--groupId>org.specs2</groupId -->
      <!--artifactId>specs2_${scala.version}</artifactId -->
      <!--version>1.12.3</version -->
      <!--scope>test</scope -->
      <groupId>org.specs2</groupId>
      <artifactId>specs2-core_2.12</artifactId>
      <version>4.0.2</version>
      <scope>test</scope>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.specs2/specs2-junit -->
    <dependency>
      <groupId>org.specs2</groupId>
      <artifactId>specs2-junit_2.12</artifactId>
      <version>3.8.9</version>
      <scope>test</scope>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.specs2/specs2 -->
    <dependency>
      <groupId>org.specs2</groupId>
      <artifactId>specs2_2.11</artifactId>
      <version>3.3.1</version>
      <scope>test</scope>
    </dependency>

    <dependency>
      <!--groupId>org.scalatest</groupId-->
      <!--artifactId>scalatest_${scala.version}</artifactId-->
      <!--version>2.0.M6-SNAP3</version-->
      <!--scope>test</scope-->
      <groupId>org.scalatest</groupId>
      <artifactId>scalatest_2.11</artifactId>
      <version>3.0.1-SNAP1</version>
      <scope>test</scope>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.scalatest/scalatest-funspec -->
    <dependency>
      <groupId>org.scalatest</groupId>
      <artifactId>scalatest-funspec_2.11</artifactId>
      <version>3.0.0-SNAP13</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.scalatest/scalatest -->
    <dependency>
      <groupId>org.scalatest</groupId>
      <artifactId>scalatest_2.12</artifactId>
      <version>3.2.0-SNAP7</version>
      <scope>test</scope>
    </dependency>

  </dependencies>

4、Failed to execute goal org.apache.maven.plugins:maven-archetype-plugin:3.0.1:generate (default-cli) on project standalone-pom: The desired archetype does not exist (net.alchim31.maven::scala-archetype-simple:1.6 ) -> [Help 1]

Failed to execute goal org.apache.maven.plugins:maven-archetype-plugin:3.0.1:generate (default-cli) on project standalone-pom: The desired archetype does not exist (net.alchim31.maven::scala-archetype-simple:1.6 ) -> [Help 1]
  • JAVA_HOME沒有設定或者設定有誤、amven_home設定有誤

  • 刪除/org/apache/maven/plugins/下的maven-archetype-plugin

  • maven-archetype-plugin 2.3版本的外掛有問題,換其它版本進行建立(方案可行)

更新 maven-archetype-plugin版本:
造成這個問題是版本不對maven-archetype-plugin,參考上面的問題1的解決辦法: 這個錯誤是由於我是直接用的maven提供的模版,而沒有注意Archetype中的版本問題。各版本一定要注意對應,解決方法是點選在建立maven專案的選擇archetype的時候點選Add Archetype, 新增新的scala-archetype-simple,填寫GAV, groupID:net.alchim31.maven ArtifactID:scala-archetype-simple Version:1.4 如果還有問題,去maven倉庫找合適的版本http://mvnrepository.com

5.spark saveAsTextFile儲存到檔案

ERROR Shell: Failed to locate the winutils binary in the hadoop binary path

解決參考連結:http://blog.csdn.net/kimyoungvon/article/details/51308651

下載地址:winutils.exe:https://github.com/srccodes/hadoop-common-2.2.0-bin

我的解決方法
下載winutils.exe和hadoop.dll檔案,hadoop.dll檔案放到system32下面,winutils.exe放到C:\hadoop_home\bin資料夾下,然後設定程式碼: 在程式碼中新增:

System.setProperty ("hadoop.home.dir", "C:\\hadoop_home\\")

五、執行WordCount例子。 我是使用scala版本是2.11 spark版本是,下面是我pom.xml以供參考:

<properties>
    <maven.compiler.source>1.6</maven.compiler.source>
    <maven.compiler.target>1.6</maven.compiler.target>
    <encoding>UTF-8</encoding>
    <scala.version>2.11.11</scala.version>

    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <spark.version>2.1.0</spark.version>
    <scala.version>2.11.11</scala.version>
    <hadoop.version>2.7.0</hadoop.version>

  </properties>

  <dependencies>
    <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>${scala.version}</version>
    </dependency>

    <!--spark-->
    <dependency>
      <groupId>org.apache.spark</groupId>
      <!--artifactId>spark-core_${scala.version}</artifactId-->
      <!--version>${spark.version}</version-->
      <artifactId>spark-core_2.11</artifactId>
      <version>2.1.0</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.11</artifactId>
      <version>2.1.0</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-streaming_2.11</artifactId>
      <version>2.1.0</version>
    </dependency>
    ..............
    ..............

wordcount程式碼:

package chen

import org.apache.spark._


object WordCount {

  System.setProperty ("hadoop.home.dir", "C:\\hadoop_home\\")

  def main(args: Array[String]) {
    var masterUrl = "local[1]"
    var inputPath = "D:\\spark_data\\data.txt"
    var outputPath = "D:\\spark_data\\output"

    if (args.length == 1) {
      masterUrl = args(0)
    } else if (args.length == 3) {
      masterUrl = args(0)
      inputPath = args(1)
      outputPath = args(2)
    }

    println(s"masterUrl:${masterUrl}, inputPath: ${inputPath}, outputPath: ${outputPath}")

    val sparkConf = new SparkConf().setMaster(masterUrl).setAppName("WordCount")
    val sc = new SparkContext(sparkConf)

    val rowRdd = sc.textFile(inputPath)
    val resultRdd = rowRdd.flatMap(line => line.split("\\s+"))
      .map(word => (word, 1)).reduceByKey(_ + _)

    resultRdd.saveAsTextFile(outputPath)
  }
}

data.txt:

chen chenxun sb
chenxun
chenliiii
jjjjjjjjjj
ddd
jjjjjjjjjj
jjjjjjjjjj jjjjjjjjj
chenxun chen chenxun chen chen chen

在output目錄下看到一些檔案:

part-00000
_SUCCESS
.part-00000.crc
._SUCCESS.crc

開啟part-00000看到spark計算結果:

(chenxun,4)
(sb,1)
(ddd,1)
(jjjjjjjjjj,3)
(jjjjjjjjj,1)
(chen,5)
(chenliiii,1)

相關文章