IDEA WordCount jar包上傳spark除錯及排錯

nanlulululu發表於2019-05-16

Based on:

Mac os

Spark 2.4.3

(Spark running on  a standalone mode  reference blog : http://blog.itpub.net/69908925/viewspace-2644303/     )

scala 2.12.8

IDEA 2019


1  IDEA-File-Project Structure-Libarary-Scala SDK



select  version  2.11.12 


這處選擇的版本需要跟spark scala執行版本一致,預設的是本機裝的Scala版本2.12.8,spark上執行會報主類錯誤


2 新建project ,pom.xml新增依賴


<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>com.ny.service</groupId>
    <artifactId>scala517</artifactId>
    <version>1.0</version>
    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
    <dependencies>
        <!-- https://mvnrepository.com/artifact/org.scala-lang/scala-library -->
        <!-- 以下dependency都要修改成自己的scala,spark,hadoop版本-->
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>2.11.12</version>
        </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.4.3</version>
    </dependency>
    </dependencies>
    <build>
        <!--程式主目錄,按照自己的路徑修改,如果有測試檔案還要加一個testDirectory-->
        <sourceDirectory>src/main/scala</sourceDirectory>
        <plugins>
            <plugin>
                <groupId>org.scala-tools</groupId>
                <artifactId>maven-scala-plugin</artifactId>
                <version>2.15.2</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>compile</goal>
                            <goal>testCompile</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>2.4.3</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <filters>
                                <filter>
                                    <artifact>*:*</artifact>
                                    <excludes>
                                        <exclude>META-INF/*.SF</exclude>
                                        <exclude>META-INF/*.DSA</exclude>
                                        <exclude>META-INF/*.RSA</exclude>
                                    </excludes>
                                </filter>
                            </filters>
                            <!--<transformers>-->
                            <!--<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">-->
                            <!--<mainClass></mainClass>-->
                            <!--</transformer>-->
                            <!--</transformers>-->
                        </configuration>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-jar-plugin</artifactId>
                <configuration>
                    <archive>
                        <manifest>
                            <addClasspath>true</addClasspath>
                            <useUniqueVersions>false</useUniqueVersions>
                            <classpathPrefix>lib/</classpathPrefix>
                            <!--修改為自己的包名.類名,右鍵類->copy reference-->
                            <mainClass>com.ny.service.WordCount</mainClass>
                        </manifest>
                    </archive>
                </configuration>
            </plugin>
        </plugins>
    </build>
</project>



scala library  選擇spark中的Scala版本 2.11.12 也是目前支援的最近版本

org.apache.spark  也選擇2.11   


否則會出現主類錯誤:

19/05/16 10:52:03 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:60010 (size: 22.9 KB, free: 366.3 MB)

19/05/16 10:52:03 INFO SparkContext: Created broadcast 0 from textFile at WordCount.scala:18

Exception in thread "main" java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: scala/runtime/java8/JFunction2$mcIII$sp

at com.nyc.WordCount$.main(WordCount.scala:24)

at com.nyc.WordCount.main(WordCount.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)



如何檢視spark 中Scala版本號

進入路徑:

/usr/local/opt/spark-2.4.3/jars



3 word count測試指令碼


package com.ny.service
import org.apache.spark.{SparkConf, SparkContext}
object WordCount{
  def main(args: Array[String]): Unit = {
    // 1 建立配置資訊
    val conf = new SparkConf().setAppName("wc")
    // 2 建立spark context sc
     val  sc = new SparkContext(conf)
    // 3 處理邏輯
    //讀取檔案
    val lines = sc.textFile(args(0))
    //壓平
    val words = lines.flatMap(_.split(" "))
    //map
    val k2v = words.map((_,1))
    val results = k2v.reduceByKey(_+_)
    //儲存資料
    results.saveAsTextFile(args(1))
    // 4 關閉連線
    sc.stop()
  }
}

4 打包

    


複製到spark家目錄下,因為standalone模式所以沒有啟動Hadoop叢集

nancylulululu:spark-2.4.3 nancy$ mv /Users/nancy/IdeaProjects/scala517/target/original-scala517-1.0.jar wc.jar 


5 spark submit 執行


bin/spark-submit \
--class com.ny.service.WordCount \
--master spark://localhost:7077 \
./wc.jar \
file:///usr/local/opt/spark-2.4.3/test/1test \
file:///usr/local/opt/spark-2.4.3/test/out


如果是Hadoop file改為hdfs檔案系統路徑 


檢視執行結果檔案:

nancylulululu:out nancy$ ls
_SUCCESSpart-00000part-00001
nancylulululu:out nancy$ cat part-00000
(scala,2)
(hive,1)
(mysql,1)
(hello,5)
(java,2)



來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/69908925/viewspace-2644643/,如需轉載,請註明出處,否則將追究法律責任。

相關文章