Java,Python,Scala比較(三)wordcount
眾所周知,wordcount在大資料中的地位相當於helloworld在各種程式語言中的地位。本文並不分析wordcount的計算方法,而是直接給出程式碼,目的是為了比較Spark中Java,Python,Scala的區別。
顯然,Java寫法較為複雜,Python簡單易懂,Scala是Spark的原生程式碼,故即為簡潔。
Java完整程式碼:
import java.util.Arrays;
import java.util.Iterator;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;
import org.apache.spark.api.java.function.VoidFunction;
import scala.Tuple2;
public class wordcount {
public static void main(String[] args) {
SparkConf conf = new SparkConf().setMaster("local").setAppName("wc");
JavaSparkContext sc = new JavaSparkContext(conf);
//read a txtfile
JavaRDD<String> text = sc.textFile("/home/vagrant/speech.txt");
//split(" ")
JavaRDD<String> words = text.flatMap(new FlatMapFunction<String, String>() {
private static final long serialVersionUID = 1L;
@Override
public Iterator<String> call(String line) throws Exception {
return Arrays.asList(line.split(" ")).iterator();
}
});
//word => (word,1)
JavaPairRDD<String,Integer> counts=words.mapToPair(
new PairFunction<String, String, Integer>() {
public Tuple2<String, Integer> call(String s) throws Exception {
return new Tuple2(s, 1);
}
}
);
//reduceByKey
JavaPairRDD <String,Integer> results=counts.reduceByKey(
new Function2<Integer, Integer, Integer>() {
public Integer call(Integer v1, Integer v2) throws Exception {
return v1 + v2;
}
}
) ;
//print
results.foreach(new VoidFunction<Tuple2<String, Integer>>(){
@Override
public void call(Tuple2<String, Integer> t) throws Exception {
System.out.println("("+t._1()+":"+t._2()+")");
}
});
}
}
Pyspark完整程式碼:
# Imports the PySpark libraries
from pyspark import SparkConf, SparkContext
# Configure the Spark context to give a name to the application
sparkConf = SparkConf().setAppName("MyWordCounts")
sc = SparkContext(conf = sparkConf)
# The text file containing the words to count (this is the Spark README file)
textFile = sc.textFile(`/home/vagrant/speech.txt`)
# The code for counting the words (note that the execution mode is lazy)
# Uses the same paradigm Map and Reduce of Hadoop, but fully in memory
wordCounts = textFile.flatMap(lambda line: line.split()).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a+b)
# Executes the DAG (Directed Acyclic Graph) for counting and collecting the result
for wc in wordCounts.collect():
print(wc)
Scala完整程式碼:
import org.apache.spark.{SparkContext,SparkConf}
object test{
def main(args:Array[String]){
val sparkConf = new SparkConf().setMaster("local").setAppName("MyWordCounts")
val sc = new SparkContext(sparkConf)
sc.textFile("/home/vagrant/speech.txt").flatMap(_.split(` `)).map((_,1)).reduceByKey(_+_).foreach(println)
}
}
本次分享到此結束,歡迎大家批評與交流~~
相關文章
- Java、Scala、Python ☞ 本地WordCount詞頻統計對比JavaPython
- Java,Pyhon,Scala比較(一)map,reduceJava
- Python解惑:整數比較 is ==的比較Python
- Java 比較器Java
- JAVA字串比較Java字串
- Scala與Haskell的嚴謹優雅性比較Haskell
- ABAP, Java和JavaScript三種語言的比較JavaScript
- java比較日期大小Java
- Java和JavaSciprt比較Java
- java--BEAN比較JavaBean
- [java之list比較]Java
- JAVA IO效能比較Java
- python字串比較大小Python字串
- Python 模板引擎比較Python
- Python 與 Javascript 比較PythonJavaScript
- 比較 python & perl(轉)Python
- Go和Python比較的話,哪個比較好?GoPython
- 從OOP和FP看蘋果Swift語言與Scala比較OOP蘋果Swift
- 比較Java Swing中三種註冊事件的方法Java事件
- Java Integer型別比較Java型別
- Java 兩個日期比較Java
- 快速排序(oc/java/python/scala)排序JavaPython
- scala陣列與java陣列對比陣列Java
- Scala比java更復雜嗎?是的..但是Java
- Python的內建比較函式cmp比較原理剖析Python函式
- python 批量resize效能比較Python
- Python 解惑:整數比較Python
- Python與Excel VBA比較PythonExcel
- Java三代單元測試工具的比較[轉貼]Java
- 選擇排序(OC/java/python/scala)排序JavaPython
- Java中List集合效能比較Java
- Java 字串比較、拼接問題Java字串
- Java字串建立方式比較Java字串
- Java的BigDecimal比較大小JavaDecimal
- JavaScript 與 Java、PHP 的比較JavaScriptPHP
- 從 Java 到 Scala (三): object 的應用JavaObject
- Scala與Java差異(三)之函式Java函式
- Python的List vs Tuple比較Python