spark中的聚合函式總結

通凡發表於2018-09-13

原文網址 : https://blog.csdn.net/wangxiaotongfan/article/details/82693610

PairRDDFunctions中的函式：

def aggregateByKey[U](zeroValue: U)(seqOp: (U, V) ⇒ U, combOp: (U, U) ⇒ U)(implicit arg0: ClassTag[U]): RDD[(K, U)]

Aggregate the values of each key, using given combine functions and a neutral “zero value”. This function can return a different result type, U, than the type of the values in this RDD, V. Thus, we need one operation for merging a V into a U and one operation for merging two U’s, as in scala.TraversableOnce. The former operation is used for merging values within a partition, and the latter is used for merging values between partitions. To avoid memory allocation, both of these functions are allowed to modify and return their first argument instead of creating a new U.

def aggregateByKey[U](zeroValue: U, numPartitions: Int)(seqOp: (U, V) ⇒ U, combOp: (U, U) ⇒ U)(implicit arg0: ClassTag[U]): RDD[(K, U)]

def aggregateByKey[U](zeroValue: U, partitioner: Partitioner)(seqOp: (U, V) ⇒ U, combOp: (U, U) ⇒ U)(implicit arg0: ClassTag[U]): RDD[(K, U)]

defreduceByKey(func: (V, V) ⇒ V): RDD[(K, V)]

Merge the values for each key using an associative and commutative reduce function. This will also perform the merging locally on each mapper before sending results to a reducer, similarly to a “combiner” in MapReduce. Output will be hash-partitioned with the existing partitioner/ parallelism level.

defreduceByKeyLocally(func: (V, V) ⇒ V): Map[K, V]

:scala.collection.Map[K,V])

Merge the values for each key using an associative and commutative reduce function, but return the results immediately to the master as a Map. This will also perform the merging locally on each mapper before sending results to a reducer, similarly to a “combiner” in MapReduce.

defgroupByKey(partitioner: Partitioner): RDD[(K, Iterable[V])]

Group the values for each key in the RDD into a single sequence. Allows controlling the partitioning of the resulting key-value pair RDD by passing a Partitioner. The ordering of elements within each group is not guaranteed, and may even differ each time the resulting RDD is evaluated.

Note

As currently implemented, groupByKey must be able to hold all the key-value pairs for any key in memory. If a key has too many values, it can result in an OutOfMemoryError.,This operation may be very expensive. If you are grouping in order to perform an aggregation (such as a sum or average) over each key, using PairRDDFunctions.aggregateByKey or PairRDDFunctions.reduceByKey will provide much better performance.

defcombineByKey[C](createCombiner: (V) ⇒ C, mergeValue: (C, V) ⇒ C, mergeCombiners: (C, C) ⇒ C): RDD[(K, C)]

:org.apache.spark.rdd.RDD[(K,C)])

Simplified version of combineByKeyWithClassTag that hash-partitions the resulting RDD using the existing partitioner/parallelism level. This method is here for backward compatibility. It does not provide combiner classtag information to the shuffle.
- See also
combineByKeyWithClassTag

defcombineByKeyWithClassTag[C](createCombiner: (V) ⇒ C, mergeValue: (C, V) ⇒ C, mergeCombiners: (C, C) ⇒ C)(implicit ct: ClassTag[C]): RDD[(K, C)]

(implicitct:scala.reflect.ClassTag[C]):org.apache.spark.rdd.RDD[(K,C)])

Simplified version of combineByKeyWithClassTag that hash-partitions the resulting RDD using the existing partitioner/parallelism level.
- Annotations
  
  @Experimental()

Spark 系列（十一）—— Spark SQL 聚合函式 Aggregations
2019-08-14
SparkSQL函式
SQL語句中聚合函式忽略NULL值的總結
2020-11-22
SQL函式Null
JS 中的函式 this 指向總結
2024-04-06
JS函式
SQL-函式 - 聚合函式
2024-12-04
SQL函式
mongoDB中聚合函式java處理
2019-04-14
MongoDB函式Java
caffe中各種cblas的函式使用總結
2020-04-04
函式
Django：聚合函式
2024-08-20
Django函式
Stream聚合函式
2021-09-28
函式
總結常用的字串函式
2019-06-08
字串函式
Sigmoid函式總結
2019-02-22
Sigmoid函式
python中list方法與函式的學習總結
2021-03-15
Python函式
Django（18）聚合函式
2021-05-19
Django函式
mysql日期函式總結
2020-09-25
MySql函式
PHP常用函式總結
2023-02-23
PHP函式
Python利用partial偏函式生成不同的聚合函式
2024-04-15
Python函式
Spark 開窗函式
2019-07-31
Spark函式
Spark Graphx常用函式
2020-11-26
Spark函式
Spark的Shuffle總結分析
2020-02-15
Spark
php 驗證格式的函式總結
2019-02-16
PHP函式
Python函式引數總結
2018-12-07
Python函式
mysql函式全面總結KSVM
2022-03-01
MySql函式
Oracle OCP(04)：聚合函式
2019-01-16
Oracle函式
MySQL函式大全(字串函式，數學函式，日期函式，系統級函式，聚合函式)
2020-11-14
MySql函式字串
oracle資料庫常用分析函式與聚合函式的用法
2019-01-27
Oracle資料庫函式
Spark SQL 開窗函式
2020-03-23
SparkSQL函式
Spark操作開窗函式
2019-09-02
Spark函式
#PowerBi 10分鐘學會，以X為結尾的聚合函式
2023-05-11
函式
C++ sort排序函式的用法總結
2019-03-09
C++排序函式
python函式與方法的區別總結
2021-09-11
Python函式
關於linux下system()函式的總結
2023-02-27
Linux函式
【Spark篇】---SparkSql之UDF函式和UDAF函式
2018-03-07
SparkSQL函式
php之正規表示式函式總結
2019-02-16
PHP函式
PHP 學習總結之函式
2019-02-16
PHP函式
js常見函式總結（一）
2019-02-15
JS函式
MySQL視窗函式用法總結
2024-05-14
MySql函式
Emgucv使用中常用函式總結
2019-01-04
函式
c++函式學習總結
2018-03-07
C++函式
php開發常用函式總結
2018-03-22
PHP函式