spark中的聚合函式總結
PairRDDFunctions中的函式:
def aggregateByKey[U](zeroValue: U)(seqOp: (U, V) ⇒ U, combOp: (U, U) ⇒ U)(implicit arg0: ClassTag[U]): RDD[(K, U)]
Aggregate the values of each key, using given combine functions and a neutral “zero value”. This function can return a different result type, U, than the type of the values in this RDD, V. Thus, we need one operation for merging a V into a U and one operation for merging two U’s, as in scala.TraversableOnce. The former operation is used for merging values within a partition, and the latter is used for merging values between partitions. To avoid memory allocation, both of these functions are allowed to modify and return their first argument instead of creating a new U.
def aggregateByKey[U](zeroValue: U, numPartitions: Int)(seqOp: (U, V) ⇒ U, combOp: (U, U) ⇒ U)(implicit arg0: ClassTag[U]): RDD[(K, U)]
def aggregateByKey[U](zeroValue: U, partitioner: Partitioner)(seqOp: (U, V) ⇒ U, combOp: (U, U) ⇒ U)(implicit arg0: ClassTag[U]): RDD[(K, U)]
defreduceByKey(func: (V, V) ⇒ V): RDD[(K, V)]
Merge the values for each key using an associative and commutative reduce function. This will also perform the merging locally on each mapper before sending results to a reducer, similarly to a “combiner” in MapReduce. Output will be hash-partitioned with the existing partitioner/ parallelism level.
defreduceByKeyLocally(func: (V, V) ⇒ V): Map[K, V]
Merge the values for each key using an associative and commutative reduce function, but return the results immediately to the master as a Map. This will also perform the merging locally on each mapper before sending results to a reducer, similarly to a “combiner” in MapReduce.
defgroupByKey(partitioner: Partitioner): RDD[(K, Iterable[V])]
Group the values for each key in the RDD into a single sequence. Allows controlling the partitioning of the resulting key-value pair RDD by passing a Partitioner. The ordering of elements within each group is not guaranteed, and may even differ each time the resulting RDD is evaluated.
Note
As currently implemented, groupByKey must be able to hold all the key-value pairs for any key in memory. If a key has too many values, it can result in an
OutOfMemoryError
.,This operation may be very expensive. If you are grouping in order to perform an aggregation (such as a sum or average) over each key, usingPairRDDFunctions.aggregateByKey
orPairRDDFunctions.reduceByKey
will provide much better performance.defcombineByKey[C](createCombiner: (V) ⇒ C, mergeValue: (C, V) ⇒ C, mergeCombiners: (C, C) ⇒ C): RDD[(K, C)]
:org.apache.spark.rdd.RDD[(K,C)])
Simplified version of combineByKeyWithClassTag that hash-partitions the resulting RDD using the existing partitioner/parallelism level. This method is here for backward compatibility. It does not provide combiner classtag information to the shuffle.
- See also
combineByKeyWithClassTag
defcombineByKeyWithClassTag[C](createCombiner: (V) ⇒ C, mergeValue: (C, V) ⇒ C, mergeCombiners: (C, C) ⇒ C)(implicit ct: ClassTag[C]): RDD[(K, C)]
(implicitct:scala.reflect.ClassTag[C]):org.apache.spark.rdd.RDD[(K,C)])
Simplified version of combineByKeyWithClassTag that hash-partitions the resulting RDD using the existing partitioner/parallelism level.
Annotations
@Experimental()
相關文章
- Spark 系列(十一)—— Spark SQL 聚合函式 AggregationsSparkSQL函式
- SQL語句中聚合函式忽略NULL值的總結SQL函式Null
- JS 中的函式 this 指向總結JS函式
- SQL-函式 - 聚合函式SQL函式
- mongoDB中聚合函式java處理MongoDB函式Java
- caffe中各種cblas的函式使用總結函式
- Django:聚合函式Django函式
- Stream聚合函式函式
- 總結常用的字串函式字串函式
- Sigmoid函式總結Sigmoid函式
- python中list方法與函式的學習總結Python函式
- Django(18)聚合函式Django函式
- mysql日期函式總結MySql函式
- PHP常用函式總結PHP函式
- Python利用partial偏函式生成不同的聚合函式Python函式
- Spark 開窗函式Spark函式
- Spark Graphx常用函式Spark函式
- Spark的Shuffle總結分析Spark
- php 驗證格式的函式總結PHP函式
- Python函式引數總結Python函式
- mysql函式全面總結KSVMMySql函式
- Oracle OCP(04):聚合函式Oracle函式
- MySQL函式大全(字串函式,數學函式,日期函式,系統級函式,聚合函式)MySql函式字串
- oracle資料庫常用分析函式與聚合函式的用法Oracle資料庫函式
- Spark SQL 開窗函式SparkSQL函式
- Spark操作開窗函式Spark函式
- #PowerBi 10分鐘學會,以X為結尾的聚合函式函式
- C++ sort排序函式的用法總結C++排序函式
- python函式與方法的區別總結Python函式
- 關於linux下system()函式的總結Linux函式
- 【Spark篇】---SparkSql之UDF函式和UDAF函式SparkSQL函式
- php之正規表示式函式總結PHP函式
- PHP 學習總結之函式PHP函式
- js常見函式總結(一)JS函式
- MySQL視窗函式用法總結MySql函式
- Emgucv使用中常用函式總結函式
- c++函式學習總結C++函式
- php開發常用函式總結PHP函式