spark MapPartitionsRDD
private[spark] class MapPartitionsRDD[U: ClassTag, T: ClassTag](
var prev: RDD[T],
f: (TaskContext, Int, Iterator[T]) => Iterator[U], // (TaskContext, partition index, iterator)
preservesPartitioning: Boolean = false,
isOrderSensitive: Boolean = false)
extends RDD[U](prev) {
override val partitioner = if (preservesPartitioning) firstParent[T].partitioner else None
override def getPartitions: Array[Partition] = firstParent[T].partitions
override def compute(split: Partition, context: TaskContext): Iterator[U] =
f(context, split.index, firstParent[T].iterator(split, context))
override def clearDependencies() {
super.clearDependencies()
prev = null
}
override protected def getOutputDeterministicLevel = {
if (isOrderSensitive && prev.outputDeterministicLevel == DeterministicLevel.UNORDERED) {
DeterministicLevel.INDETERMINATE
} else {
super.getOutputDeterministicLevel
}
}
}
相關文章
- Spark之spark shellSpark
- Spark on Yarn 和Spark on MesosSparkYarn
- Spark系列 - (3) Spark SQLSparkSQL
- Spark學習進度-Spark環境搭建&Spark shellSpark
- 【Spark】Spark容錯機制Spark
- sparkSpark
- spark學習筆記--Spark SQLSpark筆記SQL
- spark學習筆記-- Spark StreamingSpark筆記
- Spark 系列(十四)—— Spark Streaming 基本操作Spark
- Spark 系列(十五)—— Spark Streaming 整合 FlumeSpark
- 【Spark篇】---Spark故障解決(troubleshooting)Spark
- Spark記錄(一):Spark全景概述Spark
- Spark SQL | Spark,從入門到精通SparkSQL
- spark2.2.0 配置spark sql 操作hiveSparkSQLHive
- Hello Spark! | Spark,從入門到精通Spark
- Spark 系列(九)—— Spark SQL 之 Structured APISparkSQLStructAPI
- Spark文件閱讀之一:Spark OverviewSparkView
- Spark學習筆記(三)-Spark StreamingSpark筆記
- Spark —— Spark OOM Error問題排查定位SparkOOMError
- spark with hiveSparkHive
- Spark 加入Spark
- Spark StageSpark
- Spark & ZeppelinSpark
- Spark入門(四)--Spark的map、flatMap、mapToPairSparkAPTAI
- Spark in action on Kubernetes - Spark Operator的原理解析Spark
- Spark API 全集(1):Spark SQL Dataset & DataFrame APISparkAPISQL
- Spark SQL:4.對Spark SQL的理解SparkSQL
- Spark入門(五)--Spark的reduce和reduceByKeySpark
- Spark 系列(十一)—— Spark SQL 聚合函式 AggregationsSparkSQL函式
- Spark 以及 spark streaming 核心原理及實踐Spark
- 【Spark篇】---Spark中Shuffle檔案的定址Spark
- Spark Streaming + Spark SQL 實現配置化ETSparkSQL
- spark與kafaka整合workcount示例 spark-stream-kafkaSparkKafka
- Spark(十三) Spark效能調優之RDD持久化Spark持久化
- spark學習筆記--叢集執行SparkSpark筆記
- Spark之HiveSupport連線(spark-shell和IDEA)SparkHiveIdea
- 1.Spark學習(Python版本):Spark安裝SparkPython
- Spark Streaming監聽HDFS檔案(Spark-shell)Spark