Spark 原始碼解析之SparkContext

weixin_33766168發表於2016-12-09

SparkContext

SparkContext 是Spark 應用的主入口，通過它可以連線Spark 叢集，並在叢集中建立RDD，累加器，廣播變數等；==每一個啟動 JVM 上只能有一個SparkContext，在啟動一個新的SparkContext之前，必須停掉處於活動狀態的SparkContext==。

/**
 * Main entry point for Spark functionality. A SparkContext represents the connection to a Spark
 * cluster, and can be used to create RDDs, accumulators and broadcast variables on that cluster.
 *
 * Only one SparkContext may be active per JVM.  You must `stop()` the active SparkContext before
 * creating a new one.  This limitation may eventually be removed; see SPARK-2243 for more details.
 *
 * @param config a Spark Config object describing the application configuration. Any settings in
 *   this config overrides the default configs as well as system properties.
 */
class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationClient {

StreamingContext

StreamingContext 是Spark Streaming 應用的主入口，它可以從輸入的資料來源中建立DStream。它可以通過制定Spark master URL 和 appName來建立，也可以從SparkConf 中建立，==或者從已經存在的SparkContext 中建立==。相關聯的SparkContext 可以通過context.sparkContext得到。建立和轉換DStreams後，流計算可以使用context.start() 啟動或使用context.stop() 停止。
context.awaitTermination() 允許當前執行緒一直等待，直到context 進行stop() 或者丟擲異常才會終止。

/**
 * Main entry point for Spark Streaming functionality. It provides methods used to create
 * [[org.apache.spark.streaming.dstream.DStream]]s from various input sources. It can be either
 * created by providing a Spark master URL and an appName, or from a org.apache.spark.SparkConf
 * configuration (see core Spark documentation), or from an existing org.apache.spark.SparkContext.
 * The associated SparkContext can be accessed using `context.sparkContext`. After
 * creating and transforming DStreams, the streaming computation can be started and stopped
 * using `context.start()` and `context.stop()`, respectively.
 * `context.awaitTermination()` allows the current thread to wait for the termination
 * of the context by `stop()` or by an exception.
 */
class StreamingContext private[streaming] (
    sc_ : SparkContext,
    cp_ : Checkpoint,
    batchDur_ : Duration
  ) extends Logging {

SQLContext

SQLContext 是Spark 中執行==結構化資料==的主入口，可以建立DataFrame 物件，並執行SQL 查詢。

/**
 * The entry point for working with structured data (rows and columns) in Spark.  Allows the
 * creation of [[DataFrame]] objects as well as the execution of SQL queries.
 *
 * @groupname basic Basic Operations
 * @groupname ddl_ops Persistent Catalog DDL
 * @groupname cachemgmt Cached Table Management
 * @groupname genericdata Generic Data Sources
 * @groupname specificdata Specific Data Sources
 * @groupname config Configuration
 * @groupname dataframes Custom DataFrame Creation
 * @groupname Ungrouped Support functions for language integrated queries
 *
 * @since 1.0.0
 */
class SQLContext private[sql](
    @transient val sparkContext: SparkContext,
    @transient protected[sql] val cacheManager: CacheManager,
    @transient private[sql] val listener: SQLListener,
    val isRootContext: Boolean)
  extends org.apache.spark.Logging with Serializable {

Spark原始碼-SparkContext原始碼解析
2017-09-24
Spark原始碼Context
Spark-2.4.0原始碼：sparkContext
2019-05-30
Spark原始碼Context
Spark原始碼解析之Shuffle Writer
2017-12-24
Spark原始碼
Spark原始碼解析之Storage模組
2015-11-17
Spark原始碼
spark reduceByKey原始碼解析
2020-12-06
Spark原始碼
zeppelin spark SparkContext問題 Cannot call methods on a stopped SparkContext
2017-05-18
SparkContext
Spark 原始碼系列（九）Spark SQL 初體驗之解析過程詳解
2019-04-25
Spark原始碼SQL
Spark原始碼分析之MemoryManager
2017-11-11
Spark原始碼
Spark原始碼分析之BlockStore
2017-11-11
Spark原始碼BloC
提交spark程式到yarn出現ERROR SparkContext: Error initializing SparkContext.
2018-02-26
SparkYarnErrorContext
Spark Shuffle機制詳細原始碼解析
2020-11-12
Spark原始碼
Spark原始碼解析-Yarn部署流程（ApplicationMaster）
2020-10-13
Spark原始碼YarnAPPAST
Spark 原始碼系列（六）Shuffle 的過程解析
2019-04-25
Spark原始碼
spark核心(下)——job任務提交原始碼解析
2020-12-16
Spark原始碼
spark 原始碼分析之十三 -- SerializerManager剖析
2019-07-15
Spark原始碼
Spark原始碼分析之DiskBlockMangaer分析
2017-11-10
Spark原始碼BloC
Spark原始碼分析之cahce原理分析
2017-11-11
Spark原始碼
Spark原始碼分析之Checkpoint機制
2017-11-11
Spark原始碼
spark 原始碼分析之十八 -- Spark儲存體系剖析
2019-07-23
Spark原始碼
spark 原始碼分析之十五 -- Spark記憶體管理剖析
2019-07-17
Spark原始碼記憶體
spark的基本運算元使用和原始碼解析
2019-07-23
Spark原始碼
Spark SQL原始碼解析（四）Optimization和Physical Planning階段解析
2020-05-14
SparkSQL原始碼
spark 原始碼分析之十六 -- Spark記憶體儲存剖析
2019-07-18
Spark原始碼記憶體
spark 原始碼分析之十九 -- Stage的提交
2019-07-26
Spark原始碼
spark原始碼之任務提交過程
2018-10-15
Spark原始碼
jQuery原始碼解析之clone()
2019-04-25
jQuery原始碼
jQuery原始碼解析之position()
2019-05-05
jQuery原始碼
Vue原始碼解析之parse
2019-06-19
Vue原始碼
LevelDB 原始碼解析之 Arena
2021-03-29
原始碼
Dubbo原始碼解析之SPI
2018-10-31
原始碼
LevelDB 原始碼解析之 Varint 編碼
2021-03-31
原始碼
Spark 原始碼解析 : DAGScheduler中的DAG劃分與提交
2021-09-09
Spark原始碼
通過WordCount解析Spark RDD內部原始碼機制
2020-09-02
Spark原始碼
Spark原始碼分析之BlockManager通訊機制
2017-11-10
Spark原始碼BloC
Android 原始碼分析之 EventBus 的原始碼解析
2018-08-06
Android原始碼
Spring原始碼之IOC（一）BeanDefinition原始碼解析
2018-12-04
Spring原始碼Bean
Myth原始碼解析系列之五- 服務啟動原始碼解析
2018-01-15
原始碼
Flutter之Navigator原始碼解析
2019-12-26
Flutter原始碼

Spark 原始碼解析之SparkContext

相關文章