SparkContext
SparkContext 是Spark 應用的主入口,通過它可以連線Spark 叢集,並在叢集中建立RDD,累加器,廣播變數等;==每一個啟動 JVM 上只能有一個SparkContext,在啟動一個新的SparkContext之前,必須停掉處於活動狀態的SparkContext==。
/**
* Main entry point for Spark functionality. A SparkContext represents the connection to a Spark
* cluster, and can be used to create RDDs, accumulators and broadcast variables on that cluster.
*
* Only one SparkContext may be active per JVM. You must `stop()` the active SparkContext before
* creating a new one. This limitation may eventually be removed; see SPARK-2243 for more details.
*
* @param config a Spark Config object describing the application configuration. Any settings in
* this config overrides the default configs as well as system properties.
*/
class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationClient {
StreamingContext
StreamingContext 是Spark Streaming 應用的主入口,它可以從輸入的資料來源中建立DStream。它可以通過制定Spark master URL 和 appName來建立,也可以從SparkConf 中建立,==或者從已經存在的SparkContext 中建立==。相關聯的SparkContext 可以通過context.sparkContext
得到。建立和轉換DStreams後,流計算可以使用context.start()
啟動或使用context.stop()
停止。context.awaitTermination()
允許當前執行緒一直等待,直到context 進行stop()
或者丟擲異常才會終止。
/**
* Main entry point for Spark Streaming functionality. It provides methods used to create
* [[org.apache.spark.streaming.dstream.DStream]]s from various input sources. It can be either
* created by providing a Spark master URL and an appName, or from a org.apache.spark.SparkConf
* configuration (see core Spark documentation), or from an existing org.apache.spark.SparkContext.
* The associated SparkContext can be accessed using `context.sparkContext`. After
* creating and transforming DStreams, the streaming computation can be started and stopped
* using `context.start()` and `context.stop()`, respectively.
* `context.awaitTermination()` allows the current thread to wait for the termination
* of the context by `stop()` or by an exception.
*/
class StreamingContext private[streaming] (
sc_ : SparkContext,
cp_ : Checkpoint,
batchDur_ : Duration
) extends Logging {
SQLContext
SQLContext 是Spark 中執行==結構化資料==的主入口,可以建立DataFrame 物件,並執行SQL 查詢。
/**
* The entry point for working with structured data (rows and columns) in Spark. Allows the
* creation of [[DataFrame]] objects as well as the execution of SQL queries.
*
* @groupname basic Basic Operations
* @groupname ddl_ops Persistent Catalog DDL
* @groupname cachemgmt Cached Table Management
* @groupname genericdata Generic Data Sources
* @groupname specificdata Specific Data Sources
* @groupname config Configuration
* @groupname dataframes Custom DataFrame Creation
* @groupname Ungrouped Support functions for language integrated queries
*
* @since 1.0.0
*/
class SQLContext private[sql](
@transient val sparkContext: SparkContext,
@transient protected[sql] val cacheManager: CacheManager,
@transient private[sql] val listener: SQLListener,
val isRootContext: Boolean)
extends org.apache.spark.Logging with Serializable {