Differences between SparkContext & SparkSession
Spark Core (Spark Context) | Spark SQL (Spark Session) |
Spark 1.x, three entry points were introduced: SparkContext, SQLContext and HiveContext.
val sc = new SparkContext(sparkConf) val sqlContext = new SQLContext(sc) val hiveContext = new HiveContext(sc) | Spark 2.0 entry point is Spark Session Spark Session replaced HiveContext and SQLContext Additionally, it gives developers immediate access to SparkContext. // Two ways you can access spark context from spark session val spark_context = sparkSession._sc val spark_context = sparkSession.sparkContext |
Only process Txt/CSV files | We can process Text/CSV/Parquet, ORC, AVRO, JSON, S3, MySQL, ORACLE, HBASE, CASSANDRA Mostly process structured & semi-structured |
Spark Context | Spark Session |
RDD - High level representation of data in Spark Core RDD Only data | DataFrames - High level representation of data in Spark SQL RDD (data) + Schema |
No such API | Data Source API (universal API) Input/Output module of SQL Universal API - reads data from any file system and writes the data to any file system |
Spark Catalyst Optimizer Spark SQL automatically optimizes the big queries | |
Tungsten (Memory Management) | |
Slow | Fast
|
Immutable Cached Lazy | Immutable Cached Lazy |
No comments:
Post a Comment