Apache Spark (PySpark): sparkContext , sparkConf, SparkSession

[COPIED: Courtesy Ashish Kumar Singh]

This is a small and very useful stuff i found from Ashish Kumar Singh. So i wanted to share to in the shape of blog post. Here it is.

Prior to spark 2.0.0

sparkContext was used as a channel to access all spark functionality. The spark driver program uses spark context to connect to the cluster through a resource manager (YARN or Mesos…).

sparkConf is required to create the spark context object, which stores configuration parameter like appName (to identify your spark driver), application, number of core and memory size of executor running on worker node In order to use APIs of SQL, HIVE , and Streaming, separate contexts need to be created. like val conf=newSparkConf() val sc = new SparkContext(conf) val hc = new hiveContext(sc) val ssc = new streamingContext(sc).

SparkSession provides a single point of entry to interact with underlying Spark functionality and allows programming Spark with Dataframe and Dataset APIs. All the functionality available with sparkContext are also available in sparkSession. In order to use APIs of SQL, HIVE, and Streaming, no need to create separate contexts as sparkSession includes all the APIs. Once the SparkSession is instantiated, we can configure Spark’s run-time config properties Hope this will help!

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Follow by Email