How to convert RDD to Dataframe?

Updated: Oct 25, 2019



Main menu: Spark Scala Tutorial

There are basically three methods by which we can convert a RDD into Dataframe. I am using spark shell to demonstrate these examples. Open spark-shell and import the libraries which are needed to run our code.


Scala> import org.apache.spark.sql.{Row, SparkSession}

Scala> import org.apache.spark.sql.types.{IntegerType, DoubleType, StringType, StructField, StructType}



Now, create a sample RDD with parallelize method.


Scala> val rdd = sc.parallelize(

Seq(

("One", Array(1,1,1,1,1,1,1)),

("Two", Array(2,2,2,2,2,2,2)),

("Three", Array(3,3,3,3,3,3))

) )




Method 1


If you don't need header, you can directly create it with RDD as input parameter to createDataFrame method.

Scala> val df1 = spark.createDataFrame(rdd)




Method 2


If you need header, you can add the header explicitly by calling method toDF.

Scala> val df2 = spark.createDataFrame(rdd).toDF("Label", "Values")




Method 3


If you need