How to convert RDD to Dataframe?

Updated: Oct 25, 2019

Main menu: Spark Scala Tutorial

There are basically three methods by which we can convert a RDD into Dataframe. I am using spark shell to demonstrate these examples. Open spark-shell and import the libraries which are needed to run our code.

Scala> import org.apache.spark.sql.{Row, SparkSession}

Scala> import org.apache.spark.sql.types.{IntegerType, DoubleType, StringType, StructField, StructType}

Now, create a sample RDD with parallelize method.

Scala> val rdd = sc.parallelize(


("One", Array(1,1,1,1,1,1,1)),

("Two", Array(2,2,2,2,2,2,2)),

("Three", Array(3,3,3,3,3,3))

) )

Method 1

If you don't need header, you can directly create it with RDD as input parameter to createDataFrame method.

Scala> val df1 = spark.createDataFrame(rdd)

Method 2

If you need header, you can add the header explicitly by calling method toDF.

Scala> val df2 = spark.createDataFrame(rdd).toDF("Label", "Values")

Method 3

If you need