top of page
BlogPageTop

How to convert RDD to Dataframe?



There are basically three methods by which we can convert a RDD into Dataframe. I am using spark shell to demonstrate these examples. Open spark-shell and import the libraries which are needed to run our code.


Scala> import org.apache.spark.sql.{Row, SparkSession}

Scala> import org.apache.spark.sql.types.{IntegerType, DoubleType, StringType, StructField, StructType}



Now, create a sample RDD with parallelize method.


Scala> val rdd = sc.parallelize(

Seq(

("One", Array(1,1,1,1,1,1,1)),

("Two", Array(2,2,2,2,2,2,2)),

("Three", Array(3,3,3,3,3,3))

) )




Method 1


If you don't need header, you can directly create it with RDD as input parameter to createDataFrame method.

Scala> val df1 = spark.createDataFrame(rdd)



 

Method 2


If you need header, you can add the header explicitly by calling method toDF.

Scala> val df2 = spark.createDataFrame(rdd).toDF("Label", "Values")




Method 3


If you need schema structure then you need RDD of [Row] type. Let's create a new rowsRDD for this scenario.

Scala> val rowsRDD = sc.parallelize(

Seq(

Row("One",1,1.0),

Row("Two",2,2.0),

Row("Three",3,3.0),

Row("Four",4,4.0),

Row("Five",5,5.0)

)

)



Now create the schema with the field names which you need.

Scala> val schema = new StructType().

add(StructField("Label", StringType, true)).

add(StructField("IntValue", IntegerType, true)).

add(StructField("FloatValue", DoubleType, true))


Now create the dataframe with rowsRDD & schema and show dataframe.


Scala> val df3 = spark.createDataFrame(rowsRDD, schema)




Thank you folks! If you have any question please mention in comments section below.




Navigation menu

1. Apache Spark and Scala Installation

2. Getting Familiar with Scala IDE

3. Spark data structure basics

4. Spark Shell

5. Reading data files in Spark

6. Writing data files in Spark

7. Spark streaming

Comments


Want to share your thoughts about this blog?

Disclaimer: Please note that the information provided on this website is for general informational purposes only and should not be taken as legal advice. Dataneb is a platform for individuals to share their personal experiences with visa and immigration processes, and their views and opinions may not necessarily reflect those of the website owners or administrators. While we strive to keep the information up-to-date and accurate, we make no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability, or availability with respect to the website or the information, products, services, or related graphics contained on the website for any purpose. Any reliance you place on such information is therefore strictly at your own risk. We strongly advise that you consult with a qualified immigration attorney or official government agencies for any specific questions or concerns related to your individual situation. We are not responsible for any losses, damages, or legal disputes arising from the use of information provided on this website. By using this website, you acknowledge and agree to the above disclaimer and Google's Terms of Use (https://policies.google.com/terms) and Privacy Policy (https://policies.google.com/privacy).

bottom of page