View By

Categories

 

How to convert RDD to Dataframe?

Updated: Oct 25, 2019



There are basically three methods by which we can convert a RDD into Dataframe. I am using spark shell to demonstrate these examples. Open spark-shell and import the libraries which are needed to run our code.


Scala> import org.apache.spark.sql.{Row, SparkSession}

Scala> import org.apache.spark.sql.types.{IntegerType, DoubleType, StringType, StructField, StructType}



Now, create a sample RDD with parallelize method.


Scala> val rdd = sc.parallelize(

Seq(

("One", Array(1,1,1,1,1,1,1)),

("Two", Array(2,2,2,2,2,2,2)),

("Three", Array(3,3,3,3,3,3))

) )




Method 1


If you don't need header, you can directly create it with RDD as input parameter to createDataFrame method.

Scala> val df1 = spark.createDataFrame(rdd)




Method 2


If you need header, you can add the header explicitly by calling method toDF.

Scala> val df2 = spark.createDataFrame(rdd).toDF("Label", "Values")




Method 3


If you need schema structure then you need RDD of [Row] type. Let's create a new rowsRDD for this scenario.

Scala> val rowsRDD = sc.parallelize(

Seq(

Row("One",1,1.0),

Row("Two",2,2.0),

Row("Three",3,3.0),

Row("Four",4,4.0),

Row("Five",5,5.0)

)

)



Now create the schema with the field names which you need.

Scala> val schema = new StructType().

add(StructField("Label", StringType, true)).

add(StructField("IntValue", IntegerType, true)).

add(StructField("FloatValue", DoubleType, true))


Now create the dataframe with rowsRDD & schema and show dataframe.


Scala> val df3 = spark.createDataFrame(rowsRDD, schema)




Thank you folks! If you have any question please mention in comments section below.



Next: Writing data files in Spark


Navigation menu

1. Apache Spark and Scala Installation

1.1 Spark installation on Windows​

1.2 Spark installation on Mac

2. Getting Familiar with Scala IDE

2.1 Hello World with Scala IDE​

3. Spark data structure basics

3.1 Spark RDD Transformations and Actions example

4. Spark Shell

4.1 Starting Spark shell with SparkContext example​

5. Reading data files in Spark

5.1 SparkContext Parallelize and read textFile method

5.2 Loading JSON file using Spark Scala

5.3 Loading TEXT file using Spark Scala

5.4 How to convert RDD to dataframe?

6. Writing data files in Spark

​6.1 How to write single CSV file in Spark

7. Spark streaming

7.1 Word count example Scala

7.2 Analyzing Twitter texts

8. Sample Big Data Architecture with Apache Spark

9. What's Artificial Intelligence, Machine Learning, Deep Learning, Predictive Analytics, Data Science?

10. Spark Interview Questions and Answers

Write your first blog & earn!!

Home   |   Contact Us

©2019 by Data Nebulae