What is reduceByKey in Spark (example)?

Spark reduceByKey operates on (K, V) pair dataset, but reduce func must be of type (V, V) => V. For example, if you want to reduce all the values to get the total number of occurrences.

scala> val rdd = sc.parallelize(List("Hello Hello Spark Apache Hello Dataneb Dataneb Dataneb Spark"))

rdd: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[2] at parallelize at <console>:24

scala> rdd
.flatMap(words => words.split(" "))
.map(x=>(x,1))
.reduceByKey((x, y)=>x+y)  
.collect

res14: Array[(String, Int)] = Array((Spark,2), (Dataneb,3), (Hello,3), (Apache,1))

0 comments

J1 Visa Program USA - The Ultimate Guide to Green Card

Terms

Policy

Privacy

Contact

What is reduceByKey in Spark (example)?