View By

Categories

 

Just enough Scala for Spark

Updated: Oct 25, 2019

In this tutorial you will learn just enough Scala for Spark, it's like a quick guide for Scala basics needed for Spark programming, Scala syntax and few Scala examples.


Well, you can't become Scala expert in a day but after reading this post you will be able to write Spark programs. I will be using Spark-shell to run Scala commands, so no installation needed if you have Spark shell running on your machine. I would encourage you to run these commands side-by-side on your machine.


Staring with printing "Hello World", for example,



scala> println("Hello World")

Hello World


For comments you can use double forward slash, or for multiline comments you can use similar syntax like Java. For example, ignore the pipe character it's because I am using spark-shell.


scala> // Hello Data Nebulae - This is single line comment

scala> /* Hello World

| This is multi-line comment

| Data Nebulae

| */



We have two types of variables in Scala - mutable and immutable variables. Mutable variable are defined with var keyword and immutable variable with val keyword.


You can't re-assign immutable variables. For example,


scala> val myNumber :Int = 7

myNumber: Int = 7


scala> var myWord :String = "Hello"

myWord: String = Hello


Because myNumber is immutable variable so re-assignment failed

scala> myNumber = 10

<console>:25: error: reassignment to val

myNumber = 10


scala> myWord = "Dataneb"

myWord: String = Dataneb


You can specify datatype (Int, Double, Boolean, String) in front of variable name, if not Scala compiler will automatically assign the type (called variable type inference).

scala> val myNumber :Int = 10

myNumber: Int = 10


scala> val myFlag = true

myFlag: Boolean = true


You can also assign variables in pairs, basically tuples similar to Python,

scala> val (x, y) = (1, 5)

x: Int = 1

y: Int = 5


keep going..

scala> var (x, y, z) = (1, 2, "Hello")

x: Int = 1

y: Int = 2

z: String = Hello


You can pass these variables to println function

scala> println (x)

1


String interpolation, like you do in other languages s with double quotes;

scala> println(s"Value of x is: $x")

Value of x is: 1


Similar to other languages, you can create a range with step-size and print for each element.

scala> (1 to 5).foreach(println)

1

2

3

4

5

scala> (5 to 1 by -1)

res144: scala.collection.immutable.Range = Range(5, 4, 3, 2, 1)

scala> (5 to 1 by -2)

res145: scala.collection.immutable.Range = Range(5, 3, 1)


Strings are surrounded by double quotes and characters with single quotes, for example,

scala> "Hello Word"

res111: String = Hello Word

scala> 'H'

res112: Char = H

scala> :type ('H')

Char


You can apply similar methods like other languages, length, substring, replace etc, for example


scala> "Hello World".length

res113: Int = 11

scala> "Hello World".size

res1: Int = 11

scala> "Hello World".toUpperCase

res2: String = HELLO WORLD

scala> "Hello World".contains('H')

res5: Boolean = true

scala> 19.toHexString

res4: String = 13

scala> "Hello World".take(3)

res114: String = Hel

scala> "Hello World".drop(3)

res115: String = lo World

scala> "Hello World".substring(3,6)

res116: String = "lo "

scala> "Hello World".replace("H","3")

res123: String = 3ello World

scala> "Hello".map(x=>(x,1))

res7: scala.collection.immutable.IndexedSeq[(Char, Int)] = Vector((H,1), (e,1), (l,1), (l,1), (o,1))



Array, List, Map, Set - behaves similarly like other languages data structures

scala> val a = Array("Hello", "World", "Scala", "Spark")

a: Array[String] = Array(Hello, World, Scala, Spark)

// you can access the elements with index positions

scala> a(0)

res159: String = Hello

scala> (a(0),a(3))

res160: (String, String) = (Hello,Spark)


Similarly List..


// List of Integers

scala> val l = List(1, 2, 3, 4, 5)

l: List[Int] = List(1, 2, 3, 4, 5)

// List of strings

scala> val strings = List("Hello", "World", "Dataneb", "Spark")

strings: List[String] = List(Hello, World, Dataneb, Spark)

// List of List

scala> val listOfList = List(List(1,2,3), List(2,6,7), List(2,5,3))

listOfList: List[List[Int]] = List(List(1, 2, 3), List(2, 6, 7), List(2, 5, 3))

scala> val emptyList = List()

emptyList: List[Nothing] = List()


Similarly Map..


scala> val m = Map("one" -> 1, "two" -> 2 )

m: scala.collection.immutable.Map[String,Int] = Map(one -> 1, two -> 2)

scala> m("two")

res163: Int = 2


Set, returns boolean

scala> val s = Set("Apple", "Orange", "Banana")

s: scala.collection.immutable.Set[String] = Set(Apple, Orange, Banana)

scala> s("Apple")

res164: Boolean = true

scala> s("Grapes")

res165: Boolean = false


Arithmetic operations + (adds), -(subtracts), *(multiply), / (divide), %(remainder) for example,

scala> val (x, y) = (5, 8)

x: Int = 5

y: Int = 8


scala> y%x

res95: Int = 3


scala> res95 + 7

res110: Int = 10


scala> "Hello" + " World"

res0: String = Hello World


Relational operators ==, !=, <, >, >=, <= for example,

scala> y > x

res96: Boolean = true


Logical operators &&, ||, ! for example,


scala> !(y>x && x>y)

res98: Boolean = true


Assignment operators =, +=, %= etc for example, like other languages x+=y is same as x=x+y;

scala> var (x, y) = (5, 8)

x: Int = 5

y: Int = 8


scala> x+=y

scala> x

res102: Int = 13


Array of integers, with println and index

scala> val a = Array(1, 2, 3)

a: Array[Int] = Array(1, 2, 3)


scala> println(s"Sum is ${a(0) + a(1) + a(2)}")

Sum is 6


Defining function has also similar syntax (ignore | character), (Int, Int) => (Int, Int) means function takes two integer argument and returns two integers.

scala> def squareOfNumbers(x: Int, y: Int): (Int,Int) = {(x*x, y*y)

| // for multiline you have to use curly {} brackets

| }

squareOfNumbers: (x: Int, y: Int)(Int, Int)

scala> squareOfNumbers(2,3)

res131: (Int, Int) = (4,9)


Lambda function, if you will not mention datatype, Scala compiler will automatically decide it (inference).

scala> (x:Int) => x+x

res132: Int => Int = <function1>


Int => Int means function takes integer return integer

scala> val func: Int => Int = x => x + x

func: Int => Int = <function1>

scala> func(3)

res133: Int = 6


Takes two integer and returns one integer, first _ for first input and so on..

scala> val underscoreFunc: (Int, Int) => Int = (_ * 3 + _ * 2)

underscoreFunc: (Int, Int) => Int = <function2>

scala> underscoreFunc(7, 5)

res134: Int = 31



if-else statements, for example

scala> x

res139: Int = 5

scala> if (x==5) { println("five") } // curly braces not needed here but in case of multiline program

five

scala> println(if (x==4) println("Hello") else "Bye")

Bye


Loops, while, do-while and for loop


scala> while (i<5) {println(i); i+=1}

0

1

2

3

4

scala> do {println(i); i-=1} while (i>0)

5

4

3

2

1


In Scala, <- is like a generator, read like x in range(1 to 5) similar to Python

scala> for (x <- 1 to 5) println(x)

1

2

3

4

5


Pattern matching, for example;


scala> def patternMatch (x: Int) :String = x match {

| case 1 => "one"

| case 2 => "two"

| case _ => "unknown"

| }

patternMatch: (x: Int)String


scala> patternMatch(2)

res40: String = two


scala> patternMatch(4)

res41: String = unknown


Classes can be defined like other languages, for example

scala> class Dog(breed: String){

| var br: String = breed

| def bark = "Woof woof!"

| private def eat(food: String) =

| println(s"I am eating $food")

| }

defined class Dog


scala> val myDog = new Dog("pitbull")

myDog: Dog = Dog@62882596

scala> myDog.br

res155: String = pitbull

scala> myDog.bark

res156: String = Woof woof!


Case classes, these will be useful while performing data operations, for example

scala> case class Order(orderNum: Int, orderItem: String)

defined class Order

scala> val myOrder = Order(123, "iPhone")

myOrder: Order = Order(123,iPhone)

scala> val anotherOrder = Order(124, "macBook")

anotherOrder: Order = Order(124, macBook)

scala> myOrder.orderItem

res158: String = iPhone



Exercise


For Spark, most of the time you will be writing lambda functions. I have hardly seen complex functions written to transform the data in Spark. Spark has built-in transformations which takes care of complex transformations which you will learn soon.


For practice, try these examples.


Example 1: Area of Circle

scala> def areaCircle(radius:Double ) : Double = 3.14 * radius * radius

areaCircle: (radius: Double)Double

scala> areaCircle(5)

res17: Double = 78.5


Example 2: Sum of Squares of input numbers

scala> def sumOfSquares(x: Int, y:Int) : Int = x*x + y*y

sumOfSquares: (x: Int, y: Int)Int

scala> sumOfSquares(2,3)

res18: Int = 13


Example 3: Reverse the Sign of input number

scala> def reverseTheSign (x: Int) : Int = if (x>0) -x else -x

reverseTheSign: (x: Int)Int

scala> reverseTheSign(-6)

res23: Int = 6

scala> reverseTheSign(6)

res24: Int = -6


Example 4: Factorial of a number (to explain recursion), note how we are calling func within func;

scala> def factorial (x: Int) :Int = if (x==1) x else factorial(x-1)*x

factorial: (x: Int)Int

scala> factorial(4)

res26: Int = 24


Example 5: Defining objects and methods, you can define it like (ignore |)

scala> object MyObject{

| val MyVal = 1

| def MyMethod = "Hello"

| }

defined object MyObject

scala> MyObject.MyMethod

res30: String = Hello


for example;

scala> object Foo {val x = 1}

defined object Foo

scala> object Bar {val x = 2}

defined object Bar

scala> object fooBar {

| val y = Bar.x

| }

defined object fooBar

scala> fooBar.y

res31: Int = 2


Example 6: Sum of Squares using Lambda or anonymous func

scala> val z = (x:Int, y:Int) => x*x + y*y

z: (Int, Int) => Int = <function2>

scala> z(2,3)

res34: Int = 13


Example 7: Filtering the list with anonymous func

scala> List(1,2,3,4,5,6).filter(x => x % 2 == 0)

res39: List[Int] = List(2, 4, 6)


Example 8: For loops with yield

scala> for (x <- 1 to 5) yield x

res42: scala.collection.immutable.IndexedSeq[Int] = Vector(1, 2, 3, 4, 5)

scala> for (x <- 1 to 3; y <- Array("Hello","World")) yield (x, y)

res47: scala.collection.immutable.IndexedSeq[(Int, String)] = Vector((1,Hello), (1,World), (2,Hello), (2,World), (3,Hello), (3,World))


That's all guys! If you have any question please mention in the comments section below. Thank you!



Next: Hello with Eclipse Scala IDE



Navigation menu


1. Apache Spark and Scala Installation

1.1 Spark installation on Windows​

1.2 Spark installation on Mac

2. Getting Familiar with Scala IDE

2.1 Hello World with Scala IDE​

3. Spark data structure basics

3.1 Spark RDD Transformations and Actions example

4. Spark Shell

4.1 Starting Spark shell with SparkContext example​

5. Reading data files in Spark

5.1 SparkContext Parallelize and read textFile method

5.2 Loading JSON file using Spark Scala

5.3 Loading TEXT file using Spark Scala

5.4 How to convert RDD to dataframe?

6. Writing data files in Spark

​6.1 How to write single CSV file in Spark

7. Spark streaming

7.1 Word count example Scala

7.2 Analyzing Twitter texts

8. Sample Big Data Architecture with Apache Spark

9. What's Artificial Intelligence, Machine Learning, Deep Learning, Predictive Analytics, Data Science?

10. Spark Interview Questions and Answers

617 views

Help others, write your first blog today! 

Home   |   Contact Us

©2020 by Data Nebulae