top of page
BlogPageTop

Just enough Scala for Spark

In this tutorial you will learn just enough Scala for Spark, it's like a quick guide for Scala basics needed for Spark programming, Scala syntax and few Scala examples.


Well, you can't become Scala expert in a day but after reading this post you will be able to write Spark programs. I will be using Spark-shell to run Scala commands, so no installation needed if you have Spark shell running on your machine. I would encourage you to run these commands side-by-side on your machine.


Staring with printing "Hello World", for example,



scala> println("Hello World")

Hello World


For comments you can use double forward slash, or for multiline comments you can use similar syntax like Java. For example, ignore the pipe character it's because I am using spark-shell.


scala> // Hello Data Nebulae - This is single line comment

scala> /* Hello World

| This is multi-line comment

| Data Nebulae

| */


 

We have two types of variables in Scala - mutable and immutable variables. Mutable variable are defined with var keyword and immutable variable with val keyword.


You can't re-assign immutable variables. For example,


scala> val myNumber :Int = 7

myNumber: Int = 7


scala> var myWord :String = "Hello"

myWord: String = Hello


Because myNumber is immutable variable so re-assignment failed

scala> myNumber = 10

<console>:25: error: reassignment to val

myNumber = 10


scala> myWord = "Dataneb"

myWord: String = Dataneb


You can specify datatype (Int, Double, Boolean, String) in front of variable name, if not Scala compiler will automatically assign the type (called variable type inference).

scala> val myNumber :Int = 10

myNumber: Int = 10


scala> val myFlag = true

myFlag: Boolean = true


You can also assign variables in pairs, basically tuples similar to Python,

scala> val (x, y) = (1, 5)

x: Int = 1

y: Int = 5


keep going..

scala> var (x, y, z) = (1, 2, "Hello")

x: Int = 1

y: Int = 2

z: String = Hello


You can pass these variables to println function

scala> println (x)

1


String interpolation, like you do in other languages s with double quotes;

scala> println(s"Value of x is: $x")

Value of x is: 1


Similar to other languages, you can create a range with step-size and print for each element.

scala> (1 to 5).foreach(println)

1

2

3

4

5

scala> (5 to 1 by -1)

res144: scala.collection.immutable.Range = Range(5, 4, 3, 2, 1)

scala> (5 to 1 by -2)

res145: scala.collection.immutable.Range = Range(5, 3, 1)


Strings are surrounded by double quotes and characters with single quotes, for example,

scala> "Hello Word"

res111: String = Hello Word

scala> 'H'

res112: Char = H

scala> :type ('H')

Char


You can apply similar methods like other languages, length, substring, replace etc, for example


scala> "Hello World".length

res113: Int = 11

scala> "Hello World".size

res1: Int = 11

scala> "Hello World".toUpperCase

res2: String = HELLO WORLD

scala> "Hello World".contains('H')

res5: Boolean = true

scala> 19.toHexString

res4: String = 13

scala> "Hello World".take(3)

res114: String = Hel

scala> "Hello World".drop(3)

res115: String = lo World

scala> "Hello World".substring(3,6)

res116: String = "lo "

scala> "Hello World".replace("H","3")

res123: String = 3ello World

scala> "Hello".map(x=>(x,1))

res7: scala.collection.immutable.IndexedSeq[(Char, Int)] = Vector((H,1), (e,1), (l,1), (l,1), (o,1))


 

Array, List, Map, Set - behaves similarly like other languages data structures

scala> val a = Array("Hello", "World", "Scala", "Spark")

a: Array[String] = Array(Hello, World, Scala, Spark)

// you can access the elements with index positions

scala> a(0)

res159: String = Hello

scala> (a(0),a(3))

res160: (String, String) = (Hello,Spark)


Similarly List..


// List of Integers

scala> val l = List(1, 2, 3, 4, 5)

l: List[Int] = List(1, 2, 3, 4, 5)

// List of strings

scala> val strings = List("Hello", "World", "Dataneb", "Spark")

strings: List[String] = List(Hello, World, Dataneb, Spark)

// List of List

scala> val listOfList = List(List(1,2,3), List(2,6,7), List(2,5,3))

listOfList: List[List[Int]] = List(List(1, 2, 3), List(2, 6, 7), List(2, 5, 3))

scala> val emptyList = List()

emptyList: List[Nothing] = List()


Similarly Map..


scala> val m = Map("one" -> 1, "two" -> 2 )

m: scala.collection.immutable.Map[String,Int] = Map(one -> 1, two -> 2)

scala> m("two")

res163: Int = 2


Set, returns boolean

scala> val s = Set("Apple", "Orange", "Banana")

s: scala.collection.immutable.Set[String] = Set(Apple, Orange, Banana)

scala> s("Apple")

res164: Boolean = true

scala> s("Grapes")

res165: Boolean = false


Arithmetic operations + (adds), -(subtracts), *(multiply), / (divide), %(remainder) for example,

scala> val (x, y) = (5, 8)

x: Int = 5

y: Int = 8


scala> y%x

res95: Int = 3


scala> res95 + 7

res110: Int = 10


scala> "Hello" + " World"

res0: String = Hello World


Relational operators ==, !=, <, >, >=, <= for example,

scala> y > x

res96: Boolean = true


Logical operators &&, ||, ! for example,


scala> !(y>x && x>y)

res98: Boolean = true


Assignment operators =, +=, %= etc for example, like other languages x+=y is same as x=x+y;

scala> var (x, y) = (5, 8)

x: Int = 5

y: Int = 8


scala> x+=y

scala> x

res102: Int = 13


Array of integers, with println and index

scala> val a = Array(1, 2, 3)

a: Array[Int] = Array(1, 2, 3)


scala> println(s"Sum is ${a(0) + a(1) + a(2)}")

Sum is 6


Defining function has also similar syntax (ignore | character), (Int, Int) => (Int, Int) means function takes two integer argument and returns two integers.

scala> def squareOfNumbers(x: Int, y: Int): (Int,Int) = {(x*x, y*y)

| // for multiline you have to use curly {} brackets

| }

squareOfNumbers: (x: Int, y: Int)(Int, Int)

scala> squareOfNumbers(2,3)

res131: (Int, Int) = (4,9)


Lambda function, if you will not mention datatype, Scala compiler will automatically decide it (inference).

scala> (x:Int) => x+x

res132: Int => Int = <function1>


Int => Int means function takes integer return integer

scala> val func: Int => Int = x => x + x

func: Int => Int = <function1>

scala> func(3)

res133: Int = 6


Takes two integer and returns one integer, first _ for first input and so on..

scala> val underscoreFunc: (Int, Int) => Int = (_ * 3 + _ * 2)

underscoreFunc: (Int, Int) => Int = <function2>

scala> underscoreFunc(7, 5)

res134: Int = 31


 

if-else statements, for example

scala> x

res139: Int = 5

scala> if (x==5) { println("five") } // curly braces not needed here but in case of multiline program

five

scala> println(if (x==4) println("Hello") else "Bye")

Bye


Loops, while, do-while and for loop


scala> while (i<5) {println(i); i+=1}

0

1

2

3

4

scala> do {println(i); i-=1} while (i>0)

5

4

3

2

1


In Scala, <- is like a generator, read like x in range(1 to 5) similar to Python

scala> for (x <- 1 to 5) println(x)

1

2

3

4

5


Pattern matching, for example;


scala> def patternMatch (x: Int) :String = x match {

| case 1 => "one"

| case 2 => "two"

| case _ => "unknown"

| }

patternMatch: (x: Int)String


scala> patternMatch(2)

res40: String = two


scala> patternMatch(4)

res41: String = unknown

 

Classes can be defined like other languages, for example

scala> class Dog(breed: String){

| var br: String = breed

| def bark = "Woof woof!"

| private def eat(food: String) =

| println(s"I am eating $food")

| }

defined class Dog


scala> val myDog = new Dog("pitbull")

myDog: Dog = Dog@62882596

scala> myDog.br

res155: String = pitbull

scala> myDog.bark

res156: String = Woof woof!


Case classes, these will be useful while performing data operations, for example

scala> case class Order(orderNum: Int, orderItem: String)

defined class Order

scala> val myOrder = Order(123, "iPhone")

myOrder: Order = Order(123,iPhone)

scala> val anotherOrder = Order(124, "macBook")

anotherOrder: Order = Order(124, macBook)

scala> myOrder.orderItem

res158: String = iPhone


 

Exercise


For Spark, most of the time you will be writing lambda functions. I have hardly seen complex functions written to transform the data in Spark. Spark has built-in transformations which takes care of complex transformations which you will learn soon.


For practice, try these examples.


Example 1: Area of Circle

scala> def areaCircle(radius:Double ) : Double = 3.14 * radius * radius

areaCircle: (radius: Double)Double

scala> areaCircle(5)

res17: Double = 78.5


Example 2: Sum of Squares of input numbers

scala> def sumOfSquares(x: Int, y:Int) : Int = x*x + y*y

sumOfSquares: (x: Int, y: Int)Int

scala> sumOfSquares(2,3)

res18: Int = 13


Example 3: Reverse the Sign of input number

scala> def reverseTheSign (x: Int) : Int = if (x>0) -x else -x

reverseTheSign: (x: Int)Int

scala> reverseTheSign(-6)

res23: Int = 6

scala> reverseTheSign(6)

res24: Int = -6


Example 4: Factorial of a number (to explain recursion), note how we are calling func within func;

scala> def factorial (x: Int) :Int = if (x==1) x else factorial(x-1)*x

factorial: (x: Int)Int

scala> factorial(4)

res26: Int = 24


Example 5: Defining objects and methods, you can define it like (ignore |)

scala> object MyObject{

| val MyVal = 1

| def MyMethod = "Hello"

| }

defined object MyObject

scala> MyObject.MyMethod

res30: String = Hello


for example;

scala> object Foo {val x = 1}

defined object Foo

scala> object Bar {val x = 2}

defined object Bar

scala> object fooBar {

| val y = Bar.x

| }

defined object fooBar

scala> fooBar.y

res31: Int = 2


Example 6: Sum of Squares using Lambda or anonymous func

scala> val z = (x:Int, y:Int) => x*x + y*y

z: (Int, Int) => Int = <function2>

scala> z(2,3)

res34: Int = 13


Example 7: Filtering the list with anonymous func

scala> List(1,2,3,4,5,6).filter(x => x % 2 == 0)

res39: List[Int] = List(2, 4, 6)


Example 8: For loops with yield

scala> for (x <- 1 to 5) yield x

res42: scala.collection.immutable.IndexedSeq[Int] = Vector(1, 2, 3, 4, 5)

scala> for (x <- 1 to 3; y <- Array("Hello","World")) yield (x, y)

res47: scala.collection.immutable.IndexedSeq[(Int, String)] = Vector((1,Hello), (1,World), (2,Hello), (2,World), (3,Hello), (3,World))


That's all guys! If you have any question please mention in the comments section below. Thank you!





Navigation menu


1. Apache Spark and Scala Installation

2. Getting Familiar with Scala IDE

3. Spark data structure basics

4. Spark Shell

5. Reading data files in Spark

6. Writing data files in Spark

7. Spark streaming

Comments


Want to share your thoughts about this blog?

Disclaimer: Please note that the information provided on this website is for general informational purposes only and should not be taken as legal advice. Dataneb is a platform for individuals to share their personal experiences with visa and immigration processes, and their views and opinions may not necessarily reflect those of the website owners or administrators. 

 

While we strive to keep the information up-to-date and accurate, we make no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability, or availability with respect to the website or the information, products, services, or related graphics contained on the website for any purpose. Any reliance you place on such information is therefore strictly at your own risk. 

 

We strongly advise that you consult with a qualified immigration attorney or official government agencies for any specific questions or concerns related to your individual situation. We are not responsible for any losses, damages, or legal disputes arising from the use of information provided on this website. 

 

By using this website, you acknowledge and agree to the above disclaimer and Google's Terms of Use and Privacy Policy.

bottom of page