BlogPageTop

Green Card Calculator

Dataneb Team
Jun 9, 2018
4 min read

Updated: Nov 9, 2019

Main menu: Spark Scala Tutorial

In this Spark Scala tutorial you will learn how to download and install,

Apache Spark (on Windows)
Java Development Kit (JDK)
Eclipse Scala IDE

By the end of this tutorial you will be able to run Apache Spark with Scala on Windows machine, and Eclispe Scala IDE.

JDK Download and Installation

1. First download JDK (Java Development Kit) from this link. If you have already installed Java on your machine please proceed to Spark download and installation.

I have already installed Java SE 8u171/ 8u172 (Windows x64) on my machine. Java SE 8u171 means Java Standard Edition 8 Update 171. This version keeps on changing so just download the latest version available at the time of download and follow these steps.

2. Accept the license agreement and choose the OS type. In my case it is Windows 64 bit platform.

3. Double click on downloaded executable file (jdk*.exe; ~200 MB) to start the installation.

Note down the destination path where JDK is installing and then complete the installation process (for instance in this case it says Install to: C:\Program Files\Java\jdk1.8.0_171\).

Apache Spark Download & Installation

1. Download a pre-built version of Apache Spark from this link. Again, don't worry about the version, it might be different for you. Choose latest Spark release from drop down menu and package type as pre-built for Apache Hadoop.

2. If necessary, download and install WinRAR so that you can extract the .tgz file that you just downloaded.

3. Create a separate directory spark in C drive. Now extract Spark files using WinRAR, and copy its contents from downloads folder => C:\spark.

Please note you should end up with directory structure like C:\spark\bin, C:\spark\conf, etc as shown above.

Configuring windows environment for Apache Spark

4. Make sure you "Hide file extension properties" in your file explorer (view tab) is unchecked. Now go to C:\spark\conf folder and rename log4j.properties.template file to log4j.properties.

You should see filename as log4j.properties and not just log4j.

5. Now open log4j.properties with word pad and change the statement log4j.rootCategory=INFO, console --> log4j.rootCategory=ERROR, console.

Save the file and exit, we did this change to capture ERROR messages only when we run Apache Spark, instead of capturing all INFO.

6. Now create C:\winutils\bin directory.

Download winutils.exe from GitHub and extract all the files. You will find multiple versions of Hadoop inside it, you just need to focus on Hadoop version which you selected while downloading package type pre-built Hadoop 2.x/3.x in Step 1.

Copy all the underlying files (all .dll, .exe etc) from Hadoop version folder and move it into C:\winutils\bin folder. This step is needed to make windows fool as we are running Hadoop. This location (C:\winutils\bin) will act as Hadoop home.

7. Now right-click your Windows menu, Select Control Panel --> System and Security --> System --> “Advanced System Settings” --> then click “Environment Variables” button.

Click on "New" button in User variables and add 3 variables:

SPARK_HOME c:\spark
JAVA_HOME (path you noted while JDK Installation Step 3, for example C:\Program Files\Java\jdk1.8.0_171)
HADOOP_HOME c:\winutils

8. Add the following 2 paths to your PATH user variable. Select "PATH" user variable and edit, if not present create new.

%SPARK_HOME%\bin
%JAVA_HOME%\bin

Download and Install Scala IDE

1. Now install the latest Scala IDE from here. I have installed Scala-SDK-4.7 on my machine. Download the zipped file and extract it. That's it.

2. Under Scala-SDK folder you will find eclipse folder, extract it to c:\eclipse.

Run eclipse.exe and it will open the IDE (we will use this later).

Now test it out!

Open up a Windows command prompt in administrator mode. Right click on command prompt in search menu and run as admin.
Type java -version and hit Enter to check if Java is properly installed. If you see the Java version that means Java is installed properly.
Type cd c:\spark and hit Enter. Then type dir and hit Enter to get a directory listing.
Look for any text file, like README.md or CHANGES.txt.
Type spark-shell and hit Enter.
At this point you should have a scala> prompt as shown below. If not, double check the steps above, check the environment variables and after making change close the command prompt and retry again.
Type val rdd = sc.textFile(“README.md”) and hit Enter. Now type rdd.count() and hit Enter.
You should get a count of the number of lines from readme file! Congratulations, you just ran your first Spark program! We just created a rdd with readme text file and ran count action on it. Don't worry we will be going through this in detail in next sections.
Hit control-D to exit the spark shell, and close the console window.
You’ve got everything set up! Hooray!

Note for Python lovers - To install pySpark continue to this blog.

Thats all!

Guys if it's not running, don't worry. Please mention in comments section below and I will help you out with installation process. Thank you.

Next: Just enough Scala for Spark

5 Comments

JOSEPH Blessingh

Sep 25, 2019

I will tell you exactly what had happened. I just kept PySpark installed with pip long time back, not recently. And then I tried to install Apache Spark on top of it which was creating the issue I think so, I am not sure.

Another issue which I came across was the jar folder inside the Spark installation folder, it didn't have all the files. So I reinstalled it and it worked like a charm!!

Thank you Hina and WhiteSand.

And Amazing Content WhiteSand, Thank you so much!!

Dataneb Team

Sep 24, 2019

Hello Joseph, refer below post for pyspark installation. It's working perfectly fine with Python 3.7.

https://www.dataneb.com/post/installing-spark-on-windows-pyspark

JOSEPH Blessingh

Sep 24, 2019

Hi Hina,

So I have installed Pyspark on python already. But does it conflict with this installation process??

H. Singh

Sep 23, 2019

Hey Joseph, Are you trying to install PySpark (Python Spark)? Because above Spark installation is totally meant for pre-built Scala version 2.11 (windows environment). For Spark (with Scala), make sure java path, spark and winutils (hadoop) home are setup correctly. Manually navigate to those path and double check path details. Run java -version after setting up the path and see if you are getting correct java version.

If you want to run PySpark the installation process is little different. You need to install Python first and mention path details. Refer this - https://pypi.org/project/pyspark/

Thanks

JOSEPH Blessingh

Sep 21, 2019

Error after typing in spark-shell in the cmd:

\Python37-32\Scripts\..' was unexpected at this time.

All the steps were followed correctly. Every Directory is the same as the procedure told by you. I tried Java installation inside Program Files and outside Program Files as well. Both, don't solve the issue.

What else could be the issue? Me and my Friend face the same error

Installing Apache Spark and Scala (Windows)

BIG DATA

By Dataneb Team

On Sat, November 9, 2019 at 8:22 PM UTC • 4 min read

Published in

Reward the writer • Donate

Chat

RECOMMENDED FROM DATANEB

VISA & IMMIGRATION

Loading recommendations from Dataneb...

Fetching summary..

Time to read / Published on MMM DD, YYYY

Want to share your thoughts about this blog?

Disclaimer: Please note that the information provided on this website is for general informational purposes only and should not be taken as legal advice. Dataneb is a platform for individuals to share their personal experiences with visa and immigration processes, and their views and opinions may not necessarily reflect those of the website owners or administrators.

While we strive to keep the information up-to-date and accurate, we make no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability, or availability with respect to the website or the information, products, services, or related graphics contained on the website for any purpose. Any reliance you place on such information is therefore strictly at your own risk.

We strongly advise that you consult with a qualified immigration attorney or official government agencies for any specific questions or concerns related to your individual situation. We are not responsible for any losses, damages, or legal disputes arising from the use of information provided on this website.

By using this website, you acknowledge and agree to the above disclaimer, user agreement, usage policy and Google's Terms of Use and Privacy Policy.

Send Feedback

Home • Blog • About • Terms of Use • Policy • Contact

Processing Time Calculator