Installing Apache Spark and Scala (Windows)
Updated: Nov 9, 2019
Main menu: Spark Scala Tutorial
In this Spark Scala tutorial you will learn how to download and install,
Apache Spark (on Windows)
Java Development Kit (JDK)
Eclipse Scala IDE
By the end of this tutorial you will be able to run Apache Spark with Scala on Windows machine, and Eclispe Scala IDE.
JDK Download and Installation
1. First download JDK (Java Development Kit) from this link. If you have already installed Java on your machine please proceed to Spark download and installation.
I have already installed Java SE 8u171/ 8u172 (Windows x64) on my machine. Java SE 8u171 means Java Standard Edition 8 Update 171. This version keeps on changing so just download the latest version available at the time of download and follow these steps.

2. Accept the license agreement and choose the OS type. In my case it is Windows 64 bit platform.

3. Double click on downloaded executable file (jdk*.exe; ~200 MB) to start the installation.
Note down the destination path where JDK is installing and then complete the installation process (for instance in this case it says Install to: C:\Program Files\Java\jdk1.8.0_171\).

Apache Spark Download & Installation
1. Download a pre-built version of Apache Spark from this link. Again, don't worry about the version, it might be different for you. Choose latest Spark release from drop down menu and package type as pre-built for Apache Hadoop.

2. If necessary, download and install WinRAR so that you can extract the .tgz file that you just downloaded.
3. Create a separate directory spark in C drive. Now extract Spark files using WinRAR, and copy its contents from downloads folder => C:\spark.

Please note you should end up with directory structure like C:\spark\bin, C:\spark\conf, etc as shown above.
Configuring windows environment for Apache Spark
4. Make sure you "Hide file extension properties" in your file explorer (view tab) is unchecked. Now go to C:\spark\conf folder and rename log4j.properties.template file to log4j.properties.
You should see filename as log4j.properties and not just log4j.
5. Now open log4j.properties with word pad and change the statement log4j.rootCategory=INFO, console --> log4j.rootCategory=ERROR, console.
Save the file and exit, we did this change to capture ERROR messages only when we run Apache Spark, instead of capturing all INFO.
6. Now create C:\winutils\bin directory.
Download winutils.exe from GitHub and extract all the files. You will find multiple versions of Hadoop inside it, you just need to focus on Hadoop version which you selected while downloading package type pre-built Hadoop 2.x/3.x in Step 1.
Copy all the underlying files (all .dll, .exe etc) from Hadoop version folder and move it into C:\winutils\bin folder. This step is needed to make windows fool as we are running Hadoop. This location (C:\winutils\bin) will act as Hadoop home.
7. Now right-click your Windows menu, Select Control Panel --> System and Security --> System --> “Advanced System Settings” --> then click “Environment Variables” button.
Click on "New" button in User variables and add 3 variables:
SPARK_HOME c:\spark
JAVA_HOME (path you noted while JDK Installation Step 3, for example C:\Program Files\Java\jdk1.8.0_171)
HADOOP_HOME c:\winutils

8. Add the following 2 paths to your PATH user variable. Select "PATH" user variable and edit, if not present create new.
%SPARK_HOME%\bin
%JAVA_HOME%\bin

Download and Install Scala IDE
1. Now install the latest Scala IDE from here. I have installed