Spark 2016
Introduction
Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. 1 Spark comes with Apache Hadoop, a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. 2
Installed version
The currently installed version is
on hpc-env/6.4 Spark/2.4.0-intel-2018a-Hadoop-2.7
Using Spark
If you want to find out more about Spark on the HPC Cluster, you can use the command
module spider Spark
This will show you basic informations e.g. a short description and the currently installed version.
To load the desired version of the module, use the command, e.g.
module load hpc-env/6.4 module load Spark
Always remember: this command is case sensitive!
Documentation
An informative quick start guide can be found here, and the documentation page can be found here. If you need more information about Hadoop, consider visiting Apaches website