Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. 1 Spark comes with Apache Hadoop, a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. 2
The currently installed version is
on hpc-env/6.4 Spark/2.4.0-intel-2018a-Hadoop-2.7
If you want to find out more about Spark on the HPC Cluster, you can use the command
module spider Spark
This will show you basic informations e.g. a short description and the currently installed version.
To load the desired version of the module, use the command, e.g.
module load hpc-env/6.4 module load Spark
Always remember: this command is case sensitive!