Rbatchtools
Introduction
"As a successor of the packages BatchJobs and BatchExperiments, batchtools provides a parallel implementation of Map for high performance computing systems managed by schedulers like Slurm, Sun Grid Engine, OpenLava, TORQUE/OpenPBS, Load Sharing Facility (LSF) or Docker Swarm (see the setup section in the vignette)."[1]
One advantage of batchtools is that you can use it within the familiar R-environment, there is no need to learn about job scripts. In addition, it allows simple parallelization for independent tasks.
How to use batchtools
First of all, you need to load a recent R-module on one of the login nodes (batchtools should be available in all R-installations on the cluster but using a recent one is recommended):
$ module load hpc-env/8.3 $ module load R
Next you start up R (still on the login node), so the following commands are all R-commands (the R-prompt is neglected here for easy cut'n'paste). The package is loaded with:
library(batchtools)
Next we create a registry to contain the tasks that are later submitted as individual jobs to the cluster (you can of course modify the directory in td to your needs):
td <- tempfile(pattern="test", tmpdir=paste0(Sys.getenv("WORK"),"/R")) reg = makeRegistry(file.dir = td, seed = 1)
Now we define a function for the task we want to solve, in our case the approximation of Pi:
piApprox = function(n) { nums = matrix(runif(2 * n), ncol = 2) d = sqrt(nums[, 1]^2 + nums[, 2]^2) 4 * mean(d <= 1) } set.seed(42) piApprox(1000)
In the last two lines above, we test the function for a given seed and with two times 1000 random numbers. The output should be something like 3.14. Next, we create a list of ten jobs which each will evaluate the function with n=1e5:
batchMap(fun = piApprox, n = rep(1e5, 10))
Once the jobs have been created, we can submit them:
submitJobs(resources = list(walltime = 3600, memory = 1024))
Setting the resources is optional, if not used some defaults will be used. In the example, the runtime is set to 1 hour and the memory to 1024M. Now, batchtools is creating the job scripts from a template and submits them to the cluster. We can get the status and wait for the jobs to complete:
getStatus() waitForJobs()
Finally, when the jobs are finished, you load the results. In our case, we calculate a mean value:
mean(sapply(1:10, loadResult))