Difference between revisions of "Rbatchtools"

From HPC users
Jump to navigationJump to search
(Created page with "== Introduction == "As a successor of the packages BatchJobs and BatchExperiments, batchtools provides a parallel implementation of Map for high performance computing systems...")
 
Line 7: Line 7:
== How to use batchtools ==
== How to use batchtools ==


First of all, you need to load a recent R-module (batchtools may not be available in all R-installations on the cluster):
First of all, you need to load a recent R-module on one of the login nodes (batchtools may not be available in all R-installations on the cluster):
  $ module load hpc-env/8.3
  $ module load hpc-env/8.3
  $ module load R
  $ module load R
Next you start up R, so the following commands are all R-commands (the R-prompt is neglected here for easy cut'n'paste). The package is loaded with:
Next you start up R (still on the login node), so the following commands are all R-commands (the R-prompt is neglected here for easy cut'n'paste). The package is loaded with:
  library(batchtools)
  library(batchtools)
Next we
Next we create a registry to contain the tasks that are later submitted as individual jobs to the cluster (you can of course modify the directory in <tt>td</tt> to your needs):
td  <- tempfile(pattern="test", tmpdir=paste0(Sys.getenv("WORK"),"/R"))
reg = makeRegistry(file.dir = td, seed = 1)
Now we define a function for the task we want to solve, in our case the approximation of Pi:
piApprox = function(n) {
  nums = matrix(runif(2 * n), ncol = 2)
  d = sqrt(nums[, 1]^2 + nums[, 2]^2)
  4 * mean(d <= 1)
}
set.seed(42)
piApprox(1000)
In the last two lines above, we test the function for a given seed and with two times 1000 random numbers. The output should be something like 3.14. Next, we create a list of ten jobs which each will evaluate the function with n=1e5:
batchMap(fun = piApprox, n = rep(1e5, 10))
Once the jobs have been created, we can submit them:
submitJobs(resources = list(walltime = 3600, memory = 1024))
Setting the resources is optional, if not used some defaults will be used. In the example, the runtime is set to 1 hour and the memory to 1024M. Now, batchtools is creating the job scripts from a template and submits them to the cluster. We can get the status and wait for the jobs to complete:
getStatus()
waitForJobs()
Finally, when the jobs are finished, you load the results. In our case, we calculate a mean value:
mean(sapply(1:10, loadResult))

Revision as of 16:27, 11 June 2021

Introduction

"As a successor of the packages BatchJobs and BatchExperiments, batchtools provides a parallel implementation of Map for high performance computing systems managed by schedulers like Slurm, Sun Grid Engine, OpenLava, TORQUE/OpenPBS, Load Sharing Facility (LSF) or Docker Swarm (see the setup section in the vignette)."[1]

One advantage of batchtools is that you can use it within the familiar R-environment, there is no need to learn about job scripts. In addition, it allows simple parallelization for independent tasks.

How to use batchtools

First of all, you need to load a recent R-module on one of the login nodes (batchtools may not be available in all R-installations on the cluster):

$ module load hpc-env/8.3
$ module load R

Next you start up R (still on the login node), so the following commands are all R-commands (the R-prompt is neglected here for easy cut'n'paste). The package is loaded with:

library(batchtools)

Next we create a registry to contain the tasks that are later submitted as individual jobs to the cluster (you can of course modify the directory in td to your needs):

td  <- tempfile(pattern="test", tmpdir=paste0(Sys.getenv("WORK"),"/R"))
reg = makeRegistry(file.dir = td, seed = 1)

Now we define a function for the task we want to solve, in our case the approximation of Pi:

piApprox = function(n) {
  nums = matrix(runif(2 * n), ncol = 2)
  d = sqrt(nums[, 1]^2 + nums[, 2]^2)
  4 * mean(d <= 1)
}
set.seed(42)
piApprox(1000)

In the last two lines above, we test the function for a given seed and with two times 1000 random numbers. The output should be something like 3.14. Next, we create a list of ten jobs which each will evaluate the function with n=1e5:

batchMap(fun = piApprox, n = rep(1e5, 10))

Once the jobs have been created, we can submit them:

submitJobs(resources = list(walltime = 3600, memory = 1024))

Setting the resources is optional, if not used some defaults will be used. In the example, the runtime is set to 1 hour and the memory to 1024M. Now, batchtools is creating the job scripts from a template and submits them to the cluster. We can get the status and wait for the jobs to complete:

getStatus()
waitForJobs()

Finally, when the jobs are finished, you load the results. In our case, we calculate a mean value:

mean(sapply(1:10, loadResult))