Difference between revisions of "Rbatchtools"
(Created page with "== Introduction == "As a successor of the packages BatchJobs and BatchExperiments, batchtools provides a parallel implementation of Map for high performance computing systems...") |
|||
Line 7: | Line 7: | ||
== How to use batchtools == | == How to use batchtools == | ||
First of all, you need to load a recent R-module (batchtools may not be available in all R-installations on the cluster): | First of all, you need to load a recent R-module on one of the login nodes (batchtools may not be available in all R-installations on the cluster): | ||
$ module load hpc-env/8.3 | $ module load hpc-env/8.3 | ||
$ module load R | $ module load R | ||
Next you start up R, so the following commands are all R-commands (the R-prompt is neglected here for easy cut'n'paste). The package is loaded with: | Next you start up R (still on the login node), so the following commands are all R-commands (the R-prompt is neglected here for easy cut'n'paste). The package is loaded with: | ||
library(batchtools) | library(batchtools) | ||
Next we | Next we create a registry to contain the tasks that are later submitted as individual jobs to the cluster (you can of course modify the directory in <tt>td</tt> to your needs): | ||
td <- tempfile(pattern="test", tmpdir=paste0(Sys.getenv("WORK"),"/R")) | |||
reg = makeRegistry(file.dir = td, seed = 1) | |||
Now we define a function for the task we want to solve, in our case the approximation of Pi: | |||
piApprox = function(n) { | |||
nums = matrix(runif(2 * n), ncol = 2) | |||
d = sqrt(nums[, 1]^2 + nums[, 2]^2) | |||
4 * mean(d <= 1) | |||
} | |||
set.seed(42) | |||
piApprox(1000) | |||
In the last two lines above, we test the function for a given seed and with two times 1000 random numbers. The output should be something like 3.14. Next, we create a list of ten jobs which each will evaluate the function with n=1e5: | |||
batchMap(fun = piApprox, n = rep(1e5, 10)) | |||
Once the jobs have been created, we can submit them: | |||
submitJobs(resources = list(walltime = 3600, memory = 1024)) | |||
Setting the resources is optional, if not used some defaults will be used. In the example, the runtime is set to 1 hour and the memory to 1024M. Now, batchtools is creating the job scripts from a template and submits them to the cluster. We can get the status and wait for the jobs to complete: | |||
getStatus() | |||
waitForJobs() | |||
Finally, when the jobs are finished, you load the results. In our case, we calculate a mean value: | |||
mean(sapply(1:10, loadResult)) |
Revision as of 15:27, 11 June 2021
Introduction
"As a successor of the packages BatchJobs and BatchExperiments, batchtools provides a parallel implementation of Map for high performance computing systems managed by schedulers like Slurm, Sun Grid Engine, OpenLava, TORQUE/OpenPBS, Load Sharing Facility (LSF) or Docker Swarm (see the setup section in the vignette)."[1]
One advantage of batchtools is that you can use it within the familiar R-environment, there is no need to learn about job scripts. In addition, it allows simple parallelization for independent tasks.
How to use batchtools
First of all, you need to load a recent R-module on one of the login nodes (batchtools may not be available in all R-installations on the cluster):
$ module load hpc-env/8.3 $ module load R
Next you start up R (still on the login node), so the following commands are all R-commands (the R-prompt is neglected here for easy cut'n'paste). The package is loaded with:
library(batchtools)
Next we create a registry to contain the tasks that are later submitted as individual jobs to the cluster (you can of course modify the directory in td to your needs):
td <- tempfile(pattern="test", tmpdir=paste0(Sys.getenv("WORK"),"/R")) reg = makeRegistry(file.dir = td, seed = 1)
Now we define a function for the task we want to solve, in our case the approximation of Pi:
piApprox = function(n) { nums = matrix(runif(2 * n), ncol = 2) d = sqrt(nums[, 1]^2 + nums[, 2]^2) 4 * mean(d <= 1) } set.seed(42) piApprox(1000)
In the last two lines above, we test the function for a given seed and with two times 1000 random numbers. The output should be something like 3.14. Next, we create a list of ten jobs which each will evaluate the function with n=1e5:
batchMap(fun = piApprox, n = rep(1e5, 10))
Once the jobs have been created, we can submit them:
submitJobs(resources = list(walltime = 3600, memory = 1024))
Setting the resources is optional, if not used some defaults will be used. In the example, the runtime is set to 1 hour and the memory to 1024M. Now, batchtools is creating the job scripts from a template and submits them to the cluster. We can get the status and wait for the jobs to complete:
getStatus() waitForJobs()
Finally, when the jobs are finished, you load the results. In our case, we calculate a mean value:
mean(sapply(1:10, loadResult))