Difference between revisions of "R 2016"
Line 22: | Line 22: | ||
#SBATCH --ntasks=NUMBER_OF_TASKS | #SBATCH --ntasks=NUMBER_OF_TASKS | ||
'''Note (only!) for doMPI:''' | |||
* Before you start R with the ''mpirun'' command you have to unset the environment variable R_PROFILE in your SGE-Script. Otherwise the MPI processes were not spawned. Please add following line to your SGE-script | |||
unset R_PROFILE | |||
* Please use mpi.quit() at the end of your script. Otherwise it will not end. | |||
* Here a small demo R script for doMPI (it writes in b the current rank of MPI): | |||
#!/usr/bin/env Rscript | |||
# | |||
# file name: test_dompi.R | |||
# | |||
library(doMPI) | |||
library("foreach") | |||
# doMPI start | |||
cl <- startMPIcluster() | |||
registerDoMPI(cl) | |||
b<-foreach(i=0:1000, .combine="c") %dopar% { | |||
as.integer(Sys.getenv("PMI_RANK")) | |||
} | |||
closeCluster(cl) | |||
print(b) | |||
mpi.quit() | |||
and he corresponding SGE-script | |||
#!/bin/bash | |||
#$ -S /bin/bash | |||
#$ -N test_dompi | |||
#$ -cwd | |||
#$ -l h_rt=24:00:0 | |||
#$ -l h_vmem=1800M | |||
#$ -pe impi 36 | |||
#$ -R y | |||
#$ -j y | |||
# load modules | |||
module load r/3.2.1 | |||
# unset the environment variable which is needed for Rmpi | |||
# but makes problems with doMPI | |||
unset R_PROFILE | |||
# run R in parallel | |||
mpirun -np $NSLOTS R --slave -f ./test_dompi.R | |||
'''Note (only!) for Rmpi:''' | |||
* The MPI processes were spawned by the ''mpirun'' command. The Rmpi command ''mpi.spawn.Rslaves()'' is not necessary and should not be used within the script! | |||
=== Usage of NetCDF and R === | === Usage of NetCDF and R === |
Revision as of 09:34, 27 March 2017
Introduction
R is a free software environment for statistical computing and graphics.
Using R on the HPC cluster
If you want to use R on the HPC cluster, you will have to load its module. You can do that by using the command
module load R
Since there is only one version of R installed, you dont need to specify a version. If you use the command
module spider R
you will find more informations about the module.
Usage of R and MPI
For parallelization the packages doMPI and Rmpi are installed. To launch an parallel R script inside a SLURM script please use command line
mpirun -np $NSLOTS R --slave -f SCRIPTNAME SCRIPT_CMDLINE_OPTIONS
to enable SLURM to control all processes of your script. Please do not use the batch starting sequence R CMD BATCH!
The corresponding parallel environment in the SLURM submission script is specified by
#SBATCH --ntasks=NUMBER_OF_TASKS
Note (only!) for doMPI:
- Before you start R with the mpirun command you have to unset the environment variable R_PROFILE in your SGE-Script. Otherwise the MPI processes were not spawned. Please add following line to your SGE-script
unset R_PROFILE
- Please use mpi.quit() at the end of your script. Otherwise it will not end.
- Here a small demo R script for doMPI (it writes in b the current rank of MPI):
#!/usr/bin/env Rscript # # file name: test_dompi.R # library(doMPI) library("foreach") # doMPI start cl <- startMPIcluster() registerDoMPI(cl) b<-foreach(i=0:1000, .combine="c") %dopar% { as.integer(Sys.getenv("PMI_RANK")) } closeCluster(cl) print(b) mpi.quit()
and he corresponding SGE-script
#!/bin/bash #$ -S /bin/bash #$ -N test_dompi #$ -cwd #$ -l h_rt=24:00:0 #$ -l h_vmem=1800M #$ -pe impi 36 #$ -R y #$ -j y # load modules module load r/3.2.1 # unset the environment variable which is needed for Rmpi # but makes problems with doMPI unset R_PROFILE # run R in parallel mpirun -np $NSLOTS R --slave -f ./test_dompi.R
Note (only!) for Rmpi:
- The MPI processes were spawned by the mpirun command. The Rmpi command mpi.spawn.Rslaves() is not necessary and should not be used within the script!
Usage of NetCDF and R
A package for NetCDF has been installed together with R. In order to use it, please add the command
module load netCDF
to your job script before starting R. Your R-script should include a line
library(ncdf)
to load the NetCDF library. Please refer to the documentations of NetCDF and R for more informations.
Installed version
The currently installed version of R is 3.3.1.
Additional installed packages
The R release contains a lot of additional packages. After loading and starting R ("module load R" and simply "R" on the command line), you can generate a list of all of them by using the following commands
ip <- as.data.frame(installed.packages()[,c(1,3:4)]) rownames(ip) <- NULL ip <- ip[is.na(ip$Priority),1:2,drop=FALSE] print(ip, row.names=FALSE)
You will receive a list of every package and its related version. It should look like this:
Package Version abc 2.1 abc.data 1.0 abind 1.4-3 acepack 1.3-3.3 adabag 4.1
Documentation
You can look up anything about R on their