R 2016

From HPC users
Jump to navigationJump to search

Introduction

R is a free software environment for statistical computing and graphics.

Using R on the HPC cluster

If you want to use R on the HPC cluster, you will have to load its module. You can do that by using the command

module load R

Since there is only one version of R installed, you dont need to specify a version. If you use the command

module spider R

you will find more informations about the module.

Basic Job Script for R

Suppose you want to create 100 random numbers and calculate their mean and standard deviation. In R the commands for that would be:

x <- runif(100,0.0,1.0)
mean(x)
sd(x)

If you want to do the calculation on the cluster store the above commands in a file named e.g. 'Rtest.R'. Then create a job script 'Rtest.job' with the content:

#!/bin/bash

#SBATCH --job-name=Rtest
#SBATCH --partition=carl.p
#SBATCH --time=24:00:0
#SBATCH --mem=5000M
   
# load modules
module load R

# run R 
R -f ./Rtest.R

and submit a job with the command:

sbatch Rtest.job

The output of R will appear in a file called 'slurm-<jobid>.out' once the job has been completed.

Usage of R and MPI

For parallelization the package doMPI is installed. To launch an parallel R script inside a SLURM script please use command line

  mpirun -np $NSLOTS R --slave -f SCRIPTNAME SCRIPT_CMDLINE_OPTIONS

to enable SLURM to control all processes of your script. Please do not use the batch starting sequence R CMD BATCH!

The corresponding parallel environment in the SLURM submission script is specified by

 #SBATCH --ntasks=NUMBER_OF_TASKS

Note for doMPI:

  • Before you start R with the mpirun command you have to unset the environment variable R_PROFILE in your SLURM-Script. Otherwise the MPI processes were not spawned. Please add following line to your jobscript:
  unset R_PROFILE
  • Please use mpi.quit() at the end of your script. Otherwise it will not end.
  • Here a small example R script for doMPI (it writes the current rank of MPI in b):
 #!/usr/bin/env Rscript
 #
 # file name: test_dompi.R
 #
 
 library(doMPI)
 library("foreach")
 
 # doMPI start
 cl <- startMPIcluster()
 registerDoMPI(cl)
 
 b<-foreach(i=0:1000, .combine="c") %dopar% {
   as.integer(Sys.getenv("PMI_RANK"))
 }
 closeCluster(cl)
 print(b)
 
 mpi.quit()

and he corresponding SLURM-script

 #!/bin/bash

 #SBATCH --job-name=test_dompi
 #SBATCH --time=24:00:0
 #SBATCH --mem=1800M
 #SBATCH --output=bcftools-test.%j.out        
 #SBATCH --error=bcftools-test.%j.err 
 #SBATCH --ntasks=36
   
 # load modules
 module load R
 
 # unset the environment variable which is needed for Rmpi
 # but makes problems with doMPI
 unset R_PROFILE
 
 # run R in parallel
 mpirun -np $SLURM_NTASKS R --slave -f ./test_dompi.R

Usage of NetCDF and R

A package for NetCDF has been installed together with R. In order to use it, please add the command

module load netCDF

to your job script before starting R. Your R-script should include a line

library(ncdf)

to load the NetCDF library. Please refer to the documentations of NetCDF and R for more informations.

Installed version

The currently installed version of R is 3.3.1.

Additional installed packages

The R release contains a lot of additional packages. After loading and starting R ("module load R" and simply "R" on the command line), you can generate a list of all of them by using the following commands

ip <- as.data.frame(installed.packages()[,c(1,3:4)])
rownames(ip) <- NULL
ip <- ip[is.na(ip$Priority),1:2,drop=FALSE]
print(ip, row.names=FALSE)

You will receive a list of every package and its related version. It should look like this:

       Package     Version
           abc         2.1
      abc.data         1.0
         abind       1.4-3
       acepack     1.3-3.3
        adabag         4.1

Installing your own packages

If your are missing an R-packages you can contact Scientific Computing or, alternatively install the package in your own HOME directory. To do so you should first create a directory on the cluster, e.g. with

cd $HOME
mkdir -p R/lib

This would create a directory 'R' with the subdirectory 'lib' in your HOME folder. After that you need to create a file .Renviron that contains the line

R_LIBS="~/R/lib"

and a file .Rprofile with the lines

# adding fixed CRAN mirror for downloading packages
cat(".Rprofile: Setting CRAN repositoryn to ftp.gwdg.de\n")
r={}
r["CRAN"] = "https://ftp.gwdg.de/pub/misc/cran"
options(repos = r)

If these files already exist simply add the lines above to them. You can choose a different location for installing your R-libraries if you wish (by setting R_LIBS in .Renviron differently). There also alternative mirrors, set your preferred on in .Rprofile.

Once this is done you start R on the login node and begin installing packages:

R
> install.packages("lme4")
> library(lme4)
> library(car)

In this example the package 'lme4' will be downloaded from the CRAN mirror and installed in the directory given by R_LIBS, i.e. in your HOME folder. Please note that R will not check if a package is already installed and will alway reinstall a package by overwriting the previous install. You may want to separate package installation from the execution of jobs.

The package 'lme4' is already installed in the global R-installation (version 1.1-12) whereas the installation above will install a newer version (1.1-13 or newer). When you load the package with library(lme4) the installation in your HOME folder will be used. You can verify this with the R-command

> sessionInfo()
  ... lme4_1.1-13 ...

The next call library(car) will load the globally installed package 'car'. So you do not need to install every package in your HOME, only those packages that are missing or when you require an updated version.

Documentation

You can look up anything about R on their