Difference between revisions of "R 2016"

From HPC users
Jump to navigationJump to search
Line 22: Line 22:


   #SBATCH --ntasks=NUMBER_OF_TASKS
   #SBATCH --ntasks=NUMBER_OF_TASKS
'''Note (only!) for doMPI:'''
* Before you start R with the ''mpirun'' command you have to unset the environment variable R_PROFILE in your SGE-Script. Otherwise the MPI processes were not spawned. Please add following line to your SGE-script
  unset R_PROFILE
* Please use mpi.quit() at the end of your script. Otherwise it will not end.
* Here a small demo R script for doMPI (it writes in b the current rank of MPI):
  #!/usr/bin/env Rscript
  #
  # file name: test_dompi.R
  #
 
  library(doMPI)
  library("foreach")
 
  # doMPI start
  cl <- startMPIcluster()
  registerDoMPI(cl)
 
  b<-foreach(i=0:1000, .combine="c") %dopar% {
    as.integer(Sys.getenv("PMI_RANK"))
  }
  closeCluster(cl)
  print(b)
 
  mpi.quit()
and he corresponding SGE-script
  #!/bin/bash
 
  #$ -S /bin/bash
  #$ -N test_dompi
  #$ -cwd
  #$ -l h_rt=24:00:0
  #$ -l h_vmem=1800M
  #$ -pe impi 36
  #$ -R y
  #$ -j y
 
  # load modules
  module load r/3.2.1
 
  # unset the environment variable which is needed for Rmpi
  # but makes problems with doMPI
  unset R_PROFILE
 
  # run R in parallel
  mpirun -np $NSLOTS R --slave -f ./test_dompi.R
'''Note (only!) for Rmpi:'''
* The MPI processes were spawned by the ''mpirun'' command. The Rmpi command ''mpi.spawn.Rslaves()'' is not necessary and should not be used within the script!


=== Usage of NetCDF and R ===
=== Usage of NetCDF and R ===

Revision as of 10:34, 27 March 2017

Introduction

R is a free software environment for statistical computing and graphics.

Using R on the HPC cluster

If you want to use R on the HPC cluster, you will have to load its module. You can do that by using the command

module load R

Since there is only one version of R installed, you dont need to specify a version. If you use the command

module spider R

you will find more informations about the module.

Usage of R and MPI

For parallelization the packages doMPI and Rmpi are installed. To launch an parallel R script inside a SLURM script please use command line

  mpirun -np $NSLOTS R --slave -f SCRIPTNAME SCRIPT_CMDLINE_OPTIONS

to enable SLURM to control all processes of your script. Please do not use the batch starting sequence R CMD BATCH!

The corresponding parallel environment in the SLURM submission script is specified by

 #SBATCH --ntasks=NUMBER_OF_TASKS

Note (only!) for doMPI:

  • Before you start R with the mpirun command you have to unset the environment variable R_PROFILE in your SGE-Script. Otherwise the MPI processes were not spawned. Please add following line to your SGE-script
  unset R_PROFILE
  • Please use mpi.quit() at the end of your script. Otherwise it will not end.
  • Here a small demo R script for doMPI (it writes in b the current rank of MPI):
 #!/usr/bin/env Rscript
 #
 # file name: test_dompi.R
 #
 
 library(doMPI)
 library("foreach")
 
 # doMPI start
 cl <- startMPIcluster()
 registerDoMPI(cl)
 
 b<-foreach(i=0:1000, .combine="c") %dopar% {
   as.integer(Sys.getenv("PMI_RANK"))
 }
 closeCluster(cl)
 print(b)
 
 mpi.quit()

and he corresponding SGE-script

 #!/bin/bash
 
 #$ -S /bin/bash
 #$ -N test_dompi
 #$ -cwd
 #$ -l h_rt=24:00:0
 #$ -l h_vmem=1800M
 #$ -pe impi 36
 #$ -R y
 #$ -j y
 
 # load modules
 module load r/3.2.1
 
 # unset the environment variable which is needed for Rmpi
 # but makes problems with doMPI
 unset R_PROFILE
 
 # run R in parallel
 mpirun -np $NSLOTS R --slave -f ./test_dompi.R

Note (only!) for Rmpi:

  • The MPI processes were spawned by the mpirun command. The Rmpi command mpi.spawn.Rslaves() is not necessary and should not be used within the script!

Usage of NetCDF and R

A package for NetCDF has been installed together with R. In order to use it, please add the command

module load netCDF

to your job script before starting R. Your R-script should include a line

library(ncdf)

to load the NetCDF library. Please refer to the documentations of NetCDF and R for more informations.

Installed version

The currently installed version of R is 3.3.1.

Additional installed packages

The R release contains a lot of additional packages. After loading and starting R ("module load R" and simply "R" on the command line), you can generate a list of all of them by using the following commands

ip <- as.data.frame(installed.packages()[,c(1,3:4)])
rownames(ip) <- NULL
ip <- ip[is.na(ip$Priority),1:2,drop=FALSE]
print(ip, row.names=FALSE)

You will receive a list of every package and its related version. It should look like this:

       Package     Version
           abc         2.1
      abc.data         1.0
         abind       1.4-3
       acepack     1.3-3.3
        adabag         4.1

Documentation

You can look up anything about R on their