Difference between revisions of "R"

From HPC users
Jump to navigationJump to search
Line 1: Line 1:
There are two R releases available on FLOW and HERO. The actual relase is available by  
There are two R releases available on FLOW and HERO. The actual relase is available by  


   module load r/3.1.1
   module load r/3.2.1


With this command the corresponding Intel compilers and Intel MPI will be automatically loaded.
With this command the corresponding Intel compilers and Intel MPI will be automatically loaded.
Line 9: Line 9:
The actual release contains a lot of additional R packages, for example
The actual release contains a lot of additional R packages, for example


  chron, abind, ade4, akima, bitops, caTools, circular, doMC, doMPI, ensembleBMA, foreach, gdata, geosphere,
  abind, ade4, ade4TkGUI, akima, ape, BH, bitops, boot, caTools, chron, cluster,
  gplots, gtools, hht, iterators, lattice, latticeExtra, ncdf, raster, RColorBrewer, RCurl, RMpi, signal, solaR,
  coda, codetool, colorspace, DAAG, date, deldir, dichromat, digest, doMC, doMPI, ensembleBMA,
  stringr, timeDate, truncnorm, VGAM, zoo
  evaluate, fields, foreach, forecast, foreign, Formula, fracdiff, fts, gdata,
  geosphere, ggplot2, gplots, gtable, gtools, hexbin, Hmisc, ifultools, iterators,
  its, KernSmooth, labeling, lattice, latticeExtra, LearnBayes, mapproj, maps,
  maptools, MASS, Matrix, misc3d, mnormt, mondate, multicore, munsell, mvtnorm,
  ncdf, nlme, nnet, nws, pixmap, plyr, pracma, proto, quadprog, quantreg,
  randomForest, raster, rasterVis, RColorBrewer, Rcpp, RcppArmadillo, RCurl,
  reshape2, rgl, R.matlab, R.methodsS3, Rmpi, R.oo, Rssa, RUnit, R.utils,
  sandwich, sapa, scales, signal, solaR, sp, spam, SparseM, spdep, splancs,
  splus2R, stringr, strucchange, survival, svd, testthat, timeDate,
  timeSeries, tis, tkrplot, tripack, truncnorm, tseries, VGAM, waveslim,
  XML, xts, zoo


and many more...
and many more...
Line 19: Line 29:
For parallelization the packages doMPI and Rmpi are installed. To launch an parallel R script inside an [[SGE Job Management (Queueing) System| SGE]] script please use command line
For parallelization the packages doMPI and Rmpi are installed. To launch an parallel R script inside an [[SGE Job Management (Queueing) System| SGE]] script please use command line


   mpirun -bootstrap sge -np $NSLOTS R --slave -f ''SCRIPTNAME'' ''SCRIPT_CMDLINE_OPTIONS''
   mpirun -np $NSLOTS R --slave -f ''SCRIPTNAME'' ''SCRIPT_CMDLINE_OPTIONS''


to enable SGE to control all processes of your script. Please do not use the batch starting sequence ''R CMD BATCH''.
to enable SGE to control all processes of your script. Please do not use the batch starting sequence ''R CMD BATCH''.
Line 28: Line 38:
   #$ -R y
   #$ -R y


'''Note (only!) for Rmpi:'''
'''Note (only!) for doMPI:'''
* Before you start R with the ''mpirun'' command you have to set the environment variable R_PROFILE in your SGE-Script. Otherwise the MPI processes were not spawned. Please add following line to your SGE-script  
* Before you start R with the ''mpirun'' command you have to unset the environment variable R_PROFILE in your SGE-Script. Otherwise the MPI processes were not spawned. Please add following line to your SGE-script  
 
  unset R_PROFILE
 
* Please use mpi.quit() at the end of your script. Otherwise it will not end.
* Here a small demo R script for doMPI:
 
  #!/usr/bin/env Rscript
  #
  # file name: test_dompi.R
  #
 
  library(doMPI)
  library("foreach")
 
  # doMPI start
  cl <- startMPIcluster()
  registerDoMPI(cl)
 
  b<-foreach(i=0:1000, .combine="c") %dopar% {
    as.integer(Sys.getenv("PMI_RANK"))
  }
  closeCluster(cl)
  print(b)
 
  mpi.quit()


  export R_PROFILE="/cm/shared/uniol/apps/R/3.1.1/lib64/R/library/Rmpi/Rprofile"
and he corresponding SGE-script
  #!/bin/bash
 
  #$ -S /bin/bash
  #$ -N test_dompi
  #$ -cwd
  #$ -l h_rt=24:00:0
  #$ -l h_vmem=1800M
  #$ -pe impi 36
  #$ -R y
  #$ -j y
 
  # load modules
  module load r/3.2.1
 
  # unset the environment variable which is needed for Rmpi
  # but makes problems with doMPI
  unset R_PROFILE
 
  # run R in parallel
  mpirun -np $NSLOTS R --slave -f ./test_dompi.R


'''Note (only!) for Rmpi:'''
* The MPI processes were spawned by the ''mpirun'' command. The Rmpi command ''mpi.spawn.Rslaves()'' is not necessary and should not be used within the script!
* The MPI processes were spawned by the ''mpirun'' command. The Rmpi command ''mpi.spawn.Rslaves()'' is not necessary and should not be used within the script!



Revision as of 12:28, 7 July 2015

There are two R releases available on FLOW and HERO. The actual relase is available by

 module load r/3.2.1

With this command the corresponding Intel compilers and Intel MPI will be automatically loaded.


Additional installed packages

The actual release contains a lot of additional R packages, for example

 abind, ade4, ade4TkGUI, akima, ape, BH, bitops, boot, caTools, chron, cluster,
 coda, codetool, colorspace, DAAG, date, deldir, dichromat, digest, doMC, doMPI, ensembleBMA,
 evaluate, fields, foreach, forecast, foreign, Formula, fracdiff, fts, gdata,
 geosphere, ggplot2, gplots, gtable, gtools, hexbin, Hmisc, ifultools, iterators,
 its, KernSmooth, labeling, lattice, latticeExtra, LearnBayes, mapproj, maps,
 maptools, MASS, Matrix, misc3d, mnormt, mondate, multicore, munsell, mvtnorm,
 ncdf, nlme, nnet, nws, pixmap, plyr, pracma, proto, quadprog, quantreg,
 randomForest, raster, rasterVis, RColorBrewer, Rcpp, RcppArmadillo, RCurl,
 reshape2, rgl, R.matlab, R.methodsS3, Rmpi, R.oo, Rssa, RUnit, R.utils,
 sandwich, sapa, scales, signal, solaR, sp, spam, SparseM, spdep, splancs,
 splus2R, stringr, strucchange, survival, svd, testthat, timeDate,
 timeSeries, tis, tkrplot, tripack, truncnorm, tseries, VGAM, waveslim,
 XML, xts, zoo

and many more...

Usage of R and MPI

For parallelization the packages doMPI and Rmpi are installed. To launch an parallel R script inside an SGE script please use command line

  mpirun -np $NSLOTS R --slave -f SCRIPTNAME SCRIPT_CMDLINE_OPTIONS

to enable SGE to control all processes of your script. Please do not use the batch starting sequence R CMD BATCH.

The corresponding parallel environment in the SGE submission script is specified by

 #$ -pe impi NUMBER_OF_CORES
 #$ -R y

Note (only!) for doMPI:

  • Before you start R with the mpirun command you have to unset the environment variable R_PROFILE in your SGE-Script. Otherwise the MPI processes were not spawned. Please add following line to your SGE-script
  unset R_PROFILE
  • Please use mpi.quit() at the end of your script. Otherwise it will not end.
  • Here a small demo R script for doMPI:
 #!/usr/bin/env Rscript
 #
 # file name: test_dompi.R
 #
 
 library(doMPI)
 library("foreach")
 
 # doMPI start
 cl <- startMPIcluster()
 registerDoMPI(cl)
 b<-foreach(i=0:1000, .combine="c") %dopar% {
   as.integer(Sys.getenv("PMI_RANK"))
 }
 closeCluster(cl)
 print(b)
 mpi.quit()

and he corresponding SGE-script

 #!/bin/bash
 
 #$ -S /bin/bash
 #$ -N test_dompi
 #$ -cwd
 #$ -l h_rt=24:00:0
 #$ -l h_vmem=1800M
 #$ -pe impi 36
 #$ -R y
 #$ -j y
 
 # load modules
 module load r/3.2.1
 
 # unset the environment variable which is needed for Rmpi
 # but makes problems with doMPI
 unset R_PROFILE
 
 # run R in parallel
 mpirun -np $NSLOTS R --slave -f ./test_dompi.R

Note (only!) for Rmpi:

  • The MPI processes were spawned by the mpirun command. The Rmpi command mpi.spawn.Rslaves() is not necessary and should not be used within the script!


External links