Difference between revisions of "R"
Albensoeder (talk | contribs) |
(added info for package netcdf) |
||
(12 intermediate revisions by one other user not shown) | |||
Line 1: | Line 1: | ||
There are two R releases available on FLOW and HERO. The actual relase is available by | There are two R releases available on FLOW and HERO. The actual relase is available by | ||
module load r/3. | module load r/3.2.1 | ||
With this command the corresponding Intel compilers and Intel MPI will be automatically loaded. | With this command the corresponding Intel compilers and Intel MPI will be automatically loaded. | ||
Line 11: | Line 9: | ||
The actual release contains a lot of additional R packages, for example | The actual release contains a lot of additional R packages, for example | ||
abind, ade4, ade4TkGUI, akima, ape, BH, bitops, boot, caTools, chron, cluster, | |||
coda, codetool, colorspace, DAAG, date, deldir, dichromat, digest, doMC, doMPI, ensembleBMA, | |||
evaluate, fields, foreach, forecast, foreign, Formula, fracdiff, fts, gdata, | |||
geosphere, ggplot2, gplots, gtable, gtools, hexbin, Hmisc, ifultools, iterators, | |||
its, KernSmooth, labeling, lattice, latticeExtra, LearnBayes, mapproj, maps, | |||
maptools, MASS, Matrix, misc3d, mnormt, mondate, multicore, munsell, mvtnorm, | |||
ncdf, nlme, nnet, nws, pixmap, plyr, pracma, proto, quadprog, quantreg, | |||
randomForest, raster, rasterVis, RColorBrewer, Rcpp, RcppArmadillo, RCurl, | |||
reshape2, rgl, R.matlab, R.methodsS3, Rmpi, R.oo, Rssa, RUnit, R.utils, | |||
sandwich, sapa, scales, signal, solaR, sp, spam, SparseM, spdep, splancs, | |||
splus2R, stringr, strucchange, survival, svd, testthat, timeDate, | |||
timeSeries, tis, tkrplot, tripack, truncnorm, tseries, VGAM, waveslim, | |||
XML, xts, zoo | |||
and many more... | and many more... | ||
== Usage of R | == Usage of R and MPI == | ||
For parallelization the packages doMPI and Rmpi are installed. To launch an parallel R script inside an [[SGE Job Management (Queueing) System| SGE]] script please use command line | |||
mpirun -np $NSLOTS R --slave -f ''SCRIPTNAME'' ''SCRIPT_CMDLINE_OPTIONS'' | |||
to enable SGE to control all processes of your script. Please do not use the batch starting sequence ''R CMD BATCH''. | |||
The corresponding parallel environment in the SGE submission script is specified by | |||
#$ -pe impi NUMBER_OF_CORES | |||
#$ -R y | |||
'''Note (only!) for doMPI:''' | |||
* Before you start R with the ''mpirun'' command you have to unset the environment variable R_PROFILE in your SGE-Script. Otherwise the MPI processes were not spawned. Please add following line to your SGE-script | |||
unset R_PROFILE | |||
* Please use mpi.quit() at the end of your script. Otherwise it will not end. | |||
* Here a small demo R script for doMPI (it writes in b the current rank of MPI): | |||
#!/usr/bin/env Rscript | |||
# | |||
# file name: test_dompi.R | |||
# | |||
library(doMPI) | |||
library("foreach") | |||
# doMPI start | |||
cl <- startMPIcluster() | |||
registerDoMPI(cl) | |||
b<-foreach(i=0:1000, .combine="c") %dopar% { | |||
as.integer(Sys.getenv("PMI_RANK")) | |||
} | |||
closeCluster(cl) | |||
print(b) | |||
mpi.quit() | |||
and he corresponding SGE-script | |||
#!/bin/bash | |||
#$ -S /bin/bash | |||
#$ -N test_dompi | |||
#$ -cwd | |||
#$ -l h_rt=24:00:0 | |||
#$ -l h_vmem=1800M | |||
#$ -pe impi 36 | |||
#$ -R y | |||
#$ -j y | |||
# load modules | |||
module load r/3.2.1 | |||
# unset the environment variable which is needed for Rmpi | |||
# but makes problems with doMPI | |||
unset R_PROFILE | |||
# run R in parallel | |||
mpirun -np $NSLOTS R --slave -f ./test_dompi.R | |||
'''Note (only!) for Rmpi:''' | |||
* The MPI processes were spawned by the ''mpirun'' command. The Rmpi command ''mpi.spawn.Rslaves()'' is not necessary and should not be used within the script! | |||
== Usage of NetCDF and R == | |||
to | A package for NetCDF has been installed together with R. In order to use please add the command | ||
module load netcdf/4.3.2/gcc/4.4.7 | |||
to your job script before starting R. Your R-script should include a line | |||
library(ncdf) | |||
to load the NetCDF library. Please refer to the documentations of NetCDF and R for more information. | |||
== External links == | == External links == | ||
* [http://www.r-project.org/ R project] | * [http://www.r-project.org/ R project] | ||
* [http://cran.r-project.org/manuals.html R manuals] | * [http://cran.r-project.org/manuals.html R manuals] |
Latest revision as of 13:29, 29 February 2016
There are two R releases available on FLOW and HERO. The actual relase is available by
module load r/3.2.1
With this command the corresponding Intel compilers and Intel MPI will be automatically loaded.
Additional installed packages
The actual release contains a lot of additional R packages, for example
abind, ade4, ade4TkGUI, akima, ape, BH, bitops, boot, caTools, chron, cluster, coda, codetool, colorspace, DAAG, date, deldir, dichromat, digest, doMC, doMPI, ensembleBMA, evaluate, fields, foreach, forecast, foreign, Formula, fracdiff, fts, gdata, geosphere, ggplot2, gplots, gtable, gtools, hexbin, Hmisc, ifultools, iterators, its, KernSmooth, labeling, lattice, latticeExtra, LearnBayes, mapproj, maps, maptools, MASS, Matrix, misc3d, mnormt, mondate, multicore, munsell, mvtnorm, ncdf, nlme, nnet, nws, pixmap, plyr, pracma, proto, quadprog, quantreg, randomForest, raster, rasterVis, RColorBrewer, Rcpp, RcppArmadillo, RCurl, reshape2, rgl, R.matlab, R.methodsS3, Rmpi, R.oo, Rssa, RUnit, R.utils, sandwich, sapa, scales, signal, solaR, sp, spam, SparseM, spdep, splancs, splus2R, stringr, strucchange, survival, svd, testthat, timeDate, timeSeries, tis, tkrplot, tripack, truncnorm, tseries, VGAM, waveslim, XML, xts, zoo
and many more...
Usage of R and MPI
For parallelization the packages doMPI and Rmpi are installed. To launch an parallel R script inside an SGE script please use command line
mpirun -np $NSLOTS R --slave -f SCRIPTNAME SCRIPT_CMDLINE_OPTIONS
to enable SGE to control all processes of your script. Please do not use the batch starting sequence R CMD BATCH.
The corresponding parallel environment in the SGE submission script is specified by
#$ -pe impi NUMBER_OF_CORES #$ -R y
Note (only!) for doMPI:
- Before you start R with the mpirun command you have to unset the environment variable R_PROFILE in your SGE-Script. Otherwise the MPI processes were not spawned. Please add following line to your SGE-script
unset R_PROFILE
- Please use mpi.quit() at the end of your script. Otherwise it will not end.
- Here a small demo R script for doMPI (it writes in b the current rank of MPI):
#!/usr/bin/env Rscript # # file name: test_dompi.R # library(doMPI) library("foreach") # doMPI start cl <- startMPIcluster() registerDoMPI(cl) b<-foreach(i=0:1000, .combine="c") %dopar% { as.integer(Sys.getenv("PMI_RANK")) } closeCluster(cl) print(b) mpi.quit()
and he corresponding SGE-script
#!/bin/bash #$ -S /bin/bash #$ -N test_dompi #$ -cwd #$ -l h_rt=24:00:0 #$ -l h_vmem=1800M #$ -pe impi 36 #$ -R y #$ -j y # load modules module load r/3.2.1 # unset the environment variable which is needed for Rmpi # but makes problems with doMPI unset R_PROFILE # run R in parallel mpirun -np $NSLOTS R --slave -f ./test_dompi.R
Note (only!) for Rmpi:
- The MPI processes were spawned by the mpirun command. The Rmpi command mpi.spawn.Rslaves() is not necessary and should not be used within the script!
Usage of NetCDF and R
A package for NetCDF has been installed together with R. In order to use please add the command
module load netcdf/4.3.2/gcc/4.4.7
to your job script before starting R. Your R-script should include a line
library(ncdf)
to load the NetCDF library. Please refer to the documentations of NetCDF and R for more information.