Difference between revisions of "R on ALICE"

From ALICE Documentation

Line 74: Line 74:
   # Load R (version 3.3.2)
   # Load R (version 3.3.2)
   module load R
   module load R/3.6.0-foss-2019a-Python-3.7.2
   # Create scratch & copy everything over to scratch
   # Create scratch & copy everything over to scratch

Revision as of 10:17, 29 June 2020

Running R from batch scripts

R is a programming language and software environment for statistical computing and graphics.

The currently supported version is 3.6.0/3.6.2 (Centos7). 3.6.2 was built with the coda compilers. 3.6.0 was build using the standard GCC compiler.

load R in your environment?

You can obtain R in your environment by loading the R module i.e.:

 module load R/3.6.0-foss-2019a-Python-3.7.2


 module load R/3.6.2-fosscuda-2019b

The command R --version returns the version of R you have loaded:

 R --version
 R version 3.3.2 (2016-10-31) -- "Sincere Pumpkin Patch"
 Copyright (C) 2016 The R Foundation for Statistical Computing
 Platform: x86_64-pc-linux-gnu (64-bit)

The command which R returns the location where the R executable resides:

 which R

Running an R batch script on the command line

There are several ways to launch an R script on the command line:

  1. Rscript yourfile.R
  2. R CMD BATCH yourfile.R
  3. R --no-save < yourfile.R
  4. ./yourfile2.R

The first approach (i.e. using the Rscript command) redirects the output into stdout. The second approach (i.e. using the R CMD BATCH command) redirects its output into a file (in case yourfile.Rout). A third approach is to redirect the input of the file yourfile.R to the R executable. Note that in the latter approach you must specify one of the following flags: --save, --no-save or --vanilla.

The R code can be launched as a Linux script (fourth approach) as well. In order to be run as a Linux script:

  • One needs to insert an extra line (#!/usr/bin/env Rscript) at the top of the file yourfile.R
  • As a result we have a new file yourfile2.R
  • The permissions of the R script (i.e.yourfile2.R)need to be altered (-> executable)

Sometimes we need to feed arguments to the R script. This is especially useful if running parallel independent calculations - different arguments can be used to differentiate between the calculations, e.g. by feeding in different initial parameters. To read the arguments, one can use the commandArgs() function, e.g., if we have a script called myScript:

 ## myScript.R
 args <- commandArgs(trailingOnly =TRUE)
 rnorm(n=as.numeric(args[1]), mean=as.numeric(args[2]))

then we can call it with arguments as e.g.:

 > Rscript myScript.R 5100[1]98.46435100.0462699.4493798.52910100.78853

Running a R batch script on the cluster (using SLURM)

In the previous section we described how to launch an R script on the command line. In order to run a R batch job on the compute nodes we just need to create a SLURM script/wrapper "around" the R command line.

Below you will find the content of the corresponding Slurm batch script

 #SBATCH --time=00:10:00 # Walltime
 #SBATCH --nodes=1          # Use 1 Node     (Unless code is multi-node parallelized)
 #SBATCH --ntasks=1         # We only run one R instance = 1 task
 #SBATCH --cpus-per-task=12 # number of threads we want to run on
 #SBATCH --account=owner-guest
 #SBATCH --partition=ember-guest
 #SBATCH -o slurm-%j.out-%N
 #SBATCH --mail-type=ALL
 #SBATCH --mail-user=$   # Your email address
 #SBATCH --job-name=seaIce
 export FILENAME=myjob.R
 export SCR_DIR=/scratch/general/lustre/$USER/$SLURM_JOBID
 export WORK_DIR=$HOME/TestBench/R/SeaIce
 # Load R (version 3.3.2)
 module load R/3.6.0-foss-2019a-Python-3.7.2
 # Create scratch & copy everything over to scratch
 mkdir -p $SCR_DIR
 cd $SCR_DIR
 cp -p $WORK_DIR/* .
 # Run the R script in batch, redirecting the job output to a file
 # Copy results over + clean up
 cp -pR $SCR_DIR/* .
 rm -rf $SCR_DIR
 echo "End of program at `date`"

We run the script under Slurm as sbatch