STATA 2016

From HPC users
Jump to navigationJump to search

Introduction

STATA comprises a complete software package, offering statistical tools for data analysis, data management and graphics. On the local HPC System we offer a multiprocessor variant of STATA/MP 13, licensed for up to 12 cores. The license allows up to 5 users to work with STATA at the same time. STATA/MP uses the paradigm of symmetric multiprocessing (SMP) to benefit from the parallel capabilities offered by many modern computers and HPC systems to speed up computations.

Installed version

The currently installed version of STATA is 13.0.

Using Stata on the HPC cluster

Like every module on the cluster, STATA can be loaded by typing

module load stata

Then you can find the following STATA variants in your user environment:

  • stata: a version of STATA that handles small datasets
  • stata-se: a version of STATA for large datasets
  • stata-mp: a fast version of STATA for multicore/multiprocessor machines

More details on the different version can be found here

To facilitate bookkeeping, a good first step towards using STATA on the HPC system is to create a directory in which all STATA related computations are carried out. Using the command

 mkdir stata

you might thus create the folder stata in the top level of your home directory for this purpose (you might even go further and create a subdirectory mp13 specifying the precise version of STATA).

Using STATA in batch mode

On the local HPC system the convention is to use applications in batch mode rather than interactive mode as you would do on your local workstation. This requires you to list the commands you would otherwise interactively type in STATAs interactive mode in a file, called do-file in STATA jargon, and to call STATA in conjunction with the -b option on that do-file. To illustrate how to use STATA in batch mode on the HPC system, consider the basic linear regression example contained in Chapter 1 of the STATA Web Book Regression with STATA. For this linear regression example you might further create the subdirectory linear_regression and put the data sets on which you would like to work and all further supplementary files and scripts there. A do-file corresponding to the basic lienear regression example, here called linReg.do, reads:

 
use elemapi
regress api00 acs_k3 meals full 
  

For the do-file to run properly, the data file available as http://www.ats.ucla.edu/stat/stata/webbooks/reg/elemapi needs to be stored in the directory linear_regression. Further, if you did not load the STATA module yet, you need to load it via

 module load stata

before you attempt to use the STATA application. In principle you could now call STATA in batch mode by typing

 stata -b linReg.do

Albeit this is fully okay for small test programs that consume only few resources (in terms of running time and memory), the convention on the HPC system rather is to submit your job to the scheduler (here we use Sun grid engine (SGE) as scheduler) which assigns it to a proper execution host on which the actual computations are carried out. Therefore you have to setup a job submission file by means of which you allocate certain resources for your job. This is common practice on HPC systems on which multiple users access the available resources at a given time. Examples of such job submission scripts for both, single-core and multi-core usage, are detailed below.

Documentation

An user guide for STATA in version 13 can be found here (PDF-Viewer required!).