STATA 2016

From HPC users
Jump to navigationJump to search

Introduction

STATA comprises a complete software package, offering statistical tools for data analysis, data management and graphics. On the local HPC System we offer a multiprocessor variant of STATA/MP 13, licensed for up to 12 cores. The license allows up to 5 users to work with STATA at the same time. STATA/MP uses the paradigm of symmetric multiprocessing (SMP) to benefit from the parallel capabilities offered by many modern computers and HPC systems to speed up computations.

Installed version

The currently installed version of STATA is 13.0.

Using Stata on the HPC cluster

Like every module on the cluster, STATA can be loaded by typing

module load stata

Then you can find the following STATA variants in your user environment:

  • stata: a version of STATA that handles small datasets
  • stata-se: a version of STATA for large datasets
  • stata-mp: a fast version of STATA for multicore/multiprocessor machines

More details on the different version can be found here

To facilitate bookkeeping, a good first step towards using STATA on the HPC system is to create a directory in which all STATA related computations are carried out. Using the command

 mkdir stata

you might thus create the folder stata in the top level of your home directory for this purpose (you might even go further and create a subdirectory mp13 specifying the precise version of STATA).

Submitting a job

Since there many people working with the HPC cluster, its important that everyone has an equal chance to do so. Therefore, every job should be processed by SLURM.

For this reason, you have to create a jobscript for your tasks. You will learn how to submit variant types of STATA jobs in the following text paragraphs.

Single-slot variant

You might submit your STATA do-file using a job script similar to the script listed below:

#!/bin/bash
               
#SBATCH --ntasks=1                  
#SBATCH --mem=2G                  
#SBATCH --time=0-2:00  
#SBATCH --job-name STATA-TEST              
#SBATCH --output=STATA-test.%j.out        
#SBATCH --error=stata-test.%j.err          
 
module load stata
stata -b YOUR_INPUT.FILE

In this file, we are specifiying the needed resources for our job. Since this is just an example, you will probably have to adjust these values for you real jobs.

Multi-slot variant

On the local HPC system, the concept of slots is used over "cores", and hence, the title of this subsection refers to the "Multi-slot" variant of using STATA. So as to benefit from the parallel capabilities offered by many modern computers and HPC systems and to speed up computations, STATA/MP uses the paradigm of symmetric multiprocessing (SMP). A performance report for a multitude of commands implemented in the STATA software package, highlighting the benefit of multiprocessing, can be found here (PDF-viewer required!). As pointed out above, the HPC system offers STATA/MP 13, licensed for up to 12 cores.

A proper job submission script by means of which you can use the multiprocessing capabilities of STATA/MP is listed below:

#!/bin/bash
               
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=6        
#SBATCH --mem=2G                  
#SBATCH --time=0-2:00  
#SBATCH --job-name STATA-TEST              
#SBATCH --output=STATA-test.%j.out        
#SBATCH --error=stata-test.%j.err          

module load stata
stata-mp -b YOUR_INPUT.FILE

Note that in comparison to the single-slot submission script listed in the preceding subsection, you need to add the line "#SBATCH --cpus-per-task=NUMBER_OF_CPUS" (as an example we are using 6 cpus per task, you can specify up to a maximum of 12 cpus). You will also need to add "-mp" to the stata-command.

Documentation

An user guide for STATA in version 13 can be found here (PDF-Viewer required!).