Difference between revisions of "STATA 2016"

From HPC users
Jump to navigationJump to search
 
(11 intermediate revisions by the same user not shown)
Line 24: Line 24:
To facilitate bookkeeping, a good first step towards using STATA on the HPC system is to create a directory in which all STATA related computations are carried out. Using the command
To facilitate bookkeeping, a good first step towards using STATA on the HPC system is to create a directory in which all STATA related computations are carried out. Using the command
   mkdir stata
   mkdir stata
you might thus create the folder <tt>stata</tt> in the top level of your home directory for this purpose (you might even go further and create a subdirectory <tt>mp13</tt> specifying the precise version of STATA).  
you might thus create the folder <tt>stata</tt> in the top level of your home directory for this purpose (you might even go further and create a subdirectory <tt>mp13</tt> specifying the precise version of STATA).


=== Using STATA in batch mode ===
=== Submitting a job ===


On the local HPC system the convention is to use applications in ''batch'' mode rather than ''interactive'' mode as you would do on your local workstation. This requires you to  
Since there many people working with the HPC cluster, its important that everyone has an equal chance to do so. Therefore, every job should be processed by [[SLURM Job Management (Queueing) System|SLURM]].
list the commands you would otherwise interactively type in STATAs interactive mode in a file, called ''do-file'' in STATA jargon, and to call STATA in conjunction with the <tt>-b</tt> option on that do-file.
 
To illustrate how to use STATA in batch mode on the HPC system, consider the basic linear regression example contained in
For this reason, you have to create a jobscript for your tasks. You will learn how to submit variant types of STATA jobs in the following text paragraphs.
[http://www.ats.ucla.edu/stat/stata/webbooks/reg/chapter1/statareg1.htm Chapter 1] of the STATA Web Book [http://www.ats.ucla.edu/stat/stata/webbooks/reg/ Regression with STATA].
For this linear regression example you might further create the subdirectory <tt>linear_regression</tt> and put the data sets on which you would like to work and all further supplementary files and scripts there.
==== Single-slot variant ====
A do-file corresponding to the basic [http://www.ats.ucla.edu/stat/stata/webbooks/reg/chapter1/statareg1.htm lienear regression example], here called <tt>linReg.do</tt>, reads:
 
  <nowiki>
You might submit your STATA do-file using a job script similar to the script listed below:
use elemapi
 
regress api00 acs_k3 meals full
#!/bin/bash
  </nowiki>
               
For the do-file to run properly, the data file available as http://www.ats.ucla.edu/stat/stata/webbooks/reg/elemapi needs to be stored in the directory <tt>linear_regression</tt>.
#SBATCH --ntasks=1                 
Further, if you did not load the [[STATA 2016#Using Stata on the HPC cluster| STATA module]] yet, you need to load it via
#SBATCH --mem=2G                 
  module load stata
#SBATCH --time=0-2:00 
before you attempt to use the STATA application.
#SBATCH --job-name STATA-TEST             
In principle you could now call STATA in batch mode by typing
#SBATCH --output=STATA-test.%j.out       
  stata -b linReg.do
#SBATCH --error=stata-test.%j.err         
Albeit this is fully okay for small test programs that consume only few resources (in terms of running time and memory), the convention on the HPC system rather is to submit your job to  
 
the scheduler (here we use [[SGE_Job_Management_(Queueing)_System| Sun grid engine (SGE)]] as scheduler) which
module load stata
assigns it to a proper execution host on which the actual computations are carried out. Therefore you have to setup a job submission file by means of which
stata -b YOUR_INPUT.FILE
you allocate certain resources for your job. This is common practice on HPC systems on which multiple users access the available resources at a given time.
 
Examples of such job submission scripts for both, single-core and multi-core usage, are detailed below.
In this file, we are specifiying the needed resources for our job. Since this is just an example, you will probably have to adjust these values for you real jobs.
 
==== Multi-slot variant ====
 
On the local HPC system, the concept of ''slots'' is used over "cores", and hence, the title of this subsection refers to the "Multi-slot" variant of using STATA.
So as to benefit from the parallel capabilities offered by many modern computers and HPC systems and to speed up computations, STATA/MP uses the paradigm of [http://en.wikipedia.org/wiki/Symmetric_multiprocessing symmetric multiprocessing] (SMP). A performance report for a multitude of commands implemented in the STATA software package, highlighting the benefit of multiprocessing, can be found [http://www.stata.com/statamp/statamp.pdf here] (PDF-viewer required!). As pointed out above, the HPC system offers STATA/MP 13, licensed for up to 12 cores.
 
A proper job submission script by means of which you can use the multiprocessing capabilities of STATA/MP is listed below:
 
#!/bin/bash
               
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=6       
#SBATCH --mem=2G                 
#SBATCH --time=0-2:00 
#SBATCH --job-name STATA-TEST             
#SBATCH --output=STATA-test.%j.out       
#SBATCH --error=stata-test.%j.err         
module load stata
stata-mp -b YOUR_INPUT.FILE
 
Note that in comparison to the single-slot submission script listed in the preceding subsection, you need to add the line "'''#SBATCH --cpus-per-task=NUMBER_OF_CPUS'''" (as an example we are using 6 cpus per task, you can specify up to a maximum of 12 cpus). You will also need to add "'''-mp'''" to the stata-command.


== Documentation ==
== Documentation ==


An user guide for STATA in version 13 can be found [http://www.stata.com/manuals13/u.pdf here] (PDF-Viewer required!).
An user guide for STATA in version 13 can be found [http://www.stata.com/manuals13/u.pdf here] (PDF-Viewer required!).

Latest revision as of 14:42, 16 March 2017

Introduction

STATA comprises a complete software package, offering statistical tools for data analysis, data management and graphics. On the local HPC System we offer a multiprocessor variant of STATA/MP 13, licensed for up to 12 cores. The license allows up to 5 users to work with STATA at the same time. STATA/MP uses the paradigm of symmetric multiprocessing (SMP) to benefit from the parallel capabilities offered by many modern computers and HPC systems to speed up computations.

Installed version

The currently installed version of STATA is 13.0.

Using Stata on the HPC cluster

Like every module on the cluster, STATA can be loaded by typing

module load stata

Then you can find the following STATA variants in your user environment:

  • stata: a version of STATA that handles small datasets
  • stata-se: a version of STATA for large datasets
  • stata-mp: a fast version of STATA for multicore/multiprocessor machines

More details on the different version can be found here

To facilitate bookkeeping, a good first step towards using STATA on the HPC system is to create a directory in which all STATA related computations are carried out. Using the command

 mkdir stata

you might thus create the folder stata in the top level of your home directory for this purpose (you might even go further and create a subdirectory mp13 specifying the precise version of STATA).

Submitting a job

Since there many people working with the HPC cluster, its important that everyone has an equal chance to do so. Therefore, every job should be processed by SLURM.

For this reason, you have to create a jobscript for your tasks. You will learn how to submit variant types of STATA jobs in the following text paragraphs.

Single-slot variant

You might submit your STATA do-file using a job script similar to the script listed below:

#!/bin/bash
               
#SBATCH --ntasks=1                  
#SBATCH --mem=2G                  
#SBATCH --time=0-2:00  
#SBATCH --job-name STATA-TEST              
#SBATCH --output=STATA-test.%j.out        
#SBATCH --error=stata-test.%j.err          
 
module load stata
stata -b YOUR_INPUT.FILE

In this file, we are specifiying the needed resources for our job. Since this is just an example, you will probably have to adjust these values for you real jobs.

Multi-slot variant

On the local HPC system, the concept of slots is used over "cores", and hence, the title of this subsection refers to the "Multi-slot" variant of using STATA. So as to benefit from the parallel capabilities offered by many modern computers and HPC systems and to speed up computations, STATA/MP uses the paradigm of symmetric multiprocessing (SMP). A performance report for a multitude of commands implemented in the STATA software package, highlighting the benefit of multiprocessing, can be found here (PDF-viewer required!). As pointed out above, the HPC system offers STATA/MP 13, licensed for up to 12 cores.

A proper job submission script by means of which you can use the multiprocessing capabilities of STATA/MP is listed below:

#!/bin/bash
               
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=6        
#SBATCH --mem=2G                  
#SBATCH --time=0-2:00  
#SBATCH --job-name STATA-TEST              
#SBATCH --output=STATA-test.%j.out        
#SBATCH --error=stata-test.%j.err          

module load stata
stata-mp -b YOUR_INPUT.FILE

Note that in comparison to the single-slot submission script listed in the preceding subsection, you need to add the line "#SBATCH --cpus-per-task=NUMBER_OF_CPUS" (as an example we are using 6 cpus per task, you can specify up to a maximum of 12 cpus). You will also need to add "-mp" to the stata-command.

Documentation

An user guide for STATA in version 13 can be found here (PDF-Viewer required!).