Difference between revisions of "STATA"

Revision as of 14:39, 2 September 2013

STATA comprises a complete software package, offering statistical tools for data analysis, data management and graphics. On the local HPC System we offer a multiprocessor variant of STATA/MP 13 for up to 12 cores.

Logging in to the HPC System

Advice on how to login to the HPC System from either within or outside the University can be found here.

Loading the STATA module

On the HPC system, the STATA/MP 13 software package is available as a software module. In order to load the respective module just type

 module load stata

Using STATA in batch mode

To facilitate bookkeeping, a good first step towards using STATA on the HPC system is to create a directory in which all STATA related computations are carried out. Using the command

 mkdir stata

I created a folder called stata in the top level of my home directory for this purpose (you might even go further and create a subdirectory mp13 specifying the precise version of STATA). To illustrate how to use STATA in batch mode on the HPC system, consider the basic linear regression example contained in Chapter 1 of the STATA Web Book Regression with STATA. For this linear regression example I created the subdirectory linear_regression.

On the HPC system you submit your job to the scheduler (here we use Sun grid engine (SGE) as scheduler) which assigns it to a proper execution host on which the actual computations are carried out. Therefore you have to setup a job submission file by means of which you allocate certain resources for your job (this is common practice on HPC systems on which multiple users access the available resources at a given time). Examples for such a job submission script for both, single-core and multi-core usage, are detailed below.

Using STATA: Single-slot variant

You might submit your STATA do-file using a job submission script similar to the script mySubmissionScript.sge listed below (with annotated line-numbers):

 
  1 #!/bin/bash
  2 
  3 #$ -S /bin/bash
  4 #$ -cwd
  5 
  6 #$ -l h_rt=0:10:0
  7 #$ -l h_vmem=300M
  8 #$ -l h_fsize=100M
  9 #$ -N stata_linReg_test
 10 
 11 module load stata
 12 /cm/shared/apps/stata/13/stata -b linreg.do
 13 mv linreg.log ${JOB_NAME}_jobId${JOB_ID}_linreg.log

Therein, in lines 6 through 8 the job requirements in terms of the resources running-time (h_rt), memory (h_vmem) and scratch space (h_fsize) are allocated. In line 9 a name for the job is set. The module containing the STATA software package is loaded in line 11. You need to load this module in each job submission script which is used to submit STATA jobs. In line 12 the STATA program is called in batch mode and the do-file is supplied (here the linear regression example do-file set up previously). By default, STATA creates a log file with a standardized name. Here, for the do-file linreg.do, STATA will create the log file linreg.log. In case you want to call the underlying do-file several times, your results will be overwritten time after time. So it might be of use to change the standard log file name to include the actual name of the job and the unique job-Id assigned by the scheduler as is done in line 13. You can submit the script by simply typing

 qsub mySubmssionScript.sge

As soon as the job is enqueued you can check its status by typing qstat on the commanline. Immediately after submission you might obtain the output

 
job-ID  prior   name       user         state submit/start at     queue                  slots ja-task-ID 
---------------------------------------------------------------------------------------------------------
 909537 0.00000 stata_linR alxo9476     qw    09/02/2013 12:45:41                            1

According to this, the job with ID 909537 has priority 0.00000 and resides in state qw, loosely translated to "enqueued and waiting". Also, the above output indicates that the job requires a number of 1 slots. The column for the ja-task-ID, referring to the id of the particular task stemming from the execution of a job array (we don't work through a job array since we submitted a single job), is actually empty. Soon after, the priority of the job will take a value in between 0.5 and 1.0 (usually only slightly above 0.5), slightly increasing until the job starts. In case the job already finished, it is possible to retrieve information about the finished job by using the qacct commandline tool, see here.

After the job has terminated successfully, the STATA log file stata_linReg_test_jobId909537_linreg.log is available in the directory from which the job has been submitted from. It contains a log of all the commands used in the STATA session and a summary of the linear regression carried out therein. Further, the directory contains the two files stata_linReg_test.o909537 and stata_linReg_test.e909537, containing additional output to the standard outstream and errorstream, respectively.

Using STATA: Multi-slot variant

Checking the status of a job

After you submitted a job, the scheduler assigns it a unique job-ID. You might then use the qstat tool in conjunction with the job-ID to check the current status of the respective job. Detail on how to check the status of a job can be found here. In case the job already finished, it is possible to retrieve information about the finished job by using the qacct tool, see here.

Mounting your home directory on Hero

Consider a situation where you would like to transfer a large amount of data to the HPC System in order to analyze it via STATA. Similarly, consider a situation where you would like to transfer lots of already processed data from your HPC account to your local workstation. Then it is useful to mount your home directory on the HPC System in order to conveniently cope with such a task. Details about how to mount your HPC home directory can be found here.

@@ Line 13: / Line 13: @@
 == Using STATA in batch mode ==
+To facilitate bookkeeping, a good first step towards using STATA on the HPC system is to create a directory in which all STATA related computations are carried out. Using the command
+  mkdir stata
+I created a folder called <tt>stata</tt> in the top level of my home directory for this purpose (you might even go further and create a subdirectory <tt>mp13</tt> specifying the precise version of STATA).
 To illustrate how to use STATA in batch mode on the HPC system, consider the basic linear regression example contained in
 [http://www.ats.ucla.edu/stat/stata/webbooks/reg/chapter1/statareg1.htm| Chapter 1] of the STATA Web Book [http://www.ats.ucla.edu/stat/stata/webbooks/reg/| Regression with STATA].
+For this linear regression example I created the subdirectory <tt>linear_regression</tt>.
-!!! Fill in simple non HPC Unix Workstation EXAMPLE!!!
 On the HPC system you submit your job to the scheduler (here we use [[SGE_Job_Management_(Queueing)_System| Sun grid engine (SGE)]] as scheduler) which

Difference between revisions of "STATA"

Revision as of 14:39, 2 September 2013

Contents

Logging in to the HPC System

Loading the STATA module

Using STATA in batch mode

Using STATA: Single-slot variant

Using STATA: Multi-slot variant

Checking the status of a job

Mounting your home directory on Hero

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Topics

Tools