Difference between revisions of "STATA"
Line 13: | Line 13: | ||
== Using STATA in batch mode == | == Using STATA in batch mode == | ||
To facilitate bookkeeping, a good first step towards using STATA on the HPC system is to create a directory in which all STATA related computations are carried out. Using the command | |||
mkdir stata | |||
I created a folder called <tt>stata</tt> in the top level of my home directory for this purpose (you might even go further and create a subdirectory <tt>mp13</tt> specifying the precise version of STATA). | |||
To illustrate how to use STATA in batch mode on the HPC system, consider the basic linear regression example contained in | To illustrate how to use STATA in batch mode on the HPC system, consider the basic linear regression example contained in | ||
[http://www.ats.ucla.edu/stat/stata/webbooks/reg/chapter1/statareg1.htm| Chapter 1] of the STATA Web Book [http://www.ats.ucla.edu/stat/stata/webbooks/reg/| Regression with STATA]. | [http://www.ats.ucla.edu/stat/stata/webbooks/reg/chapter1/statareg1.htm| Chapter 1] of the STATA Web Book [http://www.ats.ucla.edu/stat/stata/webbooks/reg/| Regression with STATA]. | ||
For this linear regression example I created the subdirectory <tt>linear_regression</tt>. | |||
On the HPC system you submit your job to the scheduler (here we use [[SGE_Job_Management_(Queueing)_System| Sun grid engine (SGE)]] as scheduler) which | On the HPC system you submit your job to the scheduler (here we use [[SGE_Job_Management_(Queueing)_System| Sun grid engine (SGE)]] as scheduler) which |
Revision as of 14:39, 2 September 2013
STATA comprises a complete software package, offering statistical tools for data analysis, data management and graphics. On the local HPC System we offer a multiprocessor variant of STATA/MP 13 for up to 12 cores.
Logging in to the HPC System
Advice on how to login to the HPC System from either within or outside the University can be found here.
Loading the STATA module
On the HPC system, the STATA/MP 13 software package is available as a software module. In order to load the respective module just type
module load stata
Using STATA in batch mode
To facilitate bookkeeping, a good first step towards using STATA on the HPC system is to create a directory in which all STATA related computations are carried out. Using the command
mkdir stata
I created a folder called stata in the top level of my home directory for this purpose (you might even go further and create a subdirectory mp13 specifying the precise version of STATA). To illustrate how to use STATA in batch mode on the HPC system, consider the basic linear regression example contained in Chapter 1 of the STATA Web Book Regression with STATA. For this linear regression example I created the subdirectory linear_regression.
On the HPC system you submit your job to the scheduler (here we use Sun grid engine (SGE) as scheduler) which
assigns it to a proper execution host on which the actual computations are carried out. Therefore you have to setup a job submission file by means of which
you allocate certain resources for your job (this is common practice on HPC systems on which multiple users access the available resources at a given time).
Examples for such a job submission script for both, single-core and multi-core usage, are detailed below.
Using STATA: Single-slot variant
You might submit your STATA do-file using a job submission script similar to the script mySubmissionScript.sge listed below (with annotated line-numbers):
1 #!/bin/bash 2 3 #$ -S /bin/bash 4 #$ -cwd 5 6 #$ -l h_rt=0:10:0 7 #$ -l h_vmem=300M 8 #$ -l h_fsize=100M 9 #$ -N stata_linReg_test 10 11 module load stata 12 /cm/shared/apps/stata/13/stata -b linreg.do 13 mv linreg.log ${JOB_NAME}_jobId${JOB_ID}_linreg.log
Therein, in lines 6 through 8 the job requirements in terms of the resources running-time (h_rt), memory (h_vmem) and scratch space (h_fsize) are allocated. In line 9 a name for the job is set. The module containing the STATA software package is loaded in line 11. You need to load this module in each job submission script which is used to submit STATA jobs. In line 12 the STATA program is called in batch mode and the do-file is supplied (here the linear regression example do-file set up previously). By default, STATA creates a log file with a standardized name. Here, for the do-file linreg.do, STATA will create the log file linreg.log. In case you want to call the underlying do-file several times, your results will be overwritten time after time. So it might be of use to change the standard log file name to include the actual name of the job and the unique job-Id assigned by the scheduler as is done in line 13. You can submit the script by simply typing
qsub mySubmssionScript.sge
As soon as the job is enqueued you can check its status by typing qstat on the commanline. Immediately after submission you might obtain the output
job-ID prior name user state submit/start at queue slots ja-task-ID --------------------------------------------------------------------------------------------------------- 909537 0.00000 stata_linR alxo9476 qw 09/02/2013 12:45:41 1
According to this, the job with ID 909537 has priority 0.00000 and resides in state qw, loosely translated to "enqueued and waiting". Also, the above output indicates that the job requires a number of 1 slots. The column for the ja-task-ID, referring to the id of the particular task stemming from the execution of a job array (we don't work through a job array since we submitted a single job), is actually empty. Soon after, the priority of the job will take a value in between 0.5 and 1.0 (usually only slightly above 0.5), slightly increasing until the job starts. In case the job already finished, it is possible to retrieve information about the finished job by using the qacct commandline tool, see here.
After the job has terminated successfully, the STATA log file stata_linReg_test_jobId909537_linreg.log is available in the directory from which the job has been submitted from. It contains a log of all the commands used in the STATA session and a summary of the linear regression carried out therein. Further, the directory contains the two files stata_linReg_test.o909537 and stata_linReg_test.e909537, containing additional output to the standard outstream and errorstream, respectively.
Using STATA: Multi-slot variant
Checking the status of a job
After you submitted a job, the scheduler assigns it a unique job-ID. You might then use the qstat tool in conjunction with the job-ID to check the current status of the respective job. Detail on how to check the status of a job can be found here. In case the job already finished, it is possible to retrieve information about the finished job by using the qacct tool, see here.
Mounting your home directory on Hero
Consider a situation where you would like to transfer a large amount of data to the HPC System in order to analyze it via STATA. Similarly, consider a situation where you would like to transfer lots of already processed data from your HPC account to your local workstation. Then it is useful to mount your home directory on the HPC System in order to conveniently cope with such a task. Details about how to mount your HPC home directory can be found here.