Difference between revisions of "STATA"

From HPC users
Jump to navigationJump to search
Line 47: Line 47:
actual name of the job and the unique job-Id assigned by the scheduler as is done in line 13. You can submit the script by simply typing
actual name of the job and the unique job-Id assigned by the scheduler as is done in line 13. You can submit the script by simply typing
   qsub mySubmssionScript.sge
   qsub mySubmssionScript.sge
As soon as the job is enqueued you can check its status by typing <tt>qstat</tt> on the commanline. Immediately after submission you might obtain the output
  <nowiki>
job-ID  prior  name      user        state submit/start at    queue                  slots ja-task-ID
---------------------------------------------------------------------------------------------------------
909537 0.00000 stata_linR alxo9476    qw    09/02/2013 12:45:41                            1       
  </nowiki>
According to this, the job with ID 909537 has priority 0.00000 and resides in state qw, loosely translated to "enqueued and waiting". Also, the above output indicates that the job requires a number of 1 slots. The column for the ja-task-ID, referring to the id of the particular task stemming from the execution of a job array (we don't work through a job array since we submitted a single job), is actually empty. Soon after, the priority of the job will take a value in between 0.5 and 1.0 (usually only slightly above 0.5), slightly increasing until the job starts.


=== Using STATA: Multi-slot variant ===
=== Using STATA: Multi-slot variant ===

Revision as of 14:07, 2 September 2013

STATA comprises a complete software package, offering statistical tools for data analysis, data management and graphics. On the local HPC System we offer a multiprocessor variant of STATA/MP 13 for up to 12 cores.

Logging in to the HPC System

Advice on how to login to the HPC System from either within or outside the University can be found here.

Loading the STATA module

On the HPC system, the STATA/MP 13 software package is available as a software module. In order to load the respective module just type

 module load stata

Using STATA in batch mode

!!! Fill in simple non HPC Unix Workstation EXAMPLE!!!

On the HPC system you submit your job to the scheduler (here we use Sun grid engine (SGE) as scheduler) which assigns it to a proper execution host on which the actual computations are carried out. Therefore you have to setup a job submission file by means of which you allocate certain resources for your job (this is common practice on HPC systems on which multiple users access the available resources at a given time). Examples for such a job submission script for both, single-core and multi-core usage, are detailed below.

Using STATA: Single-slot variant

You might submit your STATA do-file using a job submission script similar to the script mySubmissionScript.sge listed below (with annotated line-numbers):

 
  1 #!/bin/bash
  2 
  3 #$ -S /bin/bash
  4 #$ -cwd
  5 
  6 #$ -l h_rt=0:10:0
  7 #$ -l h_vmem=300M
  8 #$ -l h_fsize=100M
  9 #$ -N stata_linReg_test
 10 
 11 module load stata
 12 /cm/shared/apps/stata/13/stata -b linreg.do
 13 mv linreg.log ${JOB_NAME}_jobId${JOB_ID}_linreg.log
  

Therein, in lines 6 through 8 the job requirements in terms of the resources running-time (h_rt), memory (h_vmem) and scratch space (h_fsize) are allocated. In line 9 a name for the job is set. The module containing the STATA software package is loaded in line 11. You need to load this module in each job submission script which is used to submit STATA jobs. In line 12 the STATA program is called in batch mode and the do-file is supplied (here the linear regression example do-file set up previously). By default, STATA creates a log file with a standardized name. Here, for the do-file linreg.do, STATA will create the log file linreg.log. In case you want to call the underlying do-file several times, your results will be overwritten time after time. So it might be of use to change the standard log file name to include the actual name of the job and the unique job-Id assigned by the scheduler as is done in line 13. You can submit the script by simply typing

 qsub mySubmssionScript.sge

As soon as the job is enqueued you can check its status by typing qstat on the commanline. Immediately after submission you might obtain the output

 
job-ID  prior   name       user         state submit/start at     queue                  slots ja-task-ID 
---------------------------------------------------------------------------------------------------------
 909537 0.00000 stata_linR alxo9476     qw    09/02/2013 12:45:41                            1        
  

According to this, the job with ID 909537 has priority 0.00000 and resides in state qw, loosely translated to "enqueued and waiting". Also, the above output indicates that the job requires a number of 1 slots. The column for the ja-task-ID, referring to the id of the particular task stemming from the execution of a job array (we don't work through a job array since we submitted a single job), is actually empty. Soon after, the priority of the job will take a value in between 0.5 and 1.0 (usually only slightly above 0.5), slightly increasing until the job starts.

Using STATA: Multi-slot variant

Checking the status of a job

After you submitted a job, the scheduler assigns it a unique job-ID. You might then use the qstat tool in conjunction with the job-ID to check the current status of the respective job. Detail on how to check the status of a job can be found here. In case the job already finished, it is possible to retrieve information about the finished job by using the qacct tool, see here.

Mounting your home directory on Hero

Consider a situation where you would like to transfer a large amount of data to the HPC System in order to analyze it via STATA. Similarly, consider a situation where you would like to transfer lots of already processed data from your HPC account to your local workstation. Then it is useful to mount your home directory on the HPC System in order to conveniently cope with such a task. Details about how to mount your HPC home directory can be found here.