Gaussian 09

From HPC users
Revision as of 16:34, 22 November 2012 by Albensoeder (talk | contribs)
Jump to navigationJump to search

Single-node (multi-threaded) jobs

You have to use the parallel environment smp to ensure that your job runs on a single host. The following example illustrates this for a Gaussian job using 12 processors (CPU cores):

#$ -l h_vmem=1900MB
#$ -l h_fsize=500G

#$ -pe smp 12
#$ -R y

module load gaussian
g09run myinputfile

The total amount of memory reserved for the job is 12 x 1900 MB = 22.8 GB (remember that for parallel jobs, the value of h_vmem is multiplied by the number of slots), which is close to the maximum memory available on a standard compute node of HERO (23 GB). If you requested less than 12 slots, the remaining slots may be filled by jobs of other users (provided there is enough memory and other resources available). Of course, you may also need to reserve sufficient local disk (scratch space) for your job (in the above example, 500 GB are

The Gaussian input file myinputfile of the above example would then contain, e.g., the following lines in the link 0 section:

%Mem=21000MB
%NProcShared=12

Important: Memory management is critical for the performance of Gaussian jobs. Which parameter values are optimal is highly dependent on the type of the calculation, the system size, and other factors. Therefore, optimizing your Gaussian job with respect to memory allocation almost always requires (besides experience) some trial and error. The following general remarks may be useful:

  • In the above example, we have told Gaussian to use almost all of the total memory reserved for the job (22.8 GB), leaving only a small margin of 1.8 GB which is necessary, among others, since the G09 executables are rather large and have to be resident in memory (a margin of about 1 GB should be sufficient in most cases). This is usually a good choice for DFT calculations.
  • For MP2 calculations, on the other hand, Gaussian requests about twice the amount of memory specified by the Mem=... directive. If this total (physical + virtual) memory requested by Gaussian is lower than the memory reserved for the job via the SGE h_vmem=... directive, the process stays in main memory. If it exceeds the memory reserved for the job, the operating systems starts swapping, which may lead to a dramatic performance decrease. In that case, you may significantly speed up your calculation by giving Gaussian access to only half of the total memory reserved for the job, i.e., in the above example, a good starting point for a MP2 calculation would be:
    %Mem=11000MB
    %NProcShared=12
    

    In any case, as mentioned above, testing and some trial and error are indispensable and well worth the effort!

You may also want to check the Efficiency Considerations website of Gaussian Inc.

Linda jobs

For Gaussian multi-node (Linda) jobs, use the linda parallel environment (PE). The PE linda behaves quite different than the other PEs, since "slot" here means "the entire node", i.e. one "slot" represents 12 CPU cores. Moreover, to ascertain that each Linda worker has exclusive access to the corresponding node (no jobs of other users running on the same node), it is necessary to set the excl attribute to true.

Example: For a Linda job requesting four nodes (Linda workers) and 22 GB of memory per node, the relevant section of the submission script would be:

#$ -l h_vmem=22G

#$ -pe linda 4 -l excl=true
#$ -R y

module load gaussian
g09run myinputfile

The link 0 section of the input file myinputfile would then, e.g., contain the following lines:

%LindaWorkers=
%NProcShared=12
%Mem=20000MB 

As for single-node jobs, you should carefully consider memory allocation. In the above example, we simply tell Gaussian that it can use all the memory reserved for the job on each node (allowing for a overhead of 2 GB), which may not be the optimal choice in all cases (see above).

For Linda jobs, the "%LindaWorkers=" directive is mandatory. The wrapper script parses the input file looking for the LindaWorkers keyword (anything after the "=" will be ignored) and, if found, fills in the correct node list. Note that the %NProcl directive of older Gaussian versions is deprecated and should no longer be used.

Important notes:

  • Not all types of Gaussian calculations support Linda. Please check, by consulting the manual or submitting short (!) test jobs, if your Gaussian calculation runs under Linda.
  • The efficiency of Linda jobs depends on the type of calculation, the system size, and many other factors. Of course, the remarks concerning memory management apply to Linda jobs as well. Please invest a little time in testing and, in particular, check the scaling of your Linda job, it may later save you a lot of work and speed up your calculations significantly. This can be done by running a (Linda capable) job first on a single node, then on 2, and 4 nodes. On two nodes, your job should (ideally!) run twice as fast, and on four nodes four times as fast. It does not make much sense to run a Linda job on four nodes if you "only" gain a speed-up of a factor 3 or less, since that would waste the resources of (at least) one compute node!