Difference between revisions of "Gaussian 09"

From HPC users
Jump to navigationJump to search
(→‎Current Issues with Gaussian 09 Rev. D.01: added bug information from tech support)
 
Line 23: Line 23:
The following example illustrates this for a Gaussian job using 12 processors (CPU cores):
The following example illustrates this for a Gaussian job using 12 processors (CPU cores):
<pre>
<pre>
#$ -l h_vmem=1900MB
#$ -l h_vmem=1900M
#$ -l h_fsize=500G
#$ -l h_fsize=500G



Latest revision as of 10:50, 5 October 2015

Available Modules

Currently, two revisions of Gaussian 09 are available on the cluster:

 $ module av
 ...
 ----- /cm/shared/uniol/modulefiles/CHEMISTRY -----
 gaussian/g09.b01 gaussian/g09.d01
 ...

To use a specific version of Gaussian make sure you have the correct command in your job-script. E.g. to load Rev. D.01:

 ...
 module load gaussian/g09.d01
 g09run <inputfile>
 ...

and likewise for Rev. B.01. Note that

 module load gaussian

will currently load Rev. D.01. However, this may change in the future if newer revisions are installed.

Note: Rev. D.01 has a bug that can cause Gaussian jobs to fail during geometry optimization. See below for details and possible work-arounds.

Single-node (multi-threaded) jobs

You have to use the parallel environment smp to ensure that your job runs on a single host. The following example illustrates this for a Gaussian job using 12 processors (CPU cores):

#$ -l h_vmem=1900M
#$ -l h_fsize=500G

#$ -pe smp 12
#$ -R y

module load gaussian
g09run myinputfile

The total amount of memory reserved for the job is 12 x 1900 MB = 22.8 GB (remember that for parallel jobs, the value of h_vmem is multiplied by the number of slots), which is close to the maximum memory available on a standard compute node of HERO (23 GB). If you requested less than 12 slots, the remaining slots may be filled by jobs of other users (provided there is enough memory and other resources available). Of course, you may also need to reserve sufficient local disk (scratch space) for your job (in the above example, 500 GB are requested)

The Gaussian input file myinputfile of the above example would then contain, e.g., the following lines in the link 0 section:

%Mem=21000MB
%NProcShared=12

Important: Memory management is critical for the performance of Gaussian jobs. Which parameter values are optimal is highly dependent on the type of the calculation, the system size, and other factors. Therefore, optimizing your Gaussian job with respect to memory allocation almost always requires (besides experience) some trial and error. The following general remarks may be useful:

  • In the above example, we have told Gaussian to use almost all of the total memory reserved for the job (22.8 GB), leaving only a small margin of 1.8 GB which is necessary, among others, since the G09 executables are rather large and have to be resident in memory (a margin of about 1 GB should be sufficient in most cases). This is usually a good choice for DFT calculations.
  • For MP2 calculations, on the other hand, Gaussian requests about twice the amount of memory specified by the Mem=... directive. If this total (physical + virtual) memory requested by Gaussian is lower than the memory reserved for the job via the SGE h_vmem=... directive, the process stays in main memory. If it exceeds the memory reserved for the job, the operating systems starts swapping, which may lead to a dramatic performance decrease. In that case, you may significantly speed up your calculation by giving Gaussian access to only half of the total memory reserved for the job, i.e., in the above example, a good starting point for a MP2 calculation would be:
    %Mem=11000MB
    %NProcShared=12
    

    In any case, as mentioned above, testing and some trial and error are indispensable and well worth the effort!

You may also want to check the Efficiency Considerations website of Gaussian Inc.

Linda jobs

For Gaussian multi-node (Linda) jobs, use the linda parallel environment (PE). The PE linda behaves quite different than the other PEs, since "slot" here means "the entire node", i.e. one "slot" represents 12 CPU cores. Moreover, to ascertain that each Linda worker has exclusive access to the corresponding node (no jobs of other users running on the same node), it is necessary to set the excl attribute to true.

Example: For a Linda job requesting four nodes (Linda workers) and 22 GB of memory per node, the relevant section of the submission script would be:

#$ -l h_vmem=22G

#$ -pe linda 4 -l excl=true
#$ -R y

module load gaussian
g09run myinputfile

The link 0 section of the input file myinputfile would then, e.g., contain the following lines:

%LindaWorkers=
%NProcShared=12
%Mem=20000MB 

As for single-node jobs, you should carefully consider memory allocation. In the above example, we simply tell Gaussian that it can use all the memory reserved for the job on each node (allowing for a overhead of 2 GB), which may not be the optimal choice in all cases (see above).

For Linda jobs, the "%LindaWorkers=" directive is mandatory. The wrapper script parses the input file looking for the LindaWorkers keyword (anything after the "=" will be ignored) and, if found, fills in the correct node list. Note that the %NProcl directive of older Gaussian versions is deprecated and should no longer be used.

Important notes:

  • Not all types of Gaussian calculations support Linda. Please check, by consulting the manual or submitting short (!) test jobs, if your Gaussian calculation runs under Linda.
  • The efficiency of Linda jobs depends on the type of calculation, the system size, and many other factors. Of course, the remarks concerning memory management apply to Linda jobs as well. Please invest a little time in testing and, in particular, check the scaling of your Linda job, it may later save you a lot of work and speed up your calculations significantly. This can be done by running a (Linda capable) job first on a single node, then on 2, and 4 nodes. On two nodes, your job should (ideally!) run twice as fast, and on four nodes four times as fast. It does not make much sense to run a Linda job on four nodes if you "only" gain a speed-up of a factor 3 or less, since that would waste the resources of (at least) one compute node!


Current Issues with Gaussian 09 Rev. D.01

Currently Rev. D.01 has a bug which can cause geometry optimizations to fail. If this happens, the following error message will appear in your log- or out-file:

 Operation on file out of range.
  FileIO: IOper= 1 IFilNo(1)=  -526 Len=      784996 IPos=           0
  Q=   46912509105480
  ....
 Error termination in NtrErr:
 NtrErr Called from FileIO.

The explanation from Gaussian's technical support:

This problem appears in cases where one ends up with different orthonormal subsets of basis functions at different geometries. The "Operation on file out of range" message appears right after the program tries to do an interpolation of two SCF solutions when generating the initial orbital guess for the current geometry optimization point. The goal here is to generate an improved guess for the current point but it failed. The interpolation of the previous two SCF solutions to generate the new initial guess was a new feature introduced in G09 rev. D.01. The reason why this failed in this particular case is because the total number of independent basis functions is different between the two sets of orbitals. We will have this bug fixed in a future Gaussian release, so the guess interpolation works even if the number of independent basis functions is different.

There a number of suggestions from the technical support on how to work around this problem:

A) Use “Guess=Always” to turn off this guess interpolation feature. Option "A" would work in many cases, although it may not be a viable alternative in cases where the desired SCF solution is difficult to get from the default guess and one has to prepare a special initial guess. You may try this for your case.

B) Just start a new geometry optimization from that final point reading the geometry from the checkpoint file. Option "B" should work just fine although you may run into the same issue again if, after a few geometry optimization steps, one ends up again in the scenario of having two geometries with two different numbers of basis functions.

C) Lower the threshold for selecting the linearly independent set by one order of magnitude, which may result in keeping all functions. The aforementioned threshold is controlled by "IOp(3/59=N)" which sets the threshold to 1.0D-N (N=6 is the default). Note that because an IOp is being used, one would need to run the optimization and frequency calculations separately and not as a compound job ("Opt Freq"), because IOps are not passed down to compound jobs. You may also want to use “Integral=(Acc2E=11)” or “Integral=(Acc2E=12)” if you lower this threshold as the calculations may not be as numerically robust as with the default thresholds. Option "C" may work well in many cases where there is only one (or very few) eigenvalue of the overlap matrix that is near the threshold for linear dependencies, so it may just work fine to use "IOp(3/59=7)", which will be keeping all the functions. Because of this situation, and because of potential convergence issues derived from including functions that are nearly linearly dependent, I strongly recommend using a better integral accuracy than the default, for example "Integral=(Acc2E=12)", which is two orders of magnitude better than default.

D) Use fewer diffuse functions or a better balanced basis set, so there aren’t linear dependencies with the default threshold and thus no functions are dropped. Option "D" is good since it would avoid issues with linear dependencies altogether, although it has the disadvantage that you would not be able to reproduce other results with the basis set that you are using.