Gaussian 2016
Introduction
Gaussian is a computer program for computational chemistry initially released in 1970 by John Pople and his research group at Carnegie Mellon University as Gaussian 70. It has been continuously updated since then. The name originates from Pople's use of Gaussian orbitals to speed up calculations compared to those using Slater-type orbitals, a choice made to improve performance on the limited computing capacities of then-current computer hardware for Hartree–Fock calculations.
Installed version
The currently installed versions of Gaussian are 09 Rev B.01 and 09 Rev D.01.
Available abilities
According to the official homepage, Gaussian 09 Rev. D01 has the following abilities:
- Molecular mechanics
- AMBER
- Universal force field (UFF)
- Dreiding force field
- Semi-empirical quantum chemistry method calculations
- Austin Model 1 (AM1), PM3, CNDO, INDO, MINDO/3, MNDO
- Self-consistent field (SCF methods)
- Hartree–Fock method: restricted, unrestricted, and restricted open-shell.
- Møller–Plesset perturbation theory (MP2, MP3, MP4, MP5).
- Built-in density functional theory (DFT) methods
- B3LYP and other hybrid functionals
- Exchange functionals: PBE, MPW, PW91, Slater, X-alpha, Gill96, TPSS.
- Correlation functionals: PBE, TPSS, VWN, PW91, LYP, PL, P86, B95
- ONIOM (QM/MM method) up to three layers
- Complete active space (CAS) and multi-configurational self-consistent field calculations
- Coupled cluster calculations
- Quadratic configuration interaction (QCI) methods
- Quantum chemistry composite methods (CBS-QB3, CBS-4, CBS-Q, CBS-Q/APNO, G1, G2, G3, W1 high-accuracy methods)
Note: Rev. D.01 has a bug that can cause Gaussian jobs to fail during geometry optimization. See below (or click here) for details and possible work-arounds.
Using Gaussian 09 Rev. D01 on the HPC cluster
If you want to find out more about Gaussian on the HPC cluster, you can use the command
module spider gaussian
which will give you an output looking like this:
----- /cm/shared/uniol/modulefiles/chem ----- ... gaussian/g09.b01 ...
To load a specific version of Gaussian use the full name of the module, e.g. to load Rev. D.01:
[abcd1234@hpcl001 ~]$ module load gaussian/g09.d01 [abcd1234@hpcl001 ~]$ module list Currently loaded modules: ... gaussian/g09.d01 ...
Single-node (multi-threaded) jobs
Example: The following job script run_gaussian.job defines a single-node job named g09test on the partition carl.p, using 8 cpus (cores), a total of 6 GB of RAM and 100G of local storage ($TMPDIR) for a maximum run time of 2 hours:
#!/bin/bash #SBATCH --job-name=g09test # job name #SBATCH --partition=carl.p # partition #SBATCH --time=0-2:00:00 # wallclock time d-hh:mm:ss #SBATCH --ntasks=1 # only one task for G09 #SBATCH --cpus-per-task=8 # multiple cpus per task #SBATCH --mem=6gb # total memory per node #SBATCH --gres=tmpdir:100G # reserve 100G on local /scratch INPUTFILE=dimer # load module for Gaussian version to be used (here g09.d01) module load gaussian/g09.d01 # call g09run which now takes up to two arguments # <inputfile>: name of the inputfile (required) # [setnproc]: control whether %NProcShared= is set to match # the requested number of CPUs above (optional) # 0 - do not change the input file (default) # 1 - modify %NProcShared=N if N is larger than # number of CPUs per task requested # 2 - modify %NProcShared=N if N is not equal # to the number of CPUs per task requested g09run $INPUTFILE 2
The example job script and the used input file can be downloaded here (.zip-file).
To submit this job, you have to use the following command:
[abcd1234@hpcl001 ~]$ sbatch run_gaussian.job Submitted batch job 54321
The job will start running as soon as the requested ressources can be allocated. The output of the job (including a checkpoint file in $HOME/g09/CHK) will be produced in the normal directories.
Explanations for the job script:
The expected run time of a job is requested with the --time option e.g. in the format d-hh:mm:ss (you can also omit the :ss part). The default is 2 hours if nothing else is specified in the job script or at job submission.
On the old cluster, you had to request a parallel environment by adding a line ' #$ -pe smp 12' to your job script for a single node job. SLURM does not have parallel environments but you can request CPUs (cores) per task to the same effect. The following settings in the above job script serve as an example:
#SBATCH --partition=carl.p #SBATCH --ntasks=1 #SBATCH --cpus-per-task=8
The maximum amount of "cpus per task" varies for the different partitions (node types) are:
- carl.p = max. 24 cpus per task
- mpcl.p = max. 24 cpus per task
- mpcs.p = max. 24 cpus per task
- mpcb.p = max. 16 cpus per task
- mpcp.p = max. 40 cpus per task
So, if you want to run a single node job on a MPC-BIG node using all available cores, the lines above should be changed to:
#SBATCH --partition=mpcb.p #SBATCH --ntasks=1 #SBATCH --cpus-per-task=16
Note: The numbers mentioned above are the maximum amount of CPUs for each node. You can use them but its not always profitable to do so. It is always a good idea to make some tests to find the optimal number of cores. An example is shown in this diagramm which shows the real (wall clock) and user (CPU) time as a function of number of CPUs (cores).
In the example, memory is requested per node with the --mem-option. This is more practical for Gaussian jobs than requesting memory per core which is also possible with the --mem-per-cpu-option.
Every node has additional local storage which can be used while your job is running. The following line in the example job script
#SBATCH --gres:tmpdir:100G
requests the use of 100GB of local storage for your job. The path for this space is $TMPDIR as always. Further informations can be found here.
Linda Jobs
Gaussian multi-node jobs (Linda-jobs) can be submitted by simply changing the "--ntasks"-value to something > 1, e.g.
#SBATCH --ntasks=2
Everything else will be handled by the g09run-script, you must no longer add "%LindaWorkers=" to your inputfile!
Letting g09run set the number of processes
If you have used Gaussian on the old cluster, you will already know what the command "g09run" in your jobscript does: its runs Gaussian with a given input file. This remains unchanged on the new cluster. On the new cluster CARL a feature has been added to g09run. The new feature is simple, but very effective for saving resources that would otherwise be "wasted". We have implemented the possibility to add another option beside naming the input file to "g09run"
setnproc has three different options:
- 0: do not change the input file (default)
- 1: modify %NProcShared=N if N is larger than number of CPUs per task requested
- 2: modify %NProcShared=N if N is not equal to the number of CPUs per task requested
The full syntax of the "g09run"-command looks like this:
g09run <INPUTFILE> [SETNPROC]
where the first argument is mandatory and second optional (i.e. if you do not want to use the new feature you can ignore it).
Example: Using
g09run my_input_file 1
in your job script will cause g09run to check whether %NProcShared is too large for the requested number of CPUs and modify it if needed.
Optimizing the runtime of your jobs
Optimizing the runtime of your jobs will not only safe your time, it will also safe resources (cores, memory, time etc.) for everyone else using the cluster. Therefore you should determine the best amount of cores for your particular job. The following diagram will show the time difference (splitted into real (wall clock) time and user (CPU) time) in terms of the amount of used cores/cpus. The job we used to gather these times is the example job mentioned above.
(Every testjob was done with 6 GB of RAM, except the one with 48 cores which was done with 12 GB of RAM)
As you see in the diagram, increasing the amount of cores will reduce the time your job needs to finish (= realtime). If you reach a certain amount of cores (16 cores in this example), adding cores will not significantly decrease the real time, it will just increase the time your job is processed by the cpus (= usertime). The usertime peaks at about 16 cores too, so adding even more cores isnt beneficial in any ways.
You may also want to check the Efficiency Consideratons on the official website of Gaussian.
Current issues with Gaussian 09 Rev. D.01
Currently Rev. D.01 has a bug which can cause geometry optimizations to fail. If this happens, the following error message will appear in your log- or out-file:
Operation on file out of range. FileIO: IOper= 1 IFilNo(1)= -526 Len= 784996 IPos= 0 Q= 46912509105480 .... Error termination in NtrErr: NtrErr Called from FileIO.
The explanation from Gaussian's technical support:
This problem appears in cases where one ends up with different orthonormal subsets of basis functions at different geometries. The "Operation on file out of range" message appears right after the program tries to do an interpolation of two SCF solutions when generating the initial orbital guess for the current geometry optimization point. The goal here is to generate an improved guess for the current point but it failed. The interpolation of the previous two SCF solutions to generate the new initial guess was a new feature introduced in G09 rev. D.01. The reason why this failed in this particular case is because the total number of independent basis functions is different between the two sets of orbitals. We will have this bug fixed in a future Gaussian release, so the guess interpolation works even if the number of independent basis functions is different.
There a number of suggestions from the technical support on how to work around this problem:
A) Use “Guess=Always” to turn off this guess interpolation feature. Option "A" would work in many cases, although it may not be a viable alternative in cases where the desired SCF solution is difficult to get from the default guess and one has to prepare a special initial guess. You may try this for your case.
B) Just start a new geometry optimization from that final point reading the geometry from the checkpoint file. Option "B" should work just fine although you may run into the same issue again if, after a few geometry optimization steps, one ends up again in the scenario of having two geometries with two different numbers of basis functions.
C) Lower the threshold for selecting the linearly independent set by one order of magnitude, which may result in keeping all functions. The aforementioned threshold is controlled by "IOp(3/59=N)" which sets the threshold to 1.0D-N (N=6 is the default). Note that because an IOp is being used, one would need to run the optimization and frequency calculations separately and not as a compound job ("Opt Freq"), because IOps are not passed down to compound jobs. You may also want to use “Integral=(Acc2E=11)” or “Integral=(Acc2E=12)” if you lower this threshold as the calculations may not be as numerically robust as with the default thresholds. Option "C" may work well in many cases where there is only one (or very few) eigenvalue of the overlap matrix that is near the threshold for linear dependencies, so it may just work fine to use "IOp(3/59=7)", which will be keeping all the functions. Because of this situation, and because of potential convergence issues derived from including functions that are nearly linearly dependent, I strongly recommend using a better integral accuracy than the default, for example "Integral=(Acc2E=12)", which is two orders of magnitude better than default.
D) Use fewer diffuse functions or a better balanced basis set, so there aren’t linear dependencies with the default threshold and thus no functions are dropped. Option "D" is good since it would avoid issues with linear dependencies altogether, although it has the disadvantage that you would not be able to reproduce other results with the basis set that you are using.
Documentation
For further informations, visit the official homepage of Gaussian.