Difference between revisions of "ORCA"
(added a page for the newly installed chemistry code ORCA) |
m (→Serial Runs) |
||
(12 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
= ORCA = | |||
== Introduction == | |||
The program ORCA is a modern electronic structure program package that is able to carry out geometry optimizations and to predict a large number of spectroscopic parameters at different levels of theory. Besides the use of Hartee Fock theory, density functional theory (DFT) and semiempirical methods, high level ab initio quantum chemical methods, based on the configuration interaction and coupled cluster methods, are included into ORCA to an increasing degree. | The program ORCA is a modern electronic structure program package that is able to carry out geometry optimizations and to predict a large number of spectroscopic parameters at different levels of theory. Besides the use of Hartee Fock theory, density functional theory (DFT) and semiempirical methods, high level ab initio quantum chemical methods, based on the configuration interaction and coupled cluster methods, are included into ORCA to an increasing degree. | ||
Line 11: | Line 11: | ||
Below, a short introduction to using ORCA on the cluster is given. | Below, a short introduction to using ORCA on the cluster is given. | ||
== Using ORCA on the Cluster == | |||
The first thing that you need to do is to load the ORCA module using the command | The first thing that you need to do is to load the ORCA module using the command | ||
Line 17: | Line 17: | ||
This will load the latest version of ORCA installed (currently, 3.0.3). More specifically, it adds the path of the ORCA executables to your environment. | This will load the latest version of ORCA installed (currently, 3.0.3). More specifically, it adds the path of the ORCA executables to your environment. | ||
=== Serial Runs === | |||
Once the job script | Next, you will need to prepare a job script which could look as follows in case of a serial run (no line starting with <tt>%pal</tt>): | ||
qsub | <pre> | ||
#!/bin/bash | |||
### the following lines are used by SGE to determine your job requirements | |||
### you may want to modify the requested resources, e.g. h_vmem or h_rt, | |||
### if you need more memory or run time (respectively) | |||
#$ -S /bin/bash | |||
#$ -cwd | |||
#$ -l h_rt=1:00:0 | |||
#$ -l h_vmem=1500M | |||
#$ -l h_fsize=100G | |||
#$ -N ORCA | |||
#$ -j n | |||
### maybe useful to receive emails when a job begins (b) and has finished (e) | |||
### (to activate change email address and remove extra # in the next 2 lines) | |||
##$ -m be | |||
##$ -M your.name@uni-oldenburg.de | |||
### here the actual job script begins | |||
### you may want to modify this part according to your needs | |||
### the script expect input files in the directory from where you submitted | |||
### the job and any output will appear in that directory as well, however | |||
### some files may not show up until the job has finished | |||
echo Job started on `date` # put a time stamp in log file | |||
# load module orca | |||
module load orca | |||
# settings (here you need to make your own modifications) | |||
MODEL=TiF3 # MODEL is the basename for all files | |||
# settings (may have to adjusted but not always) | |||
ORCAEXE=`which orca` # no need to be touched | |||
INPUTEXT="inp xyz gbw" # files with these extension will be copied | |||
OUTPUTEXT="gbw prop" # files with these extensions will saved to SGE_O_WORKDIR | |||
# preparing $TMPDIR for run by copying file | |||
for ext in $INPUTEXT | |||
do | |||
if [ -e $MODEL.$ext ] | |||
then | |||
echo "Copying $MODEL.$ext to TMPDIR" | |||
cp $MODEL.$ext $TMPDIR/${MODEL}_${JOB_ID}.$ext | |||
fi | |||
done | |||
# change to $TMPDIR for running ORCA | |||
cd $TMPDIR | |||
# run ORCA | |||
$ORCAEXE ${MODEL}_${JOB_ID}.inp > $SGE_O_WORKDIR/${MODEL}_${JOB_ID}.out | |||
# saving files from $TMPDIR | |||
for ext in $OUTPUTEXT | |||
do | |||
if [ -e ${MODEL}_${JOB_ID}.$ext ] | |||
then | |||
echo "Copying $MODEL.$ext to $SGE_O_WORKDIR" | |||
cp ${MODEL}_${JOB_ID}.$ext $SGE_O_WORKDIR | |||
fi | |||
done | |||
echo Job finished on `date` # put a time stamp in log file | |||
exit | |||
</pre> | |||
You can download this job script <tt>[[media:orca_ser.sge.gz|orca_ser.sge]]</tt> and modify it to your needs (e.g. to change the requested resources). The job script requires additional input files for ORCA, in this case <tt>[[media:TiF3.inp.gz|TiF3.inp]]</tt> and <tt>[[media:TiF3.xyz.gz|TiF3.xyz]]</tt> and all three files have to be placed in the same directory. Note: all downloads have to be unzipped first. | |||
Once the job script and your input files are ready, a job can be submitted as usual with the command: | |||
qsub orca_ser.sge | |||
The job script works roughly in the following way | |||
# the ORCA module is loaded and the name of the model is set (must be identical to the name of the <tt>.inp</tt> file) | |||
# all input files (identified by the model name and given extensions) are copied to <tt>$TMPDIR</tt>, more files can be included by adding their extensions to the variable <tt>INPUTEXT</tt> | |||
# the directory is changed to <tt>$TMPDIR</tt> and the run is started, a log file for the run (extension .out) is written to the directory from where the job was submitted | |||
# all other files are created in <tt>$TMPDIR</tt>, which is automatically deleted after the job; if additional files need to be saved, they need to be copied (not yet implemented in the job script) | |||
=== Parallel Run === | |||
The job script <tt>[[media:orca_par.sge.gz|orca_par.sge]]</tt> for parallel runs is quite similar to the one for serial runs. The example also requires an input file (<tt>[[media:silole_rad_zora_epr_pbe0.inp.gz|silole_rad_zora_epr_pbe0.inp]]</tt>), which contains a line starting with <tt>%pal</tt> which indicates to ORCA that computations can be done in parallel. The additional lines in the job script are explained here: | |||
1. A parallel environment (PE) and the number of compute cores have to be requested | |||
<pre> | |||
### setting up parallel environment | |||
#$ -pe smp 8 # use PE smp to make sure all cores are on the same host | |||
##$ -pe openmpi 4 # use PE openmpi if you need more than 12 cores | |||
</pre> | |||
ORCA uses OpenMPI to parallelize parts of the code, so in principle the PE openmpi should be used (PE openmpi not tested yet). However, on HERO this means that the job is likely to be distributed among multiple nodes resulting in a larger communication overhead due to the ethernet communication. As an alternative you can use the PE smp with up to 12 cores which ensures that ORCA only runs on one node in parallel (for more effective communication). In addition, you may need to request much more memory (h_vmem) per core with PE openmpi (not verified yet). | |||
2. set the number of cores in the ORCA input file | |||
<pre> | |||
# modify inputfile to match the number of available slots | |||
SETNPROCS=`echo "%pal nprocs $NSLOTS"` | |||
sed -i "/^%pal/c$SETNPROCS" $MODEL.inp | |||
</pre> | |||
This ensures the number of cores/nprocs are matched with the requested number of cores in the previous step. This overwrites any setting in your input file, i.e. only the setting in the job script is relevant. The input file in your working directory is not changed, however. It is also important that you input file contains the line starting with <tt>%pal</tt> (do not use the other format to set nprocs in ORCA). | |||
3. for PE openmpi the list of nodes to be used is given as a file which is (copied and) renamed as expected by ORCA. | |||
<pre> | |||
# copy file machines to the name ORCA expects | |||
if [ -e machines ] | |||
then | |||
cp machines $MODEL.nodes | |||
fi | |||
</pre> | |||
Typically, you only need to modify the number of cores you want to use (1.), in addition to the modification explained above for serial runs (e.g. model name). | |||
== Troubleshooting == | |||
In case of problems the following hints may help you to identify the cause: | |||
# check the log files from the SGE (<job-name>.x<job-id> where x is e, o, pe, and/or po) as well as the ORCA log file (<model>.out) for error messages. | |||
# check the exit status of the job by using | |||
qacct -j <job-id> | |||
The last command should show a number of lines, including the exit code, which could look like this: | |||
failed 100 : assumedly after job | |||
exit_status 137 | |||
this indicates that a resource (memory, run time, file size) was over-used. | |||
If you need help to identify the problem you can contact Scientific Computing, please include the job-id in your request. |
Latest revision as of 11:17, 18 November 2015
ORCA
Introduction
The program ORCA is a modern electronic structure program package that is able to carry out geometry optimizations and to predict a large number of spectroscopic parameters at different levels of theory. Besides the use of Hartee Fock theory, density functional theory (DFT) and semiempirical methods, high level ab initio quantum chemical methods, based on the configuration interaction and coupled cluster methods, are included into ORCA to an increasing degree.
For more details please refer the offical home of ORCA where you can also find a thorough documentation on using the program. Note that ORCA is free of charge for non-commercial use and by using ORCA on the cluster you are accepting the ORCA license. In particular, any scientific work using ORCA should at least cite
F. Neese: The ORCA program system (WIREs Comput Mol Sci 2012, 2: 73-78)
as well as other related works as apropriate.
Below, a short introduction to using ORCA on the cluster is given.
Using ORCA on the Cluster
The first thing that you need to do is to load the ORCA module using the command
module load orca
This will load the latest version of ORCA installed (currently, 3.0.3). More specifically, it adds the path of the ORCA executables to your environment.
Serial Runs
Next, you will need to prepare a job script which could look as follows in case of a serial run (no line starting with %pal):
#!/bin/bash ### the following lines are used by SGE to determine your job requirements ### you may want to modify the requested resources, e.g. h_vmem or h_rt, ### if you need more memory or run time (respectively) #$ -S /bin/bash #$ -cwd #$ -l h_rt=1:00:0 #$ -l h_vmem=1500M #$ -l h_fsize=100G #$ -N ORCA #$ -j n ### maybe useful to receive emails when a job begins (b) and has finished (e) ### (to activate change email address and remove extra # in the next 2 lines) ##$ -m be ##$ -M your.name@uni-oldenburg.de ### here the actual job script begins ### you may want to modify this part according to your needs ### the script expect input files in the directory from where you submitted ### the job and any output will appear in that directory as well, however ### some files may not show up until the job has finished echo Job started on `date` # put a time stamp in log file # load module orca module load orca # settings (here you need to make your own modifications) MODEL=TiF3 # MODEL is the basename for all files # settings (may have to adjusted but not always) ORCAEXE=`which orca` # no need to be touched INPUTEXT="inp xyz gbw" # files with these extension will be copied OUTPUTEXT="gbw prop" # files with these extensions will saved to SGE_O_WORKDIR # preparing $TMPDIR for run by copying file for ext in $INPUTEXT do if [ -e $MODEL.$ext ] then echo "Copying $MODEL.$ext to TMPDIR" cp $MODEL.$ext $TMPDIR/${MODEL}_${JOB_ID}.$ext fi done # change to $TMPDIR for running ORCA cd $TMPDIR # run ORCA $ORCAEXE ${MODEL}_${JOB_ID}.inp > $SGE_O_WORKDIR/${MODEL}_${JOB_ID}.out # saving files from $TMPDIR for ext in $OUTPUTEXT do if [ -e ${MODEL}_${JOB_ID}.$ext ] then echo "Copying $MODEL.$ext to $SGE_O_WORKDIR" cp ${MODEL}_${JOB_ID}.$ext $SGE_O_WORKDIR fi done echo Job finished on `date` # put a time stamp in log file exit
You can download this job script orca_ser.sge and modify it to your needs (e.g. to change the requested resources). The job script requires additional input files for ORCA, in this case TiF3.inp and TiF3.xyz and all three files have to be placed in the same directory. Note: all downloads have to be unzipped first.
Once the job script and your input files are ready, a job can be submitted as usual with the command:
qsub orca_ser.sge
The job script works roughly in the following way
- the ORCA module is loaded and the name of the model is set (must be identical to the name of the .inp file)
- all input files (identified by the model name and given extensions) are copied to $TMPDIR, more files can be included by adding their extensions to the variable INPUTEXT
- the directory is changed to $TMPDIR and the run is started, a log file for the run (extension .out) is written to the directory from where the job was submitted
- all other files are created in $TMPDIR, which is automatically deleted after the job; if additional files need to be saved, they need to be copied (not yet implemented in the job script)
Parallel Run
The job script orca_par.sge for parallel runs is quite similar to the one for serial runs. The example also requires an input file (silole_rad_zora_epr_pbe0.inp), which contains a line starting with %pal which indicates to ORCA that computations can be done in parallel. The additional lines in the job script are explained here:
1. A parallel environment (PE) and the number of compute cores have to be requested
### setting up parallel environment #$ -pe smp 8 # use PE smp to make sure all cores are on the same host ##$ -pe openmpi 4 # use PE openmpi if you need more than 12 cores
ORCA uses OpenMPI to parallelize parts of the code, so in principle the PE openmpi should be used (PE openmpi not tested yet). However, on HERO this means that the job is likely to be distributed among multiple nodes resulting in a larger communication overhead due to the ethernet communication. As an alternative you can use the PE smp with up to 12 cores which ensures that ORCA only runs on one node in parallel (for more effective communication). In addition, you may need to request much more memory (h_vmem) per core with PE openmpi (not verified yet).
2. set the number of cores in the ORCA input file
# modify inputfile to match the number of available slots SETNPROCS=`echo "%pal nprocs $NSLOTS"` sed -i "/^%pal/c$SETNPROCS" $MODEL.inp
This ensures the number of cores/nprocs are matched with the requested number of cores in the previous step. This overwrites any setting in your input file, i.e. only the setting in the job script is relevant. The input file in your working directory is not changed, however. It is also important that you input file contains the line starting with %pal (do not use the other format to set nprocs in ORCA).
3. for PE openmpi the list of nodes to be used is given as a file which is (copied and) renamed as expected by ORCA.
# copy file machines to the name ORCA expects if [ -e machines ] then cp machines $MODEL.nodes fi
Typically, you only need to modify the number of cores you want to use (1.), in addition to the modification explained above for serial runs (e.g. model name).
Troubleshooting
In case of problems the following hints may help you to identify the cause:
- check the log files from the SGE (<job-name>.x<job-id> where x is e, o, pe, and/or po) as well as the ORCA log file (<model>.out) for error messages.
- check the exit status of the job by using
qacct -j <job-id>
The last command should show a number of lines, including the exit code, which could look like this:
failed 100 : assumedly after job exit_status 137
this indicates that a resource (memory, run time, file size) was over-used.
If you need help to identify the problem you can contact Scientific Computing, please include the job-id in your request.