Difference between revisions of "RepeatModeler 2016"
Schwietzer (talk | contribs) |
Schwietzer (talk | contribs) |
||
Line 92: | Line 92: | ||
The path to the installation of the RMBLAST sequence alignment | The path to the installation of the RMBLAST sequence alignment | ||
program. | program. | ||
[...] | |||
</code> | </code> | ||
Latest revision as of 15:45, 25 May 2021
Introduction
RepeatModeler (by Robert Hubley and Arian Smit) is a de-novo repeat family identification and modeling package. At the heart of RepeatModeler are two de-novo repeat finding programs ( RECON and RepeatScout ) which employ complementary computational methods for identifying repeat element boundaries and family relationships from sequence data. RepeatModeler assists in automating the runs of RECON and RepeatScout given a genomic database and uses the output to build, refine and classify consensus models of putative interspersed repeats. ¹
Installed version(s)
This version is installed on... environment hpc-env/6.4:
RepeatModeler/1.0.11-foss-2017b
environment hpc-env/8.3:
RepeatModeler/2.0.2a-foss-2019b
Using RepeatModeler
If you want to find out more about RepeatModeler on the HPC Cluster, you can use the command
module spider RepeatModeler
This will show you basic informations e.g. a short description and the currently installed version.
To load the desired version of the module, use the command, e.g.
module load hpc-env/8.3 module load RepeatModeler
Always remember: this command is case sensitive!
After Loading RepeatModeler, you can just call the Executable by typing in RepeatModeler without any arguments or options to get a summary about the available options and configurations:
$ RepeatModeler
NAME
RepeatModeler - Model repetitive DNA
SYNOPSIS
RepeatModeler [-options] -database <XDF Database>
DESCRIPTION
The options are:
-h(elp)
Detailed help
-database
The name of the sequence database to run an analysis on. This is the
name that was provided to the BuildDatabase script using the "-name"
option.
-pa #
Specify the number of parallel search jobs to run. RMBlast jobs will
use 4 cores each and ABBlast jobs will use a single core each. i.e.
on a machine with 12 cores and running with RMBlast you would use
-pa 3 to fully utilize the machine.
-recoverDir <Previous Output Directory>
If a run fails in the middle of processing, it may be possible
recover some results and continue where the previous run left off.
Simply supply the output directory where the results of the failed
run were saved and the program will attempt to recover and continue
the run.
-srand #
Optionally set the seed of the random number generator to a known
value before the batches are randomly selected ( using Fisher Yates
Shuffling ). This is only useful if you need to reproduce the sample
choice between runs. This should be an integer number.
-LTRStruct
Run the LTR structural discovery pipeline ( LTR_Harvest and
LTR_retreiver ) and combine results with the RepeatScout/RECON
pipeline. [optional]
-genomeSampleSizeMax #
Optionally change the maximum bp of the genome to sample in all
rounds of RECON (default=243000000).
CONFIGURATION OVERRIDES
-genometools_dir <string>
The path to the installation of the GenomeTools package.
-cdhit_dir <string>
The path to the installation of the CD-Hit sequence clustering
package.
-ninja_dir <string>
The path to the installation of the Ninja phylogenetic analysis
package.
-rmblast_dir <string>
The path to the installation of the RMBLAST sequence alignment
program.
[...]
Documentation
The full documentation can be found here.