Difference between revisions of "RepeatModeler 2016"

From HPC users
Jump to navigationJump to search
(Created page with "== Introduction == RepeatModeler (by Robert Hubley and Arian Smit) is a de-novo repeat family identification and modeling package. At the heart of RepeatModeler are two de-no...")
 
 
(3 intermediate revisions by the same user not shown)
Line 4: Line 4:
== Installed version(s) ==
== Installed version(s) ==


This version is installed on environment ''hpc-env/6.4'':
This version is installed on...
environment ''hpc-env/6.4'':
  '''RepeatModeler/1.0.11'''-foss-2017b
  '''RepeatModeler/1.0.11'''-foss-2017b


environment ''hpc-env/8.3'':
'''RepeatModeler/2.0.2a'''-foss-2019b


== Using RepeatModeler ==
== Using RepeatModeler ==
Line 18: Line 21:
To load the desired version of the module, use the command, e.g.
To load the desired version of the module, use the command, e.g.


  module load hpc-env 6.4
  module load hpc-env/8.3
  module load RepeatModeler
  module load RepeatModeler


Line 24: Line 27:




When using RepeatModeler's help function, this will show up:  
After Loading RepeatModeler, you can just call the Executable by typing in <tt>RepeatModeler</tt> without any arguments or options to get a summary about the available options and configurations:


<code>
$  <tt>RepeatModeler</tt>
  NAME
  NAME
     RepeatModeler - Model repetitive DNA
     RepeatModeler - Model repetitive DNA
Line 39: Line 45:
   
   
     -database
     -database
         The prefix name of a XDF formatted sequence database containing the
         The name of the sequence database to run an analysis on. This is the
         genomic sequence to use when building repeat models. The database
         name that was provided to the BuildDatabase script using the "-name"
        may be created with the WUBlast "xdformat" utility or with the
         option.
        RepeatModeler wrapper script "BuildXDFDatabase".
    -engine <abblast|wublast|ncbi>
        The name of the search engine we are using. I.e abblast/wublast or
         ncbi (rmblast version).
   
   
     -pa #
     -pa #
         Specify the number of shared-memory processors available to this
         Specify the number of parallel search jobs to run. RMBlast jobs will
         program. RepeatModeler will use the processors to run BLAST searches
         use 4 cores each and ABBlast jobs will use a single core each. i.e.
        in parallel. i.e on a machine with 10 cores one might use 1 core for
        on a machine with 12 cores and running with RMBlast you would use
         the script and 9 cores for the BLAST searches by running with "-pa
         -pa 3 to fully utilize the machine.
        9".
   
   
     -recoverDir <Previous Output Directory>
     -recoverDir <Previous Output Directory>
Line 67: Line 67:
         Shuffling ). This is only useful if you need to reproduce the sample
         Shuffling ). This is only useful if you need to reproduce the sample
         choice between runs. This should be an integer number.
         choice between runs. This should be an integer number.
 
 
    -LTRStruct
 
        Run the LTR structural discovery pipeline ( LTR_Harvest and
        LTR_retreiver ) and combine results with the RepeatScout/RECON
        pipeline. [optional]
    -genomeSampleSizeMax #
        Optionally change the maximum bp of the genome to sample in all
        rounds of RECON (default=243000000).
CONFIGURATION OVERRIDES
    -genometools_dir <string>
        The path to the installation of the GenomeTools package.
    -cdhit_dir <string>
        The path to the installation of the CD-Hit sequence clustering
        package.
    -ninja_dir <string>
        The path to the installation of the Ninja phylogenetic analysis
        package.
    -rmblast_dir <string>
        The path to the installation of the RMBLAST sequence alignment
        program.
[...]
</code>


== Documentation ==
== Documentation ==


The full documentation can be found [http://www.repeatmasker.org/RepeatModeler/ here].
The full documentation can be found [http://www.repeatmasker.org/RepeatModeler/ here].

Latest revision as of 16:45, 25 May 2021

Introduction

RepeatModeler (by Robert Hubley and Arian Smit) is a de-novo repeat family identification and modeling package. At the heart of RepeatModeler are two de-novo repeat finding programs ( RECON and RepeatScout ) which employ complementary computational methods for identifying repeat element boundaries and family relationships from sequence data. RepeatModeler assists in automating the runs of RECON and RepeatScout given a genomic database and uses the output to build, refine and classify consensus models of putative interspersed repeats. ¹

Installed version(s)

This version is installed on... environment hpc-env/6.4:

RepeatModeler/1.0.11-foss-2017b

environment hpc-env/8.3:

RepeatModeler/2.0.2a-foss-2019b

Using RepeatModeler

If you want to find out more about RepeatModeler on the HPC Cluster, you can use the command

module spider RepeatModeler

This will show you basic informations e.g. a short description and the currently installed version.

To load the desired version of the module, use the command, e.g.

module load hpc-env/8.3
module load RepeatModeler

Always remember: this command is case sensitive!


After Loading RepeatModeler, you can just call the Executable by typing in RepeatModeler without any arguments or options to get a summary about the available options and configurations:

$  RepeatModeler

NAME
   RepeatModeler - Model repetitive DNA

SYNOPSIS
     RepeatModeler [-options] -database <XDF Database>

DESCRIPTION
   The options are:

   -h(elp)
       Detailed help

   -database
       The name of the sequence database to run an analysis on. This is the
       name that was provided to the BuildDatabase script using the "-name"
       option.

   -pa #
       Specify the number of parallel search jobs to run. RMBlast jobs will
       use 4 cores each and ABBlast jobs will use a single core each. i.e.
       on a machine with 12 cores and running with RMBlast you would use
       -pa 3 to fully utilize the machine.

   -recoverDir <Previous Output Directory>
       If a run fails in the middle of processing, it may be possible
       recover some results and continue where the previous run left off.
       Simply supply the output directory where the results of the failed
       run were saved and the program will attempt to recover and continue
       the run.

   -srand #
       Optionally set the seed of the random number generator to a known
       value before the batches are randomly selected ( using Fisher Yates
       Shuffling ). This is only useful if you need to reproduce the sample
       choice between runs. This should be an integer number.

   -LTRStruct
       Run the LTR structural discovery pipeline ( LTR_Harvest and
       LTR_retreiver ) and combine results with the RepeatScout/RECON
       pipeline. [optional]

   -genomeSampleSizeMax #
       Optionally change the maximum bp of the genome to sample in all
       rounds of RECON (default=243000000).

CONFIGURATION OVERRIDES
   -genometools_dir <string>
       The path to the installation of the GenomeTools package.

   -cdhit_dir <string>
       The path to the installation of the CD-Hit sequence clustering
       package.

   -ninja_dir <string>
       The path to the installation of the Ninja phylogenetic analysis
       package.

   -rmblast_dir <string>
       The path to the installation of the RMBLAST sequence alignment
       program.

[...] 

Documentation

The full documentation can be found here.