Discovardenovo 2016

From HPC users
Jump to navigationJump to search

Introduction

DISCOVAR is a new variant caller and DISCOVAR de novo a new genome assembler, both designed for state-of-the-art data. Their inputs are chosen to optimize quality while keeping costs low. Currently it takes as input Illumina reads of length 250 or longer — produced on MiSeq or HiSeq 2500 — and from a single PCR-free library. These data enable a level of completeness and continuity that was not previously possible.

DISCOVAR can call variants on a region by region basis, potentially tiling an entire large genome. DISCOVAR variant calling is under active development and transitioning to VCF.

DISCOVAR de novo can generate de novo assemblies for both large and small genomes. It currently does not call variants. 1

Installed version(s)

The following versions are installed and currently available on the environments hpc-env/8.3, hpc-env/6.4, and hpc-uniol-env:

  • discovardenovo/52488

Loading / Using discovardenovo

To load the desired version of the module, use the module load command, e.g.

module load hpc-env/8.3
module load discovardenovo 

Always remember: this command is case sensitive!

Discovardenovo is loadable as only Discovar or as DiscovarDeNovo:

To find out on how to use Discovar you can just type in Discovar --help to print out a help text to get you started:

Performing re-exec to adjust stack size.

Usage: Discovar arg1=value1 arg2=value2 ...

Required arguments:

READS (String) 
  Comma-separated list of one or more bam files, each ending in .bam.
  Alternatively, this may have the form @fn, where fn is a file
  containing a list of bam file names, one per line.
REGIONS (String) 
  Regions to be extracted from bam files: a comma-separated list of one
  or more region specifications chr:start-stop, where chr is a
  chromosome name (consistent with usage in the bam files), and
  start-stop defines a range of bases on chr (zero based). If REGIONS
  = all, bam files will be used in their entirety.
TMP (String) 
  Directory to put temporary files in.
OUT_HEAD (String) 
  Full path prefix for output files.

Optional arguments:

NUM_THREADS (unsigned int) default: 0 
  Number of threads to use (use all available processors if set to 0).
REFERENCE (String) 
  FASTA file containing reference - used for variant calling.
STATUS_LOGGING (Bool) default: False 
  if set to True, generate cryptic logging that reports on the status
  of intermediate calculations
USE_OLD_LRP_METHOD (Bool) default: True 
DRY_RUN (Bool) default: False 
  Set to True for a dry run to check input parameters.
MAX_MEMORY_GB (longlong) default: 0 
  Try not to use more than this amount of memory.

To see additional special arguments, type: Discovar --help special

To find out on how to use DiscovarDeNovo you can just type in DiscovarDeNovo --help to print out a help text to get you started:

Performing re-exec to adjust stack size.

Usage: DiscovarDeNovo arg1=value1 arg2=value2 ...

DISCOVAR de novo (experimental) is a de novo genome assembler that
requires only a single PCR-free paired end Illumina library containing
250 base reads.

Required arguments:

READS (String) 
  Comma-separated list of input files, see manual for details
OUT_DIR (String) 
  name of output directory

Optional arguments:

NUM_THREADS (unsigned int) default: 0 
  Number of threads. By default, the number of processors online.
REFHEAD (String) 
  use reference sequence REFHEAD.fasta to annotate assembly, and also
  REFHEAD.names if it exists
MAX_MEM_GB (double) default: 0 
  if specified, maximum allowed RAM use in GB; in some cases may be
  exceeded by our code
MEMORY_CHECK (Bool) default: False 
  if True, attempt to determine actual available memory and cap memory
  usage accordingly; slow and can cause machine to become very
  sluggish, or can result in process being killed

To see additional special arguments, type: DiscovarDeNovo --help special



Documentation

The full documentation can be found here.