TransDecoder 2016
Introduction
TransDecoder (Find Coding Regions Within Transcripts)
TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.
TransDecoder identifies likely coding sequences based on the following criteria:
- minimum length open reading frame (ORF) is found in a transcript sequence
- a log-likelihood score similar to what is computed by the GeneID software is > 0.
- the above coding score is greatest when the ORF is scored in the 1st reading frame as compared to scores in the other 2 forward reading frames.
- if a candidate ORF is found fully encapsulated by the coordinates of another candidate ORF, the longer one is reported. However, a single transcript can report multiple ORFs (allowing for operons, chimeras, etc).
- a PSSM is built/trained/used to refine the start codon prediction.
- optional the putative peptide has a match to a Pfam domain above the noise cutoff score. 1
Installed version(s)
The following versions are installed and currently available...
... on environment hpc-env/8.3:
- TransDecoder/5.5.0-intel-2019b-Perl-5.30.0
Loading / Using TransDecoder
To load the desired version of the module, use the module load command, e.g.
module load hpc-env/8.3 module load TransDecoder
Always remember: this command is case sensitive!
Transdecoder is mainly to be used by using the two containing scripts: TransDecoder.Predict and TransDecoder.LongOrfs
To find out on how to use TransDecoder you can just type in one of the two scripts without any entailed options or arguments. For example, this is what you will get after typing in TransDecoder.Predict:
$ TransDecoder.Predict ######################################################################################## # ______ ___ __ # /_ __/______ ____ ___ / _ \___ _______ ___/ /__ ____ # / / / __/ _ `/ _\(_-</ // / -_) __/ _ \/ _ / -_) __/ # /_/ /_/ \_,_/_//_/___/____/\__/\__/\___/\_,_/\__/_/ .Predict # ######################################################################################## # # Transdecoder.LongOrfs|http://transdecoder.github.io> - Transcriptome Protein Prediction # # # Required: # # -t <string> transcripts.fasta # # Common options: # # # --retain_long_orfs_mode <string> 'dynamic' or 'strict' (default: dynamic) # In dynamic mode, sets range according to 1%FDR in random sequence of same GC content. # # # --retain_long_orfs_length <int> under 'strict' mode, retain all ORFs found that are equal or longer than these many nucleotides even if no other evidence # marks it as coding (default: 1000000) so essentially turned off by default.) # # --retain_pfam_hits <string> domain table output file from running hmmscan to search Pfam (see transdecoder.github.io for info) # Any ORF with a pfam domain hit will be retained in the final output. # # --retain_blastp_hits <string> blastp output in '-outfmt 6' format. # Any ORF with a blast match will be retained in the final output. # # --single_best_only Retain only the single best orf per transcript (prioritized by homology then orf length) # # --output_dir | -O <string> output directory from the TransDecoder.LongOrfs step (default: basename( -t val ) + ".transdecoder_dir") # # -G <string> genetic code (default: universal; see PerlDoc; options: Euplotes, Tetrahymena, Candida, Acetabularia, ...) # # --no_refine_starts start refinement identifies potential start codons for 5' partial ORFs using a PWM, process on by default. # ## Advanced options # # -T <int> Top longest ORFs to train Markov Model (hexamer stats) (default: 500) # Note, 10x this value are first selected for removing redundancies, # and then this -T value of longest ORFs are selected from the non-redundant set. # Genetic Codes # # # --genetic_code <string> Universal (default) # # Genetic Codes (derived from: https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi) # # Acetabularia Candida Ciliate Dasycladacean Euplotid Hexamita Mesodinium Mitochondrial-Ascidian Mitochondrial-Chlorophycean Mitochondrial-Echinoderm Mitochondrial-Flatworm Mitochondrial-Invertebrates Mitochondrial-Protozoan Mitochondrial-Pterobranchia Mitochondrial-Scenedesmus_obliquus Mitochondrial-Thraustochytrium Mitochondrial-Trematode Mitochondrial-Vertebrates Mitochondrial-Yeast Pachysolen_tannophilus Peritrich SR1_Gracilibacteria Tetrahymena Universal # # --version show version (5.5.0) # #########################################################################################
Documentation
The full documentation can be found here.