Difference between revisions of "GenomeThreader 2016"
Schwietzer (talk | contribs) (Created page with "== Introduction == GenomeThreader is a software tool to compute gene structure predictions. The gene structure predictions are calculated using a similarity-based approach whe...") |
Schwietzer (talk | contribs) |
||
Line 6: | Line 6: | ||
... on environment ''hpc-env/8.3'': | ... on environment ''hpc-env/8.3'': | ||
*'''GenomeThreader/1.7. | *'''GenomeThreader/1.7.1'''-Linux_x86_64-64bit | ||
== Loading GenomeThreader == | == Loading GenomeThreader == |
Revision as of 16:00, 1 March 2022
Introduction
GenomeThreader is a software tool to compute gene structure predictions. The gene structure predictions are calculated using a similarity-based approach where additional cDNA/EST and/or protein sequences are used to predict gene structures via spliced alignments. GenomeThreader was motivated by disabling limitations in GeneSeqer, a popular gene prediction program which is widely used for plant genome annotation. [ https://genomethreader.org 1]
Installed version(s)
The following versions are installed and currently available...
... on environment hpc-env/8.3:
- GenomeThreader/1.7.1-Linux_x86_64-64bit
Loading GenomeThreader
To load the desired version of the module, use the module load command, e.g.
module load hpc-env/8.3 module load GenomeThreader
Always remember: this commands are case sensitive!
Using GenomeThreader
To find out on how to use GenomeThreader you can just type in gth --help after loading the module to print out a help text to get you started:
$ gth --help Usage: gth [option ...] -genomic file [...] -cdna file [...] -protein file [...] Compute similarity-based gene structure predictions (spliced alignments) using cDNA/EST and/or protein sequences and assemble the resulting spliced alignments to consensus spliced alignments. -genomic specify input files containing genomic sequences mandatory option -cdna specify input files containing cDNA/EST sequences -protein specify input files containing protein sequences -species specify species to select splice site model which is most appropriate; possible species: "human" "mouse" "rat" "chicken" "drosophila" "nematode" "fission_yeast" "aspergillus" "arabidopsis" "maize" "rice" "medicago" default: undefined -bssm read bssm parameter from file in the path given by the environment variable BSSMDIR default: undefined -scorematrix read amino acid substitution scoring matrix from file in the path given by the environment variable GTHDATADIR default: BLOSUM62 -translationtable set the codon translation table used for codon translation in matching, DP, and output default: 1 -f analyze only forward strand of genomic sequences default: no -r analyze only reverse strand of genomic sequences default: no -cdnaforward align only forward strand of cDNAs default: no -frompos analyze genomic sequence from this position requires -topos or -width; counting from 1 on default: 0 -topos analyze genomic sequence to this position requires -frompos; counting from 1 on default: 0 -width analyze only this width of genomic sequence requires -frompos default: 0 -v be verbose default: no -xmlout show output in XML format default: no -gff3out show output in GFF3 format default: no -md5ids show MD5 fingerprints as sequence IDs default: no -o redirect output to specified file default: undefined -gzip write gzip compressed output file default: no -bzip2 write bzip2 compressed output file default: no -force force writing to output file default: no -gs2out output in old GeneSeqer2 format default: no -minmatchlen specify minimum match length (cDNA matching) default: 20 -seedlength specify the seed length (cDNA matching) default: 18 -exdrop specify the Xdrop value for edit distance extension (cDNA matching) default: 2 -prminmatchlen specify minimum match length (protein matches) default: 24 -prseedlength specify seed length (protein matching) default: 10 -prhdist specify Hamming distance (protein matching) default: 4 -gcmaxgapwidth set the maximum gap width for global chains defines approximately the maximum intron length set to 0 to allow for unlimited length in order to avoid false-positive exons (lonely exons) at the sequence ends, it is very important to set this parameter appropriately! default: 1000000 -gcmincoverage set the minimum coverage of global chains regarding to the reference sequence default: 50 -paralogs compute paralogous genes (different chaining procedure) default: no -introncutout enable the intron cutout technique default: no -fastdp use jump table to increase speed of DP calculation default: no -autointroncutout set the automatic intron cutout matrix size in megabytes and enable the automatic intron cutout technique default: 0 -intermediate stop after calculation of spliced alignments and output results in reusable XML format. Do not process this output yourself, use the ``normal'' XML output instead! default: no -first set the maximum number of spliced alignments per genomic DNA input. Set to 0 for unlimited number. default: 0 -help display help for basic options and exit -help+ display help for all options and exit -version display version information and exit For detailed information, please refer to the manual of GenomeThreader. Report bugs to <gordon@gremme.org>.
Documentation
The full documentation can be found here.