Spaln 2016

From HPC users
Revision as of 17:28, 1 March 2022 by Schwietzer (talk | contribs) (Created page with "== Introduction == Spaln (space-efficient spliced alignment) is a stand-alone program that maps and aligns a set of cDNA or protein sequences onto a whole genomic sequence in ...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Introduction

Spaln (space-efficient spliced alignment) is a stand-alone program that maps and aligns a set of cDNA or protein sequences onto a whole genomic sequence in a single job. Spaln also performs spliced or ordinary alignment after rapid similarity search against a protein sequence database, if a genomic segment or an amino acid sequence is given as a query. 1

Installed version(s)

The following versions are installed and currently available...

... on environment hpc-env/8.3:

  • 'spaln/2.4.03-GCC-8.3.0'
  • 'spaln/2.4.6-GCC-8.3.0'


Loading spaln

To load the desired version of the module, use the module load command, e.g.

module load hpc-env/8.3
module load spaln

Always remember: this commands are case sensitive!

Using spaln

To find out on how to use spaln you can just type in spaln after loading the module to print out a help text to get you started:

$ spaln
*** SPALN version 2.4.6 <210910> ***

Usage:
spaln -W[Genome.bkn] -KD [W_Options] Genome.mfa (to write block inf.)
spaln -W[Genome.bkp] -KP [W_Options] Genome.mfa (to write block inf.)
spaln -W[AAdb.bka] -KA [W_Options] AAdb.faa     (to write aa db inf.)
spaln -W [Genome.mfa|AAdb.faa]  (alternative to makdbs.)
spaln [R_options] genomic_segment cDNA.fa       (to align)
spaln [R_options] genomic_segment protein.fa    (to align)
spaln [R_options] -dGenome cDNA.fa      (to map & align)
spaln [R_options] -dGenome protein.fa   (to map & align)
spaln [R_options] -aAAdb genomic_segment.fa     (to search aa database & align)
spaln [R_options] -aAAdb protein.fa     (to search aa database)

in the following, # = integer or real number; $ = string; default in ()

W_Options:
        -E      Generate local lookup table for each block
        -XC#    number of bit patterns < 6 (1)
        -XG#    Maximum expected gene size (inferred from genome|db size)
        -Xk#    Word size (inferred from genome|db size)
        -Xb#    Block size (inferred from genome|db size)
        -Xa#    Abundance factor (10)
        -Xr#    Minimum ORF length with -KP (30))
        -g      gzipped output
        -t#     Mutli-thread operation with # threads

R_Options (representatives):
        -E      Use local lookup table for each block
        -H#     Minimum score for report (35)
        -L or -LS or -L#        semi-global or local alignment (-L)
        -M#[,#2]        Number of outputs per query (1) (4 if # is omitted)
                #2 (4) specifies the max number of candidate loci
                This option is effective only for map-and-align modes
        -O#[,#2,..] (GvsA|C)    0:Gff3_gene; 1:alignment; 2:Gff3_match; 3:Bed; 4:exon-inf;
                        5:intron-inf; 6:cDNA; 7:translated; 8:block-only;
                        10:SAM; 12:binary; 15:query+GS (4)
        -O#[,#2,..] (AvsA)      0:statistics; 1:alignment; 2:Sugar; 3:Psl; 4:XYL;
                        5:srat+XYL; 8:Cigar; 9:Vulgar; 10:SAM; (4)
        -Q#     0:DP; 1-3:HSP-Search; 4-7; Block-Search (3)
        -R$     Read block information file *.bkn, *.bkp or *.bka
        -S#     Orientation. 0:annotation; 1:forward; 2:reverse; 3:both (3)
        -T$     Subdirectory where species-specific parameters reside
        -a$     Specify AAdb. Must run `makeidx.pl -ia' breforehand
        -A$     Same as -a but db sequences are stored in memory
        -d$     Specify genome. Must run `makeidx.pl -i[n|p]' breforehand
        -D$     Same as -d but db sequences are stored in memory
        -g      gzipped output in combination with -O12
        -l#     Number of characters per line in alignment (60)
        -o$     File/directory/prefix where results are written (stdout)
        -pa#    Remove 3' poly A >= # (0: don't remove)
        -pw     Report results even if the score is below the threshold
        -pq     Quiet mode
        -r$     Report information about block data file
        -u#     Gap-extension penalty (3)
        -v#     Gap-open penalty (8)
        -w#     Band width for DP matrix scan (100)
        -t[#]   Mutli-thread operation with # threads
        -ya#    Stringency of splice site. 0->3:strong->weak
        -yl3    Ddouble affine gap penalty
        -ym#    Nucleotide match score (2)
        -yn#    Nucleotide mismatch score (-6)
        -yo#    Penalty for a premature termination codon (100)
        -yx#    Penalty for a frame shift error (100)
        -yy#    Weight for splice site signal (8)
        -yz#    Weight for coding potential (2)
        -yB#    Weight for branch point signal (0)
        -yI$    Intron length distribution
        -yL#    Minimum expected length of intron (30)
        -yS[#]  Use species-specific parameter set (0.0/0.5)
        -yX0    Don't use parameter set for cross-species comparison
        -yZ#    Weight for intron potential (0)
        -XG#    Reset maximum expected gene size, suffix k or M is effective

Examples:
        spaln -W -KP -E -t4 dictdisc_g.gf
        spaln -W -KA -Xk5 Swiss.faa
        spaln -O -LS 'chr1.fa 10001 40000' cdna.nfa
        spaln -Q0,1,7 -t10 -TTetrapod -XG2M -ommu/ -dmus_musc_g hspcdna.nfa
        spaln -Q7 -O5 -t10 -Tdictdics -ddictdisc_g [-E] 'dictdisc.faa (101 200)' > ddi.intron
        spaln -Q7 -O0 -t10 -Tdictdics -aSwiss 'chr1.nfa 200001 210000' > Chr1_200-210K.gff
        spaln -Q4 -O0 -t10 -M10 -aSwiss dictdisc.faa > dictdisc.alignment_score


Documentation

The full documentation can be found at the project page.