Difference between revisions of "BLAT"
Schwietzer (talk | contribs) |
|||
(One intermediate revision by one other user not shown) | |||
Line 16: | Line 16: | ||
== Installed version == | == Installed version == | ||
The currently installed version is 3.5. | The currently installed version is 3.5 on all environments (hpc-env/8.3, hpc-env/6.4, hpc-uniol-env) | ||
== Using BLAT with the HPC Cluster == | == Using BLAT with the HPC Cluster == | ||
Line 32: | Line 32: | ||
Always remember: this command is case sensitive! | Always remember: this command is case sensitive! | ||
== Example usage == | |||
blat < | After loading the module, you could, for example, run BLAT by using following command | ||
blat <DATABASE> <QUERY> [-ooc=11.ooc] output.psl | |||
By adding the ''-occ'' flag you will most likely speed up your search, although you might not find certain sequences. If you can afford extra processing time, you may want to run blat without the ''-ooc=11.ooc'' flag if your particular situation warrants its use. | By adding the ''-occ'' flag you will most likely speed up your search, although you might not find certain sequences. If you can afford extra processing time, you may want to run blat without the ''-ooc=11.ooc'' flag if your particular situation warrants its use. |
Latest revision as of 15:35, 25 May 2021
Introduction
Like BLAST, Blat is an alignment tool, but it is structured differently. On DNA, Blat works by keeping an index of an entire genome in memory. Thus, the target database of BLAT is not a set of GenBank sequences, but instead an index derived from the assembly of the entire genome. By default, the index consists of all non-overlapping 11-mers except for those heavily involved in repeats, and it uses less than a gigabyte of RAM. This smaller size means that Blat is far more easily mirrored than BLAST. Blat of DNA is designed to quickly find sequences of 95% and greater similarity of length 40 bases or more. It may miss more divergent or shorter sequence alignments. (The default settings and expected behavior of standalone Blat are slightly different from those on the graphical version of Blat.)
On proteins, Blat uses 4-mers rather than 11-mers, finding protein sequences of 80% and greater similarity to the query of length 20+ amino acids. The protein index requires slightly more than 2 gigabytes of RAM. In practice -- due to sequence divergence rates over evolutionary time -- DNA Blat works well within humans and primates, while protein Blat continues to find good matches within terrestrial vertebrates and even earlier organisms for conserved proteins. Within humans, protein Blat gives a much better picture of gene families (paralogs) than DNA Blat. However, BLAST and psi-BLAST at NCBI can find much more remote matches.
From a practical standpoint, Blat has several advantages over BLAST:
- speed (no queues, response in seconds) at the price of lesser homology depth
- the ability to submit a long list of simultaneous queries in fasta format
- five convenient output sort options
- a direct link into the UCSC browser
- alignment block details in natural genomic order
- an option to launch the alignment later as part of a custom track
Installed version
The currently installed version is 3.5 on all environments (hpc-env/8.3, hpc-env/6.4, hpc-uniol-env)
Using BLAT with the HPC Cluster
If you want to find out more about BLAT on the HPC Cluster, you can use the command
module spider blat
This will show you basic informations e.g. a short description and the currently installed version.
To load the desired version of the module, use the command, e.g.
module load BLAT
Always remember: this command is case sensitive!
Example usage
After loading the module, you could, for example, run BLAT by using following command
blat <DATABASE> <QUERY> [-ooc=11.ooc] output.psl
By adding the -occ flag you will most likely speed up your search, although you might not find certain sequences. If you can afford extra processing time, you may want to run blat without the -ooc=11.ooc flag if your particular situation warrants its use.
Documentation
The full documentation can be found here.