BLAT

From HPC users
Revision as of 15:35, 25 May 2021 by Schwietzer (talk | contribs) (→‎Installed version)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Introduction

Like BLAST, Blat is an alignment tool, but it is structured differently. On DNA, Blat works by keeping an index of an entire genome in memory. Thus, the target database of BLAT is not a set of GenBank sequences, but instead an index derived from the assembly of the entire genome. By default, the index consists of all non-overlapping 11-mers except for those heavily involved in repeats, and it uses less than a gigabyte of RAM. This smaller size means that Blat is far more easily mirrored than BLAST. Blat of DNA is designed to quickly find sequences of 95% and greater similarity of length 40 bases or more. It may miss more divergent or shorter sequence alignments. (The default settings and expected behavior of standalone Blat are slightly different from those on the graphical version of Blat.)

On proteins, Blat uses 4-mers rather than 11-mers, finding protein sequences of 80% and greater similarity to the query of length 20+ amino acids. The protein index requires slightly more than 2 gigabytes of RAM. In practice -- due to sequence divergence rates over evolutionary time -- DNA Blat works well within humans and primates, while protein Blat continues to find good matches within terrestrial vertebrates and even earlier organisms for conserved proteins. Within humans, protein Blat gives a much better picture of gene families (paralogs) than DNA Blat. However, BLAST and psi-BLAST at NCBI can find much more remote matches.

From a practical standpoint, Blat has several advantages over BLAST:

  • speed (no queues, response in seconds) at the price of lesser homology depth
  • the ability to submit a long list of simultaneous queries in fasta format
  • five convenient output sort options
  • a direct link into the UCSC browser
  • alignment block details in natural genomic order
  • an option to launch the alignment later as part of a custom track

Installed version

The currently installed version is 3.5 on all environments (hpc-env/8.3, hpc-env/6.4, hpc-uniol-env)

Using BLAT with the HPC Cluster

If you want to find out more about BLAT on the HPC Cluster, you can use the command

module spider blat

This will show you basic informations e.g. a short description and the currently installed version.

To load the desired version of the module, use the command, e.g.

module load BLAT

Always remember: this command is case sensitive!

Example usage

After loading the module, you could, for example, run BLAT by using following command

blat <DATABASE> <QUERY> [-ooc=11.ooc] output.psl

By adding the -occ flag you will most likely speed up your search, although you might not find certain sequences. If you can afford extra processing time, you may want to run blat without the -ooc=11.ooc flag if your particular situation warrants its use.

Documentation

The full documentation can be found here.