FastANI 2016

From HPC users
Jump to navigationJump to search


FastANI is developed for fast alignment-free computation of whole-genome Average Nucleotide Identity (ANI). ANI is defined as mean nucleotide identity of orthologous gene pairs shared between two microbial genomes. FastANI supports pairwise comparison of both complete and draft genome assemblies. Its underlying procedure follows a similar workflow as described by Goris et al. 2007. However, it avoids expensive sequence alignments and uses Mashmap as its MinHash based sequence mapping engine to compute the orthologous mappings and alignment identity estimates. Based on our experiments with complete and draft genomes, its accuracy is on par with BLAST-based ANI solver and it achieves two to three orders of magnitude speedup. Therefore, it is useful for pairwise ANI computation of large number of genome pairs. More details about its speed, accuracy and potential applications are described here: "High Throughput ANI Analysis of 90K Prokaryotic Genomes Reveals Clear Species Boundaries". 1

Installed version(s)

The following versions are installed and currently available...

... on environment hpc-env/8.3:

  • FastANI/1.33-GCC-8.3.0
  • FastANI/1.33-iccifort-2019b

Loading / Using FastANI

To load the desired version of the module, use the module load command, e.g.

module load hpc-env/8.3
module load FastANI 

Always remember: this command is case sensitive!

To find out on how to use FastANI you can just type in fastANI --help to print out a help text to get you started:

 fastANI -h
fastANI is a fast alignment-free implementation for computing whole-genome Average Nucleotide Identity (ANI) between genomes
Example usage:
$ fastANI -q genome1.fa -r genome2.fa -o output.txt
$ fastANI -q genome1.fa --rl genome_list.txt -o output.txt

fastANI [-h] [-r <value>] [--rl <value>] [-q <value>] [--ql <value>] [-k
        <value>] [-t <value>] [--fragLen <value>] [--minFraction <value>]
        [--visualize] [--matrix] [-o <value>] [-v]

-h, --help
     print this help page

-r, --ref <value>
     reference genome (fasta/fastq)[.gz]

--rl, --refList <value>
     a file containing list of reference genome files, one genome per line

-q, --query <value>
     query genome (fasta/fastq)[.gz]

--ql, --queryList <value>
     a file containing list of query genome files, one genome per line

-k, --kmer <value>
     kmer size <= 16 [default : 16]

-t, --threads <value>
     thread count for parallel execution [default : 1]

--fragLen <value>
     fragment length [default : 3,000]

--minFraction <value>
     minimum fraction of genome that must be shared for trusting ANI. If
     reference and query genome size differ, smaller one among the two is
     considered. [default : 0.2]

     output mappings for visualization, can be enabled for single genome to
     single genome comparison only [disabled by default]

     also output ANI values as lower triangular matrix (format inspired from
     phylip). If enabled, you should expect an output file with .matrix
     extension [disabled by default]

-o, --output <value>
     output file name

-v, --version
     show version


The full documentation can be found here.