FastANI 2016
Introduction
FastANI is developed for fast alignment-free computation of whole-genome Average Nucleotide Identity (ANI). ANI is defined as mean nucleotide identity of orthologous gene pairs shared between two microbial genomes. FastANI supports pairwise comparison of both complete and draft genome assemblies. Its underlying procedure follows a similar workflow as described by Goris et al. 2007. However, it avoids expensive sequence alignments and uses Mashmap as its MinHash based sequence mapping engine to compute the orthologous mappings and alignment identity estimates. Based on our experiments with complete and draft genomes, its accuracy is on par with BLAST-based ANI solver and it achieves two to three orders of magnitude speedup. Therefore, it is useful for pairwise ANI computation of large number of genome pairs. More details about its speed, accuracy and potential applications are described here: "High Throughput ANI Analysis of 90K Prokaryotic Genomes Reveals Clear Species Boundaries". 1
Installed version(s)
The following versions are installed and currently available...
... on environment hpc-env/8.3:
- FastANI/1.33-GCC-8.3.0
Loading / Using FastANI
To load the desired version of the module, use the module load command, e.g.
module load hpc-env/8.3 module load FastANI
Always remember: this command is case sensitive!
To find out on how to use FastANI you can just type in fastANI --help to print out a help text to get you started:
fastANI -h ----------------- fastANI is a fast alignment-free implementation for computing whole-genome Average Nucleotide Identity (ANI) between genomes ----------------- Example usage: $ fastANI -q genome1.fa -r genome2.fa -o output.txt $ fastANI -q genome1.fa --rl genome_list.txt -o output.txt SYNOPSIS -------- fastANI [-h] [-r <value>] [--rl <value>] [-q <value>] [--ql <value>] [-k <value>] [-t <value>] [--fragLen <value>] [--minFraction <value>] [--visualize] [--matrix] [-o <value>] [-v] OPTIONS -------- -h, --help print this help page -r, --ref <value> reference genome (fasta/fastq)[.gz] --rl, --refList <value> a file containing list of reference genome files, one genome per line -q, --query <value> query genome (fasta/fastq)[.gz] --ql, --queryList <value> a file containing list of query genome files, one genome per line -k, --kmer <value> kmer size <= 16 [default : 16] -t, --threads <value> thread count for parallel execution [default : 1] --fragLen <value> fragment length [default : 3,000] --minFraction <value> minimum fraction of genome that must be shared for trusting ANI. If reference and query genome size differ, smaller one among the two is considered. [default : 0.2] --visualize output mappings for visualization, can be enabled for single genome to single genome comparison only [disabled by default] --matrix also output ANI values as lower triangular matrix (format inspired from phylip). If enabled, you should expect an output file with .matrix extension [disabled by default] -o, --output <value> output file name -v, --version show version
Documentation
The full documentation can be found here.