Kaiju 2016
Introduction
Kaiju is a software tool that allows for fast and accurate taxonomic classification of high-throughput sequencing reads. It can be used for both metagenomic and genomic data analysis and uses a reference database of complete genomes and/or protein sequences to classify reads.
Installed version(s)
The following versions are installed and currently available...
... on environment hpc-env/8.3:
- Kaiju/1.9.2-GCCcore-8.3.0
Loading Kaiju
To load the desired version of the module, use the module load command, e.g.
module load hpc-env/8.3 module load Kaiju
Always remember: these commands are case-sensitive!
Using Kaiju
To find out of how to use Kaiju you can just type in kaiju after loading the module to print out a help text to get you started:
Kaiju 1.9.2 Copyright 2015-2022 Peter Menzel, Anders Krogh License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> Usage: kaiju -t nodes.dmp -f kaiju_db.fmi -i reads.fastq [-j reads2.fastq] Mandatory arguments: -t FILENAME Name of nodes.dmp file -f FILENAME Name of database (.fmi) file -i FILENAME Name of input file containing reads in FASTA or FASTQ format Optional arguments: -j FILENAME Name of second input file for paired-end reads -o FILENAME Name of output file. If not specified, output will be printed to STDOUT -z INT Number of parallel threads for classification (default: 1) -a STRING Run mode, either "mem" or "greedy" (default: greedy) -e INT Number of mismatches allowed in Greedy mode (default: 3) -m INT Minimum match length (default: 11) -s INT Minimum match score in Greedy mode (default: 65) -E FLOAT Minimum E-value in Greedy mode (default: 0.01) -x Enable SEG low complexity filter (enabled by default) -X Disable SEG low complexity filter -p Input sequences are protein sequences -v Enable verbose output
The kaiju command is used to classify reads with the Kaiju software. Here are some examples:
- Build database:
kaiju-makedb -s viruses -t 2
Creates database with which we will execute the example commands:
- Classify single-end Illumina reads:
kaiju -t nodes.dmp -f custom_db.faa -i reads.fastq.gz -o kaiju_output.txt
This command classifies single-end Illumina reads in file "reads.fastq.gz" using the custom database built with "kaiju-build" and outputs the results to "kaiju_output.txt".
- Classify paired-end Illumina reads:
kaiju -t nodes.dmp -f custom_db.faa -i reads_1.fastq.gz -j reads_2.fastq.gz -o kaiju_output.txt
This command classifies paired-end Illumina reads in files "reads_1.fastq.gz" and "reads_2.fastq.gz" using the custom database and outputs the results to "kaiju_output.txt".
- Classify reads with a specified minimum length:
kaiju -t nodes.dmp -f custom_db.faa -i reads.fastq.gz -o kaiju_output.txt -l 100
This command classifies reads in file "reads.fastq.gz" using the custom database, but only considers reads with a minimum length of 100 nucleotides, specified with the "-l" option.
- Classify reads with a specified number of mismatches:
kaiju -t nodes.dmp -f custom_db.faa -i reads.fastq.gz -o kaiju_output.txt -m 2
This command classifies reads in file "reads.fastq.gz" using the custom database, but allows for up to 2 mismatches per read, specified with the "-m" option.
Documentation
The full documentation can be found here.