Kaiju 2016

Introduction

Kaiju is a software tool that allows for fast and accurate taxonomic classification of high-throughput sequencing reads. It can be used for both metagenomic and genomic data analysis and uses a reference database of complete genomes and/or protein sequences to classify reads.

Installed version(s)

The following versions are installed and currently available...

... on environment hpc-env/8.3:

Kaiju/1.9.2-GCCcore-8.3.0

Loading Kaiju

To load the desired version of the module, use the module load command, e.g.

module load hpc-env/8.3
module load Kaiju

Always remember: these commands are case-sensitive!

Using Kaiju

To find out of how to use Kaiju you can just type in kaiju after loading the module to print out a help text to get you started:

Kaiju 1.9.2
Copyright 2015-2022 Peter Menzel, Anders Krogh
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

Usage:
   kaiju -t nodes.dmp -f kaiju_db.fmi -i reads.fastq [-j reads2.fastq]

Mandatory arguments:
   -t FILENAME   Name of nodes.dmp file
   -f FILENAME   Name of database (.fmi) file
   -i FILENAME   Name of input file containing reads in FASTA or FASTQ format

Optional arguments:
   -j FILENAME   Name of second input file for paired-end reads
   -o FILENAME   Name of output file. If not specified, output will be printed to STDOUT
   -z INT        Number of parallel threads for classification (default: 1)
   -a STRING     Run mode, either "mem"  or "greedy" (default: greedy)
   -e INT        Number of mismatches allowed in Greedy mode (default: 3)
   -m INT        Minimum match length (default: 11)
   -s INT        Minimum match score in Greedy mode (default: 65)
   -E FLOAT      Minimum E-value in Greedy mode (default: 0.01)
   -x            Enable SEG low complexity filter (enabled by default)
   -X            Disable SEG low complexity filter
   -p            Input sequences are protein sequences
   -v            Enable verbose output

The kaiju command is used to classify reads with the Kaiju software. Here are some examples:

Build database:

kaiju-makedb -s viruses -t 2

Creates database with which we will execute the example commands:

Classify single-end Illumina reads:

kaiju -t nodes.dmp -f custom_db.faa -i reads.fastq.gz -o kaiju_output.txt

This command classifies single-end Illumina reads in file "reads.fastq.gz" using the custom database built with "kaiju-build" and outputs the results to "kaiju_output.txt".

Classify paired-end Illumina reads:

kaiju -t nodes.dmp -f custom_db.faa -i reads_1.fastq.gz -j reads_2.fastq.gz -o kaiju_output.txt

This command classifies paired-end Illumina reads in files "reads_1.fastq.gz" and "reads_2.fastq.gz" using the custom database and outputs the results to "kaiju_output.txt".

Classify reads with a specified minimum length:

kaiju -t nodes.dmp -f custom_db.faa -i reads.fastq.gz -o kaiju_output.txt -l 100

This command classifies reads in file "reads.fastq.gz" using the custom database, but only considers reads with a minimum length of 100 nucleotides, specified with the "-l" option.

Classify reads with a specified number of mismatches:

kaiju -t nodes.dmp -f custom_db.faa -i reads.fastq.gz -o kaiju_output.txt -m 2

This command classifies reads in file "reads.fastq.gz" using the custom database, but allows for up to 2 mismatches per read, specified with the "-m" option.

Besides the kaju command, there are more binaries to find and use inside the installation directory:

$ ls $EBROOTKAIJU/bin
kaiju        kaiju2table          kaiju-convertMAR.py  kaiju-excluded-accessions.txt  kaiju-makedb        kaiju-mkbwt  kaiju-multi  kaiju-taxonlistEuk.tsv
kaiju2krona  kaiju-addTaxonNames  kaiju-convertNR      kaiju-gbk2faa.pl               kaiju-mergeOutputs  kaiju-mkfmi  kaijup       kaijux

Documentation

The full documentation can be found here.

Kaiju 2016

Contents

Introduction

Installed version(s)

Loading Kaiju

Using Kaiju

Documentation

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Topics

Tools