Kraken 2016

From HPC users
Jump to navigationJump to search

Introduction

Kraken 2 is the newest version of Kraken, a taxonomic classification system using exact k-mer matches to achieve high accuracy and fast classification speeds. This classifier matches each k-mer within a query sequence to the lowest common ancestor (LCA) of all genomes containing the given k-mer. The k-mer assignments inform the classification algorithm. (see: Kraken 1's Webpage for more details). 1

Installed version(s)

The following versions are installed and currently available...

... on environment hpc-env/8.3:

  • name/version-GCC-8.3.0
  • Kraken2/2.1.2-gompi-2019b
  • Kraken2/2.1.1-gompi-2019b

Loading / Using Kraken2

To load the desired version of the module, use the module load command, e.g.

module load hpc-env/8.3
module load Kraken2

Always remember: this command is case sensitive!


To find out on how to use Kraken2you can just type in kraken2 without any arguments to print out a help text to get you started:

$ kraken2
Need to specify input filenames!
Usage: kraken2 [options] <filename(s)>

Options:
  --db NAME               Name for Kraken 2 DB
                          (default: none)
  --threads NUM           Number of threads (default: 1)
  --quick                 Quick operation (use first hit or hits)
  --unclassified-out FILENAME
                          Print unclassified sequences to filename
  --classified-out FILENAME
                          Print classified sequences to filename
  --output FILENAME       Print output to filename (default: stdout); "-" will
                          suppress normal output
  --confidence FLOAT      Confidence score threshold (default: 0.0); must be
                          in [0, 1].
  --minimum-base-quality NUM
                          Minimum base quality used in classification (def: 0,
                          only effective with FASTQ input).
  --report FILENAME       Print a report with aggregrate counts/clade to file
  --use-mpa-style         With --report, format report output like Kraken 1's
                          kraken-mpa-report
  --report-zero-counts    With --report, report counts for ALL taxa, even if
                          counts are zero
  --report-minimizer-data With --report, report minimizer and distinct minimizer
                          count information in addition to normal Kraken report
  --memory-mapping        Avoids loading database into RAM
  --paired                The filenames provided have paired-end reads
  --use-names             Print scientific names instead of just taxids
  --gzip-compressed       Input files are compressed with gzip
  --bzip2-compressed      Input files are compressed with bzip2
  --minimum-hit-groups NUM
                          Minimum number of hit groups (overlapping k-mers
                          sharing the same minimizer) needed to make a call
                          (default: 2)
  --help                  Print this message
  --version               Print version information



Kraken2 also comes with a hand full of scripts which can be fount at the bin directory:


$ ls $EBROOTKRAKEN2/bin

16S_gg_installation.sh     build_gg_taxonomy.pl     clean_db.sh                  estimate_capacity  lookup_accession_numbers     scan_fasta_file.pl
16S_rdp_installation.sh    build_kraken2_db.sh      cp_into_tempfile.pl          kraken2            lookup_accession_numbers.pl  standard_installation.sh
16S_silva_installation.sh  build_rdp_taxonomy.pl    download_genomic_library.sh  kraken2-build      make_seqid2taxid_map.pl
add_to_library.sh          build_silva_taxonomy.pl  download_taxonomy.sh         kraken2-inspect    mask_low_complexity.sh
build_db                   classify                 dump_table                   kraken2lib.pm      rsync_from_ncbi.pl

Documentation

The full documentation can be found here and at the projects' github page.