BCFtools

From HPC users
Revision as of 11:14, 22 July 2022 by Schwietzer (talk | contribs) (→‎Using BCFtools)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Introduction

BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed.

Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations. In general, whenever multiple VCFs are read simultaneously, they must be indexed and therefore also compressed.

BCFtools is designed to work on a stream. It regards an input file "-" as the standard input (stdin) and outputs to the standard output (stdout). Several commands can thus be combined with Unix pipes.

Installed version

These versions are installed and and currently available...

... on envirnoment hpc-uniol-env:

BCFtools/1.3.1

... on environment hpc-env/6.4:

BCFtools/1.6-intel-2018a

... on environment hpc-env/8.3:

BCFtools/1.9-foss-2019b
BCFtools/1.15.1-GCC-8.3.0

List of available commands

For a full list of available commands, run bcftools without arguments. For a full list of available options, run bcftools COMMAND (eg. "bcftools annotate") without arguments.

  • annotate: edit VCF files, add or remove annotations
  • call: SNP/indel calling (former "view")
  • cnv: Copy Number Variation caller
  • concat: concatenate VCF/BCF files from the same set of samples
  • consensus: create consensus sequence by applying VCF variants
  • convert: convert VCF/BCF to other formats and back
  • csq: haplotype aware consequence caller
  • filter: filter VCF/BCF files using fixed thresholds
  • gtcheck: check sample concordance, detect sample swaps and contamination
  • index: index VCF/BCF
  • isec: intersections of VCF/BCF files
  • merge: merge VCF/BCF files files from non-overlapping sample sets
  • mpileup: multi-way pileup producing genotype likelihoods
  • norm: normalize indels
  • plugin: run user-defined plugin
  • polysomy: detect contaminations and whole-chromosome aberrations
  • query: transform VCF/BCF into user-defined formats
  • reheader: modify VCF/BCF header, change sample names
  • roh: identify runs of homo/auto-zygosity
  • stats: produce VCF/BCF stats (former vcfcheck)
  • view: subset, filter and convert VCF and BCF files

Using BCFtools

If you want to find out more about BCFtools on the HPC cluster, you can use the command

module spider bcftools

This will show you basic informations e.g. a short description and the currently installed version.

To load the desired version of the module, use the command

module load hpc-env/8.3
module load BCFtools/1.15.1-GCC-8.3.0

Always remember: this command is case sensitive!

After loading the module, you can use the program with

bcftools <command> <argument>

Using BCFtools with the HPC cluster

Since there many people working with the HPC cluster, its important that everyone has an equal chance to do so. Therefore, every job should be processed by SLURM.

For this reason, you have to create a jobscript for your tasks.

Example:

#!/bin/bash
               
#SBATCH --ntasks=1                  
#SBATCH --mem=2G                  
#SBATCH --time=0-2:00  
#SBATCH --job-name=BCFTOOLS-TEST              
#SBATCH --output=bcftools-test.%j.out        
#SBATCH --error=bcftools-test.%j.err          
 
module load BCFtools/1.3.1-intel-2016b
bcftools -l bcftools-testfile.vcf

This will output list of sites (chr pos) or regions (BED) to a file named like bcftools-test.JOBID.out. Possible errors would have been written to bcftools-test.JOBID.err.

Documentation

The full documentation can be found here.