BEDTools

From HPC users
Revision as of 12:58, 28 June 2019 by Schwietzer (talk | contribs) (→‎Installed version)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Introduction

Collectively, the bedtools utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetic: that is, set theory on the genome. For example, bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF. While each individual tool is designed to do a relatively simple task (e.g., intersect two interval files), quite sophisticated analyses can be conducted by combining multiple bedtools operations on the UNIX command line.

bedtools is developed in the Quinlan laboratory at the University of Utah and benefits from fantastic contributions made by scientists worldwide.

Installed version

The currently installed version is 2.26.0 (on hpc-uniol-env).

Using BEDtools

If you want to find out more about BEDtools on the HPC Cluster, you can use the command

module spider bedtools

This will show you basic informations e.g. a short description and the currently installed version.

To load the desired version of the module, use the command

module load BEDTools/2.26.0-intel-2016b 

Always remember: this command is case sensitive!

After loading the module, you can use the program with

bedtools <subcommand> [options]

Example:

If you want to sort the intervals in your .bed file you can do that by using following command

bedtools sort -i input.bed

Using BEDtools with the HPC Cluster

Since there many people working with the HPC cluster, its important that everyone has an equal chance to do so. Therefore, every job should be processed by SLURM.

For this reason, you have to create a jobscript for your tasks. This is an example file for a simple BEDTools-task:

#!/bin/bash
               
#SBATCH --ntasks=1                  
#SBATCH --mem=2G                  
#SBATCH --time=0-2:00  
#SBATCH --job-name BEDTOOLS-TEST              
#SBATCH --output=bedtools-test.%j.out        
#SBATCH --error=bedtools-test.%j.err          
 
module load BEDTools/2.26.0-intel-2016b  
bedtools intersect -a TESTINPUT_1.bed -b TESTINPUT_2.bed

This will intersect both given files (TESTINPUT_1.bed and TESTINPUT_2.bed) to BEDTools and write the output to a file named like bcftools-test.JOBID.out. Possible errors would have been written to bcftools-test.JOBID.err.

Documentation

The full documentation can be found here.