Bam to mate hist 2016

From HPC users
Jump to navigationJump to search

bam_to_mate_hist

Introduction

This script is intended as a simple QC method for Hi-C libraries, based on reads in a BAM file aligned to some genome/assembly.

The most informative Hi-C reads are the ones that are long-distance contacts, or contacts between contigs of an assembly. This tool quantifies such contacts and makes plots of contact distance distributions. The most successful Hi-C libraries have many long-distance and among-contig contacts.

Hi-C connectivity drops off in approximately a power-law with increasing linear sequence distance. Consequently, one expects Hi-C reads to follow a characteristic distribution, wherein there is a spike of many read pairs at distances close to zero, which drops off smoothly (in log space) with increasing distance. If there are odd spikes or discontinuities, or if there are few long-distance contacts, there may be a problem either with the library or the assembly.

Installed version

The script is installed as a module within the Environment hpc-env/6.4 and includes all the required dependencies. Since bam_to_mate_hist is basically just a script, there is no versioning. This is why we added the date of the last change on github as the version:

bam_to_mate_hist/2018.09-intel-2018a-Python-2.7.14
bam_to_mate_hist/2018.11-intel-2018a-Python-2.7.14

Using bam_to_mate_hist on the HPC cluster

To use the script, just change to the corresponding environment and load the module:

module load hpc-env/6.4
module load  bam_to_mate_hist

Now you can easily use the script using the following statement:

bam_to_mate_hist <arguments>

Documentation

You can find more documentation regarding bam_to_mate_hist and how it can be used here.