Juicer 2016

From HPC users
Jump to navigationJump to search

Introduction

Juicer is a platform for analyzing kilobase resolution Hi-C data. This distribution includes the pipeline for generating Hi-C maps from fastq raw data files and command line tools for feature annotation on the Hi-C maps. 1

On our cluster CARL, Juicer automatically loads the module Juicebox, which includes java executables for the visualization software for Hi-C data juicebox as well as for juicer_tools

Installed version(s)

The following versions are installed and currently available on environment hpc-env/8.3:

  • Juicebox/2.13.07-GCC-8.3.0-CUDA-11.4.2
  • Juicer/1.6-GCC-8.3.0-CUDA-11.4.2


Loading Juicer

To load the desired version of the module, use the module load command, e.g.

module load hpc-env/8.3
module load Juicer


Using Juicetools & Juicebox

As stated above, Juicer automatically loads Juicebox as a dependent module. This dependency contains two java files ( juicebox and juicer_tools) as well as two shell executables, which make using the java files easyly accessible.

To run the visualization tool juicebox, you only have to load the module and type in juicebox , assuming you are logged in with X11 display support to forward the GUI from the cluster to your screen.

juicer_tools is a command line tool which prints out the following help text to get you started:

$ juicer_tools
WARN [2021-11-23T18:23:46,540]  [Globals.java:138] [main]  Development mode is enabled
Juicer Tools Version 2.13.07
Usage:
	dump <observed/oe> <NONE/VC/VC_SQRT/KR> <hicFile(s)> <chr1>[:x1:x2] <chr2>[:y1:y2] <BP/FRAG> <binsize> [outfile]
	dump <norm/expected> <NONE/VC/VC_SQRT/KR> <hicFile(s)> <chr> <BP/FRAG> <binsize> [outfile]
	dump <loops/domains> <hicFile URL> [outfile]
	pre [options] <infile> <outfile> <genomeID>
	addNorm <input_HiC_file> [input_vector_file]
	pearsons [-p] <NONE/VC/VC_SQRT/KR> <hicFile(s)> <chr> <BP/FRAG> <binsize> [outfile]
	eigenvector -p <NONE/VC/VC_SQRT/KR> <hicFile(s)> <chr> <BP/FRAG> <binsize> [outfile]
	apa <hicFile(s)> <PeaksFile> <SaveFolder>
	arrowhead <hicFile(s)> <output_file>
	hiccups <hicFile> <outputDirectory>
	hiccupsdiff <firstHicFile> <secondHicFile> <firstLoopList> <secondLoopList> <outputDirectory>
	validate <hicFile>
	-h, --help print help
	-v, --verbose verbose mode
	-V, --version print version
Type juicer_tools <commandName> for more detailed usage instructions

Using Juicer

To start off, it must be noted that Juicer is mostly based on highly individualized shell scripts which were designed to run on a specific machine that isn't our CARL or EDDY cluster. Additionally, Juicer is made to be built preferrably inside of a user directory. In practise, this means to our users, that the scripts might not fit out of the box and need partial adjustments to make them usable on our cluster. Although we tried to adjust some standard paths to the right target files and folders, some scripts might point to non existing paths. Keeping that in mind, here's how Juicer works on our cluster:


When loading our Juicer module the first time, it will create a folder in your $HOME directory, called juiceDir, cointaining this subdirectories: fastq, references, scripts and restriction_sites. At scripts you will find a compilation of different, already slightly adjusted scripts for different tasks. Most of the scripts will include a help function (mostly callable by <script.sh> -h or <script.sh> --help ) and should be called from within your $HOME/juiceDir directory.

Loosely following this git instructions, we will show you a very small example on how to use Juicer:

First things first, (1) load the module and (2) enter the (newly created) juiceDir references folder and (3) downloading test reference files to work with:

ml hpc-env/8.3 Juicer                            #(1)
cd ~/juiceDir/references                         #(2)
wget https://s3.amazonaws.com/juicerawsmirror/opt/juicer/references/Homo_sapiens_assembly19.fasta       #(3)
wget https://s3.amazonaws.com/juicerawsmirror/opt/juicer/references/Homo_sapiens_assembly19.fasta.amb
wget https://s3.amazonaws.com/juicerawsmirror/opt/juicer/references/Homo_sapiens_assembly19.fasta.ann
wget https://s3.amazonaws.com/juicerawsmirror/opt/juicer/references/Homo_sapiens_assembly19.fasta.bwt
wget https://s3.amazonaws.com/juicerawsmirror/opt/juicer/references/Homo_sapiens_assembly19.fasta.pac
wget https://s3.amazonaws.com/juicerawsmirror/opt/juicer/references/Homo_sapiens_assembly19.fasta.sa

Also, we need some fasq files to work with, so we (4) cd into juiceDir/fastq, (5) download test files, and (6) cd back into juiceDir:

cd ../fastq                                     #(4)
wget http://juicerawsmirror.s3.amazonaws.com/opt/juicer/work/HIC003/fastq/HIC003_S2_L001_R1_001.fastq.gz  #(5)
wget http://juicerawsmirror.s3.amazonaws.com/opt/juicer/work/HIC003/fastq/HIC003_S2_L001_R2_001.fastq.gz
cd ..                                           #(6)

Now to the last step of this simple how-to: (7) Start the script, aligning the fastq files to our fasta genome references:

scripts/juicer.sh -g hg19                       #(7)

Since the directory structure (~/juiceDir/references etc.) is built up this specific way, the scripts do know where to look for the right files. Naturally, when it comes to computing tasks with more complex file structures, you might need to tell the (juicer.sh) scripts where to find them and where to put the output files. This is mostly done by arguments proceeding the script execution, like we did with -g hg19 to tell juicer to look for the right files matching the genomeID.


Documentation

The full documentation is spread arount the different audenlabgit projects.

For Juicertools, you can visit this git wiki page, for Juicebox, you can find an equivalent wiki page here.