CellRanger 2016
Introduction
Cell Ranger is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more. Cell Ranger includes five pipelines relevant to the 3' and 5' Single Cell Gene Expression Solutions and related products:
- cellranger mkfastq demultiplexes raw base call (BCL) files generated by Illumina sequencers into FASTQ files. It is a wrapper around Illumina's bcl2fastq, with additional features that are specific to 10x libraries and a simplified sample sheet format.
- cellranger count takes FASTQ files from cellranger mkfastq and performs alignment, filtering, barcode counting, and UMI counting. It uses the Chromium cellular barcodes to generate feature-barcode matrices, determine clusters, and perform gene expression analysis. The count pipeline can take input from multiple sequencing runs on the same GEM well. cellranger count also processes Feature Barcode data alongside Gene Expression reads.
- cellranger aggr aggregates outputs from multiple runs of cellranger count, normalizing those runs to the same sequencing depth and then recomputing the feature-barcode matrices and analysis on the combined data. The aggr pipeline can be used to combine data from multiple samples into an experiment-wide feature-barcode matrix and analysis.
- cellranger reanalyze takes feature-barcode matrices produced by cellranger count or cellranger aggr and reruns the dimensionality reduction, clustering, and gene expression algorithms using tunable parameter settings.
- cellranger multi is used to analyze Cell Multiplexing data. It inputs FASTQ files from cellranger mkfastq and performs alignment, filtering, barcode counting, and UMI counting. It uses the Chromium cellular barcodes to generate feature-barcode matrices, determine clusters, and perform gene expression analysis. The cellranger multi pipeline also supports the analysis of Feature Barcode data. 1
Installed version(s)
The following version is currently available...
... on environment hpc-env/8.3:
- CellRanger/6.1.1
- CellRanger/7.1.0
... on environment hpc-env/6.4:
- CellRanger/6.1.1
- CellRanger/7.1.0
... on environment hpc-uniol-env:
- CellRanger/6.1.1
- CellRanger/7.1.0
Loading / Using CellRanger
To load the desired version of the module, use the module load command, e.g.
module load hpc-env/8.3 module load CellRanger/6.1.1
Always remember: this command is case sensitive!
To find out on how to use CellRanger you can just type in cellranger -h to print out a help text to get you started:
$ cellranger -h cellranger cellranger-6.1.1 Process 10x Genomics Gene Expression, Feature Barcode, and Immune Profiling data USAGE: cellranger <SUBCOMMAND> FLAGS: -h, --help Prints help information -V, --version Prints version information SUBCOMMANDS: count Count gene expression (targeted or whole- transcriptome) and/or feature barcode reads from a single sample and GEM well multi Analyze multiplexed data or combined gene expression/immune profiling/feature barcode data vdj Assembles single-cell VDJ receptor sequences from 10x Immune Profiling libraries aggr Aggregate data from multiple Cell Ranger runs reanalyze Re-run secondary analysis (dimensionality reduction, clustering, etc) targeted-compare Analyze targeted enrichment performance by comparing a targeted sample to its cognate parent WTA sample (used as input for targeted gene expression) targeted-depth Estimate targeted read depth values (mean reads per cell) for a specified input parent WTA sample and a target panel CSV file mkvdjref Prepare a reference for use with CellRanger VDJ mkfastq Run Illumina demultiplexer on sample sheets that contain 10x-specific sample index sets testrun Execute the 'count' pipeline on a small test dataset mat2csv Convert a gene count matrix to CSV format mkref Prepare a reference for use with 10x analysis software. Requires a GTF and FASTA mkgtf Filter a GTF file by attribute prior to creating a 10x reference upload Upload analysis logs to 10x Genomics support sitecheck Collect linux system configuration information help Prints this message or the help of the given subcommand(s)
Additionally, we included some reference files (References - 2020-A (July 7, 2020)) which you can find inside the folder called data which can be found at software path $EBROOTCELLRANGER.
To make the file access easier for you, we created the environment variable $CELLRANGER_DATA which leads to the files directory:
$ ls $CELLRANGER_DATA chromium-shared-sample-indexes-plate.csv chromium-shared-sample-indexes-plate.json chromium-single-cell-sample-indexes-plate-v1.csv chromium-single-cell-sample-indexes-plate-v1.json gemcode-single-cell-sample-indexes-plate.csv gemcode-single-cell-sample-indexes-plate.json refdata-gex-GRCh38-2020-A refdata-gex-GRCh38-and-mm10-2020-A refdata-gex-mm10-2020-A
A simple test command can be executed with
cellranger testrun --id=tiny
which runs for a few minutes and creates a directory tiny containing the results from the test run.
Job Script vs. Cluster Mode
There are in principle two options to run CellRanger on the cluster:
1. Using a Job Script As explained in the documentaion, you can use a standard Slurm job script to run CellRanger on a compute node. For the test run above the job script could look like this:
#!/usr/bin/env bash # ============================================================================= # Slurm Options (mmodify to your needs) # ============================================================================= #SBATCH -J CellRangerTestrun #SBATCH --partition carl.p #SBATCH --time 0-24:00:00 # time format d-hh:mm:ss #SBATCH --nodes=1 --ntasks=1 # do not change #SBATCH --cpus-per-task=4 # adjust as needed #SBATCH --signal=2 #SBATCH --no-requeue #SBATCH --mem=20G # adjust as needed #SBATCH -o CellRanger_%j.out # log file for STDOUT, %j is job id #SBATCH -e CellRanger_%j.err # log file for STDERR # calculate memory limit in GB MEM_GB=$((9*SLURM_MEM_PER_NODE/10240)) # pipeline command (replace with the command you would like to run) # keep the options --jobmode and --local* cellranger testrun --id=tiny --jobmode=local --localcores=${SLURM_CPUS_PER_TASK} --localmem=${MEM_GB}
The job script above can be saved e.g. as CellRanger_testrun.sh and then submitted with
sbatch CellRanger_testrun.sh
For real applications you can replace the testrun command with the pipeline command you want to run. The --jobmode and the --local* options allow CellRanger to use the resources allocated for the job. The values are taken automatically from the SBATCH-options for --cpus-per-task and --mem and can be adjusted there as needed. Note, that you may need to find an optimal number of --cpus-per-core for different steps of a pipeline by running some benchmark tests.
2. Using the Cluster Mode
Instead of writing a job script, you can also use CellRanger's built-in cluster mode. Here you execute the following commands on a login node:
screen # start a screen terminal (it allows you to disconnect from the cluster while the pipeline is running) module load CellRanger # maybe specify the version as well cellranger testrun --id=tiny --jobmode=slurm
This will start the testrun pipeline but now instead of running it locally, it will create around 200 smaller jobs which are submitted to the cluster. The job submission is done with a template in /cm/shared/uniol/scripts/CellRanger and CellRanger automatically sets the number of cores and memory to use. The example runs for almost an hour because of the overhead created from submitting the 200 jobs and the extra time it takes Slurm to manage the jobs. In real applications this might still be beneficial because several jobs can run in parallel with this approach, while in the job script above all steps are taken one after another.
While the pipeline is running, you can check the status in the terminal. You can also use the screen-mechanism to detach your session (press CTRL-A-D and logout from the cluster, but you need to remember which hpcl00x-login node you are connected to. When you login back to the cluster to the same login node, you can reattach you screen-session (screen -r) and check the progress).
If you want, you can also modify the template script. Copy the script from the location above to e.g. your $HOME, modify it and use the option --jobmode=$HOME/slurm.template. That way, you can for example use a different partition.
Documentation
More information and a tutorial can be found here.