Difference between revisions of "CellRanger 2016"
Line 101: | Line 101: | ||
</pre> | </pre> | ||
A simple test command can be executed with | |||
cellranger testrun --id=tiny | |||
which runs for a few minutes and creates a directory <tt>tiny</tt> containing the results from the test run. | |||
== Job Script vs. Cluster Mode == | |||
There are in principle two options to run CellRanger on the cluster: | |||
```1. Using a Job Script``` | |||
As [https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/job-submission-mode explained in the documentaion], you can use a standard Slurm job script to run CellRanger on a compute node. For the test run above the job script could look like this: | |||
<pre> | |||
#!/usr/bin/env bash | |||
# ============================================================================= | |||
# Slurm Options (mmodify to your needs) | |||
# ============================================================================= | |||
#SBATCH -J CellRangerTestrun | |||
#SBATCH --partition carl.p | |||
#SBATCH --time 0-24:00:00 # time format d-hh:mm:ss | |||
#SBATCH --nodes=1 --ntasks=1 # do not change | |||
#SBATCH --cpus-per-task=4 # adjust as needed | |||
#SBATCH --signal=2 | |||
#SBATCH --no-requeue | |||
#SBATCH --mem=20G # adjust as needed | |||
#SBATCH -o CellRanger_%j.out # log file for STDOUT, %j is job id | |||
#SBATCH -e CellRanger_%j.err # log file for STDERR | |||
# calculate memory limit in GB | |||
MEM_GB=$((9*SLURM_MEM_PER_NODE/10240)) | |||
# pipeline command (replace with the command you would like to run) | |||
# keep the options --jobmode and --local* | |||
cellranger testrun --id=tiny --jobmode=local --localcores=${SLURM_CPUS_PER_TASK} --localmem=${MEM_GB} | |||
</pre> | |||
The job script above can be saved e.g. as <tt>CellRanger_testrun.sh</tt> and then submitted with | |||
sbatch CellRanger_testrun.sh | |||
For real applications you can replace the testrun command with the pipeline command you want to run. The <tt>--jobmode</tt> and the <tt>--local*</tt> options allow CellRanger to use the resources allocated for the job. The values are taken automatically from the <tt>SBATCH</tt>-options for <tt>--cpus-per-task</tt> and <tt>--mem</tt> and can be adjusted there as needed. Note, that you may need to find an optimal number of <tt>--cpus-per-core</tt> for different steps of a pipeline by running some benchmark tests. | |||
== Documentation == | == Documentation == | ||
More information and a tutorial can be found [https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/tutorial_ov here]. | More information and a tutorial can be found [https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/tutorial_ov here]. |
Revision as of 13:58, 21 August 2023
Introduction
Cell Ranger is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more. Cell Ranger includes five pipelines relevant to the 3' and 5' Single Cell Gene Expression Solutions and related products:
- cellranger mkfastq demultiplexes raw base call (BCL) files generated by Illumina sequencers into FASTQ files. It is a wrapper around Illumina's bcl2fastq, with additional features that are specific to 10x libraries and a simplified sample sheet format.
- cellranger count takes FASTQ files from cellranger mkfastq and performs alignment, filtering, barcode counting, and UMI counting. It uses the Chromium cellular barcodes to generate feature-barcode matrices, determine clusters, and perform gene expression analysis. The count pipeline can take input from multiple sequencing runs on the same GEM well. cellranger count also processes Feature Barcode data alongside Gene Expression reads.
- cellranger aggr aggregates outputs from multiple runs of cellranger count, normalizing those runs to the same sequencing depth and then recomputing the feature-barcode matrices and analysis on the combined data. The aggr pipeline can be used to combine data from multiple samples into an experiment-wide feature-barcode matrix and analysis.
- cellranger reanalyze takes feature-barcode matrices produced by cellranger count or cellranger aggr and reruns the dimensionality reduction, clustering, and gene expression algorithms using tunable parameter settings.
- cellranger multi is used to analyze Cell Multiplexing data. It inputs FASTQ files from cellranger mkfastq and performs alignment, filtering, barcode counting, and UMI counting. It uses the Chromium cellular barcodes to generate feature-barcode matrices, determine clusters, and perform gene expression analysis. The cellranger multi pipeline also supports the analysis of Feature Barcode data. 1
Installed version(s)
The following version is currently available...
... on environment hpc-env/8.3:
- CellRanger/6.1.1
- CellRanger/7.1.0
... on environment hpc-env/6.4:
- CellRanger/6.1.1
- CellRanger/7.1.0
... on environment hpc-uniol-env:
- CellRanger/6.1.1
- CellRanger/7.1.0
Loading / Using CellRanger
To load the desired version of the module, use the module load command, e.g.
module load hpc-env/8.3 module load CellRanger/6.1.1
Always remember: this command is case sensitive!
To find out on how to use CellRanger you can just type in cellranger -h to print out a help text to get you started:
$ cellranger -h cellranger cellranger-6.1.1 Process 10x Genomics Gene Expression, Feature Barcode, and Immune Profiling data USAGE: cellranger <SUBCOMMAND> FLAGS: -h, --help Prints help information -V, --version Prints version information SUBCOMMANDS: count Count gene expression (targeted or whole- transcriptome) and/or feature barcode reads from a single sample and GEM well multi Analyze multiplexed data or combined gene expression/immune profiling/feature barcode data vdj Assembles single-cell VDJ receptor sequences from 10x Immune Profiling libraries aggr Aggregate data from multiple Cell Ranger runs reanalyze Re-run secondary analysis (dimensionality reduction, clustering, etc) targeted-compare Analyze targeted enrichment performance by comparing a targeted sample to its cognate parent WTA sample (used as input for targeted gene expression) targeted-depth Estimate targeted read depth values (mean reads per cell) for a specified input parent WTA sample and a target panel CSV file mkvdjref Prepare a reference for use with CellRanger VDJ mkfastq Run Illumina demultiplexer on sample sheets that contain 10x-specific sample index sets testrun Execute the 'count' pipeline on a small test dataset mat2csv Convert a gene count matrix to CSV format mkref Prepare a reference for use with 10x analysis software. Requires a GTF and FASTA mkgtf Filter a GTF file by attribute prior to creating a 10x reference upload Upload analysis logs to 10x Genomics support sitecheck Collect linux system configuration information help Prints this message or the help of the given subcommand(s)
Additionally, we included some reference files (References - 2020-A (July 7, 2020)) which you can find inside the folder called data which can be found at software path $EBROOTCELLRANGER.
To make the file access easier for you, we created the environment variable $CELLRANGER_DATA which leads to the files directory:
$ ls $CELLRANGER_DATA chromium-shared-sample-indexes-plate.csv chromium-shared-sample-indexes-plate.json chromium-single-cell-sample-indexes-plate-v1.csv chromium-single-cell-sample-indexes-plate-v1.json gemcode-single-cell-sample-indexes-plate.csv gemcode-single-cell-sample-indexes-plate.json refdata-gex-GRCh38-2020-A refdata-gex-GRCh38-and-mm10-2020-A refdata-gex-mm10-2020-A
A simple test command can be executed with
cellranger testrun --id=tiny
which runs for a few minutes and creates a directory tiny containing the results from the test run.
Job Script vs. Cluster Mode
There are in principle two options to run CellRanger on the cluster:
```1. Using a Job Script``` As explained in the documentaion, you can use a standard Slurm job script to run CellRanger on a compute node. For the test run above the job script could look like this:
#!/usr/bin/env bash # ============================================================================= # Slurm Options (mmodify to your needs) # ============================================================================= #SBATCH -J CellRangerTestrun #SBATCH --partition carl.p #SBATCH --time 0-24:00:00 # time format d-hh:mm:ss #SBATCH --nodes=1 --ntasks=1 # do not change #SBATCH --cpus-per-task=4 # adjust as needed #SBATCH --signal=2 #SBATCH --no-requeue #SBATCH --mem=20G # adjust as needed #SBATCH -o CellRanger_%j.out # log file for STDOUT, %j is job id #SBATCH -e CellRanger_%j.err # log file for STDERR # calculate memory limit in GB MEM_GB=$((9*SLURM_MEM_PER_NODE/10240)) # pipeline command (replace with the command you would like to run) # keep the options --jobmode and --local* cellranger testrun --id=tiny --jobmode=local --localcores=${SLURM_CPUS_PER_TASK} --localmem=${MEM_GB}
The job script above can be saved e.g. as CellRanger_testrun.sh and then submitted with
sbatch CellRanger_testrun.sh
For real applications you can replace the testrun command with the pipeline command you want to run. The --jobmode and the --local* options allow CellRanger to use the resources allocated for the job. The values are taken automatically from the SBATCH-options for --cpus-per-task and --mem and can be adjusted there as needed. Note, that you may need to find an optimal number of --cpus-per-core for different steps of a pipeline by running some benchmark tests.
Documentation
More information and a tutorial can be found here.