Difference between revisions of "Ipyrad"
Line 46: | Line 46: | ||
== Using ipyrad on CARL == | == Using ipyrad on CARL == | ||
=== Preparations and First Tests === | |||
Following the [https://ipyrad.readthedocs.io/en/latest/tutorial_intro_cli.html Introductory tutorial to the CLI] you can start by downloading some test data and creating a parameter file in a new directory under <tt>$WORK</tt>: | Following the [https://ipyrad.readthedocs.io/en/latest/tutorial_intro_cli.html Introductory tutorial to the CLI] you can start by downloading some test data and creating a parameter file in a new directory under <tt>$WORK</tt>: | ||
Line 57: | Line 59: | ||
(ipyrad) [carl]$ ipyrad -n iptest | (ipyrad) [carl]$ ipyrad -n iptest | ||
New file 'params-iptest.txt' created in /gss/work/lees4820/ipyrad | New file 'params-iptest.txt' created in /gss/work/lees4820/ipyrad | ||
The resulting file <tt>params-iptest.txt has to be opened in a text editor to add the locations of the raw non-demultiplexed fastq file and the barcodes file. With the test data the first couple of lines should look like this: | The resulting file <tt>params-iptest.txt</tt> has to be opened in a text editor to add the locations of the raw non-demultiplexed fastq file and the barcodes file. With the test data the first couple of lines should look like this: | ||
(ipyrad) [carl]$ $ head params-iptest.txt | (ipyrad) [carl]$ $ head params-iptest.txt | ||
------- ipyrad params file (v.0.9.53)------------------------------------------- | ------- ipyrad params file (v.0.9.53)------------------------------------------- | ||
iptest ## [0] [assembly_name]: Assembly name. Used to name output directories for assembly steps | iptest ## [0] [assembly_name]: Assembly name. Used to name output directories for assembly steps | ||
/gss/work/ | /gss/work/abcd1234/ipyrad ## [1] [project_dir]: Project dir (made in curdir if not present) | ||
./ipsimdata/rad_example_R1_.fastq.gz ## [2] [raw_fastq_path]: Location of raw non-demultiplexed fastq files | ./ipsimdata/rad_example_R1_.fastq.gz ## [2] [raw_fastq_path]: Location of raw non-demultiplexed fastq files | ||
./ipsimdata/rad_example_barcodes.txt ## [3] [barcodes_path]: Location of barcodes file | ./ipsimdata/rad_example_barcodes.txt ## [3] [barcodes_path]: Location of barcodes file | ||
## [4] [sorted_fastq_path]: Location of demultiplexed/sorted fastq files | ## [4] [sorted_fastq_path]: Location of demultiplexed/sorted fastq files | ||
denovo ## [5] [assembly_method]: Assembly method (denovo, reference) | denovo ## [5] [assembly_method]: Assembly method (denovo, reference) | ||
A simple test can be performed with this command: | |||
(ipyrad) [carl]$ ipyrad -p params-iptest.txt -s 1 -c 8 | |||
------------------------------------------------------------- | |||
ipyrad [v.0.9.53] | |||
Interactive assembly and analysis of RAD-seq data | |||
------------------------------------------------------------- | |||
Parallel connection | hpcl004: 8 cores | |||
Step 1: Demultiplexing fastq data to Samples | |||
The program performs the first step of a work flow (<tt>-s 1</tt>) using a total of 8 cores (<tt>-c 8</tt>) on the login node. We can now remove the newly created data files and directories with | |||
(ipyrad) [carl]$ rm -r iptest_fastqs/ iptest.json | |||
to avoid error message in the next steps. |
Revision as of 15:21, 15 June 2020
Introduction
The software ipyrad is an interactive toolkit for assembly and analysis of restriction-site associated genomic data sets (e.g., RAD, ddRAD, GBS) for population genetic and phylogenetic studies. [1]
At the moment, there is no central installation of ipyrad, however, you can easily install it yourself using Anaconda3 as described below.
Installation
To install ipyrad you first need to load a module for Anaconda3. In this example, we use Anaconda3/2020.02 which can be found in hpc-env/8.3 (if you want to use a different version/environment you can search with module av Anaconda3 or module spider Anaconda3):
[carl]$ module load hpc-env/8.3 [carl]$ module load Anaconda/2020.02
The next step is to create a new environment for ipyrad with the command:
[carl]$ conda create --name ipyrad Collecting package metadata (current_repodata.json): done Solving environment: done ## Package Plan ## environment location: /user/abcd1234/.conda/envs/ipyrad Proceed ([y]/n)? Preparing transaction: done Verifying transaction: done Executing transaction: done
The name for the environment can be freely chosen and it will be created after you have confirmed to proceed with pressing (y and)enter. You may see a warning about an outdated conda which you can safely ignore (or, if you wish, you can switch to a newer module of Anaconda3 if available).
The new environment can now be activated. We recommend using the command(*):
[carl]$ source activate ipyrad (ipyrad) [carl]$
You will notice the change of the command-line prompt to indicate the active environment. Packages that are now installed with conda install will be installed in this environment and not interfere with other software installations.
(*) The alternative conda activate requires you to use the command conda init bash first which modifies your .bashrc and more or less forces you to always use the same version of Anaconda3.
Now you can install ipyrad along with a the package mpi4py for parallel computing?
(ipyrad) [carl]$ conda install ipyrad -c bioconda (ipyrad) [carl]$ conda install mpi4py -c conda-forge
These commands will take a moment to complete but after that ipyrad is ready to use. And next time you log in or in a job script you only need the commands
[carl]$ module load hpc-env/8.3 [carl]$ module load Anaconda/2020.02 [carl]$ source activate ipyrad
to get started. If you want to leave the environment you can always type
(ipyrad) [carl]$ conda deactivate
which should return you to the normal command-line prompt.
Using ipyrad on CARL
Preparations and First Tests
Following the Introductory tutorial to the CLI you can start by downloading some test data and creating a parameter file in a new directory under $WORK:
(ipyrad) [carl]$ mkdir $WORK/ipyrad_test (ipyrad) [carl]$ cd $WORK/ipyrad_test (ipyrad) [carl]$ curl -LkO https://eaton-lab.org/data/ipsimdata.tar.gz % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 11.8M 100 11.8M 0 0 8514k 0 0:00:01 0:00:01 --:--:-- 8508k (ipyrad) [carl]$ tar -xzf ipsimdata.tar.gz (ipyrad) [carl]$ ipyrad -n iptest New file 'params-iptest.txt' created in /gss/work/lees4820/ipyrad
The resulting file params-iptest.txt has to be opened in a text editor to add the locations of the raw non-demultiplexed fastq file and the barcodes file. With the test data the first couple of lines should look like this:
(ipyrad) [carl]$ $ head params-iptest.txt ------- ipyrad params file (v.0.9.53)------------------------------------------- iptest ## [0] [assembly_name]: Assembly name. Used to name output directories for assembly steps /gss/work/abcd1234/ipyrad ## [1] [project_dir]: Project dir (made in curdir if not present) ./ipsimdata/rad_example_R1_.fastq.gz ## [2] [raw_fastq_path]: Location of raw non-demultiplexed fastq files ./ipsimdata/rad_example_barcodes.txt ## [3] [barcodes_path]: Location of barcodes file ## [4] [sorted_fastq_path]: Location of demultiplexed/sorted fastq files denovo ## [5] [assembly_method]: Assembly method (denovo, reference)
A simple test can be performed with this command:
(ipyrad) [carl]$ ipyrad -p params-iptest.txt -s 1 -c 8 ------------------------------------------------------------- ipyrad [v.0.9.53] Interactive assembly and analysis of RAD-seq data ------------------------------------------------------------- Parallel connection | hpcl004: 8 cores Step 1: Demultiplexing fastq data to Samples
The program performs the first step of a work flow (-s 1) using a total of 8 cores (-c 8) on the login node. We can now remove the newly created data files and directories with
(ipyrad) [carl]$ rm -r iptest_fastqs/ iptest.json
to avoid error message in the next steps.