Difference between revisions of "CD-HIT"

From HPC users
Jump to navigationJump to search
 
(6 intermediate revisions by 2 users not shown)
Line 5: Line 5:
== Installed Version ==
== Installed Version ==


The currently installed version is 4.6.4.
The following versions are installed and currently available... <br/>


== Using BLAT with the HPC Cluster ==
... on environment ''hpc-env/8.3'':
*'''CD-HIT/4.8.1'''-GCC-8.3.0
*'''CD-HIT/4.8.1'''-iccifort-2019b


If you want to find out more about BLAT on the HPC Cluster, you can use the command


  module spider blat
... on environment ''hpc-uniol-env''
*'''CD-HIT/4.6.4'''
 
== Using CD-HIT with the HPC Cluster ==
 
If you want to find out more about CD-HIT on the HPC Cluster, you can use the command
 
  module spider cd-hit


This will show you basic informations e.g. a short description and the currently installed version.
This will show you basic informations e.g. a short description and the currently installed version.
Line 17: Line 25:
To load the desired version of the module, use the command, e.g.
To load the desired version of the module, use the command, e.g.


  module load BLAT
  module load CD-HIT/4.6.4-foss-2016b-2015-0603


Always remember: this command is case sensitive!
Always remember: this command is case sensitive!


After loading the module, you can run BLAT by using following command
After loading the module, you can run CD-HIT by using following command
 
blat <database> <query> [-ooc=11.ooc] output.psl


By adding the ''-occ'' flag you will most likely speed up your search, although you might not find certain sequences. If you can afford extra processing time, you may want to run blat without the ''-ooc=11.ooc'' flag if your particular situation warrants its use.
cd-hit [Options]


== Documentation ==
== Documentation ==


The full documentation can be found [http://weizhong-lab.ucsd.edu/cd-hit/ here].
The project page can be found [http://weizhongli-lab.org/cd-hit/ here].
Although, the most recent information can be found on [https://github.com/weizhongli/cdhit github] since the project moved there.

Latest revision as of 13:40, 9 September 2021

Introduction

CD-HIT was originally a protein clustering program. The main advantage of this program is its ultra-fast speed. It can be hundreds of times than other clustering programs, for example BLASTCLUST. Therefore it can handle very large databases, like NR. The first version was released in 2001, the second version was publishes in 2002 with significant improvements. Since 2004 its hosted at bioinformatics.org as an open source project which helped to develop the program even more. Its still under active development and there will be new features and programs out in the future.

Installed Version

The following versions are installed and currently available...

... on environment hpc-env/8.3:

  • CD-HIT/4.8.1-GCC-8.3.0
  • CD-HIT/4.8.1-iccifort-2019b


... on environment hpc-uniol-env

  • CD-HIT/4.6.4

Using CD-HIT with the HPC Cluster

If you want to find out more about CD-HIT on the HPC Cluster, you can use the command

module spider cd-hit

This will show you basic informations e.g. a short description and the currently installed version.

To load the desired version of the module, use the command, e.g.

module load CD-HIT/4.6.4-foss-2016b-2015-0603

Always remember: this command is case sensitive!

After loading the module, you can run CD-HIT by using following command

cd-hit [Options]

Documentation

The project page can be found here. Although, the most recent information can be found on github since the project moved there.