CD-HIT
From HPC users
Introduction
CD-HIT was originally a protein clustering program. The main advantage of this program is its ultra-fast speed. It can be hundreds of times than other clustering programs, for example BLASTCLUST. Therefore it can handle very large databases, like NR. The first version was released in 2001, the second version was publishes in 2002 with significant improvements. Since 2004 its hosted at bioinformatics.org as an open source project which helped to develop the program even more. Its still under active development and there will be new features and programs out in the future.
Installed Version
The currently installed version is 4.6.4.
Documentation
The full documentation can be found here.