Wtdbg2 is a de novo sequence assembler for long noisy reads produced by PacBio or Oxford Nanopore Technologies (ONT). It assembles raw reads without error correction and then builds the consensus from intermediate assembly output. Wtdbg2 is able to assemble the human and even the 32Gb Axolotl genome at a speed tens of times faster than CANU and FALCON while producing contigs of comparable base accuracy.
During assembly, wtdbg2 chops reads into 1024bp segments, merges similar segments into a vertex and connects vertices based on the segment adjacency on reads. The resulting graph is called fuzzy Bruijn graph (FBG). It is akin to De Bruijn graph but permits mismatches/gaps and keeps read paths when collapsing k-mers. The use of FBG distinguishes wtdbg2 from the majority of long-read assemblers. 1
The following versions are installed and currently available...
... on environment hpc-env/8.3:
Loading / Using wtdbg2
To load the desired version of the module, use the module load command, e.g.
module load hpc-env/8.3 module load wtdbg2
Always remember: this command is case sensitive!
To find out on how to use wtdbg2 you can just type in wtdbg2 without any additional arguments to print out a help text to get you started:
$ wtdbg2 WTDBG: De novo assembler for long noisy sequences Author: Jue Ruan <firstname.lastname@example.org> Version: 2.5 (20190621) Usage: wtdbg2 [options] -i <reads.fa> -o <prefix> [reads.fa ...] Options: -i <string> Long reads sequences file (REQUIRED; can be multiple),  -o <string> Prefix of output files (REQUIRED),  -t <int> Number of threads, 0 for all cores,  -f Force to overwrite output files -x <string> Presets, comma delimited,  preset1/rsII/rs: -p 21 -S 4 -s 0.05 -L 5000 preset2: -p 0 -k 15 -AS 2 -s 0.05 -L 5000 preset3: -p 19 -AS 2 -s 0.05 -L 5000 sequel/sq nanopore/ont: (genome size < 1G: preset2) -p 0 -k 15 -AS 2 -s 0.05 -L 5000 (genome size >= 1G: preset3) -p 19 -AS 2 -s 0.05 -L 5000 preset4/corrected/ccs: -p 21 -k 0 -AS 4 -K 0.05 -s 0.5 -g <number> Approximate genome size (k/m/g suffix allowed)  -X <float> Choose the best <float> depth from input reads(effective with -g) [50.0] -L <int> Choose the longest subread and drop reads shorter than <int> (5000 recommended for PacBio)  Negative integer indicate tidying read names too, e.g. -5000. -k <int> Kmer fsize, 0 <= k <= 23,  -p <int> Kmer psize, 0 <= p <= 23,  k + p <= 25, seed is <k-mer>+<p-homopolymer-compressed> -K <float> Filter high frequency kmers, maybe repetitive, [1000.05] >= 1000 and indexing >= (1 - 0.05) * total_kmers_count -S <float> Subsampling kmers, 1/(<-S>) kmers are indexed, [4.00] -S is very useful in saving memeory and speeding up please note that subsampling kmers will have less matched length -l <float> Min length of alignment,  -m <float> Min matched length by kmer matching,  -R Enable realignment mode -A Keep contained reads during alignment -s <float> Min similarity, calculated by kmer matched length / aligned length, [0.05] -e <int> Min read depth of a valid edge,  -q Quiet -v Verbose (can be multiple) -V Print version information and then exit --help Show more options
The softwares' root directory contains two folders which might be of use for you: The bin folder which contains all executables as well as a scripts directory:
$ ls $EBROOTWTDBG2/bin kbm2 pgzf wtdbg2 wtdbg2.pl wtdbg-cns wtpoa-cns $ ls $EBROOTWTDBG2/scripts best_kbm_hit.pl dbm_index_fa.pl fa2tab.pl hlcolor num_n50.pl runit.pl split_seqs_3.pl best_minimap_hit.pl dbm_read_dot.pl first_n_bases.pl longest_pacbio_subreads.pl rename_fa.pl sam2dbgcns.pl wtdbg-dot2gfa.pl best_sam_hits4longreads.pl dbm_read_fa.pl first_n_seqs.pl mmpoa.pl rename_fq.pl seq_n50.pl dbm_index_dot.pl fa2fq.pl fq2fa.pl mum_assess.sh rev_seq.pl split_seqs_2.pl
The full documentation can be found here.