Sergey Koren | Canu: A New Assembler for Genomes Large and Small

Sergey Koren | Canu: A New Assembler for Genomes Large and Small

Canoe: A Hierarchical Assembler for Nanopore Data

Introduction to Canoe

  • The speaker introduces Canoe, a hierarchical assembler designed for nanopore data, emphasizing its background and functionality.
  • Canoe optimizes raw data through overlapping and improving it before assembly, which is crucial for handling noisy data.
  • It is noted that Canoe can run on human genomes, unlike other assemblers like Mini ASM or SPAdes.

Performance of Canoe

  • Initial tests show that Canoe performs faster than NanoPolish in generating high-quality consensus from nanopore-only assemblies.
  • Both Canoe and Mini ASM achieve similar quality (99.82% identity), but Canoe does so significantly faster due to higher initial output quality.

Comparison with Other Assemblers

  • When using Illumina data alongside nanopore data, Canoe achieves the highest accuracy assembly compared to other assemblers.
  • Many ASM struggles with errors during assembly when using only nanopore data due to its rough pass approach.

Assembly of Bacterial Genomes

  • The speaker discusses the routine nature of assembling bacterial genomes with MinION sequencing and highlights the efficiency of 1D template data in achieving high-quality assemblies.
  • Combining 1D nanopore data with Illumina results in improved quality metrics (99.8% QV).

Larger Genome Assemblies

  • An example involving yeast genome assembly demonstrates that newer chemistries could lead to more efficient chromosome assemblies without algorithm changes.

Hybrid vs. Pure Assemblers

  • The discussion shifts to hybrid assemblers like SPAdes versus those relying solely on nanopore data; an Arabidopsis biosample was used for testing different coverage levels.
  • Results indicate that while SPAdes shows rapid gains initially, it plateaus at higher coverage levels compared to the consistent performance improvement seen with Canoe.

Conclusion on Coverage Impact

Video description

Single-molecule sequencing is now routinely used assemble complete, high-quality microbial genomes, but these assembly methods have not scaled well to large genomes. To address this problem, the MinHash Alignment Process was previously introduced for assembling single-molecule, noisy reads. This has enabled reference-grade assemblies of model organisms, revealed novel heterochromatic sequences, and filled low-complexity gaps. This work has been built on, creating a new assembler named Canu, optimized for single-molecule sequencing. Canu represents a complete refactorization of the Celera Assembler, shrinking the code base by over 70%, lowering coverage requirements, enhancing reconstruction of diploid genomes, and more efficiently assembling repetitive sequences. Canu has generated single-chromosome assemblies of E. coli, B. anthracis, B. cereus, and Y. pestis with over 99% identity using Oxford Nanopore sequencing. Using older chemistry, the eukaryote S. cerevisiae assembly exceeds 500Kb NG50 and we predict chromosome-scale assembly with recent chemistries. Canu is available at https://github.com/marbl/canu. Flongle, GridION, MinION, MinIT, PromethION, and VolTRAX are currently for research use only.