Sergey Koren | Canu: A New Assembler for Genomes Large and Small
Canoe: A Hierarchical Assembler for Nanopore Data
Introduction to Canoe
- The speaker introduces Canoe, a hierarchical assembler designed for nanopore data, emphasizing its background and functionality.
- Canoe optimizes raw data through overlapping and improving it before assembly, which is crucial for handling noisy data.
- It is noted that Canoe can run on human genomes, unlike other assemblers like Mini ASM or SPAdes.
Performance of Canoe
- Initial tests show that Canoe performs faster than NanoPolish in generating high-quality consensus from nanopore-only assemblies.
- Both Canoe and Mini ASM achieve similar quality (99.82% identity), but Canoe does so significantly faster due to higher initial output quality.
Comparison with Other Assemblers
- When using Illumina data alongside nanopore data, Canoe achieves the highest accuracy assembly compared to other assemblers.
- Many ASM struggles with errors during assembly when using only nanopore data due to its rough pass approach.
Assembly of Bacterial Genomes
- The speaker discusses the routine nature of assembling bacterial genomes with MinION sequencing and highlights the efficiency of 1D template data in achieving high-quality assemblies.
- Combining 1D nanopore data with Illumina results in improved quality metrics (99.8% QV).
Larger Genome Assemblies
- An example involving yeast genome assembly demonstrates that newer chemistries could lead to more efficient chromosome assemblies without algorithm changes.
Hybrid vs. Pure Assemblers
- The discussion shifts to hybrid assemblers like SPAdes versus those relying solely on nanopore data; an Arabidopsis biosample was used for testing different coverage levels.
- Results indicate that while SPAdes shows rapid gains initially, it plateaus at higher coverage levels compared to the consistent performance improvement seen with Canoe.
Conclusion on Coverage Impact