Dfind old genome assemblies

12/3/2023

The most established long-read technology is from Pacific Biosciences (PacBio), which uses a sequencing by synthesis approach where phospholinked nucleotides are used to synthesize the complement strand of a single stranded DNA template. Such an error profile could negatively affect the assembly accuracy, but because the errors are mostly randomly distributed the majority of long-read assemblers adopt the strategy of correcting base errors algorithmically before attempting to assemble the reads. A major drawback of long-read technologies is the higher rate of sequencing errors (5–20%) compared to NGS data (<1%) 2. Particularly in the genome assembly field they can be used for de novo assembly with long-read data only, or for scaffolding of NGS-based assemblies by bridging gaps between contigs or spanning long repeats thus resolving them. Reads of such length can be exploited in various ways. The main characteristic of these new platforms is to work with long DNA molecules and provide reads with lengths up to hundreds of kilobases (kb). To overcome the high fragmentation of NGS-based assemblies and to help resolve long repeats, long-read sequencing technologies have been developed and recently adopted by the genomics community. In particular, short reads are not able to solve complex genome features like repeated regions (repeats) longer than the fragment length or copy number variations, with the typical outcome that (almost-) identical repeats are collapsed into a single element in the assembly. The typical length of the DNA fragments sequenced is between 50 and 400 bases long 2, and as a result, the assembly obtained from such short reads is fragmented in contigs much smaller than the actual chromosome sizes. After DNA amplification, multiple fragments of the sequences obtained may cover the same genome region, so that computational algorithms can be used to concatenate and assemble such reads like a jigsaw puzzle and generate a consensus to correct for the occasional sequencing errors. In a typical NGS run, DNA molecules are sheared into small fragments and then clonally amplified before being sequenced. Compared to the previous Sanger technology 1, NGS has significantly lowered the cost of sequencing using massively parallel sequencing methods 2, 3. The advent of next generation sequencing technologies (NGS) has marked the start of a new era in genomics research. With a given read depth of 31X, the assemblies from both Pacific Biosciences and Oxford Nanopore MinION show excellent continuity and completeness for the 16 nuclear chromosomes, but not for the mitochondrial genome, whose reconstruction still represents a significant challenge. We present a comprehensive metric comparison of assemblies generated by various pipelines and discuss how the platform associated data characteristics affect the assembly quality. In this paper, we re-sequenced a well characterized genome, the Saccharomyces cerevisiae S288C strain using three different platforms: MinION, PacBio and MiSeq. Recently, genome assemblies using Oxford Nanopore MinION data have attracted much attention due to the portability and low cost of this novel sequencing instrument. Many successful assembly applications of the Pacific Biosciences technology have been reported ranging from small bacterial genomes to large plant and animal genomes.

Compared with short reads, the assemblies obtained from long-read sequencing platforms have much higher contig continuity and genome completeness as long fragments are able to extend paths into problematic or repetitive regions. Long-read sequencing technologies such as Pacific Biosciences and Oxford Nanopore MinION are capable of producing long sequencing reads with average fragment lengths of over 10,000 base-pairs and maximum lengths reaching 100,000 base- pairs.

0 Comments

Dfind old genome assemblies

Leave a Reply.

Author

Archives

Categories