作者
Erwin L. van Dijk,Yan Jaszczyszyn,Delphine Naquin,Claude Thermes
摘要
Long-read/third-generation sequencing technologies are causing a new revolution in genomics as they provide a way to study genomes, transcriptomes, and metagenomes at an unprecedented resolution. SMRT and nanopore sequencing allow for the first time the direct study of different types of DNA base modifications. Moreover, nanopore technology can sequence directly RNA and identify RNA base modifications. Owing to the portability of the MinION and the existence of extremely simple library preparation methods, nanopore technology allows the performance of high-throughput sequencing for the first time in the field and at remote places. This is of tremendous importance for the survey of outbreaks in developing countries. Forty years ago the advent of Sanger sequencing was revolutionary as it allowed complete genome sequences to be deciphered for the first time. A second revolution came when next-generation sequencing (NGS) technologies appeared, which made genome sequencing much cheaper and faster. However, NGS methods have several drawbacks and pitfalls, most notably their short reads. Recently, third-generation/long-read methods appeared, which can produce genome assemblies of unprecedented quality. Moreover, these technologies can directly detect epigenetic modifications on native DNA and allow whole-transcript sequencing without the need for assembly. This marks the third revolution in sequencing technology. Here we review and compare the various long-read methods. We discuss their applications and their respective strengths and weaknesses and provide future perspectives. Forty years ago the advent of Sanger sequencing was revolutionary as it allowed complete genome sequences to be deciphered for the first time. A second revolution came when next-generation sequencing (NGS) technologies appeared, which made genome sequencing much cheaper and faster. However, NGS methods have several drawbacks and pitfalls, most notably their short reads. Recently, third-generation/long-read methods appeared, which can produce genome assemblies of unprecedented quality. Moreover, these technologies can directly detect epigenetic modifications on native DNA and allow whole-transcript sequencing without the need for assembly. This marks the third revolution in sequencing technology. Here we review and compare the various long-read methods. We discuss their applications and their respective strengths and weaknesses and provide future perspectives. a sequencing method in which a physical map of the target genome, or chromosome, is established using a set of overlapping bacterial artificial chromosome (BAC) clones. The individual clones are subsequently fragmented and subjected to shotgun sequencing. a set of molecular biology methods used to analyze the spatial organization of chromatin in a cell. in PacBio CCS, the DNA polymerase reads a ligated circular DNA template multiple times, generating a consensus sequence with a high level of accuracy. from contiguous; a set of overlapping DNA segments that together represent a consensus region of DNA. a method of DNA sequencing based on the selective incorporation of chain-terminating dideoxynucleotides by DNA polymerase. The resulting DNA fragments are heat denatured and separated by size using gel electrophoresis. diploid genomes have two copies of each chromosome that differ at various loci along each chromosome. Genome phasing (also called haplotyping or haplotype estimation) allows the determination of which chromosome such heterozygous variants are derived from. Assembly of reads that share the same variation enables reconstruction of the parental homologs (haplotype reconstruction). a common class of mutations comprising an insertion or deletion of one or more DNA bases into a genome. a statistical measure of the average length of a set of sequences; used widely in genomics, especially in reference to read, contig, or scaffold lengths in a draft assembly. For reads it indicates the length such that reads of this length or greater sum to half of the total number of bases. For contigs or scaffolds it indicates the size such that contigs or scaffolds of this length or greater sum to at least half of the haploid genome size. high-throughput (optical) genome mapping technology commercialized by BioNano Genomics, also referred to as next-generation mapping (NGM). Long DNA molecules are nick labeled at specific sites and linearized in nanochannel arrays. The length of the DNA molecules and the positions of nick labels are determined after automated image capture. methods based on of massive parallel sequencing via spatially separated, clonally amplified DNA templates in a flow cell. Typically, reads of up to several hundreds of base pairs are produced. also referred to as the Phred quality score; indicates the probability that a given base is called incorrectly by the sequencer. QVs are logarithmically related to the base-calling error probability (P)2, Q = −10log10P. For example, QV30 is equivalent to the probability of an incorrect base call 1 in 1000 times. a noncontiguous series of genomic sequences is linked together into a scaffold comprising sequences separated by gaps of known length. The sequences that are linked are typically contiguous sequences corresponding to read overlaps. the molecular mechanism of a given sequencing method. Several technologies are based on ‘sequencing by synthesis’ in which sequence information is generated by a polymerase that copies a DNA strand. By contrast, nanopore sequencing directly ‘reads’ the original DNA or RNA molecule. or microsatellites; comprise a unit of 2–13 nucleotides repeated many times (up to hundreds or thousands) in a row on a DNA strand. genomic rearrangements affecting more than 50 bp. SVs are often multiple kilobases or even megabases in size and include deletions, insertions, inversions, mobile-element transpositions, translocations, tandem repeats, and copy number variants (CNVs).