DNA sequencing

from Wikipedia, the free encyclopedia

DNA sequencing is the determination of the nucleotide -Abfolge in a DNA - molecule . DNA sequencing revolutionized the biological sciences and ushered in the era of genomics . Since 1995, the genome of over 50,000 (as of 2020) different organisms has been analyzed using DNA sequencing . Together with other DNA-analytical methods, DNA sequencing is used, among other things, to investigate genetic diseases . In addition, DNA sequencing is a key analytical method , especially in the context of DNA cloning .molecular cloning ), an indispensable part of a molecular biology or genetic engineering laboratory today.

Problem

DNA sequencing as the reading of the nucleotide sequence of DNA was an unsolved problem for decades until the mid-1970s, when appropriate biochemical or biotechnological methods were developed. Nowadays, even the sequencing of entire genomes has become comparatively quick and easy.

However, the challenges of genome sequencing are not limited to the direct reading of the nucleotide sequence. Depending on the method, only short DNA sections ( reads ) up to a maximum of 1000 base pairs are read in each individual sequencing reaction due to technical limitations . After obtaining the sequence of the next will then primer (having a sequence from the end of the previous sequencing), which as a primer walking or whole chromosomes as chromosome walking is called and was first applied in 1979.

A larger sequencing project, like the Human Genome Project , which sequenced billions of base pairs, therefore requires an approach known as shotgun sequencing . Longer DNA sections are first broken down into smaller units, these are then sequenced, and the sequence information of the individual short sections is then reassembled into a complete overall sequence using bioinformatic methods. In order to obtain biologically relevant information from the raw sequence data (for example information about existing genes and their control elements ), the sequencing is followed by the DNA sequence analysis . Without them, any sequence information has no scientific value.

Sequencing methods

DNA sequencing equipment

There are several methods today of reading the sequence information from a DNA molecule. For a long time, mainly further developments of the Frederick Sanger method were in use. Modern methods offer possibilities for accelerated sequencing through highly parallel use. Developed by the Sanger method sequencing methods are often called next generation sequencing ( Engl. Next generation sequencing ), respectively.

Classic methods

Method of Maxam and Gilbert

The method of Allan Maxam and Walter Gilbert from 1977 is based on the base-specific chemical cleavage of the DNA by means of suitable reagents and subsequent separation of the fragments by denaturing polyacrylamide gel electrophoresis . The DNA is first marked at a 5 'or 3' end with radioactive phosphate or a non-radioactive substance ( biotin , fluorescein ). In four separate approaches, certain bases are then partially (limited) modified and cleaved from the sugar-phosphate backbone of the DNA, for example the base guanine (G) is methylated by the reagent dimethyl sulfate and removed by alkali treatment with piperidine . Then the DNA strand is completely split. In each approach, fragments of different lengths arise, the 3 'end of which has always been cleaved at certain bases. The denaturing polyacrylamide gel electrophoresis separates the fragments according to length, differences in length being resolved by a base. The sequence of the DNA can be read off by comparing the four approaches on the gel. This method enabled its inventors to determine the operon sequence of a bacterial genome. The method is rarely used today because it requires dangerous reagents and is more difficult to automate than the Sanger dideoxy method developed at the same time.

Sanger's dideoxy method

The Sanger dideoxy method is also known as chain termination synthesis . It is an enzymatic method. It was developed by Sanger and Coulson around 1975 and presented as early as 1977 with the first complete sequencing of a genome ( bacteriophage φX174 ). Sanger received the Nobel Prize in Chemistry in 1980 for his work on DNA sequencing together with Walter Gilbert and Paul Berg .

Principle of DNA sequencing according to the dideoxy method.
dNTP is the general abbreviation for a nucleoside triphosphate and can stand for dATP, dCTP, dGTP or dTTP. ddNTPs are the corresponding dideoxy variants of the dNTPs. The incorporation of a ddNTP leads to the termination of the polymerization reaction. The blue dots at the 5 'end of the primer represent a marking (e.g. a fluorescent group) by means of which the synthesis products can later be made visible in the gel. Alternatively, radioactively labeled nucleoside triphosphates can also be used for the polymerization reaction.

Starting from a short section of known sequence ( primer ), one of the two complementary DNA strands is extended by a DNA polymerase . First, the DNA double helix is ​​denatured by heating, whereupon single strands are available for further procedures. In four otherwise identical approaches (all contain the four nucleotides, one of which is radioactively labeled), one of the four bases is added in part as dideoxynucleoside triphosphate (ddNTP) (i.e. one approach each with either ddATP, ddCTP, ddGTP or ddTTP). These chain termination ddNTPs do not have a 3'-hydroxyl group: If they are incorporated into the newly synthesized strand, the DNA polymerase can no longer extend the DNA, since the OH group on the 3'-carbon atom is responsible for the linkage with the Phosphate group of the next nucleotide is missing. As a result, DNA fragments of different lengths are created which always end with the same ddNTP in each individual batch (i.e. only with A or C or G or T for each batch). After the sequencing reaction, the marked termination products from all batches are separated lengthwise by means of polyacrylamide gel electrophoresis . By comparing the four approaches, the sequence after exposure of the radioactive gel can be read on photographic film ( X-ray film ). The corresponding complementary sequence is the sequence of the single-stranded DNA template used . Nowadays, a variation of the polymerase chain reaction (PCR) is used as a sequencing reaction . In contrast to PCR, only one primer is used, so that the DNA is only amplified linearly.

A radioactive method for DNA sequencing, through the transfer of the DNA molecules to a carrier during electrophoretic separation, was developed by Fritz M. Pohl and his group in the early 1980s. The "Direct-Blotting-Electrophoresis System GATC 1500" was marketed by the Konstanz company GATC Biotech. The DNA sequencer was e.g. B. used as part of the European genome project for sequencing chromosome II of the yeast Saccharomyces cerevisiae .

Since the beginning of the nineties , dideoxynucleoside triphosphates marked with fluorescent dyes have been used. Each of the four ddNTPs is coupled with a different dye. This modification makes it possible to add all four ddNTPs in one reaction vessel, splitting into separate batches and the handling of radioisotopes is not necessary. The resulting chain termination products are separated by capillary electrophoresis and excited to fluorescence with the aid of a laser. The ddNTPs at the end of each DNA fragment show fluorescence of different colors and can thus be recognized by a detector. The electropherogram (the sequence of color signals that appear on the detector) directly shows the sequence of the bases of the sequenced DNA strand.

Modern approaches

With the increasing importance of DNA sequencing in research and diagnostics, methods have been developed that allow increased throughput. It is now possible to sequence the complete human genome in about 8 days. These procedures are as sequencing of the second generation ( engl. Second generation sequencing ), respectively. Different companies have developed processes with different advantages and disadvantages. There are others besides those listed here. Second-generation DNA sequencing was named Method of the Year 2007 by Nature Methods magazine .

Pyrosequencing

Raw data (center) including DNA sequence (right) shown in OpenChrom

Like Sanger sequencing, pyrosequencing uses a DNA polymerase to synthesize the opposite strand of DNA, although the type of DNA polymerase can still be different. The DNA mixture is ligated with a DNA adapter and coupled to beads via a complementary adapter sequence . The beads loaded with DNA are placed on a plate with pores the size of a bead , in which a light guide leads to a detector under each pore. The DNA polymerase is observed, so to speak, "in action" as it successively attaches individual nucleotides to a newly synthesized DNA strand. The successful incorporation of a nucleotide is converted into a flash of light by a sophisticated enzyme system with the participation of a luciferase and recorded by a detector. The DNA to be sequenced serves as a template strand and is single-stranded. Starting with a primer , the strand is lengthened, nucleotide by nucleotide, by adding one of the four types of deoxynucleoside triphosphates (dNTP). A signal is obtained when the appropriate (complementary) nucleotide is added. If an NTP that does not match at this point has been added, the light flash will not be heard. After that, the existing NTPs are destroyed and another species is added; this continues until there is another reaction; After the fourth addition at the latest, there is a reaction, since all types of NTP were then tried out.

When a complementary nucleotide is incorporated by the DNA polymerase, pyrophosphate (PP i ) is released. The pyrophosphate is converted to adenosine triphosphate (ATP) by the ATP sulfurylase . The ATP drives the luciferase reaction, which converts luciferin into oxyluciferin. This in turn results in a detectable light signal - the strength of which is proportional to the ATP consumed.

Pyrosequencing is used, for example, to determine the frequency of certain gene mutations (SNPs, single nucleotide polymorphism ), e.g. B. used in the investigation of hereditary diseases. Pyrosequencing is easy to automate and is suitable for the highly parallel analysis of DNA samples.

Sequencing by Hybridization

For this purpose, short DNA segments ( oligonucleotides ) are fixed in rows and columns on a glass slide ( DNA chip or microarray ) . The fragments of the DNA to be sequenced are marked with dyes and the fragment mixture is applied to the oligonucleotide matrix so that complementary fixed and free DNA segments can hybridize with one another. After washing out unbound fragments, the hybridization pattern can be read from the color markings and their strength. Since the sequences of the fixed oligonucleotides and their areas of overlap are known, conclusions can ultimately be drawn from the color pattern about the underlying overall sequence of the unknown DNA.

Ion semiconductor DNA sequencing system

This method of Ion Torrent uses semiconductor process in order by means of integrated circuits, a direct non-optical genome perform sequencing. The sequencing data are obtained directly from the semiconductor chip detection of ions that are produced by template-dependent DNA polymerases. The chip used for this has ion-sensitive field effect transistor sensors that are arranged in a grid of 1.2 million wells in which the polymerase reaction takes place. This grid enables parallel and simultaneous detection of independent sequence reactions. The complementary metal oxide semiconductor technology ( CMOS ) is used, which allows a cost-effective reaction with a high density of measuring points.

Sequencing with bridge synthesis

During sequencing with bridge synthesis by Solexa / Illumina, the double-stranded DNA to be sequenced is ligated at both ends with a different adapter DNA sequence. Then the DNA is denatured, after dilution, ligated single-stranded onto a carrier plate and amplified in situ by bridge amplification. Characterized individual regions are formed on the carrier plate ( cluster ) with amplified DNA that within a cluster of the same sequence. In a sequencing by synthesis- related PCR reaction with four differently colored fluorescent chain termination substrates, the nucleobase built into each cycle is determined in a cluster in real time .

Two base sequencing

Two-base sequencing ( Sequencing by Oligo Ligation Detection , SOLiD) from Applied Biosystems is a variant of Sequencing by Ligation . A DNA library is diluted and coupled to microbeads with a DNA polymerase , then the DNA is duplicated in an emulsion PCR . As a result, each microbead contains several copies of only one DNA sequence. The microbeads are modified at the 3 'end so that they can be individually attached to a carrier plate. After primers have been bound and four different cleavable probes have been added , each of which is marked with different fluorescent colors and which bind to the DNA template based on the first two nucleotides (CA, CT, GG, GC), a DNA ligase is used for ligating. The probes are then cleaved, releasing the markings. Each base in the DNA sequence is determined in at least two different ligation reactions using up to five primers, each of which is set back one base in the sequence.

Paired End Sequencing

A clearly identifiable signal can also be obtained by generating short pieces of DNA from the beginning and end of a DNA sequence ( Paired End Tag Sequencing , PETS) if the genome has already been completely sequenced.

Third generation sequencing

The sequencing of the third generation for the first time measures the response of individual molecules as single-molecule experiment , which a previous sequencing amplification by PCR is eliminated. This avoids uneven amplification by thermostable DNA polymerases , as polymerases preferentially bind some DNA sequences and replicate them more intensely ( polymerase bias ). This can cause some sequences to be overlooked. The genome of individual cells can also be examined. The recording of the released signal is recorded in real time . In third-generation DNA sequencing, two different signals are recorded, depending on the process: released protons (as a variant of semiconductor sequencing) or fluorophores (with fluorescence detector). The DNA and RNA sequencing of individual cells was named method of the year 2013 by the journal Nature Methods .

Nanopore sequencing

Nanopore sequencing is based on changes in the ion current through nanopores that are embedded in an artificially created membrane. Both biological (small transmembrane proteins similar to an ion channel , e.g. α-hemolysin (α-HL) or ClpX ) and synthetic pores (made of silicon nitride or graphene ) as well as semi-synthetic pores are used as nanopores . The nanopore is embedded in an artificial membrane that has a particularly high electrical resistance. In contrast to conventional ion channels, the pore is permanently open and thus allows a constant flow of ions through the membrane after a potential has been applied. DNA molecules that pass through the pore reduce the current. This current decrease has a specific amplitude for each nucleotide , which can be measured and assigned to the corresponding nucleotide. In single-strand sequencing, a double-stranded DNA strand is separated by a helicase and inserted into the nanopore. In the case of an MspA pore, there are four nucleotides of DNA within the pore at the same time. The rate of passage depends , among other things, on the pH value difference on both sides of the membrane. The specific ion current changes for each of the four nucleotides allow the sequence to be read from the data set obtained. An evaluation takes place e.g. B. with the software Poretools . The advantage of this method is that it has consistent accuracy even with long strands of DNA. A modification of the method is used for protein sequencing .

Nanopore sequencing technology is being promoted, for example, by the British company Oxford Nanopore Technologies. Their “MinION” sequencer was initially only accessible via a so-called “Early Access Program”, but has been available since 2015 through conventional sales channels.

Individual evidence

  1. E. Pettersson, J. Lundeberg, A. Ahmadian: Generations of sequencing technologies . In: Genomics . tape 93 , no. 2 , February 2009, p. 105-111 , doi : 10.1016 / j.ygeno.2008.10.003 , PMID 18992322 .
  2. AC Chinault, J. Carbon: Overlap hybridization screening: isolation and characterization of overlapping DNA fragments surrounding the leu2 gene on yeast chromosome III. In: Genes. Volume 5 (2), 1979, pp. 111-126. PMID 376402 .
  3. ^ A. Maxam, W. Gilbert: A new method of sequencing DNA. In: Proceedings of the National Academy of Sciences USA Vol. 74, 1977, pp. 560-564. PMID 265521 ; PMC 392330 (free full text, PDF).
  4. F. Sanger: et al. : Nucleotide sequence of bacteriophage phi X174 DNA. In: Nature . Volume 265, 1977, pp. 687-695. doi: 10.1038 / 265687a0 ; PMID 870828 .
  5. F. Sanger et al. : DNA sequencing with chain-terminating inhibitors. In: Proceedings of the National Academy of Sciences USA Vol. 74, 1977, pp. 5463-5467. PMID 271968 ; PMC 431765 (free full text, PDF).
  6. Information from the Nobel Foundation on the 1980 award ceremony to Walter Gilbert, Paul Berg and Frederick Sanger (English)
  7. F. Sanger, AR Coulson: A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. In: Journal of Molecular Biology . Volume 93, 1975, pp. 441-448. PMID 1100841 .
  8. ^ S. Beck, FM Pohl: DNA sequencing with direct blotting electrophoresis. In: The EMBO journal . Volume 3, Number 12, December 1984, pp. 2905-2909. PMID 6396083 . PMC 557787 (free full text).
  9. Patent DE3022527
  10. H. Feldmann, M. Aigle et al. .: Complete DNA sequence of yeast chromosome II. In: The EMBO journal. Volume 13, Number 24, December 1994, pp. 5795-5809, PMID 7813418 . PMC 395553 (free full text).
  11. ^ Genome sequencing: the third generation. February 6, 2009, accessed February 28, 2011 .
  12. Anonymous: Method of the Year. In: Nature Methods. 5, 2008, p. 1, doi: 10.1038 / nmeth1153 .
  13. M. Ronaghi: Pyrosequencing sheds light on DNA sequencing. In: Genome Research. 11, 2001, pp. 3-11. PMID 11156611 doi: 10.1101 / gr.11.1.3 (PDF)
  14. JM Rothberg, W. Hinz, TM Rearick, J. Schultz, W. Mileski, M. Davey, JH Leamon, K. Johnson, MJ Milgrew, M. Edwards, J. Hoon, JF Simons, D. Marran, JW Myers , JF Davidson, A. Branting, JR Nobile, BP Puc, D. Light, TA Clark, M. Huber, JT Branciforte, IB Stoner, SE Cawley, M. Lyons, Y. Fu, N. Homer, M. Sedova, X. Miao, B. Reed, J. Sabina, E. Feierstein, M. Schorn, M. Alanjary, E. Dimalanta, D. Dressman, R. Kasinskas, T. Sokolsky, JA Fidanza, E. Namsaraev, KJ McKernan, A. Williams, GT Roth, J. Bustillo: An integrated semiconductor device enabling non-optical genome sequencing. In: Nature. 475 (7356), Jul 20, 2011, pp. 348-352. doi: 10.1038 / nature10242
  15. ^ ER Mardis: The impact of next-generation sequencing technology on genetics. In: Trends Genet. Volume 24 (3), 2008, pp. 133-141. doi: 10.1016 / j.tig.2007.12.007 . PMID 18262675 .
  16. a b c L. Liu, Y. Li, S. Li, N. Hu, Y. He, R. Pong, D. Lin, L. Lu, M. Law: Comparison of next-generation sequencing systems. In: J Biomed Biotechnol. Volume 2012, p. 251364. doi: 10.1155 / 2012/251364 . PMID 22829749 ; PMC 3398667 (free full text).
  17. J. Henson, G. Tischler, Z. Ning: Next-generation sequencing and large genome assemblies. In: Pharmacogenomics . Volume 13 (8), 2012, pp. 901-915. doi: 10.2217 / pgs.12.72 . PMID 22676195 . (PDF)
  18. X. Ruan, Y. Ruan: Genome wide full-length transcript analysis using 5 'and 3' paired-end-tag next generation sequencing (RNA-PET). In: Methods Mol Biol. Volume 809, 2012, pp. 535-562. doi : 10.1007 / 978-1-61779-376-9_35 . PMID 22113299 .
  19. MJ Fullwood, CL Wei, ET Liu, Y. Ruan: Next-generation DNA sequencing of paired-end tags (PET) for transcriptome and genome analyzes. In: Genome Res. Volume 19 (4), 2009, pp. 521-532. doi: 10.1101 / gr.074906.107 . PMID 19339662 . (PDF)
  20. F. Ozsolak: Third-generation sequencing techniques and applications to drug discovery. In: Expert Opin Drug Discov . Volume 7 (3), 2012, pp. 231-243. doi: 10.1517 / 17460441.2012.660145 . PMID 22468954 ; PMC 3319653 (free full text).
  21. CS Pareek, R. Smoczynski, A. Tretyn: Sequencing technologies and genome sequencing. In: J Appl Genet. Volume 52, Issue 4, 2011, pp. 413-435. doi: 10.1007 / s13353-011-0057-x . PMID 21698376 ; PMC 3189340 (free full text).
  22. Anonymous: Method of the Year 2013. In: Nature Methods. 11, 2013, p. 1, doi: 10.1038 / nmeth.2801 .
  23. J. Nivala, DB Marks, M. Akeson: Unfoldase-mediated protein translocation through an α-hemolysin nanopore. In: Nature Biotechnology . Volume 31, Number 3, March 2013, pp. 247–250, doi: 10.1038 / nbt.2503 . PMID 23376966 . PMC 3772521 (free full text).
  24. AH Laszlo, IM Derrington, BC Ross, H. Brinkerhoff, A. Adey, IC Nova, JM Craig, KW Langford, JM Samson, R. Daza, K. Doering, J. Shendure, JH Gundlach: Decoding long nanopore sequencing reads of natural DNA. In: Nature Biotechnology . Volume 32, number 8, August 2014, pp. 829-833, doi: 10.1038 / nbt.2950 . PMID 24964173 . PMC 4126851 (free full text).
  25. BN Anderson, M. Muthukumar, A. Meller: pH tuning of DNA translocation time through organically functionalized nanopores. In: ACS Nano . Volume 7, Number 2, February 2013, pp. 1408-1414, doi: 10.1021 / nn3051677 . PMID 23259840 . PMC 3584232 (free full text).
  26. ^ NJ Loman, AR Quinlan: Poretools: a toolkit for analyzing nanopore sequence data. In: Bioinformatics. Volume 30, number 23, December 2014, pp. 3399-3401, doi: 10.1093 / bioinformatics / btu555 . PMID 25143291 .
  27. ^ Y. Yang, R. Liu, H. Xie, Y. Hui, R. Jiao, Y. Gong, Y. Zhang: Advances in nanopore sequencing technology. In: Journal of nanoscience and nanotechnology. Volume 13, Number 7, July 2013, pp. 4521-4538. PMID 23901471 .
  28. Start using MinION ( Memento from January 23, 2016 in the Internet Archive ), accessed on March 23, 2016.

literature

Auxiliary methods

Web links