DNA barcoding

from Wikipedia, the free encyclopedia

DNA barcoding ( English DNA barcoding ) is a taxonomic method to identify species based on the DNA sequence of a marker gene . The sequence of the base pairs is used in the same way as the barcode on food packaging as an identifier for a certain species. The name barcoding (English bar = "bar") comes from this analogy. Since the DNA sequence changes at a largely uniform rate due to point mutations (cf. molecular clock ), more closely related individuals (and species) have more similar sequences. As long as a species remains undivided, i.e. that is, if it has a common gene pool , differences between different populations are repeatedly balanced out by gene flow . With the separation during speciation, the sequences develop at an almost constant rate. If samples from two individuals have significantly different sequences, this is a sign that they come from different species.

DNA barcoding is not about the protein-coding property of DNA. Since coding sequences are subject to selection , their use is even with reservations. It is still possible because the genetic code in the third position of the base triplet is largely redundant (“degenerate code”). As a result, the sequence is hardly subject to the effect of the selection and can be used as a neutral marker. For the detection of differences between closely related species, a rapidly changing base sequence, e.g. B. from a functionless DNA segment, most suitable. Slowly changing sections are better suited for larger differences. Accordingly, various DNA segments have been proposed for the method.

The most widespread is a section from the mitochondrial DNA (mtDNA). This has the advantage that it contains no introns (except in fungi), is only subject to very little recombination and is inherited in the haploid mode (i.e., there are no two different chromosomes with possibly different alleles , as in the nuclear genome ) ; this saves cloning which is otherwise necessary . Of the 13 protein-coding genes of the mtDNA, a 648 base pair (abbreviated: bp) region of the gene of subunit I of cytochrome c oxidase (COI or cox1) is used as the standard because this gene differs more widely between different species than the other mitochondrial genes .


The process is based on the application of the polymerase chain reaction (abbreviated to PCR). The following work steps are required:

  • Extraction of the DNA from the examined organism or the sample (see main article DNA extraction ). Fresh material or museum material can be used for this, but the preservation with formalin , which is often practiced in museums, causes problems.
  • Performing the PCR. In order for the reaction to start, a short DNA segment, the primer, is required in addition to the enzyme . A separate primer is necessary for both DNA strands (named 3 'and 5' after the ring structure at the respective end), whereby in practice only the 5 'primer is usually used. In fact, it seems that choosing the primer should be an impossible task; after all, it depends on the base sequence of the DNA, which is unknown and which the method is currently trying to find out. Fortunately, numerous sections are interspersed in the genome that are very little variable between different organisms. These conserved sequences usually code for a biologically fundamental task, so that mutations at this point are usually lethal. The standard sequence of the cox1 gene was chosen not least because good primers were available for it. However, the choice of primer is a difficult step and different primers can give different sequences. It is possible to generate new primers with the enzyme reverse transcriptase , but this is much too time-consuming for routine examinations. The enzyme used for the replication is the Taq polymerase from the Thermus aquaticus bacterium , which has been available for routine use since 1987 - an essential prerequisite for the methodology.
  • Sequencing of the replicated (or "amplified") DNA (see main article DNA sequencing ). This used to be a difficult laboratory task. Today, there are powerful automated sequencing machines with high throughput rates available that automatically perform the sequencing. Sequencing therefore today adds neither to the difficulties nor to the costs of the method, nor does it add anything worth mentioning.
  • Analysis of the sequence. If there is already a database for the examined group, the sequence is compared with the sequences stored there. If it is identical or shows only slight variations, the examined sample probably belongs to this type. It is more difficult if there is no database at all or if one is about to be created with the samples, or if the sample does not match the stored sequences . Unknown samples are grouped by the computer according to sorting algorithms according to similarity, so that trees are produced that resemble a family tree. Samples from the same species or from very closely related species should have similar sequences and therefore be close together when sorted. If the samples of a "species" show two or more clearly separated groups (or "clusters"), this is a strong indication that there are actually several species that have not been recognized and differentiated so far. Unfortunately, the differences between different species in different systematic groups vary greatly, and at the same time the polymorphism within a species can sometimes be quite large. It is therefore not possible to specify a universal threshold above which diverging sequences represent different species with certainty. In the order of magnitude, a difference of 3% has proven itself in many cases, but both lower and higher values ​​are often in use. What the clusters of data analysis, sometimes called operational taxonomic units (OTUs), really represent and whether one can easily equate them with species is one of the main points of contention in the process.


There are a number of initiatives around the world attempting to build databases with DNA barcode sequences as references for specific groups of species. The main aim of the initiatives is to collect and read in sequences of clearly identified individuals of the described species in order to provide data for users. The IBOL (International barcode of life project) initiative coordinates efforts in numerous species groups and provides technical assistance. Some of the participating initiatives are: The Fish Barcode of Life Initiative (FISH-BOL) is trying to build a database of DNA barcodes for all fish species worldwide. ABBI is the corresponding initiative for the birds. Other IBOL initiatives are trying the same for butterflies and mammals.

However, the ambition of some research groups goes far beyond these goals. Many dream of one day simply sequencing unsorted samples obtained from the environment and then more or less receiving a list of species of the corresponding habitat without having to employ highly trained, expensive and rare specialists. Others expect in the near future even portable barcoders due to miniaturization of the components, which, manageable in the field or at the workplace, can determine a species name from the smallest samples reliably and in real time.

Case studies

  • An investigation of the neotropical butterfly Astraptes fulgerator using DNA barcoding has shown that what was previously thought to be a (polymorphic) species is actually a complex of ten very similar twin species that are morphologically hardly distinguishable.
  • In a study on tropical parasitoid brackish wasps , 171 provisional (approx. 95% undescribed) species could be distinguished using morphological methods. DNA barcoding revealed the presence of a further 142 species that could not be identified during morphological sorting, most of them host-specific. The study allows projections to be made of the extreme abundance of species in this group in the tropics, in which only extremely few taxonomists worldwide specialize.
  • The suitability of the method for the identification of species of marine red algae could be proven. These are extremely difficult to distinguish according to morphological criteria.
  • In land plants, the cox1 gene is unsuitable for DNA barcoding and does not provide any usable results. A number of other genes were tested for the method. So far, a section of the plastid gene matK has been the most suitable (just like mitochondria, plastids have their own genetic material). In a pilot study on orchid species, the suitability of this gene for DNA barcoding of land plants could be demonstrated. When used, markers for tropical orchid species could be an important building block for preventing smuggling. Another working group found, however, in trees of the economically important Meliaceae family ( mahogany family ) that all markers on mitochondria and plastids were equally unreliable. They propose a region of the nuclear genome that is also examined as a marker.
  • The application of the method to primate species turned out to be difficult because of some methodological problems, but it was possible and promising for the future after appropriate adaptation of the standard methodology. The method could also help curb smuggling (including meat and other products) and would be helpful in biomedical research.
  • A study has shown that it is possible to determine the species and gender of Siberian tigers and Amur leopards from faecal samples collected in the habitat . This means that the remaining distribution, the ecology and way of life of these extremely secret species can be clarified much more easily than with the very rare visual observations.
  • Researchers in the south of France have succeeded in using DNA from water samples to find out whether individuals of the American bullfrog are found in the water . The species that was introduced to Europe is feared here because of its effects on the native amphibian fauna. Direct detection is difficult with low population density and only possible at certain times of the year.

Major advantages of the method

Proponents of the method cite the following major advantages of DNA barcoding compared to more traditional taxonomic working methods, which they can also partially substantiate (see the case studies):

  • The method enables non-specialists to identify species from difficult and species-rich groups. This is important because every specialist can with some certainty only really have an overview of a few thousand species, but there are millions of species (see: biodiversity ). The number of taxonomists is small worldwide. At the moment, it continues to decrease significantly because the subject is considered old-fashioned and, when allocating funds within research institutions (e.g. at universities), other biological subjects are predominantly considered. For example, chairs for classical taxonomy will not be filled again or the field of research will be changed in the event of a new appointment. At the same time, the biodiversity of life on earth should be described and recorded, which would take centuries with conventional methods at the current speed.
  • DNA barcoding makes it possible to assign parts and products of organisms to a species. This is essential in solving protected species smuggling, compliance with catch quotas and similar issues that are overwhelming authorities today. In addition, larvae and other developmental stages can be assigned to the species (mostly described after adults).
  • By analyzing apparently known species, it often turns out that there are morphologically indistinguishable twin species ( cryptospecies ), which can clearly differ in lifestyle and specialization. In other groups with few features, such as nematodes , a species identification according to the morphology is almost impossible anyway. Here, DNA barcoding can unravel the connections much better or at least provide essential clues.

Criticism and Limits of the Method

The impressive opportunities offered by the DNA barcoding method in the quick and easy identification of species should not obscure our view of inadequacies that have been found in various areas. An uncritical acceptance of the results can lead to serious misjudgments. These relate to various aspects of the process and can partly be remedied by technical adjustments and refinements, but partly also fundamental inadequacies that make the use of DNA barcoding difficult or impossible for some areas of application.

First of all, the use of a mitochondrial marker gene results in the relationship being determined exclusively in the maternal inheritance, since the sperm does not contribute any mitochondria to the new organism. This makes it impossible to research some of the effects of hybridizations or introgression. However, this effect is only significant in the case of incomplete species splits or very closely related species.

Another principal difficulty is that there is seldom a sharp break between intraspecific and interspecific variability (i.e. that within a species and between different species). Very polymorphic species and closely related groups of species are blurred and blurred. Basically, this is not a problem of the method, but simply an effect of nature itself, which does not always fit perfectly into our more or less artificial sorting criteria. Problems arise from this in the application, e.g. B. when species numbers are to be compared.

It becomes even more problematic if only operational taxonomic units delimited with DNA barcoding are treated as species, because then the species diversity z. B. of a habitat depends critically on the threshold values ​​used in the analysis. This makes subtle manipulations possible. Since the threshold values ​​can be very different between different groups of organisms, it is also very risky to treat poorly researched or unknown sequences without very similar reference entries in the database as real biological units. The difficulties mentioned should diminish and ultimately disappear when the groups examined are better known and the databases become more complete. However, the advocates of the new method had always advertised that it could be used to determine biodiversity directly, i.e. H. just without in-depth knowledge of the species on independent paths.

Some researchers point out that the marker gene cox1 is subject to more directed selection, at least in some groups of organisms. Due to the effect of the selection, changes are no longer necessarily neutral, they can run slower or faster than expected and thus distort the results. The selection can be aimed directly at the encoded enzyme or it can result indirectly from the linkage with other genes (linkage disequilibrium, roughly: “gene linkage imbalance”). In insects and other arthropods z. B. the almost universal infection with symbiotic or damaging bacterial strains, z. B. of the genus Wolbachia , produce strong imbalances of the mtDNA within a species (whereby it is then wrongly assumed that there are several, cryptic species) as well as make individual populations of different species more similar to each other than to other populations within the species (here either the Species difference completely misunderstood or too many species would be distinguished). These effects are not without importance for the estimates of the biodiversity , because around half of the described species (and probably a significantly higher proportion of the unknown) are insects. In a pilot study with a species of fly it could be shown that the effect is not only theoretically plausible, but actually falsifies the results.

Another problem of the method are pseudogenes of the mitochondrial genes in the cell nucleus. Due to copying errors, sections of the mtDNA are sometimes mistakenly integrated into the nuclear genome, which is why it is assumed that most of the originally much more numerous independent organelle genes were integrated into the cell nucleus in the past. Although this integration has been functionally completed, such genes are still occasionally incorporated into the cell nucleus, where they remain functionless and generally degenerate more or less rapidly into pseudogenes through selectively neutral mutations. In many species there are numerous such pseudogenes in the nucleus; in humans, for example, there are more than 500 for COI alone. Using the usual primers in DNA barcoding, the pseudogenes are duplicated in the PCR, just like the "real" gene. Since these are sequences that are mutated for a longer or shorter period of time independently of the original gene, they are different from the original gene and result in erroneous measured values. In the worst case, the sequence of the pseudogen is mistaken for the marker gene, which means that the species in question is completely incorrectly sorted. How unrecognized pseudogenes can ruin an analysis is shown e.g. B. Jennifer E. Buhay (2009). In many cases it is possible to recognize pseudogenes: Since they are not subject to selection, mutations also occur by chance that would completely destroy the integrity of functional genes. These are e.g. B. Stop codons in the middle of the gene or shifts in the reading frame. Apart from such clear cases, their detection is impossible without additional information.

It should also be noted that the method can of course only deliver correct results if the sequence stored in the reference database actually belongs to the specified type. In a 2006 study, around 20 percent of the species names in mushrooms were found to be incorrect.

Turbo taxonomy

Efforts are now being made to use the DNA barcoding process not only to identify species that have already been described, but also to standardize the description of new species (“turbo taxonomy”). The barcode sequence, together with a strongly abbreviated morphological description, then serves to define the new species, which should only be described comprehensively according to today's standard if necessary (see first description ). In fact, there are currently species that are differentiated from other species solely on the basis of their DNA sequence.

See also


  1. ^ Paul DN Hebert, Alina Cywinska, Shelley L. Ball, Jeremy R. de Waard (2003): Biological identifications through DNA barcodes. Proceedings of the Royal Society London Series B 270, 313-321.
  2. Dirk Steinke & Nora Brede (2006): DNA barcoding. In: Biology in Our Time. Vol. 36, No. 1, pp. 40-46. doi : 10.1002 / biuz.200410302 PDF ( Memento of the original from August 20, 2008 in the Internet Archive ) Info: The archive link has been inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2Template: Webachiv / IABot / www.bolinfonet.org
  3. Archived copy ( memento of the original dated May 23, 2018 in the Internet Archive ) Info: The archive link was inserted automatically and not yet checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2Template: Webachiv / IABot / www.fishbol.org
  4. http://www.barcodingbirds.org/
  5. http://www.lepbarcoding.org/
  6. http://www.mammaliabol.org/
  7. Alice Valentini, Francois Pompanon, Pierre Taberlet (2009): DNA barcoding for ecologists. Trends in Ecology & Evolution 24 (2): 110-117.
  8. ^ Paul DN Hebert, Erin H. Penton, John M. Burns, Daniel H. Janzen, Winnie Hallwachs (2004): Ten species in one: DNA barcoding reveals cryptic species in the neotropical skipper butterfly Astraptes fulgerator. Proceedings of the National Academy of Sciences USA 101 (41): 14812-14817.
  9. M. Alex Smith, Josephine J. Rodriguez, James B. Whitfield, Andrew R. Deans, Daniel H. Janzen, Winnie Hallwachs, Paul DN Hebert (2008): Extreme diversity of tropical parasitoid wasps exposed by iterative integration of natural history, DNA barcoding, morphology, and collections. Proceedings of the National Academy of Science USA 105 (34): 12359-12364.
  10. Lavinia Robba, Stephen J. Russell, Gary, L. Barker, Juliet Brodie (2006): Assessing the use of the mitochondrial cox1-marker for use in DNA barcoding of red algae (Rhodophyta). American Journal of Botany 93 (8): 1101-1108.
  11. Khidir W. Hilu & Hongping Liang (1997): The matK-Ggene: sequence variation and application in plant systematics. American Journal of Botany 84 (6): 830-839.
  12. Renaud Lahaye, Michelle van der Bank, Diego Bogarin, Jorge Warner, Franco Pupulin, Guillaume Gigot, Olivier Maurin, Sylvie Duthoit, Timothy G. Barraclough, Vincent Savolainen (2008): DNA barcoding the floras of biodiversity hotspots. Proceedings of the National Academy of Science USA 105 (8): 2923-2928.
  13. AN Muellner, H. Schaefer, R. Lahaye (2011): Evaluation of candidate DNA barcoding loci for economically important timber species of the mahogany family (Meliaceae). Molecular Ecology Resources 11 (3): 450-460. doi : 10.1111 / j.1755-0998.2011.02984.x .
  14. ^ Joseph G. Lorenz, Whitney E. Jackson, Jeanne C. Beck, Robert Hanner (2005): The problems and promise of DNA barcodes for species diagnosis of primate biomaterials. Philosophical Transactions of the Royal Society Series B 360: 1869-1877.
  15. Taro Sugimoto, Junco Nagata, Vladimir V. Aramilev, Alexander Belozor, Seigo Higashi, Dale R. McCullough (2006): Species and sex identification from faecal samples of sympatric carnivores, Amur leopard and Siberian tiger, in the Russian Far East. Conservation Genetics 7: 799-802.
  16. Gentile Francesco Ficetola, Claude Miaud, François Pompanon, Pierre Taberlet (2008): Species detection using environmental DNA from water samples. Biology Letters 23 (4): 423-425.
  17. Hurst, GD & Jiggins, FM (2005): Problems with mitochondrial DNA as a marker in population, phylogeographic and phylogenetic studies: the effects of inherited symbionts. In: Proceedings of the Royal Society London Series B 272: 1525-1534. PDF  ( page no longer available , search in web archivesInfo: The link was automatically marked as defective. Please check the link according to the instructions and then remove this notice.@1@ 2Template: Dead Link / www.gen.cam.ac.uk  
  18. TL Whitworth, RD Dawson, H. Magalon, E. Baudry (2007): DNA barcoding cannot reliably identify species of the blowfly genus Protocalliphora (Diptera: Calliphoridae). In: Proceedings of the Royal Society London Series B 274: 1731-1739. doi : 10.1098 / rspb.2007.0062 .
  19. Hojun Song, Jennifer E. Buhay, Michael F. Whiting, Keith A. Crandall (2008): Many species in one: DNA barcoding overestimates the number of species when nuclear mitochondrial pseudogenes are coamplified. Proceedings of the National Academy of Science USA 105 (36): 13486-13491. doi : 10.1073 / pnas.0803076105
  20. D. Bensasson, DX Zhang, DL Hartl, GM Hewitt (2001): Mitochondrial pseudogenes: Evolution's misplaced witnesses. Trends in Ecology and Evolution 16: 314-321.
  21. Erik Richly & Dario Leister (2004): NUMTs in Sequenced Eukaryotic Genomes. Molecular Biology and Evolution 21 (6): 1081-1084. doi : 10.1093 / molbev / msh110 .
  22. Jennifer E. Buhay (2009): “COI-Like” sequences are becoming problematic in molecular systematic and DNA barcoding studies. Journal of Crustacean Biology 29 (1): 96-110. doi : 10.1651 / 08-3020.1
  23. An example: Antonis Rokas, George Melika, Yoshihisa Abe, Jose-Luis Nieves-Aldrey, James M. Cook, Graham N. Stone (2003): Lifecycle closure, lineage sorting, and hybridization revealed in a phylogenetic analysis of European oak gallwasps (Hymenoptera: Cynipidae: Cynipini) using mitochondrial sequence data. Molecular Phylogenetics and Evolution 26: 36-45.
  24. Nilsson RH, Ryberg M, Kristiansson E, Abarenkov K, Larsson KH, Kõljalg U (2006): Taxonomic Reliability of DNA Sequences in Public Sequence Databases: A Fungal Perspective. PLoS ONE 1 (1): e59. doi : 10.1371 / journal.pone.0000059
  25. Alexander Riedel, Katayo Sagata, Yayuk R Suhardjono, Rene Tänzler, Michael Balke (2013): Integrative taxonomy on the fast track - towards more sustainability in biodiversity research. Frontiers in Zoology 10:15 download

Web links