A phylogenetic tree is a tree that represents the evolutionary relationships between different species or other entities that are believed to have a common ancestor. A phylogenetic tree is thus a form of the cladogram . In a phylogenetic tree, each node with an ancestor represents the "closest common relative" of those ancestors. The edge length usually corresponds to the estimated time in which the species separated or the number of mutations during this development. Each node in a phylogenetic tree is referred to as a "taxonomic unit", with internal nodes often referred to as "hypothetical taxonomic units" when the species or units in question cannot be observed.
Data sources and interpretation
Today, phylogenetic trees are mostly built on the basis of sequenced genes of the species examined. A sequence alignment of the same gene (or possibly the same genes) of these species is calculated and the similarities and differences that appear in the alignment are used to build the tree. Species whose sequences are similar are likely to be closer together in the tree than those with very different sequences. However, since the complexity of the calculation of such trees increases exponentially with the number of sequences, heuristics are used to generate the trees. The standard methods of molecular phylogenetic tree construction methods include maximum likelihood , neighbor joining , maximum parsimony and Bayesian analysis methods.
The aim of creating phylogenetic trees is to reconstruct and describe evolution in as much detail as possible. However, we now know that the genes did not develop evenly. For example, some genes that are found in humans today only have common ancestors with the chimpanzee , others are found in all mammals , etc.
This is why the phylogenetic analysis of different genes of the same species can result in different phylogenetic trees which, however, are all correct in themselves. In order to determine the points of origin and ramifications in the evolution of the individual species, different gene regions must therefore be examined. Furthermore, results from classical phylogeny and morphological features should be used for interpretation.
To get these problems under control, a large number of genes have recently been examined simultaneously; Irregularities in the speed of development are thus compensated for.
The lineage relationships can be optimally inferred if the entire genome of all considered species is known and its genes are determined. After assigning all mutually orthologous genes, those genes remain in which the species differ. Phylogenetic trees based on such considerations of orthology are considered to be the most reliable and are available for all sequenced species, especially the genealogical trees of bacteria and archaea already created in this way offer a detailed overview of their parentage relationships.
Important basic work on the construction of phylogenetic trees was carried out by Walter M. Fitch at the end of the 1960s .
Rooted and unrooted trees
A rooted phylogenetic tree is a directed tree with a specific single edge that is believed to be the location of the closest common ancestor of all units in the tree. (This ancestor itself would correspond to an additional knot.)
An unrooted tree, on the other hand, does not have an excellent closest common ancestor, but is only intended to represent the closeness or distance of the individual species. As a rule, unrooted trees are only introduced as a processing step. Since the actual evolution was timed, an unrooted tree can only provide an imperfect model of reality.
Various methods can be used to root a computationally determined tree. By far the most significant is the use of outgroups . An outgroup is a taxon analyzed with that is closely related to the investigated group, but is known and clearly outside of the kinship circle. The sequence of the out-group that was also analyzed almost always allows for rooting. The method fails if the sequence of the outgroup deviates significantly from the sequences analyzed, e.g. B. because it is too distantly related. In this case, the comparison with its sequence is more or less the same as that with a random one. Other possible rooting processes that are based on assumed rates of change (molecular clock) or the assumption of irreversible change patterns are therefore particularly important for very basal branchings for which no closely related outgroup is available.
More information about rooted and unrooted trees can be found in the article Tree (graph theory) .
Differences between gene and species trees
Different forms of gene / species development: Usually the first case is assumed: Speciation goes hand in hand with the splitting of gene development. A reconstruction of the species tree is made more difficult by the three other cases:
- the gene is only taken over by one of the new species ( gene loss )
- the gene is duplicated , which leads to ambiguous possibilities for comparison in a subsequent speciation (not shown)
- horizontal gene transfer takes place between two different species, which are thereby mistakenly moved together during tree reconstruction
Common methods of bioinformatics for phylogenetic sequence analysis are parsimony , in which the smallest number of "explanations", in this case sequence matches, are intended to clarify the relationships of parentage. In neighbor joining , all sequences are compared with all of them in an alignment, the most similar to each other are perceived as related and treated as a common species in the next round of joining until a complete tree is created. The maximum likelihood model, which is based on statistical assumptions about the evolution of sequences, is currently used most frequently .
If many genomes are known and the individual genes are characterized in them, as is the case in particular with bacteria, orthologous genes can be marked between individuals. Anything that is not orthologous is the result of an insertion or deletion of a gene, depending on which chronological order is assumed. This means that only such events have to be analyzed in order to determine the parentage relationships.
- By definition, phylogenetic trees can not represent hybrid formation and lateral gene transfer , which are also important methods of gene transfer. Therefore, some researchers take the view that one should not build a tree but rather a phylogenetic network (which in the sense of graph theory differs from a tree in that it allows "cross-connections" between otherwise not directly related species).
- Trees that do not contain extinct species must be interpreted with caution (see also the comment above on interpretation).
History of the tree metaphor
The name tree is derived from earlier ideas of life as progress from “lower” to “higher”, more complex forms, whereby the level of development ascribed in each case was indicated by the level of placement in the “evolutionary family tree”. A real tree was often used as a template for the graphic representation of such a family tree.
Richard Dawkins proposed a change from the family tree metaphor to that of a “stem flow” : One could interpret the cladogram as a branching flow system, a. have the advantage that this metaphor does not suggest any higher development.
- Walter M. Fitch , Emanuel Margoliash: Construction of phylogenetic trees. In: Science . Volume 155, No. 3760, 1967, pp. 279–284, doi: 10.1126 / science.155.3760.279 , full text (PDF; 356 kB)
- Volker Knoop , Kai Müller: Genes and Family Trees. A handbook on molecular phylogenetics. Spectrum Academic Publishing House, Heidelberg 2006, ISBN 3-8274-1642-6 .
- Arndt von Haeseler , Dorit Liebers: Molecular Evolution. Fischer, Frankfurt am Main 2003, ISBN 3-596-15365-4 .
- Bernhard Wiesemüller, Hartmut Rothe , Winfried Henke : Phylogenetic systematics. Springer, Berlin 2003, ISBN 3-540-43643-X .
- OneZoom: Tree of Life - family tree of all recent living beings (intuitive and zoomable fractal explorer in responsive web design)
- Online version of a phyletic tree produced as part of a dedicated issue of Science magazine in 2003
- Phyletic tree of nearly all over 4,500 recent mammal species , In: Nature . Volume 446, March 29, 2007. PMID 17392779
- "Open Tree of Life" (2.3 million species)
- OMAbrowser: Orthology Prediction - Algorithm ( Memento from May 3, 2015 in the Internet Archive )
- Status of the OMA Orthologs Project (Contains current trees)
- John P. Huelsenbeck, Jonathan P. Bollback, Amy M. Levine: Inferring the Root of a Phylogenetic Tree. In: Systematic Biology. 51 (1), 2002, pp. 32-43. doi: 10.1080 / 106351502753475862
- G. Fang, N. Bhardwaj, R. Robilotto, MB Gerstein: Getting Started in Gene Orthology and Functional Analysis . In: PLoS Comput Biol . tape 6 , no. 3 , 2010, p. e100073 , doi : 10.1371 / journal.pcbi.1000703 ( ploscompbiol.org ).