Unweighted Pair Group Method with Arithmetic mean
Unweighted Pair Group Method with Arithmetic mean just UPGMA ( German about: unweighted pair group method with arithmetic mean ) is a variant of hierarchical cluster analysis . It is often used in bioinformatics to reconstruct phylogenetic trees. In contrast to other methods such as the neighbor joining algorithm , UPGMA is based on the assumption of the molecular clock ; that is, all taxa evolve at the same constant rate.
Description of the method
There is a set of objects and a distance matrix which contains the paired distances of the objects, whereby the distance measure must have the properties of an ultrametric . We are looking for a binary tree whose leaves represent the objects and whose edges reflect the distances in the distance matrix as well as possible .
At the beginning each object is in its own cluster. In each step, the two clusters with the smallest distance are combined and the distance matrix is recalculated. The distance between two clusters is the mean of the paired distances of all objects in both clusters. Be the new cluster, consisting of the two clusters and was formed: .
The distance to a cluster is then calculated with WPGMA as follows:
If there are different numbers of objects in a cluster, these do not contribute equally to the distance calculation of the new cluster with WPGMA. The distances are weighted differently in the calculation (hence: weighted PGMA).
If the improved UPGMA is used, the new distances are calculated using:
This causes all distances equal, so unweighted ( unweighted be) included in the distance calculation.
The simple mean of the WPGMA gives a weighted result, while the proportional mean of the UPGMA gives an unweighted result.
literature
- RR Sokal and CD Michener .: A statistical method for evaluating systematic relationships. In: University of Kansas Science Bulletin , 38: 1409-1438, 1958.