Consensus sequence

from Wikipedia, the free encyclopedia

The consensus sequence is that sequence of nucleotides or amino acids which in total deviates the least from a given set of corresponding pattern sequences. The exact nature of this sequence can vary depending on the choice of the distance , such as Hamming or Levenshtein distance .

The creation of a consensus sequence is usually based on the assumption that the given sequences have a common evolutionary origin or represent a sequence motif with a specific biological task, whereby it can often also be useful to formulate ambiguous consensus sequences.

In the case of nucleic acids, the base symbols of the nucleic acid nomenclature can be used for this, i.e. in addition to the unique base symbols A, C, G, T, U , for example, R for any purine base , Y for any pyrimidine base or N for any nucleotide par excellence.

As a rule, consensus sequences are created heuristically from a multiple sequence alignment (MSA). In the simplest case, that element is included in the consensus sequence which occurs most frequently in the corresponding column of the MSA.

literature