Sankoff algorithm

from Wikipedia, the free encyclopedia
An RNA secondary structure, i.e. H. a fold of an RNA sequence.

The Sankoff algorithm is a dynamic programming algorithm that is used in genetics to simultaneously solve the three sub-problems of alignment , convolution and phylogeny . It folds and aligns two sequences at the same time , so that the free energy of the secondary structures and the costs of the editing operations of the alignment are minimized under an energy model . The algorithm uses the dynamic programming method for this purpose . The runtime of the algorithm is in O and the memory requirement in .

Case distinction

The recurrences of the algorithm basically implement the following case distinction:

1. A match of two bases

2. An insertion of a base

3. A deletion of a base

4. A match of two base pairs .

Let the two input sequences be , with and , then the simplified basic structure of the recurrences is:

Case 4 ensures that, when folded at the same time, both structures form the same number and nesting of hairpins.

complexity

Let the input be two sequences , with .

The term is in . For all subwords of , all subwords of and in each case distinction two limits, which vary in each case , must be considered.

The memory requirement is in , since all intermediate results for all partial word combinations are stored in a table.

variants

Since running time is problematic in practice, there are variants that do not consider all possible or all possible cases in the case differentiation , but only, for example, only those base pairs that have a certain base pair probability. This then reduces the running time to .

literature

  • David Sankoff: Simultaneous Solution of the RNA Folding, Alignment and Protosequence Problems . In: SIAM Journal on Applied Mathematics . tape 45 , no. 5 , October 1985, p. 68-82 .