Sankoff algorithm
The Sankoff algorithm is a dynamic programming algorithm that is used in genetics to simultaneously solve the three sub-problems of alignment , convolution and phylogeny . It folds and aligns two sequences at the same time , so that the free energy of the secondary structures and the costs of the editing operations of the alignment are minimized under an energy model . The algorithm uses the dynamic programming method for this purpose . The runtime of the algorithm is in O and the memory requirement in .
Case distinction
The recurrences of the algorithm basically implement the following case distinction:
1. A match of two bases
2. An insertion of a base
3. A deletion of a base
4. A match of two base pairs .
Let the two input sequences be , with and , then the simplified basic structure of the recurrences is:
Case 4 ensures that, when folded at the same time, both structures form the same number and nesting of hairpins.
complexity
Let the input be two sequences , with .
The term is in . For all subwords of , all subwords of and in each case distinction two limits, which vary in each case , must be considered.
The memory requirement is in , since all intermediate results for all partial word combinations are stored in a table.
variants
Since running time is problematic in practice, there are variants that do not consider all possible or all possible cases in the case differentiation , but only, for example, only those base pairs that have a certain base pair probability. This then reduces the running time to .
literature
- David Sankoff: Simultaneous Solution of the RNA Folding, Alignment and Protosequence Problems . In: SIAM Journal on Applied Mathematics . tape 45 , no. 5 , October 1985, p. 68-82 .