Viterbi algorithm

The Viterbi algorithm is an algorithm of dynamic programming to determine the most likely sequence of hidden states at a given Hidden Markov Model (HMM) and an observed sequence of symbols. This sequence of states is also known as the Viterbi path .

It was developed by Andrew J. Viterbi in 1967 for the decoding of convolutional codes ; it fell off as a by-product in the analysis of the error probability of convolutional codes. In 1972 GD Forney derived the optimum receiver for distorted and disturbed channels from this. The Viterbi algorithm is used today, for example, in cell phones or wireless LANs to correct errors in radio transmission , as well as in hard drives , since transmission errors also occur when recording on magnetic disks.

The algorithm is widely used in communications engineering and computer science : information theory , bioinformatics , speech recognition and computational linguistics often use the Viterbi algorithm.

Markov model

Given is an HMM with ${\ displaystyle \ lambda = (S; V; A; B; \ pi)}$

${\ displaystyle S}$ - set of hidden states
${\ displaystyle V}$ - Alphabet of observable symbols (emissions)
${\ displaystyle A}$ - state transition matrix
${\ displaystyle B}$ - observation matrix
${\ displaystyle \ pi}$ - initial probability distribution

Task

Let be the observed sequence of symbols. The most likely sequence of states is to be calculated. That is, the sequence of hidden states that maximizes the value of among all sequences of length , that is the probability that the model ran through the states when the output was generated . ${\ displaystyle {\ boldsymbol {o}} = o_ {1} o_ {2} \ ldots o_ {T} \ in V ^ {*}}$ ${\ displaystyle {\ boldsymbol {q}} ^ {*} = q_ {1} ^ {*} q_ {2} ^ {*} \ ldots q_ {T} ^ {*} \ in S ^ {T}}$ ${\ displaystyle {\ boldsymbol {q}}}$ ${\ displaystyle T}$ ${\ displaystyle P ({\ varvec {q}} | {\ varvec {o}}; \ lambda)}$ ${\ displaystyle \ lambda}$ ${\ displaystyle {\ boldsymbol {o}}}$ ${\ displaystyle {\ boldsymbol {q}}}$

According to the calculation rules for conditional probabilities :

{\ displaystyle P ({\ varvec {q}} | {\ varvec {o}}; \ lambda) = {\ frac {P ({\ varvec {o}}; {\ varvec {q}} | \ lambda) } {P ({\ boldsymbol {o}} | \ lambda)}}}

In addition, since it does not depend on, the following relationship arises: ${\ displaystyle P ({\ boldsymbol {o}} | \ lambda)}$ ${\ displaystyle {\ boldsymbol {q}}}$

{\ displaystyle P ({\ varvec {o}}; {\ varvec {q}} ^ {*} | \ lambda) = \ max _ {{\ varvec {q}} \ in S ^ {T}} P ( {\ varvec {o}}; {\ varvec {q}} | \ lambda)}

Two different types of variables - and - are used for the actual calculation : ${\ displaystyle \ vartheta _ {t} (i)}$ ${\ displaystyle \ psi _ {t} (i)}$

In the maximum is joint probability stored at the time in the observation of the prefix by a state sequence of length to be run and in condition to end: ${\ displaystyle \ vartheta _ {t} (i)}$ ${\ displaystyle 1 \ leq t \ leq T}$ ${\ displaystyle o_ {1} o_ {2} \ ldots o_ {t}}$ ${\ displaystyle t}$ ${\ displaystyle s_ {i} \ in S}$

{\ displaystyle \ vartheta _ {t} (i) = \ max _ {{\ boldsymbol {q}} \ in S ^ {t} \ atop q_ {t} = s_ {i}} P (o_ {1} o_ {2} \ ldots o_ {t}; q_ {1} q_ {2} \ ldots q_ {t} | \ lambda)}

The variable, on the other hand, notes for each point in time and each state which previous state was involved in generating the maximum. ${\ displaystyle \ psi _ {t} (i)}$

algorithm

The variables and can be determined recursively: ${\ displaystyle \ vartheta _ {t} (i)}$ ${\ displaystyle \ psi _ {t} (i)}$

initialization: ${\ displaystyle \ vartheta _ {1} (i) = \ pi _ {i} \ cdot b_ {i} (o_ {1}), \ \ psi _ {1} (i) = 0, \ qquad 1 \ leq i \ leq \ left | S \ right |}$

Recursion

For calculate ${\ displaystyle \ 1 <t \ leq T}$

{\ displaystyle {\ begin {aligned} \ vartheta _ {t} (i) & = b_ {i} (o_ {t}) \ \ cdot \ \ max _ {1 \ leq j \ leq \ left | S \ right |} (a_ {ji} \ \ cdot \ \ vartheta _ {t-1} (j)), \ qquad 1 \ leq i \ leq \ left | S \ right | \\\ psi _ {t} (i) & = {\ underset {1 \ leq j \ leq \ left | S \ right |} {\ operatorname {argmax}}} \ (a_ {ji} \ \ cdot \ \ vartheta _ {t-1} (j)) , \ qquad 1 \ leq i \ leq \ left | S \ right | \ end {aligned}}}

Termination: ${\ displaystyle {\ begin {aligned} P ({\ varvec {o}}; {\ varvec {q}} ^ {*} | \ lambda) & = \ max _ {1 \ leq j \ leq \ left | S \ right |} \ vartheta _ {T} (j) \\ q_ {T} ^ {*} & = {\ underset {1 \ leq j \ leq \ left | S \ right |} {\ operatorname {argmax}} } \ \ vartheta _ {T} (j) \ end {aligned}}}$

Path determination: ${\ displaystyle q_ {t} ^ {*} = \ psi _ {t + 1} (q_ {t + 1} ^ {*}), \ qquad 1 \ leq t <T}$

complexity

The table of the required memory, the matrix of the is of the same size. Alternatives are used to optimize each cell of the two matrices , so the running time is in . ${\ displaystyle \ vartheta _ {t} (i)}$ ${\ displaystyle O (\ left | S \ right | \ cdot T)}$ ${\ displaystyle \ psi _ {t} (i)}$ ${\ displaystyle \ left | S \ right |}$ ${\ displaystyle O (\ left | S \ right | ^ {2} T)}$

In order to halve the storage space, the path can alternatively also be determined after the termination by backtracking in the matrix of all - i.e. without the additional variables . However, since the calculation does not cause any additional effort in practice , the required computing time is slightly longer with the backtracking approach. ${\ displaystyle {\ boldsymbol {q}} ^ {*}}$ ${\ displaystyle \ vartheta _ {t} (i)}$ ${\ displaystyle \ psi _ {t} (i)}$ ${\ displaystyle \ psi _ {t} (i)}$

Applications

The Viterbi algorithm is the optimal algorithm for decoding convolutional codes in terms of the block error rate (maximum likelihood sequence estimation). The decoding algorithm that is optimal in terms of the symbol error rate is the BCJR algorithm .

As you can see from the description of the algorithm, it can be used almost anywhere to recognize patterns . This is a broad field, as living beings have to constantly interpret sensory stimuli and classify these signals from what has already been learned. The Viterbi algorithm does exactly that and is therefore an important component of artificial intelligence .

The algorithm plays an important role in bioinformatics, because the Viterbi algorithm can be used, among other things, to infer possible hidden states from the actual sequence of a DNA segment. For example, it can be examined whether a given sequence is likely to be a specific structural motif ( CpG island , promoter , ...) or not. The advantage of this recursive algorithm is the effort that increases linearly with the sequence length in contrast to the exponential effort of the underlying hidden Markov model.

literature

A. Viterbi: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm . In: IEEE Transactions on Information Theory . tape 13 , no. 2 , 1967, ISSN 0018-9448 , pp. 260-269 , doi : 10.1109 / TIT.1967.1054010 ( IEEE Xplore ).
Durbin et al .: Biological sequence analysis . Cambridge, 2006, ISBN 0-521-62971-3 , pp. 56 .
Forney Jr., GD: The Viterbi Algorithm. In: In Proceedings of the IEEE . tape 61 , no. 3 , 1973, p. 268-278 , doi : 10.1109 / PROC.1973.9030 ( IEEE Xplore ).

Web links

Declaration of the Viterbi algorithm for convolutional codes (English)
Andrew J. Viterbi: Viterbi algorithm . In: Scholarpedia . (English, including references)
EG Schukat-Talamazzini: Special pattern analysis systems (PDF, 1.3 MB) Lecture in WS 2012/13 at the University of Jena. Chapter 5 Slide 39 ff.