Markov's algorithm

from Wikipedia, the free encyclopedia

The Markov algorithm developed by the Russian mathematician Andrei Markow is an important approach to formalizing the concept of algorithm. In particular, symbolic data processing tasks, such as the conjugation and declination of natural languages, can be solved very efficiently with its help.

definition

Informal description

The Markov algorithm regards the input data of an algorithm as words or sentences from which a result can be determined through translation. The solution principle is based exclusively on the substitution of character strings. Further operations are not available. Similar to the Turing machine , a symbol chain is used as the basic data structure . Although productive systems usually do a nondeterministic processing of such symbol chains, a deterministic behavior can be achieved through special restrictions:

  • If several rules can be used, the order of application must always be clearly defined.
  • If a rule can be applied at several positions in the output word, a priority must always be defined.

The Markov algorithm fulfills the requirements for such a deterministic word calculus. With the help of computability theory one can prove that Markov algorithms are just as powerful as any other algorithms, Turing machines or µ-recursive functions .

Formal definition

Markov's algorithm and natural algorithm are semi-Thue systems whose rules form an ordered set, which in turn breaks down into the following disjoint subsets:

  • terminating rules
  • non-terminating rules

In a Markov algorithm, the word Q can be derived directly from the word P by a rule R under the following conditions:

  • P was created by a non-terminating rule
  • R is the first rule that applies to P.
  • Q is generated by applying R to the leftmost subword of R in P.

The work of the Markov algorithm stops at the word that was generated by a terminating rule or to which no further rule can be applied. In contrast to the post-calculus , only the appropriate parts of the word are used. The substitution of a word pair (P, Q) forms the basis of the Markov algorithm:

  • A given output word is searched for the first occurrence of the word P
  • If P can be found, it is replaced by the word Q

The following special cases of substitution exist:

  • ε ⇒ Q
    The empty word is replaced by a word Q.
  • P ⇒ ε
    A word P is replaced by the empty word.
  • ε ⇒ ε
    The empty word is replaced by itself.

The words to be processed are formed from an alphabet A. The left and right parts of the rules of a Markov algorithm represent words of the alphabet A. The following metacharacters must not be included in the alphabet:

  •   is used as a substitution operator
  • .   indicates terminating rules

Working method

flow chart

Markov algorithm as a flow diagram

A search takes place on the input word to be processed using the left word of the first rule. If this is contained in the input word, a substitution corresponding to the rule is triggered. The input word is searched from left to right. Thus, if the left word occurs multiple times, the leftmost occurrence is always substituted.

If the search described above is unsuccessful, the next rule is applied. If no substitution can be made taking into account all other rules, the algorithm is ended. The application of a terminating rule also leads to its termination. If a non-terminating rule has been used to substitute, the entire process begins again, taking the changed word into account.

Simple case study

In addition to the explanations of the flow chart, a simple case study to explain the working method; In particular, the order in which the rules are applied and the resulting results are clearly illustrated below.

The input word used in the example is:

   A_I_I_I_

In addition, the following rules are defined:

1  01 I->A
2  02 _->B
3  03 AB->_B
4  04 BBBBBBBB->.I_I_I_I_

The following substitutions result (the number of the rule applied has been put in front):

   1.   A_I_I_I_
   1.   A_A_I_I_
   1.   A_A_A_I_
   1.   A_A_A_A_
   2.   ABA_A_A_
   2.   ABABA_A_
   2.   ABABABA_
   2.   ABABABAB
   3.   _BABABAB
   2.   BBABABAB
   3.   BB_BABAB
   2.   BBBBABAB
   3.   BBBB_BAB
   2.   BBBBBBAB
   3.   BBBBBB_B
   2.   BBBBBBBB
   4.   I_I_I_I_

Here the calculation terminates because of the point ( .) in the definition of rule 4.

Application examples

Incrementation and addition

The representation of numbers in the decimal system is not optimal for solving the problem. However, if a simple unary code is used, the algorithm for incrementation or addition consists of only a single rule.

Presentation:

  • the numbers are coded in the form of   1 = I, 2 = II, 3 = III   etc.
  • the addition   1 + 0 + 2 + 4   is I++II+IIII   coded as, for example  

The result is the following solution:

  • ε ⇒ . I
    incrementation
  • + ⇒ ε
    addition

Recognition of correct parentheses

The key to solving this problem is finding and deleting related pairs of brackets. Crossed brackets disappear and their place is taken up by the adjacent characters. Now the brackets of the following pairs are directly adjacent and can easily be found again. In our example it is assumed that the expression in brackets is delimited on both sides by the '$' character.

The result is the following solution:

  • () ⇒ ε
    delete a pair of brackets
  • $$ ⇒ $ .1 $
    All pairs deleted, result is 1
  • (⇒ 0
    ) ⇒ 0
    delete the remaining brackets
  • 00 ⇒ 0
    Deletion of all superfluous zeros

The form shown for solving the task is very simple and understandable. The Markov algorithm offers a solution principle that is well adapted to the problem.

Web links