Markov's algorithm
The Markov algorithm developed by the Russian mathematician Andrei Markow is an important approach to formalizing the concept of algorithm. In particular, symbolic data processing tasks, such as the conjugation and declination of natural languages, can be solved very efficiently with its help.
definition
Informal description
The Markov algorithm regards the input data of an algorithm as words or sentences from which a result can be determined through translation. The solution principle is based exclusively on the substitution of character strings. Further operations are not available. Similar to the Turing machine , a symbol chain is used as the basic data structure . Although productive systems usually do a nondeterministic processing of such symbol chains, a deterministic behavior can be achieved through special restrictions:
- If several rules can be used, the order of application must always be clearly defined.
- If a rule can be applied at several positions in the output word, a priority must always be defined.
The Markov algorithm fulfills the requirements for such a deterministic word calculus. With the help of computability theory one can prove that Markov algorithms are just as powerful as any other algorithms, Turing machines or µ-recursive functions .
Formal definition
Markov's algorithm and natural algorithm are semi-Thue systems whose rules form an ordered set, which in turn breaks down into the following disjoint subsets:
- terminating rules
- non-terminating rules
In a Markov algorithm, the word Q can be derived directly from the word P by a rule R under the following conditions:
- P was created by a non-terminating rule
- R is the first rule that applies to P.
- Q is generated by applying R to the leftmost subword of R in P.
The work of the Markov algorithm stops at the word that was generated by a terminating rule or to which no further rule can be applied. In contrast to the post-calculus , only the appropriate parts of the word are used. The substitution of a word pair (P, Q) forms the basis of the Markov algorithm:
- A given output word is searched for the first occurrence of the word P
- If P can be found, it is replaced by the word Q
The following special cases of substitution exist:
- ε ⇒ Q
The empty word is replaced by a word Q. - P ⇒ ε
A word P is replaced by the empty word. - ε ⇒ ε
The empty word is replaced by itself.
The words to be processed are formed from an alphabet A. The left and right parts of the rules of a Markov algorithm represent words of the alphabet A. The following metacharacters must not be included in the alphabet:
- ⇒ is used as a substitution operator
- . indicates terminating rules
Working method
flow chart
A search takes place on the input word to be processed using the left word of the first rule. If this is contained in the input word, a substitution corresponding to the rule is triggered. The input word is searched from left to right. Thus, if the left word occurs multiple times, the leftmost occurrence is always substituted.
If the search described above is unsuccessful, the next rule is applied. If no substitution can be made taking into account all other rules, the algorithm is ended. The application of a terminating rule also leads to its termination. If a non-terminating rule has been used to substitute, the entire process begins again, taking the changed word into account.
Simple case study
In addition to the explanations of the flow chart, a simple case study to explain the working method; In particular, the order in which the rules are applied and the resulting results are clearly illustrated below.
The input word used in the example is:
A_I_I_I_
In addition, the following rules are defined:
1 01 I->A
2 02 _->B
3 03 AB->_B
4 04 BBBBBBBB->.I_I_I_I_
The following substitutions result (the number of the rule applied has been put in front):
1. A_I_I_I_
1. A_A_I_I_
1. A_A_A_I_
1. A_A_A_A_
2. ABA_A_A_
2. ABABA_A_
2. ABABABA_
2. ABABABAB
3. _BABABAB
2. BBABABAB
3. BB_BABAB
2. BBBBABAB
3. BBBB_BAB
2. BBBBBBAB
3. BBBBBB_B
2. BBBBBBBB
4. I_I_I_I_
Here the calculation terminates because of the point ( .
) in the definition of rule 4.
Application examples
Incrementation and addition
The representation of numbers in the decimal system is not optimal for solving the problem. However, if a simple unary code is used, the algorithm for incrementation or addition consists of only a single rule.
Presentation:
- the numbers are coded in the form of
1 = I, 2 = II, 3 = III
etc. - the addition
1 + 0 + 2 + 4
isI++II+IIII
coded as, for example
The result is the following solution:
- ε ⇒ . I
incrementation - + ⇒ ε
addition
Recognition of correct parentheses
The key to solving this problem is finding and deleting related pairs of brackets. Crossed brackets disappear and their place is taken up by the adjacent characters. Now the brackets of the following pairs are directly adjacent and can easily be found again. In our example it is assumed that the expression in brackets is delimited on both sides by the '$' character.
The result is the following solution:
- () ⇒ ε
delete a pair of brackets - $$ ⇒ $ .1 $
All pairs deleted, result is 1 - (⇒ 0
) ⇒ 0
delete the remaining brackets - 00 ⇒ 0
Deletion of all superfluous zeros
The form shown for solving the task is very simple and understandable. The Markov algorithm offers a solution principle that is well adapted to the problem.
Web links
- Markov algorithm interpreter (English)
- Machines and environments (PDF; 1.1 MB)
- Markov algorithm interpreter at Rosetta Code