Cocke-Younger-Kasami algorithm

from Wikipedia, the free encyclopedia

The Cocke-Younger-Kasami algorithm ( CYK algorithm ) is an algorithm from the field of theoretical computer science . It can be used to determine whether a word belongs to a certain context-free language . In technical language, this is known as solving the word problem for context-free languages. With the help of backtracking the parse tree or the parse trees of a given word of the language can be constructed. In order to use the algorithm, a grammar in Chomsky normal form must be available for the given language . The algorithm developed independently in the 1960s by Itiroo Sakai , John Cocke , Tadao Kasami , Jacob Schwartz and Daniel Younger uses the principle of dynamic programming .

description

This description follows Hopcroft / Ullman (1996).

As input, the algorithm receives a context-free grammar in Chomsky normal form and the word to be checked . Now, for each subword ( i.e .: starts at the index and has the length ) the set of nonterminals that generate, denoted by .

According to the principle of dynamic programming, those for the smallest partial words of are first calculated, stored and then used for the efficient calculation of the next larger partial words. The smallest partial words are single letters. Since the context-free grammar is given in Chomsky normal form, each letter can only be derived from a nonterminal in exactly one step.

A nonterminal of a grammar in Chomsky normal form cannot be derived to several terminals in one step. Therefore, a partial word that contains more than one character can be generated by just one rule . Since nonterminals cannot generate the empty word (ε) , must generate the left and right parts of . It follows:

In other words: can generate, if it can be derived according to the production rules on and and in turn on and derived, that is

.

The word problem can now easily be decided: can be generated by the grammar if and only if applies. In are all the variables that can generate the partial word that starts with the first letter and has the length , i.e. the whole word.

algorithm

The following algorithm results from the description:

Für i = 1 ... n
  Für jede Produktion 
    Falls r = 
      Setze 
Für j = 2 ... n
  Für i = 1 ... n - j + 1
    Für k = 1 ... j - 1
      Setze 
Falls , stoppe und gib "w wird von G erzeugt" aus
Stoppe und gib "w wird nicht von G erzeugt" aus

example

The question is whether the word can be generated through grammar . The productions of grammar are:

The algorithm can be carried out using a table. The -th column and -th row store the number of non-terminal symbols from which the partial word can be derived, which starts with the index and has the length .

1 2 3 4th 5 6th
1 {A} {B} {C} {A} {B} {D}
2 {S} {} {} {S} {}
3 {} {} {} {U}
4th {} {} {S}
5 {} {}
6th {S}

There , the given word can be derived from under the grammar .

A left derivation of the word would therefore be:

=

So is a word of language .

complexity

The algorithm decides in time whether a word of length is in the language (see Landau symbols for describing the notation). This requires storage space in the order of magnitude .

literature

  • Itiroo Sakai: Syntax in universal translation . In: 1961 International Conference on Machine Translation of Languages ​​and Applied Language Analysis . Her Majesty's Stationery Office, London 1962, p. 593–608 ( mt-archive.info [PDF]).
  • Tadao Kasami: An Efficient Recognition and Syntax-Analysis Algorithm for Context-Free Languages . Air Force Cambridge Research Lab, Bedford June 11, 1965 ( Scientific report AFCRL-65-758 ).
  • Daniel H. Younger: Recognition and parsing of context-free languages ​​in time . In: Information and Control . tape 10 , no. 2 , 1967, p. 189-208 .
  • John Cocke , Jacob T. Schwartz : Programming languages ​​and their compilers. Preliminary notes . Courant Institute of Mathematical Sciences of New York University, New York 1970.
  • John E. Hopcroft , Jeffrey D. Ullman : Introduction to Automata Theory, Formal Languages, and Complexity Theory . 3. Edition. Addison-Wesley, Bonn 1996, p. 148-149 (1st reprint).
  • Dick Grune, Ceriel JH Jacobs: Parsing Techniques. A Practical Guide . 1st edition. Ellis Horwood, New York 1990, ISBN 0-13-651431-6 , pp. 81-104 ( [1] [PDF; 1.9 MB ]).

Web links