Chomsky normal form

from Wikipedia, the free encyclopedia

The Chomsky normal form (abbr .: CNF ) is a normal form for context-free grammars in theoretical computer science . It is named after the linguist Noam Chomsky and is used in the CYK algorithm . A context-free grammar in Chomsky normal form has a simple structure of the production rules and also fulfills the properties of context-sensitive grammars .

Every context-free language has a grammar in Chomsky normal form. Therefore, a Chomsky normal form can be constructed from any context-free grammar that produces the same language. The grammar is then also called a Chomsky normal form of context-free grammar .

Another normal form for context-free grammars is the Greibach normal form . An extension of the Chomsky normal form of context-sensitive grammars , the Kuroda normal form . The Chomsky normal form is easy due to the same shortcut with the conjunctiva Normal Form (Engl. Conjunctive normal form) confused.

definition

A formal grammar is in Chomsky normal form when each production from has one of the following forms:

wherein , and non-terminal symbols of are and a terminal symbol of is. is the start symbol and the empty word . If the production is part of the grammar, then it can not be on the right side of a production.

If you allow any number of non-terminal symbols instead of two on the right-hand side during the first production, then one speaks of a weak Chomsky normal form .

Construction of a Chomsky normal form

If a context-free grammar is available, a grammar in Chomsky normal form can be generated step by step , which generates the same language:

except treat
If the grammar contains the rule , a new start symbol for is introduced. The new grammar then receives the rules and . This ensures that the grammar still allows the empty word and the original start symbol can still be used on the right.
Generate a weak Chomsky normal form
A non-terminal symbol is assigned to each terminal symbol . On the right side of each production, all terminal symbols are replaced by the corresponding non-terminal symbol . Finally, all productions are added to the grammar.
Replace right sides with more than two non-terminals
If there are more than two non-terminals on the right side of a production, two adjacent non-terminals are replaced by a new non-terminal . The production is added to the grammar. This is repeated until there is no more production with more than two non-terminals.
- Remove productions
Cross out the rules , except (if any).
If there was previously exactly one production on the left-hand side, delete that everywhere on the right-hand side of the productions, because it cannot be derived to a terminal.
If there were previously several productions on the left, add a rule for each rule that contains one on the right in which it was deleted, because the case in which it is derived as an empty word must be considered was or maybe not. The rule is then supplemented with the rule , for example .
From therefore is:
Remove chain rules (productions of form A → B)
If you have a chain rule, i. H. a production of the form , removed, a new production is added for each existing production of the form , if this does not result in a chain rule that has already been removed. is any word here; However, the previous changes ensure that there is either exactly one terminal symbol or a word made up of exactly two non-terminal symbols.

example

It applies, the grammar above the alphabet with the rules

to bring into Chomsky normal form.  

1. Add a new start variable

  2. Remove transitions

  A new rule has been created, which in turn must be treated in the same way:

  3. Remove all unity rules. These are and .

  after that

  and finally

  4. Longer concatenations are not allowed, so we introduce an additional variable and replace it with the rule and :

  Now only the rule and . Therefore, another variable is used which, together with the rule , can replace the terminal symbol in the rules mentioned.

  Thus the grammar is converted to Chomsky normal form.

swell

  • Grzegorz Rozenberg, Arto Salomaa: Handbook of Formal Languages . Volume 1: Word, Language, Grammar. Springer-Verlag, Berlin et al. 1997, ISBN 3-540-60420-0 , pp. 124-125