LL (k) grammar
This article assumes previous knowledge of theoretical computer science and compiler construction .
A LL (k) grammar (in contrast to LF (k) grammar also weak LL (k) grammar ) is a special context-free grammar which the basis of a LL (k) parser forms.
A context-free grammar is called LL (k) grammar for a natural number k , if each derivation step is uniquely determined by the next k symbols of the input ( lookahead ). This means that the question of which non-terminal symbol is to be expanded next with which rule can be clearly determined with the aid of the next k symbols of the input.
In general, the larger k is chosen, the more powerful the language class becomes, whereby the expressiveness of context-free grammars is never reached. This means that there are context-free languages that are not generated by an LL (k) grammar for any k .
DPDA stands for the deterministic push-down machines . These can precisely recognize the deterministic context-free languages .
Formal definition of LL (k) grammar
A context-free grammar is LL (k) grammar if and only if for all left derivatives of the form
with and applies:
The following applies to the function used in the definition to determine the FIRST quantities:
application
Current LL parsers usually only use a lookahead of 1. Therefore, the following explanations can be used.
In practical application, it is only possible to check with great effort whether the grammar at hand fulfills the definition of an LL (k) grammar. Instead, a modified approach is used.
A context-free grammar is exactly then a LL (k) grammar if for all non-terminal symbols , for all productions and with and where: .
Explanation: The start symbol of the context-free grammar was expanded (possibly in several steps) . According to the left derivative, the nonterminal symbol will be replaced next. There are two different rules for doing this in context-free grammar; and . The question of which rule is used to expand is determined from the calculation of and . In order to be able to answer the question clearly, both sets must be disjoint.
In general, however, depends on the legal context (if ). The aim is to determine only from the productions, i.e. H. from and from the strings that can follow an occurrence of . For this purpose the function is defined which calculates the set of all following symbols.
With this, the condition requested at the beginning can be reformulated:
A reduced context-free grammar is an LL (1) grammar if and only if the following applies to all non-terminal symbols and to all productions and with :
Warning : this sentence can not be applied to cases .
The amount charged for a production is called the lookahead amount .
example
The following grammar is checked to determine whether it is an LL (1) grammar. To do this, the lookahead sets of all productions with the same left rule pages must be disjoint.
- and the amount of productions is:
First of all, the first and follow sets of the non-terminal symbols are determined, since these are necessary for the calculation of the lookahead sets.
E. | E ' | T | T ' | F. | |
The following is the comparison of the lookahead quantities for all productions with the same left rule pages.
First of all for the two productions and
Next up for the two productions and
Last for the two productions and
Since all intersections considered are empty, the grammar is an LL (1) grammar.
See also
literature
- Donald E. Knuth : Top-down syntax analysis . In: Acta Informatica 1, 1971, ISSN 0001-5903 , pp. 79–110, (reprint of an extended version in: Donald E. Knuth: Selected Papers on Computer Languages . Center for the Study of Language and Information, Stanford CA 2003, ISBN 1-575-86381-2 , ( CSLI lecture notes 139), chapter 14).
- LR (k) analysis for pragmatists by Andreas Kunert