LL (k) grammar

This article assumes previous knowledge of theoretical computer science and compiler construction .

A LL (k) grammar (in contrast to LF (k) grammar also weak LL (k) grammar ) is a special context-free grammar which the basis of a LL (k) parser forms.

A context-free grammar is called LL (k) grammar for a natural number k , if each derivation step is uniquely determined by the next k symbols of the input ( lookahead ). This means that the question of which non-terminal symbol is to be expanded next with which rule can be clearly determined with the aid of the next k symbols of the input.

In general, the larger k is chosen, the more powerful the language class becomes, whereby the expressiveness of context-free grammars is never reached. This means that there are context-free languages that are not generated by an LL (k) grammar for any k .

${\ displaystyle {\ mathcal {L}} (\ mathrm {LL} (1)) \ subsetneq {\ mathcal {L}} (\ mathrm {LL} (2)) \ subsetneq \ dots \ subsetneq {\ mathcal {L }} (\ mathrm {LL} (k)) \ subsetneq {\ mathcal {L}} (\ mathrm {LR} (1)) = {\ mathcal {L}} (\ mathrm {DPDA})}$

DPDA stands for the deterministic push-down machines . These can precisely recognize the deterministic context-free languages .

Formal definition of LL (k) grammar

A context-free grammar is LL (k) grammar if and only if for all left derivatives of the form ${\ displaystyle G = (N, \ Sigma, P, S)}$

{\ displaystyle S \ Rightarrow _ {l} ^ {*} wA \ gamma \ Rightarrow _ {l} \ left \ {{\ begin {array} {l} w \ alpha \ gamma \ Rightarrow _ {l} ^ {* } wx \\ w \ beta \ gamma \ Rightarrow _ {l} ^ {*} wy \ end {array}} \ right.}

with and applies: ${\ displaystyle \ quad (w, x, y \ in \ Sigma ^ {*}; \ alpha, \ beta, \ gamma \ in (N \ cup \ Sigma) ^ {*}; A \ in N)}$ ${\ displaystyle {\ mathit {first}} _ {k} (x) = {\ mathit {first}} _ {k} (y) ^ {\,}}$ ${\ displaystyle \ alpha = \ beta ^ {\,}}$

The following applies to the function used in the definition to determine the FIRST quantities:

${\ displaystyle a \ in \ Sigma ^ {*}; \| a \| \ leq k}$	${\ displaystyle {\ mathit {first}} _ {k} \ left (a \ right) = \ {a \}}$
${\ displaystyle a \ in \ Sigma ^ {*}; \| a \|> k}$	${\ displaystyle {\ mathit {first}} _ {k} (a) = \ {v \ in \ Sigma ^ {*} \ mid a = vw; \| v \| = k \}}$
${\ displaystyle A \ in (N \ cup \ Sigma) ^ {} \ backslash \ Sigma ^ {}}$	${\ displaystyle {\ mathit {first}} _ {k} (A) = \ {v \ in \ Sigma ^ {} \ mid A \ Rightarrow ^ {} w; w \ in \ Sigma ^ {*}; {\ mathit {first}} _ {k} (w) = \ {v \} \}}$

application

Current LL parsers usually only use a lookahead of 1. Therefore, the following explanations can be used. ${\ displaystyle k = 1}$

In practical application, it is only possible to check with great effort whether the grammar at hand fulfills the definition of an LL (k) grammar. Instead, a modified approach is used.

A context-free grammar is exactly then a LL (k) grammar if for all non-terminal symbols , for all productions and with and where: . ${\ displaystyle A}$ ${\ displaystyle A \ to \ beta}$ ${\ displaystyle A \ to \ gamma}$ ${\ displaystyle \ beta \ neq \ gamma}$ ${\ displaystyle S \ Rightarrow _ {l} ^ {*} wA \ alpha}$ ${\ displaystyle first_ {k} (\ beta \ alpha) \ cap first_ {k} (\ gamma \ alpha) = \ emptyset}$ ${\ displaystyle (w \ in \ Sigma ^ {*}; \ alpha, \ beta, \ gamma \ in (N \ cup \ Sigma) ^ {*}; A \ in N)}$

Explanation: The start symbol of the context-free grammar was expanded (possibly in several steps) . According to the left derivative, the nonterminal symbol will be replaced next. There are two different rules for doing this in context-free grammar; and . The question of which rule is used to expand is determined from the calculation of and . In order to be able to answer the question clearly, both sets must be disjoint. ${\ displaystyle S}$ ${\ displaystyle wA ^ {\,} \ alpha}$ ${\ displaystyle A}$ ${\ displaystyle A \ to \ beta}$ ${\ displaystyle A \ to \ gamma}$ ${\ displaystyle A}$ ${\ displaystyle first_ {k} \ left (\ beta \ alpha \ right)}$ ${\ displaystyle first_ {k} \ left (\ gamma \ alpha \ right)}$

In general, however, depends on the legal context (if ). The aim is to determine only from the productions, i.e. H. from and from the strings that can follow an occurrence of . For this purpose the function is defined which calculates the set of all following symbols. ${\ displaystyle first_ {k} \ left (\ beta \ alpha \ right)}$ ${\ displaystyle \ alpha}$ ${\ displaystyle \ beta \ Rightarrow ^ {*} \ epsilon}$ ${\ displaystyle first_ {k} \ left (\ beta \ alpha \ right)}$ ${\ displaystyle \ beta}$ ${\ displaystyle A}$ ${\ displaystyle follow_ {k} \ left (A \ right)}$ ${\ displaystyle A}$

${\ displaystyle \ forall \ beta \ in (N \ cup \ Sigma) ^ {*}: follow_ {k} (\ beta) = \ {w \ in \ Sigma ^ {*} \ mid \ exists \ alpha, \ gamma \ in (N \ cup \ Sigma) ^ {*} {\ mbox {with}} S \ Rightarrow _ {l} ^ {*} \ alpha \ beta \ gamma {\ mbox {and}} w \ in first_ {k } (\ gamma) \}}$

With this, the condition requested at the beginning can be reformulated:

A reduced context-free grammar is an LL (1) grammar if and only if the following applies to all non-terminal symbols and to all productions and with : ${\ displaystyle A}$ ${\ displaystyle A \ to \ beta}$ ${\ displaystyle A \ to \ gamma}$ ${\ displaystyle \ beta \ neq \ gamma}$ ${\ displaystyle first_ {1} (\ {\ beta \} follow_ {1} (A)) \ cap first_ {1} (\ {\ gamma \} follow_ {1} (A)) = \ emptyset.}$

Warning : this sentence can not be applied to cases . ${\ displaystyle k> 1}$

The amount charged for a production is called the lookahead amount . ${\ displaystyle A \ to \ beta}$ ${\ displaystyle la (A, \ beta) = first_ {1} \ left (\ {\ beta \} follow_ {1} (A) \ right)}$

example

The following grammar is checked to determine whether it is an LL (1) grammar. To do this, the lookahead sets of all productions with the same left rule pages must be disjoint. ${\ displaystyle G}$

{\ displaystyle G = \ left (\ {E, E ', T, T', F \}, \ {\ epsilon, a, (,), +, * \}, P, E \ right)}

and the amount of productions is:

{\ displaystyle E \ to TE '}

{\ displaystyle E '\ to + TE'}

{\ displaystyle E '\ to \ epsilon}

{\ displaystyle T \ to FT '}

{\ displaystyle T '\ to * FT'}

{\ displaystyle T '\ to \ epsilon}

{\ displaystyle F \ to (E)}

{\ displaystyle F \ to a}

First of all, the first and follow sets of the non-terminal symbols are determined, since these are necessary for the calculation of the lookahead sets.

	E.	E '	T	T '	F.
${\ displaystyle first_ {1}}$	${\ displaystyle \ left \ {(, a \ right \}}$	${\ displaystyle \ left \ {+, \ epsilon \ right \}}$	${\ displaystyle \ left \ {(, a \ right \}}$	${\ displaystyle \ left \ {*, \ epsilon \ right \}}$	${\ displaystyle \ left \ {(, a \ right \}}$
${\ displaystyle follow_ {1}}$	${\ displaystyle \ left \ {) \ right \}}$	${\ displaystyle \ left \ {\ epsilon,) \ right \}}$	${\ displaystyle \ left \ {+, \ epsilon,) \ right \}}$	${\ displaystyle \ left \ {+, \ epsilon,) \ right \}}$	${\ displaystyle \ left \ {*, +, \ epsilon,) \ right \}}$

The following is the comparison of the lookahead quantities for all productions with the same left rule pages.

First of all for the two productions and ${\ displaystyle E '\ to + TE'}$ ${\ displaystyle E '\ to \ epsilon}$

{\ displaystyle la (E ', + TE') \ cap la (E ', \ epsilon) = first_ {1} (\ {+ TE' \} follow_ {1} (E ')) \ cap first_ {1} (\ {\ epsilon \} follow_ {1} (E ')) = \ {+ \} \ cap \ {\ epsilon,) \} = \ emptyset}

Next up for the two productions and ${\ displaystyle T '\ to * FT'}$ ${\ displaystyle T '\ to \ epsilon}$

{\ displaystyle la (T ', * FT') \ cap la (T ', \ epsilon) = first_ {1} (\ {* FT' \} follow_ {1} (T ')) \ cap first_ {1} (\ {\ epsilon \} follow_ {1} (T ')) = \ {* \} \ cap \ {+, \ epsilon,) \} = \ emptyset}

Last for the two productions and ${\ displaystyle F \ to (E)}$ ${\ displaystyle F \ to a}$

{\ displaystyle la (F, (E)) \ cap la (F, a) = first_ {1} (\ {(E) \} follow_ {1} (F)) \ cap first_ {1} (\ {a \} follow_ {1} (F)) = \ {(\} \ cap \ {a \} = \ emptyset}

Since all intersections considered are empty, the grammar is an LL (1) grammar. ${\ displaystyle G}$

literature

Donald E. Knuth : Top-down syntax analysis . In: Acta Informatica 1, 1971, ISSN 0001-5903 , pp. 79–110, (reprint of an extended version in: Donald E. Knuth: Selected Papers on Computer Languages . Center for the Study of Language and Information, Stanford CA 2003, ISBN 1-575-86381-2 , ( CSLI lecture notes 139), chapter 14).
LR (k) analysis for pragmatists by Andreas Kunert

${\ displaystyle a \ in \ Sigma ^ {*}; \| a \| \ leq k}$	${\ displaystyle {\ mathit {first}} _ {k} \ left (a \ right) = \ {a \}}$
${\ displaystyle a \ in \ Sigma ^ {*}; \| a \|> k}$	${\ displaystyle {\ mathit {first}} _ {k} (a) = \ {v \ in \ Sigma ^ {*} \ mid a = vw; \| v \| = k \}}$
${\ displaystyle A \ in (N \ cup \ Sigma) ^ {} \ backslash \ Sigma ^ {}}$	${\ displaystyle {\ mathit {first}} _ {k} (A) = \ {v \ in \ Sigma ^ {} \ mid A \ Rightarrow ^ {} w; w \ in \ Sigma ^ {*}; {\ mathit {first}} _ {k} (w) = \ {v \} \}}$

LL (k) grammar

contents

Formal definition of LL (k) grammar

application

example

See also

literature