Lookahead

from Wikipedia, the free encyclopedia

Lookahead is the anticipation of inputs during the automatic processing of texts in compiler construction .

The number of tokens that a parser looks ahead is a measure of the effort that has to be made to clearly differentiate between grammatical constructions of the input. Using this number k , parsers and grammars can be formally classified.

The lookahead also refers to the number of characters that a tokenizer (lexical scanner) looks ahead (the value 1 is sufficient for most programming languages).

The lookahead plays a role in top-down parsing ( LL parser ) as well as in bottom-up parsing . The latter case is examined below. Shift-Reduce parsers, such as LR (0), SLR, LR (1) parsers, can get into two different conflicts:

Shift-reduce
Here the compiler doesn't know whether to shift or reduce.
Reduce-reduce
Here the compiler doesn't know which rule to use to reduce.

The lookahead can help avoid this. If a language can be parsed conflict-free with a lookahead from an LR ( k ) parser using a grammar , then it is an LR (1) grammar.

Shift Reduce Example

There is a well-known shift-reduce problem, the so-called dangling-else problem .

The rule is given in Backus-Naur form :

<statement> ::= IF <expression> THEN <statement>
              | IF <expression> THEN <statement> ELSE <statement>

Here the compiler doesn't know whether to reduce or to ELSEcontinue with if an alternative should follow. This problem usually occurs when the parser is automatically generated with the parser generator Yacc or GNU Bison , but these automatically correct it. It doesn't matter whether there is a lookahead or not.

One possible solution to this problem is as follows:

<statement> ::= <matched>
              | <unmatched>

<matched>   ::= IF <expression> THEN <matched> ELSE <matched>
              | other statements

<unmatched> ::= IF <expression> THEN <statement>
              | IF <expression> THEN <matched> ELSE <unmatched>

But this is extremely difficult to read.

Another simple possibility in GNU Bison (Yacc) would be:

%token KW_ELSE
%token KW_IF

%nonassoc LOWER_THAN_ELSE
%nonassoc KW_ELSE

ifstatement: KW_IF '(' assignment ')' block2 %prec LOWER_THAN_ELSE
           | KW_IF '(' assignment ')' block2 KW_ELSE block2
           ;

%prec LOWER_THAN_ELSEassigns the precedence of LOWER_THAN_ELSEto the rule in which it is written . This stands above the token definition of KW_ELSE, so gives the first rule ifstatementa lower precedence in the Reduce decision than the second.

Web links