LL parser

from Wikipedia, the free encyclopedia

In the compiler one is LL parser is a top-down parser , the input of L executes inks to the right to an L inksableitung the input to calculate.

An LL parser is called an LL (k) parser if it can look ahead k tokens during parsing and, in contrast to the LF parser , uses the basement content. k is referred to as lookahead . This parser type is based on the LL (k) grammars .

Although the LL (k) grammars are relatively limited, LL (k) parsers are often used. The decision as to which rule is used to expand can only be made by analyzing the lookahead. A simple way to implement this parser technique is to use the recursive descent method .

functionality

The starting point is a grammar . The parser works with a set of states , whereby a state is composed as follows:

  • is the current content of a runtime cell that is used to store the current symbols. can contain both terminal and non-terminal symbols.
  • is the part of the input that has not yet been read.
  • is the output, a sequence of natural numbers that contains the numbers of the rules of the left derivative.

The nondeterministic automaton for the LL (k) -analysis is then:

  • (Initial state)
  • (Final state)

It is the start symbol of the underlying grammar and links analysis of the input .

The transitions are composed as follows:

  • (Shift or shift step)
  • (Expansion or derivation step), where the rule must be contained in the rule set and is the number of this rule.

LL (1) parser

This type of parser uses a one-character lookahead. Because of this limitation, a deterministic parser can easily be created.

The above-mentioned non-deterministic steps are determined by the lookahead.

Example implementation in Python

In an example, an LL (1) parser should map the following simple grammar:

   S → F
   S → ( S + F )
   F → n

The following Python implementation of the LL (1) parser for this grammar is applied to the input string ((n + n) + n) :

# Parse table
table = {'@S': {'n': 0, '(': 1},
         '@F': {'n': 2}}

rules = [['@F'],
         ['(', '@S', '+', '@F', ')'],
         ['n']]

def syntactic_analysis(string):
    print('Syntactic analysis of input string:', string)
    stack = ['\n', '@S']
    tokens = list(string) + ['\n']
    position = 0
    while len(stack) > 0:
        stackvalue = stack.pop()
        token = tokens[position]
        if not stackvalue.startswith('@'):
            if stackvalue == token:
                # print('pop', repr(stackvalue))
                position += 1
                if token == '\n':
                    print('input accepted')
                    break
            else:
                print('syntax error at input:', repr(token))
                break
        else:
            rule = table[stackvalue].get(token, -1)
            print('at pos', position, 'found rule', repr(stackvalue +
                    ' -> ' +  ' '.join(rules[rule])))
            for r in reversed(rules[rule]):
                stack.append(r)
        # print('stack:', repr(', '.join(reversed(stack))))

syntactic_analysis('((n+n)+n)')

If the syntax is correct, the output of the scripts directly results in the serialized syntax tree:

Syntactic analysis of input string: ((n+n)+n)
at pos 0 found rule '@S -> ( @S + @F )'
at pos 1 found rule '@S -> ( @S + @F )'
at pos 2 found rule '@S -> @F'
at pos 2 found rule '@F -> n'
at pos 4 found rule '@F -> n'
at pos 7 found rule '@F -> n'
input accepted

See also

Individual evidence

  1. ^ Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman: Compilers, Principles, Techniques, and Tools . ISBN 0-201-10088-6 , p. 191