Extended Backus-Naur form

from Wikipedia, the free encyclopedia

The Extended Backus-Naur-Form , EBNF for short , is an extension of the Backus-Naur-Form (BNF), which was originally introduced by Niklaus Wirth to represent the syntax of the Pascal programming language . It is a formal metasyntax ( metalanguage ) that is used to represent context-free grammars .

The EBNF is standardized by ISO as ISO / IEC 14977: 1996 (E) . The examples in this article are based on the ISO standard. Occasionally, other extended variants of the BNF are also referred to as EBNF.

Basics

A text, such as the source text of a computer program , initially consists of terminal symbols, i.e. of visible characters such as letters, numbers, punctuation marks, spaces, etc.

The EBNF defines production rules in which symbol sequences are assigned to a non-terminal symbol, for example

 ZifferAusserNull   = "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
 Ziffer             = "0" | ZifferAusserNull ;

In this production rule, the non-terminal symbol number is defined, which is always on the left. The vertical line represents an (exclusive) alternative , the terminal symbols are enclosed in quotation marks and ended with a semicolon as the end character. A digit is therefore a 0 or a digit other than zero , which in turn can be a natural number between 1 and 9 .

A production rule can also contain a sequence of terminal or non-terminal symbols, where the components are connected by commas, for example:

 Zwoelf                       = "1", "2" ;
 Zweihundertundeins           = "2", "0", "1" ;
 Dreihundertzwoelf            = "3", Zwoelf ;
 ZwoelfTausendzweihunderteins = Zwoelf, Zweihundertundeins ;

Expressions that can be omitted or repeated can be shown with curly brackets {...}:

 NatuerlicheZahl = ZifferAusserNull, { Ziffer } ;

Texts 1 , 2 ,…, 10 ,…, 12345 ,… fit here . It should be noted that everything within the curly brackets can appear as often as you like, but not once .

An option can be represented by square brackets [...]:

 GanzeZahl = "0" | [ "-" ], NatuerlicheZahl ;

A whole number is therefore zero ( 0 ) or a natural number, which can optionally be preceded by a minus sign. So all whole numbers like 0 , -3 , 1234 etc. fit here .

It is also possible to allow a definable number of repetitions.

 LeerzeichenAlsTab = 4 * " " , "Yes" ;

Here, four times the "" character is expected before the character string "Yes" .

Motivation to expand the BNF

The BNF sometimes requires cumbersome constructs to represent optional elements, i.e. elements that can be left out, as well as repetitive elements, since - unlike the EBNF - they do not "[…]" for options or "{…}" for optional repetitions knows, but solves these cases with appropriate alternatives (using '|' cases), recursion or 'empty content'.

In the specification of PL / 1, square brackets "[…]" were already used for options. In the definition of the Pascal language, Niklaus Wirth also introduced curly brackets "{…}" for repetitions in the BNF and called this extended BNF (extended BNF).

All formulations in an EBNF syntax can also be expressed in BNF. The EBNF was created by Wirth for reasons of better legibility and more compact notation.

Number definition in BNF

A number is a sequence of digits with an optional minus sign as a sign . In BNF you have to use several alternatives and a recursion for the digit repetition:

BNF

 <Zahl> ::= <Positive Zahl> | - <Positive Zahl> | 0
 <Positive Zahl> ::= <Ziffer ausser Null><Optionale Ziffernfolge>
 <Optionale Ziffernfolge> ::= <Ziffer> <Optionale Ziffernfolge> |

Read: A number is either a positive number or a minus sign followed by a positive number or the sign zero. A positive number is a non-zero digit followed by an optional sequence of digits. An optional sequence of digits is a digit followed by an optional sequence of digits or is empty.

Number definition in EBNF

In EBNF this can be represented in a single rule without recursion:

EBNF

 Zahl = ([ "-" ], ZifferAusserNull, { Ziffer }) | "0" ;

Read: A number consists of an optional minus sign, followed by a digit other than zero, followed by any number of additional digits (including no additional digit). Or: A number consists of the character zero.

The minus sign can be omitted. The repetition can also never occur (optional repetition). The EBNF only needs a single rule without an alternative, while the BNF needs three rules with four alternatives, including a recursion ( <optional sequence of digits> contains itself in its own definition).

The EBNF marks terminal symbols with quotation marks and uses an end character. Non-terminal symbols are not enclosed in angle brackets. The quotation marks prevent confusion.

Other additions and modifications

The EBNF eliminates some of the BNF's weaknesses:

  • The BNF itself uses the symbols (<,>, |, :: =). If these appear in the defined language, the BNF cannot be used without modification or explanation.
  • A BNF syntax can actually only contain one-line rules.

The EBNF solves these problems:

  • Terminal symbols are always written in quotation marks ("..." or '...'). The angle brackets ("<…>") for non-terminal symbols can then be omitted.
  • An end character, usually a semicolon, or a period for some authors, marks the end of each rule.

In addition, expansion mechanisms, definition of the number of repetitions, removal of alternatives (for example all characters without quotation marks), comments, etc. are provided.

Despite all the extensions, the EBNF is not “more powerful” than the BNF in terms of the languages ​​it can define. In principle, every grammar defined in EBNF can also be represented by rules in the BNF, which, however, often results in a much more extensive description.

Each extended BNF may also be referred to as an EBNF. The W3C uses an EBNF to specify XML .

Applications

Many metalanguages , such as HTML, can be defined in the EBNF.

In principle, all formal languages can be expressed in the EBNF. EBNF is often used in computer science when defining programming languages, regular expressions or parsers (see, for example, Spirit ).

EBNF is not suitable for determining the semantics of a language. For example, it is easily possible not to define essential unambiguous facts at all or several times, so that logical gaps or contradictions can arise. Furthermore, the assignment compatibility of expressions cannot be determined by EBNF.

Example programming language

A very simple programming language that only allows assignments can be defined in EBNF as follows:

 (* ein einfaches Beispiel in EBNF - Wikipedia *)
 Programm = 'PROGRAM', Bezeichner ,
            'BEGIN' ,
            { Zuweisung, ";" } ,
            'END', "." ;
 Zuweisung = Bezeichner, ":=", ( Zahl |
                               Bezeichner |
                               String ) ;
 Bezeichner = Buchstabe, { ( Buchstabe | Ziffer ) } ;
 Zahl = [ '-' ], Ziffer, { Ziffer } ;
 String = '"', { AlleZeichen - '"' }, '"' ;
 Buchstabe = "A" | "B" | "C" | "D" | "E" | "F" | "G"
           | "H" | "I" | "J" | "K" | "L" | "M" | "N"
           | "O" | "P" | "Q" | "R" | "S" | "T" | "U"
           | "V" | "W" | "X" | "Y" | "Z" ;
 Ziffer = "0" | "1" | "2" | "3" | "4" | "5" | "6"
        | "7" | "8" | "9" ;
 AlleZeichen = ? alle sichtbaren Zeichen ? ;

The standard symbols ("=" for definitions, ";" as terminator, etc.) have been used here. If necessary, deviations from this can be made.

A syntactically permissible program would then be

 PROGRAM DEMO1
 BEGIN
   A0:=3;
   B:=45;
   H:=-100023;
   C:=A;
   D123:=B34A;
   ESEL:=GIRAFFE;
   TEXTZEILE:="Hallo, Welt!";
 END.

The language can easily be supplemented with control structures , arithmetic expressions and input and output instructions. Then a useful, small programming language would already be created.

The following characters, which are recommended as normal representation in the ISO / IEC standard, have been used here:

use character
definition =
enumeration ,
Terminator ;
alternative |
option [...]
Optional repetition {...}
grouping (...)
Quotation marks, 1st variant "..."
Quotation marks, 2nd variant '...'
comment (* ... *)
Special sequence ? ...?
exception -

See also

Web links

Individual evidence

  1. Notation ( English ) W3C ®. Retrieved April 1, 2019.
  2. a b Federico Tomassetti: EBNF: How to Describe the Grammar of a Language from August 1, 2017, accessed on August 6, 2019
  3. Niklaus Wirth : What can we do about the unnecessary diversity of notation for syntactic definitions? , Communications of the ACM, Volume 20, Issue 11, Nov. 1977, 822-823, ACM New York