Enriched Backus-Naur form

from Wikipedia, the free encyclopedia

The enriched Backus-Naur-Form ( ABNF , English augmented BNF) is a variant of the Backus-Naur-Form - metalanguage for describing syntax notations. It was originally developed as RFC 2234 for the unambiguous specification of RFC Internet standards of the IETF and is suitable for the syntactic definition of technical languages ​​and protocols.

Emergence

During the emergence of the RFC standards, the need arose to represent the required syntax descriptions using a standardized BNF variant. The RFC standard RFC 2234 standardized the slightly different variants in the previously published RFC standards. Newer RFC standards no longer needed to contain a definition of the metalanguage used. Instead, a reference to RFC 2234 was sufficient.

The document contains a self-definition of the ABNF syntax. In it, ABNF is expressed using ABNF notation.

Successor versions

The corrected editions RFC 4234 , RFC 5234 and RFC 7405 later replaced the first version.

properties

The ABNF notation is based on the BNF, the principles of the BNF that apply here can also be found there.

The extensions to the BNF consist of a modified naming of rules, repetitions, alternatives, value ranges and a set of predefined basic rules. They allow a more comfortable and expressive formulation of the structures to be described. The main focus of the means of expression is intended for the definition of character strings. Mechanisms were deliberately defined that require a certain coding (e.g. ASCII ). Are z. If, for example, character codes or ranges are used, these definitions are dependent on the character encoding originally used, and usually have to be adapted for other character encodings.

Comments

A semicolon ( ; ) introduces the comment. The comment text follows and extends to the next line end ( line comment ). Multi-line comments require a semicolon per line.

Terminal symbols

Terminal symbols are the values ​​from which the rule definitions are ultimately built. The terminal symbols include:

  • literal strings with no distinction between upper and lower case. They are placed in double quotation marks " . They can also be identified explicitly with the prefix % i . If the quotation mark is required in the character string, the character string must be expressed as a sequence with a character code for the quotation mark. Example: " PROGRAM "
  • Literal strings that are case-sensitive. They are identified by the prefix % s . Example: % s "BIG SMALL"
  • Character codes in
    • decimal representation: the prefix % d indicates the decimal system used . Example: % d13 for the character with the code value 13 (in ASCII this is the carriage return character , CR for short)
    • hexadecimal (also hexadecimal) representation: the prefix % x indicates the hexadecimal system used . Example: % x0d for the character with the code value 13.
    • binary representation: the prefix % b indicates the binary system used . Example: % b00001101 for the character with the code value 13.
  • Sequences (or chains) of character codes: Sequences consist of a character code and any number of attachments for subsequent characters, each consisting of a period . and a number without a % prefix. They are the only way to specify character strings with fixed upper and lower case. Example: % d13.10 for the character string carriage return (CR) and line feed (LF) in ASCII.

Naming rules

Definition names can contain the characters A – Z , a – z , 0–9 and the minus sign - , where the first character must be a letter. In comparison to the BNF, no angle brackets < > are necessary around names; however, they are possible for reasons of compatibility. The definition of a rule begins with the name and an equal sign = . It continues until no more subsequent lines with extended indentation are encountered.

  Regel1 = Regelbestandteil1
    Regelbestandteil2
    Regelbestandteil3
  Regel2 =

An exception concerns the incremental alternative (see below). It extends an existing definition.

Areas

Ranges represent a set of characters whose code values ​​lie in the specified range. They are a special kind of alternative. The area is formed by the code values ​​at the boundaries of the area. Both values ​​are connected with a minus sign - , the prefix for the number system used is only given for the first number. For example, this defines the range of hexadecimal character codes 0x30 to 0x39 (or decimal 48 to 57). In the common ASCII character code, this corresponds to the digits '0', '1' etc. to '9':

         Ziffer      =  %x3039

corresponds to the alternative

         Ziffer      =  "0" / "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9"

Repetitions

Information about repetitions is placed in front of the expression and can contain a minimum and / or a maximum of occurrences. The explicit form is <Minimum> * <Maximum> , where a missing minimum is taken as 0 (nonexistent) and a missing maximum as infinite (unlimited occurrence). An exact number of n repetitions is expressed by the single number n .

  beliebighaeufig    = *meinWert
  genau-dreimal      = 3allergutenDinge
  mindestens-zweimal = 2*abzweidabei
  hoechstens-drei    = *3Versuchefrei
  ein-bis-zwei       = 1*2Vornamen

groups

Groups are used to clearly define precedence for compound expressions and are formed by round brackets ( and ) .

  Ausdruck1          = (To be) / (not to be)
  Ausdruck2          =  To (be / not) to be

The first example is the alternatives " To be " and " not to be ".

In the second example, a sequence is formed from “ To ”, then either “ be ” or “ not ”, and then “ to be ”, ie “ To be to be ” or “ To not to be ”.

Sequences

In the case of sequences, all lined up expressions are expected exactly as specified. Sequences are simply formed by stringing together expressions (separated by white space).

  Sequenz            = Eins nach dem Anderen

Optional sequences

Optional sequences can exist once, but do not have to be. They are formed by square brackets [ and ] . The following idioms are equivalent:

[optionalerAusdruck]
*1optionalerAusdruck
0*1optionalerAusdruck

Alternatives

In the case of alternatives, only one of the variants listed can be available. Alternatives are listed with a solidus or a slash / separated.

  Auswahl            = Sein / Nichtsein

Incremental alternatives

Existing definitions can be expanded incrementally with alternatives. This means that decentralized definitions are possible, but this can be at the expense of clarity if the components of a definition are far apart. The name of the rule must be repeated with = / .

  Status             = JaStatus             =/ NeinStatus             =/ WeissNicht

corresponds

  Status             = Ja / Nein / WeissNicht

Prioritization

The following processing sequence applies to compound expressions:

  • Names, strings, terminals
  • Comments
  • Areas
  • Repetitions
  • Groups and optional sequences
  • Sequences
  • Alternatives

The RFC standard recommends setting groups for clear priority definition for mixed expressions with sequences and alternatives.

Predefined rules

Frequently used definitions are already predefined as core rules . They include general classes like numbers, letters, and spaces.

predefined rules in ABNF
rule = definition comment POSIX character class in the C -locale
ALPHA = % x41-5A /% x61-7A ; Upper and lower case letters A – Z or a – z in ASCII [:alpha:]
BIT = "0" / "1" ; The values ​​of a bit
CHAR = % x01-7F ; every 7-bit US-ASCII character, except for the NUL character
CR = % x0D ; Carriage return (Engl. Carriage return)
CRLF = CR LF ; Internet standard for line breaks
CTL = % x00-1F /% x7F ; Control sign [: cntrl:]
DIGIT = % x30-39 ; The digits of the decimal system 0-9 [: digit:]
DQUOTE = % x22 ; "(Quotation marks, English double quote)
HEXDIG = DIGIT / "A" / "B" / "C" / "D" / "E" / "F" ; The digits in the hexadecimal system [: xdigit:]
HTAB = % x09 ; Horizontal tab characters
LF = % x0A ; Line Feed (English. Line feed)
LWSP = * (WSP / CRLF WSP) ; linear white space (after line endings)
OCTET = % x00-FF ; 8-bit data
SP = % x20 ; Space (Engl. Space)
VCHAR = % x21-7E ; visible and printable characters [:graph:]
WSP = SP / HTAB ; White space (Engl. Whitespace)

example

For demonstration purposes, the sample language definition of the EBNF variant is adapted to ABNF. It allows programs with simple assignments.

 ; ein Beispiel in ABNF - analog zum Beispiel der EBNF-Wikipediaseite
 Programm = "PROGRAM" Bezeichner
            "BEGIN"
            *( Zuweisung ";" )
            "END."
 Zuweisung = Bezeichner ":=" ( Zahl /
                               Bezeichner /
                               String )
 Bezeichner = Buchstabe *( Buchstabe / Ziffer )
 Zahl = [ "-" ] 1*Ziffer
 String = %x22 *( %x20-21 / %x23-7E ) %x22 ; "beliebige sichtbare Zeichen ausser doppelten Anführungszeichen"
 Buchstabe = %x41-5A ; Bereich der Zeichen von "A" bis "Z"
 Ziffer = DIGIT      ; alle Ziffern
 AlleZeichen = VCHAR ; alle sicht- und druckbaren Zeichen (wird hier nicht verwendet)

The following arbitrary program fits the definition given above.

 PROGRAM WERTESETZEN
 BEGIN
   A:=-1234;
   B:=A;
   BEZEICHNER:="";
   C:=BEZEICHNER;
   R2D2:="Piep";
   RESULTAT:="Erfolg";
 END.

Comparison with the EBNF

The differences between ABNF and EBNF are tabulated here for orientation.

Property comparison
Present in ABNF Available in EBNF Characters used in EBNF
Areas - -
Character codes - -
incremental alternatives - -
Repetitions with minimum and maximum - -
Line comment Block comment (* ... *)
- Alternative "quotation mark" '
- Exceptions -
- Optional repetition { ... }
- Explicit terminator character ;
- Special sequence ? ... ?

Both notations allow the same scope of syntax definitions.

See also

Individual evidence

  1. D. Crocker, P. Overell: Augmented BNF for Syntax Specifications: ABNF. RFC 2234 (obsolete). Pp. 1-14 , accessed on August 25, 2011 (English).
  2. D. Crocker, P. Overell: Augmented BNF for Syntax Specifications: ABNF. RFC 4234 (obsolete). Pp. 1–16 , accessed on August 25, 2011 (English).
  3. D. Crocker, P. Overell: Augmented BNF for Syntax Specifications: ABNF. RFC 5234 . Pp. 1–16 , accessed on August 25, 2011 (English).
  4. P. Kyzivat: Case-Sensitive String Support in ABNF. RFC 7405 . Pp. 1–4 , accessed December 2014 .
  5. D. Crocker, P. Overell: Augmented BNF for Syntax Specifications: ABNF. RFC 5234 . Pp. 13-14 , accessed on August 25, 2011 (English).
  6. ISO Committee: ISO / IEC 14977: 1996 (E). ISO standard for EBNF (1st edition). Retrieved August 25, 2011 .

Web links

  • tools.ietf.org Tools of the IETF
  • tools.ietf.org ABNF tools of the IETF: Parser generators and validation (English) no longer up to date (March 2016)
  • github.com/fenner/bap 'bap' is an open source ABNF parser generator.
  • .bortzmeyer.org 'eustathius' is an open source toolset for parsing ABNF and generating sample programs.
  • quut.com/abnfgen 'abnfgen' is an open source generator of sample programs according to the given ABNF syntax.
  • akr.org/abnf 'ABNF' is an open source Ruby module that converts ABNF syntax into regular expressions.
  • coasttocoastresearch.com 'apg' is an open source ABNF parser generator (for C / C ++ / Java / JavaScript).
  • vinegen.com/metabbs an ABNF parser generator from VineGen (closed source).