Shunting yard algorithm
The shunting yard algorithm is a method to convert mathematical terms from the infix notation into the reverse Polish notation or into an abstract syntax tree . The algorithm was developed by Edsger W. Dijkstra invented and English shunting yard , marshalling yard ' named because he's way of working to a marshalling yard recalls.
functionality
The shunting yard algorithm requires both an input and an output queue for the conversion, as well as a stack , which is required for the intermediate storage of the operators. The input string is read character by character, with all numbers being written directly to the output variable in the same order. If the pending character is an operation character, it is placed on the operator stack. If an operator is already on the stack, the operator priority and operator associativity are used to decide whether the new operator should be placed directly on the stack or whether the stack should be emptied into the output first.
Opening brackets are also placed on the operator stack, but they are not written to the output stream when they are removed. With closing brackets, the stack is emptied until an opening bracket is encountered; if no opening brackets are found, the input string is incomplete, although the error handling is not defined by the algorithm.
In detail
This is followed by a German pseudo-code that can be easily translated and adapted into most programming languages.
Stack mit LIFO-Prinzip und Ausgabe-Queue anlegen. SOLANGE Tokens verfügbar sind: Token einlesen. WENN Token IST-Zahl: Token ZU Ausgabe. ENDEWENN WENN Token IST-Funktion: Token ZU Stack. ENDEWENN WENN Token IST-Argumenttrennzeichen: BIS Stack-Spitze IST öffnende-Klammer: Stack-Spitze ZU Ausgabe. FEHLER-BEI Stack IST-LEER: GRUND (1) Ein falsch platziertes Argumenttrennzeichen. GRUND (2) Der schließenden Klammer geht keine öffnende voraus. ENDEFEHLER ENDEBIS ENDEWENN WENN Token IST-Operator SOLANGE Stack IST-NICHT-LEER UND Stack-Spitze IST Operator UND Token IST-linksassoziativ UND Präzedenz von Token IST-KLEINER-GLEICH Präzedenz von Stack-Spitze Stack-Spitze ZU Ausgabe. ENDESOLANGE Token ZU Stack. ENDEWENN WENN Token IST öffnende-Klammer: Token ZU Stack. ENDEWENN WENN Token IST schließende-Klammer: BIS Stack-Spitze IST öffnende-Klammer: FEHLER-BEI Stack IST-LEER: GRUND (1) Der schließenden Klammer geht keine öffnende voraus. ENDEFEHLER Stack-Spitze ZU Ausgabe. ENDEBIS Stack-Spitze (öffnende-Klammer) entfernen WENN Stack-Spitze IST-Funktion: Stack-Spitze ZU Ausgabe. ENDEWENN ENDEWENN ENDESOLANGE BIS Stack IST-LEER: FEHLER-BEI Stack-Spitze IST öffnende-Klammer: GRUND (1) Es gibt mehr öffnende als schließende Klammern. ENDEFEHLER Stack-Spitze ZU Ausgabe. ENDEBIS
This algorithm assumes that all tokens are correctly recognized and are valid. In particular, this algorithm does not take over the superposition of the characters “+” and “-”. A conflict between right and left associative operators with the same precedence is not caught.
The “reading” access to the stack (e.g. with “Stack top ACT”) and the “taking” (with “Stack top CLOSED”) are required.
Required functions are:
- Recognize numbers
- Recognize functions
- Recognizing argument separators
- Recognize operators
- Determine operator associativity
- Determining the operator precedence (here higher precedence means a stronger bond), the precedence of a function is maximal.
Examples
The following calculations, given in Infix notation, are to be transformed. It is documented what happens.
Precedents: ( +
, −
) <( ·
, :
) <( ^
) <functions
(3 + 4) (5 - 6)
- Summarize tokens:
(
3
+
4
)
(
5
−
6
)
(Here: every character is a token) -
(
on the stack. -
3
for output -
+
on the stack -
4
for output -
)
:-
+
for output -
(
remove from the stack
-
- Operator expected but
(
found:- implicit
·
on the stack -
(
on the stack
- implicit
-
5
for output -
−
on the stack -
6
for output -
)
:-
−
for output -
(
remove from the stack
-
- End:
·
to issue
Output:3
4
+
5
6
−
·
1 + 2 - 3 4 + 5 ^ 6 ^ 7 8 - 9
- Summarize tokens:
1
+
2
−
3
·
4
+
5
^
6
^
7
·
8
−
9
(Here: every character is a token) -
1
for output -
+
on the stack -
2
for output -
−
:-
+
for output -
−
on the stack
-
-
3
for output -
·
on the stack -
4
for output -
+
:-
·
for output -
−
for output -
+
on the stack
-
-
5
for output -
^
on the stack -
6
for output -
^
on the stack (^
is right-associative) -
7
for output -
·
:-
^
for output -
^
for output -
·
on the stack
-
-
8
for output -
−
:-
·
for output -
+
for output -
−
on stack
-
-
9
for output - End:
−
to issue
Output: 1
2
+
3
4
·
−
5
6
7
^
^
8
·
+
9
−
Explicitly in parentheses ([(1 + 2) - (3 · 4)] + {[5 ^ (6 ^ 7)] · 8}) - 9 this is possibly easier to understand.
cos (1 + sin (ln (5) - exp (8)) ^ 2)
- Summarize tokens:
cos
(
1
+
sin
(
ln
(
5
)
−
exp
(
8
)
)
^
2
)
-
cos
on the stack -
(
on the stack -
1
for output -
+
on the stack -
sin
on the stack -
(
on the stack -
ln
on the stack -
(
on the stack -
5
for output -
)
:(
Remove the top one from the stack-
ln
for output
-
–
on the stack -
exp
on the stack -
(
on the stack -
8
for output -
)
:(
Remove the top one from the stack-
exp
for output
-
)
:-
–
for output (
Remove the top one from the stack-
sin
for output
-
-
^
on the stack -
2
for output -
)
:-
^
for output -
+
for output (
Remove the top one from the stack-
cos
for output
-
- End: No remaining elements in the stack, so you're done!
Output: 1
5
ln
8
exp
−
sin
2
^
+
cos
Web links
- Original description of the algorithm (PDF; 1.1 MB; English)