Ershov number

Ershov numbers are needed in the field of computer science when assigning registers . They are named after the Russian computer scientist Andrei Petrovich Jerschow , who published the idea of the procedure for register allocation . The method goes back to the hydrologists Robert Elmer Horton and Arthur Newell Strahler , who developed the method for calculating the flow order number of a flowing water area. According to Erhov, the method can also be used to determine how many registers are required to evaluate an expression , thus enabling the evaluation of large expressions with as few registers as possible. ${\ displaystyle a}$

calculation

The Ershov number of a binary tree is calculated as follows:

Each sheet has the value (constants or variables). ${\ displaystyle ershov (a) = 1}$
Every node with only one subnode has the same value as the subnode (unary operators).
For every other node applies

{\ displaystyle ershov (a_ {1} \ circ a_ {2}) = {\ begin {cases} max (ershov (a_ {1}), ershov (a_ {2})) & ershov (a_ {1}) \ neq ershov (a_ {2}) \\ ershov (a_ {1}) + 1 & ershov (a_ {1}) = ershov (a_ {2}) \ end {cases}}}

Derivation of the calculation

The method specifies a function of the form: which specifies the number of registers required for evaluation in a given arithmetic expression . For this a case distinction is necessary: ${\ displaystyle ershov: Expression \ rightarrow \ mathbb {N}}$ ${\ displaystyle a}$

Variables and constants

Exactly one register is required to evaluate a variable or constant. Hence:

{\ displaystyle ershov (v) = 1}

Unary link

To evaluate a unary operator on an expression , only the number of registers is required , because the result register of can simply be overwritten. ${\ displaystyle \ alpha}$ ${\ displaystyle a}$ ${\ displaystyle a}$ ${\ displaystyle a}$

${\ displaystyle ershov (\ alpha ~ a) = ershov (a)}$

Binary link

When evaluating a binary combination of two expressions of the form , a distinction must still be made between three cases: ${\ displaystyle a_ {1} \ circ a_ {2}}$

Case I: ${\ displaystyle ershov (a_ {1}) <ershov (a_ {2})}$

Assuming that an evaluation was carried out first, then the result would be in a register and further registers would be necessary in order to evaluate. All in all, registers would be necessary to evaluate. ${\ displaystyle a_ {1}}$ ${\ displaystyle ershov (a_ {2})}$ ${\ displaystyle a_ {2}}$ ${\ displaystyle ershov (a_ {2}) + 1}$ ${\ displaystyle ershov (a_ {1} \ circ a_ {2})}$

If the evaluation is carried out first, only precise registers are necessary, since the result of is in an additional register, but all other registers are available again for the evaluation of and therefore, in addition to this one result register, only further registers are necessary. Since is less than , is also less than or equal to , which is obviously also less than . ${\ displaystyle a_ {2}}$ ${\ displaystyle ershov (a_ {2})}$ ${\ displaystyle ershov (a_ {1} \ circ a_ {2})}$ ${\ displaystyle a_ {2}}$ ${\ displaystyle a_ {1}}$ ${\ displaystyle ershov (a_ {1})}$ ${\ displaystyle ershov (a_ {1})}$ ${\ displaystyle ershov (a_ {2})}$ ${\ displaystyle ershov (a_ {1}) + 1}$ ${\ displaystyle ershov (a_ {2})}$ ${\ displaystyle ershov (a_ {2}) + 1}$

Hence:

{\ displaystyle ershov (a_ {1}) <ershov (a_ {2}) \ Rightarrow ershov (a_ {1} \ circ a_ {2}) = ershov (a_ {2})}

Case II: ${\ displaystyle ershov (a_ {1}) = ershov (a_ {2})}$

It does not matter which expression is evaluated first, since both expressions require the same number of registers. As an example, the expression for which registers are necessary can therefore first be evaluated . After the evaluation, the result is in a results register and registers are now available for the evaluation of . Since the evaluation of the expression also or needs register are, in addition to the register for the result of still further register, for a total tab for the evaluation of needs. ${\ displaystyle a_ {1}}$ ${\ displaystyle ershov (a_ {1})}$ ${\ displaystyle ershov (a_ {1}) - 1}$ ${\ displaystyle a_ {2}}$ ${\ displaystyle ershov (a_ {1})}$ ${\ displaystyle ershov (a_ {2})}$ ${\ displaystyle a_ {1}}$ ${\ displaystyle ershov (a_ {2})}$ ${\ displaystyle ershov (a_ {2}) + 1}$ ${\ displaystyle ershov (a_ {1} \ circ a_ {2})}$

Hence:

{\ displaystyle ershov (a_ {1}) = ershov (a_ {2}) \ Rightarrow ershov (a_ {1} \ circ a_ {2}) = ershov (a_ {1}) + 1}

Case III: ${\ displaystyle ershov (a_ {1})> ershov (a_ {2})}$

The reasoning is analogous to case I:

{\ displaystyle ershov (a_ {1})> ershov (a_ {2}) \ Rightarrow ershov (a_ {1} \ circ a_ {2}) = ershov (a_ {1})}

Generalization to function calls

Ershov numbers can also be extended to function calls. The calculations for unary and binary operations can even be seen as special cases of function calls.

If the arguments of a function call are written to the stack, it is not necessary to also save the result in a register. From the same reasoning as in case I for binary expressions it follows: ${\ displaystyle f (a_ {1}, ..., a_ {n})}$

{\ displaystyle ershov (f (a_ {1}, ..., a_ {n})) = max (ershov (a_ {1}), ..., ershov (a_ {n}))}

But there are also calling conventions in which arguments are passed via certain registers, which can improve the speed of functions. Here, as for binary operators, which are a special case of function calls with two arguments in registers, the arguments with the highest Ershov number are calculated first so that as many registers as possible are available for them. Since all arguments have to be written in registers, the Ershov number of the function call is at least as large as the number of arguments in registers:

{\ displaystyle Minimum = max (ershov (a_ {1}), ..., ershov (a_ {n}), \ # ArgumentsInRegistern)}

If the Ershov number is equal to this minimum for more than one argument of the function, then, similarly to binary operators, the Ershov number of the function call increases by one for each additional such argument:

{\ displaystyle increment = \ # \ {a_ {k} | ershov (a_ {k}) = minimum \}}

{\ displaystyle ershov (f (a_ {1}, ..., a_ {n})) = {\ begin {cases} Increment> 1 & Minimum + Increment-1 \\ Increment <2 & Minimum \\\ end {cases}}}

Usually only the first, for example 5, arguments are passed in registers and the rest are placed on the stack. Then it makes sense to first calculate the last arguments and put them on the stack so that all registers can be used for the calculation of the first arguments. While the arguments in registers can be calculated in any order, the arguments on the stack must be in the logical order.

Sethi-Ullman algorithm

The Sethi – Ullman algorithm, which is named after Ravi Sethi and Jeffrey Ullman , converts an abstract syntax tree into the commands of a register machine such as machine code . The algorithm is basically the same as for the calculation of the Ershov numbers, but takes into account some properties of instructions of real register machines. The instruction set of x86 processors, for example, allows constants and values in memory to be used directly as arguments. However, not all combinations are allowed. Allowed are:

  add dword [ a ], 0x1337      ; Konstanten können direkt zu Variablen im RAM addiert werden.
  add eax,         0x2342      ; Konstanten können direkt zu Registern addiert werden.
  mov eax,         dword [ a ]
  add dword [ b ], eax         ; Register können direkt zu Variablen addiert werden.
  add eax,         dword [ c ] ; Variablen können direkt zu Registern addiert werden.

Calculating the Ershov numbers

The calculation of the Ershov numbers is different depending on the instruction set. In the simplest case, the instruction set only allows operations with registers, whereby variables and constants must first be loaded into registers. In this case all hands are assigned the Ershov number . If the instruction set also allows operations with variables and constants, a distinction must be made whether the instruction set only allows commands in the two-operand form or also in the three-operand form. ${\ displaystyle ershov (a) = 1}$

In the two-operand form, the result is stored in the register of the first operand. An example is the addition on x86 processors:

  mov eax, a
  add eax, b ; eax + b → eax. eax ist zugleich ein Operand als auch das Ergebnisregister.

In this case, the value can be assigned to a right leaf in the abstract syntax tree, and the value to a left leaf . In the example, the right operand is, it does not have to be loaded into a register, and the left operand. This must first be loaded into a register so that it can be calculated and stored in. ${\ displaystyle ershov (r) = 0}$ ${\ displaystyle ershov (l) = 1}$ baeax ${\ displaystyle a + b}$ eax

In the three-operand form, the first operand is the result register:

  add r1, a, b ; a + b → r1.

In this case, the value is assigned to each leaf in the abstract syntax tree . ${\ displaystyle ershov (a) = 0}$

However, this representation of the calculation is idealized: actual processors allow, for example

only constants as direct operands, but no variables or vice versa.
Constants as operands only if they are in a limited range of values, for example in . Otherwise, constants must first be loaded into a register, which increases the number of necessary registers and thus the Ershov number. ${\ displaystyle [-128,127]}$
the three-operand form for some instructions, but only the two-operand form for others.
the three-operand form, but the second operand cannot be a constant and / or a variable. In this case, the three-operand form behaves like the two-operand form.

Generation of the code

Then the tree is traversed in secondary order , whereby only the subtree with the highest Ershov number is evaluated. If the subtree with the lowest Ershov number were evaluated instead, the result must be stored in a register, which is no longer available for the calculation of the other subtree. With a limited number of registers, the situation inevitably arises that not enough registers are available for the calculation of a tree. The number of registers required for calculating a tree is only greater than the number of necessary registers for the subtrees if both subtrees have the same Ershov number or if both subtrees need the same number of registers. Only then is the limit of the available registers exceeded and the intermediate result of a subtree must be stored in a memory, for example on the stack or in a red zone , by

only the right subtree is evaluated,
a command to save the intermediate result inserted,
then the left subtree is evaluated,
a command for loading the intermediate result is inserted and
Finally, the command for linking the two subtrees is inserted.

The order is actually irrelevant, but it can be advantageous to first evaluate the right and then the left subtree if the operation allows a value in memory as a right operand. Then the last two steps can be summarized.

credentials

^ Sethi Ravi, Jeffrey D. Ullman, The Generation of Optimal Code for Arithmetic Expressions . In: Journal of the Association for Computing Machinery . tape 17 , no. 4 , 1970, pp. 715-728 , doi : 10.1145 / 321607.321620 .

[1] Sethi Ravi, Jeffrey D. Ullman, The Generation of Optimal Code for Arithmetic Expressions . In: Journal of the Association for Computing Machinery . tape 17 , no. 4 , 1970, pp. 715-728 , doi : 10.1145 / 321607.321620 .