Stack Engine tab

The Register Stack Engine (RSE) is a programming mechanism for efficient handling of the stack in the IA-64 , the Intel architecture for 64-bit processors .

The parameters of a function are transferred in registers, some of which work like a stack. The logical stack consists of the registers and a part of the main memory, so that it can be of any size (as large as the main memory) and current stack frames ( memory pages ) are quickly available for calculations in the ALU (as fast as the registers ). The RSE is a hardware mechanism that is controlled by a few machine commands and, depending on the operating mode and the implementation of the processor, works independently to varying degrees, i.e. the registers are synchronized with the memory with free bandwidth regardless of the command stream being executed, quasi in the background.

Register structure

The IA-64 architecture defines 128 freely usable registers (as opposed to eight registers on the IA-32 ), of which registers 0 through 31 function like normal static registers (eight on the IA-32). Registers 32 to 127 support special mechanisms that reduce or even avoid the tedious saving and restoring of register contents on the stack (in main memory) when calling subroutines and functions .

Function call

When a function is called, the function arguments , return address and return values are generally not transferred on the stack, but in registers. The registers 32 to 127 themselves serve as a stack, the register stack pointer points to the register that is next available for the transfer of function parameters . If there are no more registers available, registers from an earlier stack frame are written to main memory (usually only to the processor cache ).

Each time the function is called, the registers are rotated so that the function parameters can always be found starting with register 32. This is followed by the local data of the function and then the registers that are intended to be passed on to subordinate functions. We speak of input , local and output registers, which can be conveniently addressed using the abbreviations ( in, locand out). The length of these three areas is determined during compilation .

From a technical point of view, the registers are not copied during rotation, but only renamed ( register mapping ). For this purpose, a pointer is adapted which points to the respectively valid register 32.

Structure of the registers

The following illustration serves to explain the functionality and cannot be influenced by the program.

The 96 dynamic registers are divided into 4 partitions:

The first two partitions contain registers that are used for stack frames by higher-level functions.

clean: the register is synchronized with the main memory
dirty: the register is not yet synchronized with the main memory
current: the register belongs to the current stack frame
invalid: the register does not contain any important data and can be used for new function calls, i.e. subordinate functions

Since the current input register 0 always has the number 32, the register file looks something like this:

For example, if the old stack frame is 32-36, then the pointer must be shifted to the beginning of the current partition 4 registers to the right, but since the old stack frame is not stored in register positions 2832, it is then closed in positions 123-127 like a ring buffer.

In fact, the current register 32 can be any register, that is, the above representation can actually be saved as follows:

If the invalid partition is too small for the next function call, registers from clean are added to invalid . If invalid and clean are too small, registers from dirty must be synchronized before the function can be called. With automatic synchronization, the dirty part is reduced in size in favor of the clean partition, i.e. registers that contain the data from previous stack frames are synchronized with the main memory, because clean registers can be used again immediately. Or the invalid partition is reduced in size in favor of the clean partition, that is, data from previous stack frames are loaded from main memory so that they are available again when a higher-level function is returned. In short, the clean partition should be particularly large.

With a task change, 127 registers do not have to be saved, but only the registers of the dirty partition and the static registers 0 to 32.

synchronization

The synchronization with the main memory is only necessary if no more registers are available. This would be the case, for example, with a call depth of 96 functions that do not accept any parameters (only the return address has to be saved for each function call). Optionally, the synchronization with the main memory can also take place in the background. If the load / store unit is idle, it can speculatively either increase the number of registers available by writing registers from previous calls to main memory, or reading registers from previous stack frames from main memory so that it can quickly return from functions Are available. If the synchronization is speculative, one speaks of eager mode , otherwise of lazy mode . Speculative synchronization can be switched on and off if it is implemented.

Comparison with other technologies

The Register Stack Engine is a generalization of the Windows registers as they appear in SPARC processors. There the size of the register window is always the same, while it can be set as desired in the register stack engine .

literature

Intel Itanium Architecture Software Developer's Manual , Volume 2 Chapter 6