Polytope model
The polytope model (or, more generally, polyhedron model ) is a mathematical model that can be used by compilers to optimize sets of loops. The loops in the source program are described by polytopes , to which a correctness-preserving transformation is then applied. In the last step, the resulting polytopes are translated back into (target) code.
Establishing the transformation matrix
A transformation consists of two parts, the schedule and the allocation. The schedule specifies when a computation should take place, while the allocation specifies where the computation takes place (i.e. on which processor it is executed).
Calculation of a valid schedule
A schedule is valid when it receives all data dependencies .
If an iteration with the loop indices , the results of calculation required needs for the Schedule shall apply: . This means that all required values must have been calculated at an earlier point in time.
Calculation of a valid allocation
In contrast to the schedule, there are no restrictions on the allocation.
In principle, there is always the option of having the calculation performed by just one processor. However, you lose all parallelism. It is therefore advisable to distribute the calculation over as many processors as possible in order to maximize parallelism. However, one must take into account that this means that more data must be sent between the processors. These additional costs for communication can easily exceed the profit from the parallel calculation.
Example (automatic parallelization)
From the source program to the polytope
Let us consider the following program, which consists of a perfectly nested set of loops. The body of the loop contains a statement S .
for i:= 0 to n do for j:= 0 to i+2 do S: A(i, j):= A(i-1, j) + A(i, j-1) end end
To represent the loop as a polytope, it is sufficient to write the upper and lower bounds as inequalities:
or as a linear system of equations in matrix representation (one row corresponds to an inequality, columns: i, j, n, 1)
Dependency analysis
The array cell is A(i,j)
overwritten in each loop pass. For the calculation of A(i,j)
you need the values of A(i-1,j)
and A(i,j-1)
. This creates two data dependencies: Each iteration (i,j)
depends on both the iteration (i-1,j)
and (i,j-1)
on. Both dependencies must be taken into account in the next step when calculating the schedule.
All dependencies can be calculated algorithmically using a dependency analysis method.
Establishing the transformation matrix
A correct schedule that contains both dependencies is e.g. B. .
Interpretation:
- The first step is to
A(0,0)
calculate - In the second step,
A(1,0)
and areA(0,1)
calculated - In the third step
A(2,0)
,A(1,1)
andA(0,2)
- etc.
In order to enable maximum parallelism in every calculation step, we choose the allocation
This results in the following transformation matrix:
(Explanation: first line = schedule (i + j), second line = allocation (i), first column: i, second column: j)
Transformed polytope
The loop indices are also transformed by: and
Generation of the target program
The last step is to generate code that precisely enumerates the points from the target polyhedron and adheres to the correct order (more precisely the lexicographical order). This is calculated algorithmically by so-called scanning algorithms (e.g. Fourier-Motzkin elimination or the Quillerè method ).
The following (synchronous) target program is obtained:
for t:= 0 to 2n+2 do parfor p:= max(0, ceil(t/2)-1) to min(t, n) do A(p, t-p):= A(p-1, t-p) + A(p, t-p-1) end end
The outer loop specifies global time steps, while the second loop represents the parallel calculation. Since this program lacks the code for communication between the processors, it cannot run directly on systems with distributed memory. However, it can be implemented directly with shared memory, e.g. B. as an OpenMP program.
application
The most important possible uses include:
- Cache optimization
- Loop tiling
- Loop parallelization