ILU dismantling

In numerical mathematics, the incorrect decomposition of a matrix into the product of a lower triangular matrix L and an upper triangular matrix U is called ILU decomposition (from incomplete LU decomposition ) or incomplete LU decomposition ${\ displaystyle A \ in \ mathbb {R} ^ {n \ times n}}$

{\ displaystyle LU \ approx A}

,

in which only the entries of a given occupation structure are calculated from the decomposition matrices L and U.

When calculating a normal LU decomposition of a sparse matrix, the occupation structure cannot usually be exploited. Much more storage space is therefore required than for the original matrix and the number of necessary arithmetic operations is no less than that for a fully occupied matrix. By specifying a maximum cast structure, this problem is circumvented while accepting an incorrect breakdown.

The ILU decomposition is successfully used as a preconditioner to accelerate the iterative solution of large sparse linear systems of equations using the Krylow subspace method . No properties of the actual problem (mostly the numerical solution of a partial differential equation ) are used. It is therefore not limited to certain problem classes and has found its way into many areas of numerical simulation , for example in numerical fluid mechanics the technology is widespread. ${\ displaystyle Ax = b}$

The procedure was first mentioned in 1960 by Richard S. Varga and Nikolai Ivanovich Buleev (NI Buleev). A more detailed analysis was published in 1977 by JA Meijerink and van der Vorst . These examined preconditioning techniques for the CG method and suggested an incomplete Cholesky decomposition for symmetric matrices . At the same time they mentioned an extension to general matrices.

application

For use as a preconditioner, the system of equations is formally multiplied by ${\ displaystyle Ax = b}$ ${\ displaystyle (LU) ^ {- 1}}$

{\ displaystyle (LU) ^ {- 1} Ax = (LU) ^ {- 1} b.}

Applying it to Krylov subspace methods, they converge better, because the matrix is closer to the unit matrix as A is. For this method, matrix-vector multiplications are required in each step . Since A is sparsely populated, it is possible to calculate with little computational effort. For the solution of one can use the equivalent system of equations ${\ displaystyle (LU) ^ {- 1} A}$ ${\ displaystyle y = Ax}$ ${\ displaystyle c = (LU) ^ {- 1} y}$

{\ displaystyle LUc = y}

solve efficiently by inserting forwards and backwards . The thin occupancy of the matrices L and U can be used here.

Compared to splitting- based preconditioners such as Gauß-Seidel , the calculation of an ILU breakdown is relatively time-consuming, whereby the effort is directly related to the size of the permitted occupation structure. This is offset by the acceleration of the Krylow subspace process, which is usually better, the larger the allowed occupation structure. If several systems are solved one after the other with the same matrix but different right-hand sides, it is therefore advisable to invest more in the calculation of the ILU. When solving time-dependent partial differential equations numerically, in which sequences of thousands of similar linear systems of equations have to be solved one after the other, an ILU decomposition that has been calculated once is usually frozen and recalculated periodically.

ILU decompositions or variants are part of every larger program library for solving sparse linear equation systems, for example from PetSc or MATLAB .

Basic form

In the basic form, that of A is given as the occupation structure P. The decomposition into the matrices L and U is then defined by the following conditions:

${\ displaystyle l_ {ij} = u_ {ij} = 0, {\ mbox {if}} (i, j) \ notin P}$
${\ displaystyle (LU) _ {ij} = a_ {ij}, {\ mbox {if}} (i, j) \ in P.}$

In addition, normalization applies, i.e. H. Determination of one of the two incompletely occupied triangular matrices over the main diagonal . Either the diagonal elements of the lower triangular matrix are normalized to 1:

{\ displaystyle l_ {ii} = 1, \ quad i = 1, \ dotsc, n}

or the diagonal elements of the upper triangular matrix:

{\ displaystyle u_ {ii} = 1, \ quad i = 1, \ dotsc, n}

Depending on the standardization, the decomposition algorithms differ, which, depending on the implementation, also has an impact on the calculation efficiency.

Since it is not required for, it is possible (this again motivates the name incomplete LU decomposition). ${\ displaystyle (i, j) \ notin P}$ ${\ displaystyle (LU) _ {ij} = 0}$ ${\ displaystyle LU \ neq A}$

The matrix is given . The incomplete decomposition matrices L and U are then saved together in a new matrix , whereby the previously known ones on the diagonal of L and U are not saved. The matrix M is initialized in such a way that it is set identical to A for entries from P , otherwise to zero. ${\ displaystyle n \ times n}$ ${\ displaystyle A = (a_ {ij})}$ ${\ displaystyle M = (m_ {ij})}$

If normalization is selected, the decomposition is then calculated using the following algorithm: ${\ displaystyle l_ {ii} = 1, \, i = 1, \ dotsc, n}$

For  $k=1,\dotsc ,n-1$ , do
   For  $i=k+1,\dotsc ,n$  and if  $(i,k)\in P$ , do
       $m_{ik}:=m_{ik}/m_{kk}$ 
      For  $j=k+1,\dotsc ,n$  and if  $(i,j)\in P$ , do
          $m_{ij}:=m_{ij}-m_{ik}m_{kj}$

The order of the loops in the above algorithm can be changed to improve efficiency depending on the data structure . If the matrix is saved line by line, for example, the memory accesses in the last loop do not take place on adjacent memory blocks. In such cases, it makes sense to swap the loops.

existence

Existence statements of the decomposition exist for M-matrices and H-matrices . For general matrices there are counterexamples in which the algorithm terminates prematurely because a zero appears on the diagonal, which leads to a division by zero. Nevertheless, in practice the calculation of the decomposition is not aborted.

variants

There are many variants of the original ILU dismantling. These attempt either to improve the approximation properties or, if the approximation quality is similar, to reduce the calculation effort.

ILU (p)

The ILU (p) decompositions, which were first considered by Watts in 1981 when simulating an oil reservoir, are widespread. This denotes the level of fill . The basic version of the ILU has level 0. Level 1 is defined by considering the occupation structure of the product of the matrices L and U from the ILU (0). Level 2 results from the decomposition matrices of ILU (1) etc. To determine the staffing structure of an ILU (p) decomposition, it is not necessary to calculate the decomposition of the lower levels in advance. To do this, the non- zero entries of the matrix are initially assigned level 0, while the zero entries are assigned infinitely. Then you go through the elements of the matrix as it would happen in the regular algorithm and every time the element is modified in the innermost loop, the level is updated using ${\ displaystyle p}$ ${\ displaystyle A}$ ${\ displaystyle a_ {ij}}$

{\ displaystyle lev_ {ij} = \ min (lev_ {ij}, lev_ {ik} + lev_ {kj} +1).}

It is thus possible to provide the memory for the ILU decomposition before the start of the algorithm. When using an ILU (p) it should be noted that, on the one hand, the calculation of the decomposition is more complex than with the basic version and, on the other hand, the application is more expensive because the preconditioner has more non-zero entries. This means that at high levels from around 3 onwards, the reduction in the number of iterations in the Krylow subspace method no longer necessarily leads to a shortening of the CPU times. In addition, especially in the case of indefinite matrices, the iteration numbers may even deteriorate compared to the basic version.

ILUT

The ILU (p) have the disadvantage that the non-zero entries are not selected on the basis of approximation properties and thus computing time for almost zero entries can be wasted. This is taken into account in the ILU decomposition with threshold , called ILUT, proposed by Yousef Saad in 1994 . In addition to the use of a staffing structure, additional conditions are permitted here, according to which entries are not taken into account if they are below a certain tolerance. For certain diagonally dominant M-matrices the existence of the decomposition can then be proven again. The implementation of an efficient ILUT is more difficult than with the other variants, but higher levels of fill are often possible than with a pure ILU (p).

Fixed point method

In 2015, a method was proposed that interprets the ILU decomposition as a fixed point equation. This alternative approach is characterized by its high degree of parallelizability and its simplicity.

Other variants

The ILU can be expanded to block matrices without any major problems ; instead of dividing by the diagonal element, it has to be multiplied by the inverse of the corresponding diagonal block . ${\ displaystyle d_ {ii}}$

Comparison of ICCG with CG using the 2D Poisson equation

A special case, however, is the incomplete Cholesky decomposition (IC). This applies the concept of the ILU decomposition to symmetrical and positively definite matrices, analogous to the Cholesky decomposition . This process, which was introduced in 1977 as the first ILU variant, is often used as a preconditioner for the CG process . The combination of CG and IC is also referred to as ICCG.

Parallelization

The classic ILU decomposition is strictly sequential and therefore difficult to parallelize . However, variants have been developed that use the central ideas to make parallelization possible. This includes in particular multilevel techniques such as ILUM. Here are independent amounts to a set of unknowns to eliminate blocks used. The resulting fill-in is limited by a threshold. A new independent set is then sought in the remaining unknowns and the step is repeated until the remaining block has become small enough for a direct solution.

The iterative ITU by means of fixed point iteration can intrinsically be parallelized to a high degree.

Influence of the numbering

The numbering of the unknowns in A has an influence on the efficiency of the preconditioner that should not be underestimated. This is due to the fact that the fill-in in the exact decomposition depends exactly on it and thus the numbering has an influence on how well the faulty ILU decomposition A approximates. In addition, the numbering influences the size of the entries on the diagonal and thus the stability .

Here, too, there are no solid statements as to which numbering makes sense for which type of problems. In particular, when using the basic version ILU (0), no convincing heuristics are known. For the stronger preconditioners ILUT or ILU (p) with p> 0 , the reverse-Cuthill-McKee numbering has proven to be beneficial in many cases .

literature

Andreas Meister: Numerics of linear systems of equations , 2nd edition, Vieweg, Wiesbaden 2005, ISBN 3-528-13135-7
Gerard Meurant: Computer Solution of Large Linear Systems , Elsevier, 1999, ISBN 0-444-50169-X
Yousef Saad: Iterative Methods for Sparse Linear Systems , 2nd edition, SIAM Society for Industrial & Applied Mathematics 2003, ISBN 0-898-71534-2

Individual evidence

↑ Varga RS: Factorization and normalized iterative methods In: Boundary problems in differential equations (Ed .: RELanger) University of Wisconsin Press, Madison 1960
↑ Buleev A numerical method for solving two- and three-dimensional diffusion equations Math.Sb. 51 (1960)
↑ Meijerink, van der Vorst: An iterative solution method for linear systems of which the coefficients matrix is a symmetric M-Matrix , Mathematics of Computation 31 (1977), pp. 148-162
↑ Edmond Chow, Aftab Patel: Fine-grained parallel incomplete LU factorization , SIAM Journal on Scientific Computing 37: 2 (2015), C169-C193

[1] Varga RS: Factorization and normalized iterative methods In: Boundary problems in differential equations (Ed .: RELanger) University of Wisconsin Press, Madison 1960

[2] Buleev A numerical method for solving two- and three-dimensional diffusion equations Math.Sb. 51 (1960)

[3] Meijerink, van der Vorst: An iterative solution method for linear systems of which the coefficients matrix is a symmetric M-Matrix , Mathematics of Computation 31 (1977), pp. 148-162

[4] Edmond Chow, Aftab Patel: Fine-grained parallel incomplete LU factorization , SIAM Journal on Scientific Computing 37: 2 (2015), C169-C193