CG procedure

A comparison of the simple gradient method with optimal step length (in green) with the CG method (in red) for the minimization of the quadratic form of a given linear equation system. CG converges after 2 steps (the size of the system matrix is m = 2).

The CG method (of English. C onjugate g radients or conjugate gradient method ) is an efficient numerical method to solve large systems of linear equations of the form with symmetric , positive definite system matrix . ${\ displaystyle Ax = b}$ ${\ displaystyle A}$

The method delivers, in exact arithmetic, the exact solution after no more than steps, whereby the size of the square matrix is. However, it is particularly interesting as an iterative method , since the error falls monotonically. The CG method can be classified in the class of the Krylow subspace methods . ${\ displaystyle m}$ ${\ displaystyle m}$ ${\ displaystyle A \ in \ mathbb {R} ^ {m \ times m}}$

It was first proposed in 1952 by Eduard Stiefel and Magnus Hestenes . A method that is equivalent for certain systems of equations was also proposed by Cornelius Lanczos in the early 1950s with the Lanczos method .

Idea of the CG process

The idea of the CG method is that for symmetric and positive definites the minimization of the square shape ${\ displaystyle A}$

{\ displaystyle E (x): = {\ frac {1} {2}} \ langle Ax, x \ rangle - \ langle b, x \ rangle}

is equivalent to solving . This is the standard scalar product . ${\ displaystyle Ax = b}$ ${\ displaystyle \ langle \ cdot, \ cdot \ rangle}$

The gradient of at this point is straight and can therefore be calculated quickly for large, sparsely populated matrices . The idea of the CG method is now to minimize the function over a subspace instead of in the direction of the residual as in the gradient method in another direction . The directions are all conjugated , that is, it applies ${\ displaystyle E}$ ${\ displaystyle x_ {k}}$ ${\ displaystyle \ left. \ nabla E \ right | _ {x_ {k}} = Ax_ {k} -b = -r_ {k}}$ ${\ displaystyle r_ {k}}$ ${\ displaystyle d_ {k}}$ ${\ displaystyle E}$ ${\ displaystyle d_ {k}}$ ${\ displaystyle A}$

{\ displaystyle \ langle Ad_ {i}, d_ {j} \ rangle = 0 \ qquad \ forall i \ neq j}

.

The iterates of the CG method are then chosen such that they form the minimum of in the affine space that is spanned by the vectors and shifted by: ${\ displaystyle x_ {k}}$ ${\ displaystyle E}$ ${\ displaystyle V_ {k}}$ ${\ displaystyle d_ {0}, \ ldots, d_ {k}}$ ${\ displaystyle x_ {0}}$

{\ displaystyle V_ {k}: = x_ {0} + \ operatorname {span} \ {d_ {0}, \ ldots, d_ {k-1} \}.}

It can be shown that the following also applies:

{\ displaystyle V_ {k} = x_ {0} + \ operatorname {span} \ {r_ {0}, Ar_ {0} \ ldots, A ^ {k-1} r_ {0} \}.}

The last part shows that the search directions span the Krylow space to A and . The CG method can therefore alternatively be defined directly as a Krylow subspace method. ${\ displaystyle r_ {0}}$

Since the vectors are all -conjugated, the dimension of is even if the vectors are. You can show that is when is. If there is a matrix, the process terminates after steps at the latest , if the calculation is precise. Numerical errors can be eliminated by further iterations. To do this, consider the gradient that specifies the residual. If the norm of this residual falls below a certain threshold value, the process is terminated. ${\ displaystyle d_ {k}}$ ${\ displaystyle A}$ ${\ displaystyle V_ {k}}$ ${\ displaystyle k}$ ${\ displaystyle d_ {k} \ neq 0}$ ${\ displaystyle r_ {k} = 0}$ ${\ displaystyle d_ {k} = 0}$ ${\ displaystyle A}$ ${\ displaystyle m \ times m}$ ${\ displaystyle m}$ ${\ displaystyle r_ {k}}$

The process gradually builds an orthogonal basis for the and minimizes it as best as possible in the respective direction. ${\ displaystyle A}$ ${\ displaystyle \ mathbb {R} ^ {m}}$

The problem with the iterative method is finding the optimal step size. In order to determine the quality of a point, a complete matrix multiplication is necessary, which at the same time delivers a new gradient. If the step size along a given gradient is too imprecise, the method corresponds more to a simple mountain climbing algorithm .

CG process without preconditioning

First you choose any and calculate: ${\ displaystyle x_ {0} \ in \ mathbb {R} ^ {m}}$

{\ displaystyle r_ {0} = b-Ax_ {0}}

{\ displaystyle d_ {0} = r_ {0}}

For one executes: ${\ displaystyle k = 0.1, ...}$

Save matrix-vector-product to calculate it just once

{\ displaystyle {\ begin {aligned} z & = Ad_ {k} \ end {aligned}}}

Find from in the direction of the location of the minimum of the function and update the gradient or the residual ${\ displaystyle x_ {k}}$ ${\ displaystyle d_ {k}}$ ${\ displaystyle x_ {k + 1}}$ ${\ displaystyle E}$

{\ displaystyle {\ begin {aligned} \ alpha _ {k} \; & = \; {\ frac {r_ {k} ^ {T} r_ {k}} {d_ {k} ^ {T} \, z }}, \\ [. 2em] x_ {k + 1} \; & = \; x_ {k} + \ alpha _ {k} d_ {k}, \\ [. 4em] r_ {k + 1} \ ; & = \; r_ {k} - \ alpha _ {k} z \ end {aligned}}}

Correct the search direction using and ${\ displaystyle d_ {k + 1}}$ ${\ displaystyle d_ {k}}$ ${\ displaystyle r_ {k + 1}}$

{\ displaystyle {\ begin {aligned} \ beta _ {k} \; & = \; {\ frac {r_ {k + 1} ^ {T} r_ {k + 1}} {r_ {k} ^ {T } r_ {k}}}, \\ [. 2em] d_ {k + 1} \; & = \; r_ {k + 1} + \ beta _ {k} d_ {k}, \ end {aligned}} }

until the residual in the norm is less than a tolerance ( ). ${\ displaystyle \ | r_ {k + 1} \ | <{\ text {tol}}}$

variants

There are different variants of the process, in addition to the first one by Roger Fletcher and Colin Reeves z. B. by Hestenes and Stiefel, by Davidon , Fletcher and Powell or by Polak and Ribière. These are identical for quadratic forms (as defined above), since the other terms vanish due to the orthogonality of the residuals. However, if the CG method is used to minimize a function approximated by a square shape, these variants often show better convergence behavior than the original formulation by Fletcher and Reeves.

${\ displaystyle \ beta _ {k} = {\ frac {r_ {k + 1} ^ {T} r_ {k + 1}} {r_ {k} ^ {T} r_ {k}}}}$ (Fletcher-Reeves)
${\ displaystyle \ beta _ {k} = {\ frac {r_ {k + 1} ^ {T} (r_ {k + 1} -r_ {k})} {r_ {k} ^ {T} r_ {k }}}}$ (Polak-Ribière)
${\ displaystyle \ beta _ {k} = {\ frac {r_ {k + 1} ^ {T} (r_ {k + 1} -r_ {k})} {d_ {k} ^ {T} (r_ { k + 1} -r_ {k})}}}$ (Hestenes boots)

CG process with symmetrical preconditioning (PCG process)

The convergence of the CG method is only assured with symmetrical positive definite matrices. A preconditioner must take this into account. In a symmetrical preconditioning, the equation system is by means of a preconditioner matrix to with transformed and then applied the CG process. ${\ displaystyle Ax = b}$ ${\ displaystyle C = KK ^ {T} \ approx A ^ {- 1}}$ ${\ displaystyle K ^ {T} AKy = K ^ {T} b}$ ${\ displaystyle y = K ^ {- 1} x}$

The matrix is symmetrical because is symmetrical. It is also positive definite, since according to Sylvester's law of inertia and have the same numbers of positive and negative eigenvalues . ${\ displaystyle K ^ {T} AK}$ ${\ displaystyle A}$ ${\ displaystyle A}$ ${\ displaystyle K ^ {T} AK}$

The resulting procedure is the so-called PCG procedure (from English P reconditioned C onjugate G radient):

First you choose any and calculate: ${\ displaystyle x_ {0} \ in \ mathbb {R} ^ {m}}$

{\ displaystyle r_ {0} = b-Ax_ {0}}

{\ displaystyle h_ {0} = Cr_ {0}}

{\ displaystyle d_ {0} = h_ {0}}

For one sets: ${\ displaystyle k = 0.1, \ dotsc}$

Save matrix-vector-product to calculate it just once

{\ displaystyle z = Ad_ {k}}

Find from towards the minimum and update gradients and preconditioned gradients ${\ displaystyle x_ {k}}$ ${\ displaystyle d_ {k}}$ ${\ displaystyle x_ {k + 1}}$

{\ displaystyle \ alpha _ {k} = {\ frac {r_ {k} ^ {T} h_ {k}} {d_ {k} ^ {T} z}}}

{\ displaystyle x_ {k + 1} = x_ {k} + \ alpha _ {k} d_ {k}}

{\ displaystyle r_ {k + 1} = r_ {k} - \ alpha _ {k} z}

( Residual )

{\ displaystyle h_ {k + 1} = Cr_ {k + 1}}

Correct the search direction ${\ displaystyle d_ {k + 1}}$

{\ displaystyle \ beta _ {k} = {\ frac {r_ {k + 1} ^ {T} h_ {k + 1}} {r_ {k} ^ {T} h_ {k}}}}

{\ displaystyle d_ {k + 1} = h_ {k + 1} + \ beta _ {k} d_ {k}}

until the residual in the norm is less than a tolerance ( ). ${\ displaystyle \ | r_ {k + 1} \ | <{\ mbox {tol}}}$

Comparison of ICCG with CG using the 2D Poisson equation

A common preconditioner associated with CG is the incomplete Cholesky decomposition . This combination is also known as ICCG and was introduced by Meijerink and van der Vorst in the 1970s .

Two further preconditioners permitted for the PCG method are the Jacobi preconditioner , where the main diagonal is from , and the SSOR preconditioner ${\ displaystyle C = D ^ {- 1}}$ ${\ displaystyle D}$ ${\ displaystyle A}$

{\ displaystyle C = \ left [{\ tfrac {1} {2- \ omega}} \ left ({\ tfrac {1} {\ omega}} D + L \ right) \ left ({\ tfrac {1} {\ omega}} D \ right) ^ {- 1} \ left ({\ tfrac {1} {\ omega}} D + L \ right) ^ {T} \ right] ^ {- 1}}

with , where the main diagonal and the strict lower triangular matrix of is. ${\ displaystyle \ omega \ in (0, \, 2)}$ ${\ displaystyle D}$ ${\ displaystyle L}$ ${\ displaystyle A}$

Convergence rate of the CG method

It can be shown that the convergence speed of the CG method is through

{\ displaystyle \ | x_ {k} -x \ | _ {A} \ leq 2 \ left ({\ frac {{\ sqrt {\ kappa (A)}} - 1} {{\ sqrt {\ kappa (A )}} + 1}} \ right) ^ {k} \ | x_ {0} -x \ | _ {A}}

is described. Here is the condition of the matrix with respect to the spectral norm, i.e. the matrix norm generated by the Euclidean norm, as well as the energy norm of . The expression is not negative, because the condition number (with regard to a matrix norm generated by a vector norm) of a matrix is always greater than or equal to 1. Since symmetric and positive is definite, the following applies ${\ displaystyle \ kappa (A)}$ ${\ displaystyle A}$ ${\ displaystyle \ | x \ | _ {A} = {\ sqrt {x ^ {T} Ax}}}$ ${\ displaystyle A}$ ${\ displaystyle {\ sqrt {\ kappa (A)}} - 1}$ ${\ displaystyle A}$

{\ displaystyle \ kappa (A) = {\ frac {\ lambda _ {\ mathrm {max}} (A)} {\ lambda _ {\ mathrm {min}} (A)}}}

.

From the minimization property it can also be deduced that

{\ displaystyle {\ frac {\ | x_ {k} -x ^ {*} \ | _ {A}} {\ | x_ {0} -x ^ {*} \ | _ {A}}} \ leq \ max _ {z \ in \ sigma (A)} | p_ {k} (z) |}

,

where any polynomial is of degree with and the solution. With that is spectrum , so the amount of the eigenvalues of matrix meant. It follows that the CG method solves a system into a matrix with only different eigenvalues in steps and that the CG method converges very quickly for systems in which the eigenvalues are concentrated in a few small surroundings. This in turn provides a clue for useful preconditioners: A preconditioner is good if it ensures that the eigenvalues are concentrated. ${\ displaystyle p_ {k} (z)}$ ${\ displaystyle k}$ ${\ displaystyle p_ {k} (0) = 1}$ ${\ displaystyle x ^ {*}}$ ${\ displaystyle \ sigma (A)}$ ${\ displaystyle A}$ ${\ displaystyle k}$ ${\ displaystyle k}$

Extension to asymmetrical matrices

If the system matrix A is asymmetrical but regular , the CG method can be based on the normal equations

{\ displaystyle A ^ {T} Ax = A ^ {T} b}

can be applied, since for a regular matrix A is symmetric and positive definite. This procedure is also called CGNR, since this procedure minimizes the norm of the residual of . Alternatively, there is the CGNE procedure, which ${\ displaystyle A ^ {T} A}$ ${\ displaystyle b-Ax}$

{\ displaystyle AA ^ {T} y = b}

solves with . The error is minimized here. ${\ displaystyle x = A ^ {T} y}$

Both methods have the disadvantage that, on the one hand , what is not always given must be available and, on the other hand, the condition of A is squared in this approach, which can lead to a slowdown in convergence. ${\ displaystyle A ^ {T}}$

literature

CT Kelley: Iterative Methods for Linear and Nonlinear Equations. SIAM, ISBN 0-89871-352-8 . PDF (783 kB)
P. Knabner, L. Angermann: Numerics of partial differential equations. Springer, ISBN 3-540-66231-6 .
A. Meister: Numerics of linear systems of equations. Vieweg 1999, ISBN 3-528-03135-2 .
William H., Teukolsky, Saul A .: Numerical Recipes in C ++. Cambridge University Press 2002, ISBN 0-521-75033-4 .
JR Shewchuck: An Introduction to the Conjugate Gradient Method Without the Agonizing Pain (PDF) (503 kB).
Eduard Stiefel : About some relaxation calculation methods. ZAMP, 3 (1), 1-33, 1952.

Individual evidence

↑ Hestenes, Stiefel: Methods of conjugate gradients for solving linear systems , Journal of Research of the National Bureau of Standards, Vol. 49, 1952, pp. 409-436, doi : 10.6028 / jres.049.044

[1] Hestenes, Stiefel: Methods of conjugate gradients for solving linear systems , Journal of Research of the National Bureau of Standards, Vol. 49, 1952, pp. 409-436, doi : 10.6028 / jres.049.044