Numerical linear algebra

from Wikipedia, the free encyclopedia
The modeling using finite elements , as shown here for the stress analysis of a reciprocating piston (diesel engine), leads to linear systems of equations with a large number of equations and unknowns.

The numerical linear algebra is a central branch of numerical mathematics . It deals with the development and analysis of computational methods ( algorithms ) for problems in linear algebra , in particular the solution of linear systems of equations and eigenvalue problems . Such problems play a major role in all natural and engineering sciences , but also in econometrics and statistics .

The algorithms of numerical linear algebra can be roughly divided into two groups: into the direct methods , which theoretically deliver the exact solution of a problem after a finite number of calculation steps, and into the iterative methods , in which the exact solution is gradually approximated. However, since the direct methods only provide approximations for the exact solution due to the rounding errors that occur when calculating with finite accuracy, this distinction is only important for the development and investigation of the method itself; it does not play a major role in practical use. Historically, the first systematic methods from both groups - the direct Gaussian elimination method and the iterative Gauss-Seidel method - go back to Carl Friedrich Gauß . Examples of important methods of the 20th century that resulted in numerous improvements and further developments are the decomposition method by André-Louis Cholesky , the QR method for eigenvalue problems by John GF Francis and Wera Nikolajewna Kublanowskaja and the CG method by Eduard Stiefel and Magnus Hestenes as the first representative of the important Krylow subspace methods .

Introduction to the problems

A central starting point of elementary linear algebra - also from a historical point of view - are linear systems of equations. We consider equations of shape

for strangers . The coefficients and are given numbers; the sought values ​​for should be determined in such a way that all equations are fulfilled. The coefficients can be combined into a matrix ; the numbers and the unknowns form column vectors and . This results in the matrix-vector representation

of a linear system of equations: We are looking for a vector that yields the given vector when multiplying the matrix-vector with the given matrix . As a sub-area of ​​numerics, numerical linear algebra only considers so-called correctly posed problems, in particular only those problems that have a solution and for which the solution is uniquely determined. In particular, it is always assumed in the following that the matrix is regular , i.e. has an inverse . Then there is a uniquely determined solution of the linear system of equations for each right-hand side , which can be formally specified as.

However, many important applications lead to systems of linear equations with more equations than unknowns. In the matrix-vector representation , the matrix has more rows than columns in this case . Such over-determined systems generally have no solution. One therefore avoids choosing the vector in such a way that the difference , the residual , becomes “as small as possible” in a sense that has yet to be determined. In what is by far the most important case, the so-called linear equalization problem , the method of least squares is used: Here, the choice is made so that the sum of squares is minimal, with the components of the difference vector denoting. With the help of the Euclidean norm, it can also be written like this: You choose so that it becomes minimal.

In addition to the linear equations, the eigenvalue problems are another central topic of linear algebra. A matrix with rows and columns is given here ; we are looking for numbers and vectors , so that the equation

is satisfied. One then calls an eigenvector of to eigenvalue . The problem of finding all eigenvalues ​​and eigenvectors of a matrix is ​​tantamount to diagonalizing them . That means: Find a regular matrix and a diagonal matrix with . The diagonal entries of are then the eigenvalues ​​of and the columns of are the associated eigenvectors.

These problems occur in all natural and engineering sciences. But they also play a major role in economics and statistics - and thus in all areas that use statistical methods. Linear systems of equations describe, for example, models in statics , electrical networks or economic interdependencies . Apparently different tasks such as the stability analysis of dynamic systems , resonance phenomena with vibrations , the determination of a PageRank or the main component analysis in statistics all lead to eigenvalue problems. Linear equations are also created by linearization and discretization within other numerical methods. For example, numerous models in science and technology can be described by partial differential equations . Their numerical solution using difference or finite element methods leads to systems with a large number of unknowns.

For the sake of simplicity, it is assumed in this review that all given matrices and vectors are real, that is, that all of their entries are real numbers . Most of the time, the methods mentioned can be generalized directly to complex numbers ; Instead of the orthogonal matrices, for example, their complex counterpart, the unitary matrices, appears . Sometimes it is also advantageous to trace a given complex problem back to a real one - for example by considering the real and imaginary part . Additional considerations arise, however, for eigenvalue problems with non-symmetric real matrices, because these can also have non-real eigenvalues ​​and eigenvectors.

history

The beginnings: Gauss and Jacobi

Lithograph by Gauss in the Astronomische Nachrichten , 1828 by Bendixen

Solutions to specific problems have been handed down since ancient times , which from today's perspective can be viewed as systems of linear equations. The nine chapters of arithmetic , in which the state of Chinese mathematics of the 1st century AD is summarized, already contained tabular calculation rules that corresponded to elimination methods in matrix representation. The systematic investigation of linear systems of equations began towards the end of the 17th century with their formulation using general coefficients. After preliminary work by Gottfried Wilhelm Leibniz and Colin Maclaurin , Gabriel Cramer published an explicit solution formula for any number of unknowns with the help of determinants in 1750 . With this Cramer rule , the problem was theoretically completely solved, also with regard to the existence and uniqueness of solutions. For their practical calculation, the formula proved to be completely unsuitable because the computational effort increases astronomically quickly with the number of unknowns (see also Cramer's rule # Computational effort ).

The first use and description of systematic calculation methods for linear equations goes back to Carl Friedrich Gauß (1777–1855). At the beginning of the 19th century, determining the orbital data of astronomical objects and surveying the country by triangulation were the most important application tasks in mathematical practice. In 1801 Gauss caused a sensation when he succeeded in determining the orbit of the newly discovered minor planet Ceres so precisely from a few observations that Ceres could be found again at the end of the year. For the associated overdetermined system of equations, he used the method of least squares he had discovered . The elimination method he used to calculate the solution was systematically described by Gauss from 1809 onwards in the course of determining the orbit of the asteroid Pallas , but still directly applied to sums of squares.

Carl Jacobi

The first iteration method for solving systems of linear equations - the Gauß-Seidel method - was also developed by Gauß. In a letter to Christian Ludwig Gerling in 1823, he reported on a new, simple procedure with which the solution could be increasingly approximated step by step. Gauss, who was meanwhile busy with the triangulation of the Kingdom of Hanover , writes in it that he still calculates one iteration step almost every evening; that is a pleasant change from the uniform recording of the measurement data. The process is said to be so little prone to errors that it can even be carried out “half asleep”. In 1845 Carl Gustav Jacob Jacobi published another, similar iteration method, the Jacobi method . When Philipp Ludwig von Seidel , a student of Jacobi, had to solve a system with 72 unknowns in 1874, he developed a modified, improved version of this method. As it turned out in retrospect, this method is equivalent to Gauss's iteration method, which Seidel probably didn't know about. Jacobi also published an iterative method for transforming matrices in 1846, which is suitable for solving the eigenvalue problem for symmetric matrices and is also known today as the Jacobi method . However, he himself only used it as a preparatory step to make the diagonal entries of the matrix more dominant.

20th century

In 1923, a procedure developed by André-Louis Cholesky was published that solves certain systems of linear equations by decomposing the coefficient matrix into a product of two simpler matrices, the Cholesky decomposition . The Gaussian elimination process also turned out to be a special case of such a matrix decomposition process. Algorithms from this group of methods are still the standard methods for solving moderately large systems today.

From the end of the 1920s, new ideas for the iterative solution of eigenvalue problems emerged, beginning in 1929 with the introduction of the power method by Richard von Mises . As with the further development of inverse iteration by Helmut Wielandt in 1944, these simple vector iteration methods can only ever calculate eigenvectors for a single eigenvalue. A complete solution of the eigenvalue problem for arbitrary matrices remained complex. The breakthrough came here in 1961–1962 with the development of the QR method by the British computer scientist John GF Francis and independently of that by the Russian mathematician Vera Nikolajewna Kublanowskaja . While Kublanowskaja showed a deep understanding of the convergence properties of the method from the beginning, Francis mainly worked on implementation details that made the method fast and stable. The QR method is still the standard method for calculating all eigenvalues ​​and eigenvectors that are not too large.

Eduard Stiefel, 1955

The solution of linear systems of equations with very large matrices, as they occur in the discretization of partial differential equations, remained difficult. These matrices have relatively few non-zero entries, and it is critical that a numerical method take advantage of this property. A new approach to this, which turned out to be the starting point for numerous further developments, was the CG method developed in 1952 by Eduard Stiefel and Magnus Hestenes . In the special case that the matrix is ​​symmetrical and also positive definite , the linear system of equations is replaced by an equivalent optimization problem . Another approach to the CG method, which was discovered at the same time by Cornelius Lanczos , turned out to be even more fruitful : The approximations calculated by the CG method are located in an ascending chain of sub-spaces , the Krylow spaces .

Despite the discovery of these connections, it took a relatively long time to develop specific generalizations of the CG method. The BiCG method published in 1974 by Roger Fletcher can theoretically be used for any matrices, but has proven to be unstable in practice in many cases. The MINRES method , published in 1975, is a Krylow subspace method , for which the matrix must be symmetrical, but not necessarily positively definite as in the CG method. In the period that followed, numerous further developments were examined, in particular stabilization attempts for the BiCG process. An example of a widely used Krylow subspace method for any linear system of equations is a generalization of the MINRES method, the GMRES method presented in 1986 by Yousef Saad and Martin H. Schultz . Other methods use syntheses from ideas of the BiCG group and GMRES, such as the QMR method ( Roland W. Freund and Noel M. Nachtigal , 1991) and the TFQMR method (Freund, 1993). From the beginning, Krylow's subspace methods were also used to calculate eigenvalues, starting with a method by Lanczos in 1950 and the Arnoldi method by Walter Edwin Arnoldi in 1951.

Basic principles

“The field of numerical linear algebra is more beautiful, and more fundamental, than its rather dull name may suggest. More beautiful, because it is full of powerful ideas that are quite unlike those normally emphasized in a linear algebra course in a mathematics department. [...] More fundamental, because, thanks to a trick of history, 'numerical' linear algebra is really applied linear algebra. "

“The field of numerical linear algebra is nicer and more fundamental than its rather bland name suggests. Nicer because it's full of powerful ideas that are very different from the ones that are usually highlighted as significant in a linear algebra class at a math institute. […] More fundamental, because 'numerical' linear algebra is actually applied linear algebra, thanks to a trick of history . "

Exploitation of structures

Occupation structure of a sparse matrix, as occurs in the finite element method; the small black squares represent the non-zero matrix entries.

Models and questions in science and technology can lead to problems of linear algebra with millions of equations. The entries of a matrix with one million rows and columns require 8 terabytes of storage space in the double-precision format  . This shows that the provision of data to a problem, let alone its solution, is a challenge if its special structure is not also taken into account. Fortunately, many important applications - such as the discretization of partial differential equations with the finite element method - lead to a large number of equations, but there are relatively few unknowns in each individual equation. For the associated matrix, this means that there are only a few entries not equal to zero in each row; the matrix is, as they say, sparse . There are numerous methods of efficiently storing such matrices and using their structure. Methods in which matrices only appear in matrix-vector products are particularly well suited for sparse problems, since all multiplications and additions with zero do not have to be carried out explicitly. Algorithms in which the matrix itself is reshaped, on the other hand, are usually difficult to implement, since the sparse population is then generally lost.

In general, the occupation structure, i.e. the number and position of the matrix entries not equal to zero, has a very large influence on the theoretical and numerical properties of a problem. This becomes particularly clear in the extreme case of diagonal matrices , i.e. matrices that only have entries other than zero on the main diagonal. A linear system of equations with a diagonal matrix can be solved simply by dividing the entries on the right-hand side by the diagonal elements, i.e. by means of divisions. Linear compensation problems and eigenvalue problems are also trivial for diagonal matrices. The eigenvalues ​​of a diagonal matrix are its diagonal elements and the associated eigenvectors are the standard basis vectors .

Another important special case are the triangular matrices , in which all entries above or below the main diagonal are zero. Systems of equations with such matrices can simply be solved in sequence from top to bottom or from bottom to top by inserting them forwards or backwards . The eigenvalues ​​of triangular matrices are again trivially the entries on the main diagonal; Associated eigenvectors can also be determined by inserting forwards or backwards. Another common special case of sparsely populated matrices are the band matrices : Here only the main diagonal and a few adjacent secondary diagonals have entries other than zero. The upper Hessenberg matrices are a weakening of the upper triangular matrices , in which the secondary diagonal below the main diagonal is also occupied. Eigenvalue problems can be transformed into equivalent problems for Hessenberg or tridiagonal matrices with relatively little effort .

But not only the occupation structure, but also other matrix properties play an important role in the development and analysis of numerical methods. Many applications lead to problems with symmetric matrices. In particular, the eigenvalue problems are much easier to handle if the given matrix is ​​symmetrical, but even with linear systems of equations, the solution effort is generally reduced by about half in this case. Further examples of types of matrices for which specialized algorithms exist are the Vandermonde matrices , the Toeplitz matrices and the circulant matrices .

Failure analysis: vector and matrix norms

Different vector norms are used in mathematics to measure the “size” of a vector . The best known and most common is the Euclidean norm

,

thus the square root of the sum of the squares of all vector components. In the known geometric illustration of vectors as arrows in two- or three-dimensional space, this corresponds precisely to the length of the arrow. However, depending on the research question, other vector norms such as the maximum norm or the 1 norm may be more suitable.

If vectors are to be understood as an approximation for , the accuracy of this approximation can be quantified with the aid of a vector norm. The norm of the difference vector

is called the (normal) absolute error . If one considers the absolute error in relation to the norm of the "exact" vector , one obtains the (norm-wise) relative error

.

Since the relative error is not influenced by the scaling of and , this is the standard measure for the difference between the two vectors and is often simply referred to as "error".

The “size” of matrices is also measured using norms , the matrix norms . For the selection of a matrix norm it is essential that it “fits” the vector norm used, in particular the inequality should be fulfilled for all . If one defines the smallest number for a given vector norm , so that applies to all , then one obtains the so-called natural matrix norm . For each vector norm there is a natural matrix norm induced by it: for the Euclidean norm this is the spectral norm, for the maximum norm it is the row sum norm and for the 1 norm the column sum norm . Analogous to vectors, the relative error

when approximating a matrix can be quantified by a matrix .

Condition and stability

Two-dimensional illustration: The multiplication by a matrix A distorts the unit circle (blue) to an ellipse (green). The condition number of A is the ratio of the major semiaxis λ 1 to the minor semiaxis λ 2 , so it measures the strength of the distortion.

In the case of problems from practice, given sizes are usually subject to errors, the data errors . For example, in the case of a system of linear equations, the given right-hand side can originate from a measurement and therefore have a measurement error. But even with quantities known theoretically with any precision, rounding errors can not be avoided when they are represented in the computer as floating point numbers . It must therefore be assumed that instead of the exact system, in reality there is a system with a disturbed right-hand side and, accordingly, a "wrong" solution . The fundamental question now is how strongly perturbations of the given quantities affect perturbations of the quantities sought. If the relative error of the solution is not significantly larger than the relative error of the input variables, one speaks of a well- conditioned problem , otherwise of a badly conditioned problem. For the example of linear systems of equations, this can be estimated

to prove. So the problem is well conditioned when the product of the norm of the coefficient matrix and the norm of its inverse is small. This important parameter is called the condition number of the matrix and is designated by. In real problems, it is usually not only the right-hand side, as shown here, that is faulty, but also the matrix . Then a similar, more complicated estimate applies, but which is also the key figure for determining the condition of the problem in the case of small data errors. The definition of the condition number can be generalized to non-square matrices and then also plays an essential role in the analysis of linear equalization problems. How well such a problem is conditioned depends not only on the condition number of the coefficient matrix , as in the case of linear systems of equations , but also on the right side , more precisely on the angle between the vectors and . According to Bauer-Fike's theorem , the condition of the eigenvalue problem can also be described with condition numbers. Here, however, it is not the number with which disturbances of the eigenvalues ​​can be estimated, but the condition number of the matrix that diagonalizes via .

While condition is a property of the problem to be solved, stability is a property of the process used. A numerical algorithm generally does not provide the exact solution to the problem - even with precisely imagined input data. For example, an iterative procedure that gradually approximates a true solution more and more precisely has to break off after a finite number of steps with the approximate solution achieved up to that point. But even with direct methods, which theoretically produce the exact solution in a finite number of calculation steps, rounding errors occur during implementation on the computer for every calculation operation. In numerical mathematics, two different stability terms are used, forward stability and backward stability. In general, let us be an input quantity of a problem and its exact solution, understood as the value of a function applied to . Even if the input variable is considered to be exactly given, the calculation with an algorithm will deliver another, “wrong” result , interpreted as the value of another “wrong” function also applied to . An algorithm is called forward stable if it does not differ significantly more than would be expected anyway due to the errors in the input variable and the condition of the problem. A formal definition of this term gives an obvious and relatively clear measure of stability, but with complicated algorithms it is often difficult to examine their forward stability. Therefore, according to an idea by James H. Wilkinson , a so-called backward analysis is generally considered first: For this purpose, a value is determined with , that is: The “wrong” value calculated by the method is interpreted as the “correct” value, but that with a different value the input variable was calculated. An algorithm is called backward stable if it does not differ significantly more than would be expected anyway due to the errors in this input variable. It can be proven that a backward stable algorithm is also forward stable.

Orthogonality and Orthogonal Matrices

As linear algebra shows, there is a close connection between matrices and bases in vector space . If linearly independent vectors are given, then these are a basis of the space and every other vector can be represented uniquely as a linear combination of the basis vectors. A change of base corresponds to the multiplication of given vectors and matrices with a transformation matrix. The orthonormal bases form an important special case . Here the basis vectors are orthogonal to each other in pairs (“are perpendicular to each other”) and are also all normalized to Euclidean length 1, like the standard basis in three-dimensional space. If you combine the basis vectors column by column to form a matrix

together, a so-called orthogonal matrix is obtained in the case of an orthonormal basis .

Orthonormal bases and orthogonal matrices have many remarkable properties on which the most important methods of modern numerical linear algebra are based. The fact that in an orthogonal matrix , the columns form an orthonormal basis, can be described by the equation in matrix notation expressed, wherein the transposed matrix and the unit matrix call. This shows again that an orthogonal matrix is regular and its inverse equals its transpose: . The solution of a linear system of equations can therefore be determined very easily, it applies . Another fundamental property is that multiplying a vector by an orthogonal matrix leaves its Euclidean norm unchanged

.

This also follows for the spectral norm and for the condition number

,

then is also an orthogonal matrix. Multiplications with orthogonal matrices do not increase the relative error.

Orthogonal matrices also play an important role in the theory and numerical treatment of eigenvalue problems. According to the simplest version of the spectral theorem , symmetric matrices can be orthogonally diagonalized. This means: For a matrix for which applies, there is an orthogonal matrix and a diagonal matrix with

.

On the diagonal of are the eigenvalues ​​of and the columns of form an orthonormal basis from eigenvectors. In particular, according to the Bauer-Fike theorem mentioned above, the eigenvalue problem for symmetric matrices is always well conditioned. With the so-called Schur's normal form there is a generalization of this orthogonal transformation for non-symmetric matrices.

A multiplication with a Householder matrix mirrors a given vector with a suitable choice of the mirror plane on the x-axis.

There are two special, easy-to-use types of orthogonal matrices that are used in countless concrete methods of numerical linear algebra: the Householder matrices and the Givens rotations . Householder matrices have the shape

with a vector with . Geometrically, they describe reflections of the -dimensional space on the -dimensional hyperplane through the zero point, which is orthogonal to . Its essential property is the following: For a given vector , a vector can easily be determined so that the associated Householder matrix transforms the vector to a multiple of : with . This transforms all entries from the first to zero. If you apply suitable Householder transformations column by column to a matrix in this way, all entries can be transformed from below the main diagonal to zero.

Givens rotations are special rotations of the that rotate one two-dimensional plane and leave the other dimensions fixed. Transforming a vector with a Givens rotation therefore only changes two entries of . By suitable choice of the angle of rotation, one of the two entries can be set to zero. While Householder transformations, applied to matrices, transform entire partial columns, Givens rotations can be used to change individual matrix entries in a targeted manner.

Householder transformations and Givens rotations can therefore be used to transform a given matrix to an upper triangular matrix, or in other words, to calculate a QR decomposition into an orthogonal matrix and an upper triangular matrix. The QR decomposition is an important and versatile tool that is used in numerous procedures from all areas of numerical linear algebra.

Similarity transformations

In linear algebra, the characteristic polynomial , a polynomial of degree , is used to investigate the eigenvalue problem of a matrix with rows and columns . The eigenvalues ​​of are exactly the zeros of . With the fundamental theorem of algebra it follows directly that it has eigenvalues exactly if they are counted with their multiplicity . However, even with real matrices, these eigenvalues ​​can be complex numbers. But if there is a real symmetric matrix, then its eigenvalues ​​are all real.

Although the characteristic polynomial is of great theoretical importance for the eigenvalue problem, it is not suitable for numerical calculation. This is mainly due to the fact that the problem of calculating the zeros of the associated polynomial from given coefficients is generally very poorly conditioned: Small perturbations such as rounding errors in the coefficients of a polynomial can lead to a large shift of its zeros. This would replace a possibly well-conditioned problem - the calculation of the eigenvalues ​​- by a mathematically equivalent but poorly conditioned problem - the calculation of the zeros of the characteristic polynomial. Many numerical methods for calculating eigenvalues ​​and eigenvectors are therefore based on another basic idea, the similarity transformations: Two square matrices and are called similar if there is a regular matrix with

gives. It can be shown that matrices that are similar to one another have the same eigenvalues, so the searched eigenvalues ​​do not change in a similarity transformation of the matrix to the matrix . The associated eigenvectors can also be easily converted into one another: If an eigenvector is , then an eigenvector of has the same eigenvalue. This leads to basic ideas that are used in numerous algorithms. The matrix is converted by similarity transformation into a matrix for which the eigenvalue problem can be solved more efficiently, or a sequence of similarity transformations is constructed in which the matrix approaches a diagonal or triangular matrix ever closer. For the reasons mentioned above, orthogonal matrices are mostly used for the transformation matrices.

Process and process classes

Gaussian elimination method

The classic Gauss elimination method for solving systems of linear equations is a starting point and a benchmark for further developed methods. However, it is also still widely used in practice as a simple and reliable method - especially in its modification as LR decomposition (see below) - for well-conditioned systems that are not too large. The method systematically eliminates variables from the given equations by subtracting appropriate multiples of one equation from another equation until a step-by-step system results that can be solved in sequence from bottom to top.

Numerical considerations come into play when considering the stability of the process. If an element in the same column is to be eliminated with the -th diagonal element of the matrix , then the quotient must be used

subtract the -fold of the -th line from the -line. For this purpose, at least what can always be achieved for a regular matrix by suitable line interchanges must apply . But more than that, if it is very small compared to , then this would result in a very large amount of . In the following steps there would then be the risk of positions being deleted by subtracting large numbers and the method would be unstable. It is therefore important to swap lines, known as pivoting , to ensure that the amounts remain as small as possible.

Factorization method

The most important direct methods for solving systems of linear equations can be represented as factorization methods. Their basic idea is to break down the coefficient matrix of the system into a product of two or more matrices, in general for example . The linear system of equations is thus and is solved in two steps: First the solution of the system is calculated and then the solution of the system . It then holds , so is the solution to the original problem. At first glance, only the task of solving one system of linear equations seems to be replaced by the task of solving two systems of linear equations. The idea behind it, however, is to choose the factors and in such a way that the two subsystems are much easier to solve than the initial system. An obvious advantage of the method class arises in the case that several linear equation systems with the same coefficient matrix but different right-hand sides are to be solved: The factorization of , generally the most complex method step, then only has to be calculated once.

LR decomposition

The Gaussian elimination method can be viewed as a factorization method. If the coefficients for are entered in a matrix, the result is without interchanging rows with a lower triangular matrix and an upper triangular matrix . In addition, is unipotent , i.e. all entries on the main diagonal of are equal to 1. As we have seen, lines of must generally be swapped with Gaussian elimination . This can be represented formally with the help of a permutation matrix by factoring the row permutation matrix instead of :

.

According to the basic principle of the factorization method, the triangular matrices and , if necessary, the associated permutation are determined as described to solve for . In the next step, the row permuted right side is solved by inserting it forward and then inserting it backwards.

The LR decomposition and thus the Gaussian elimination method is "almost always stable" with suitable pivoting, which means that in most practical application tasks there is no great increase in errors. However, pathological examples can be given in which the procedural errors increase exponentially with the number of unknowns.

Cholesky decomposition

The Cholesky decomposition, like the LR decomposition, is a factorization of the matrix into two triangular matrices for the case that occurs in many applications that symmetric and positive is definite, i.e. it is fulfilled and has only positive eigenvalues. Under these conditions there is a lower triangular matrix with

.

A general approach for the matrix entries leads to an explicit method with which these can be calculated column by column or row by row one after the other, the Cholesky method. This utilization of the symmetry of reduces the computational effort compared to the LR decomposition to about half.

Symmetrical and positive definite coefficient matrices occur classically in the formulation of the so-called normal equations for the solution of linear compensation problems. One can show that the problem of minimizing is equivalent to the system of linear equations

to solve. The coefficient matrix of these normal equations is symmetrical and, if the columns are independent of linear, also positive definite. So it can be solved using the Cholesky method. However, this approach is only recommended for well-conditioned problems with few unknowns. In general, the system of normal equations is in fact significantly worse conditioned than the originally given linear compensation problem. It is then better not to go the detour via the normal equations, but to use a QR decomposition of directly .

QR decomposition

The linear system of equations can be calculated after a QR decomposition

be solved directly according to the general principle of factoring procedures; it is only with be determined by backward substitution. Due to the good condition of orthogonal matrices, the possible instabilities of the LR decomposition do not occur. However, the computational effort is generally about twice as great, so that the methods may have to be weighed up.

The QR decomposition is also the common method for solving not too large, well-conditioned linear compensation problems. For the problem

Minimize

applies with and

.

It was used that is orthogonal, i.e. receives the Euclidean norm, and that applies. The last expression can be minimized simply by inserting the first lines of backwards .

Fixed point iteration with splitting method

A completely different idea to solve, is to provide a starting vector to choose and that gradually , to calculate more new approximations to the desired solution. In the case of the convergence of the sequence against , this iteration is then terminated after a suitable number of steps with a sufficiently precise approximation for . The simplest and most important methods of this type use an iteration of the shape

with a suitable matrix and vector . It can be shown that such methods converge if and only if all eigenvalues ​​of have an absolute value less than 1. In this case, the iterates converge to a solution of the equation , i.e. to a fixed point of the iteration function .

A systematic approach in the search for suitable algorithms of this type enables the idea of ​​the splitting process. This turns the matrix into a sum

decomposed with an easy to invert matrix and the rest . Inserting and rearranging the equation results in the fixed point equation

.

With and one obtains an iterative method of the shape which, in the case of convergence, yields the solution of . The speed of convergence is greater, the smaller the eigenvalue of the iteration matrix with the greatest magnitude . This can also be estimated using any matrix norms of .

The classic case for splitting procedure uses the Jacobi method for the diagonal matrix with the main diagonal , the Gauss-Seidel method to lower triangular portion of . The idea of relaxation can be used to accelerate the convergence of the fixed point method . Think of iteration in the form

with the correction in the illustrated th step, one proceeds with a suitably chosen relaxation parameter to

over. For example, the SOR method is obtained in this way from the Gauss-Seidel method .

Jacobi method for the calculation of eigenvalues

A simple but reliable iterative method for solving the eigenvalue problem for symmetric matrices is the Jacobi method. By successive similarity transformations with Givens rotations, it generates a sequence of symmetrical matrices, which are all similar to the given symmetrical matrix and which converge to a diagonal matrix . If the method is aborted after a suitable number of steps, one therefore obtains approximations for the sought eigenvalues ​​of with the diagonal entries .

In each step, the Givens rotation, also called Jacobi rotation in this context , is selected in such a way that the entry at the matrix position and the one symmetrically to it are transformed to zero. It should be noted, however, that with this transformation the whole th and th row as well as the whole th and th column of the matrix are changed. Therefore, the zeros generated in one step are generally canceled out again in the following steps. Nevertheless, given a suitable choice of the positions for the Jacobi rotations, all non-diagonal elements converge to zero. For this purpose, the classic Jacobi method selects that position in each iteration step where the off-diagonal element with the greatest absolute value is located. In a manual calculation, this position could usually be recognized quickly, but when implemented as a computer program, the effort for the search for it is considerable compared to the other arithmetic operations. Therefore the cyclic Jacobi method is mostly used today. The positions are cycled through in a previously fixed sequence, for example simply in columns. It can be shown that both the classical and the cyclic Jacobi method always converge. Compared to more modern algorithms, however, convergence is relatively slow. The Jacobi method is not suitable for sparsely populated matrices, since in the course of the iteration the matrix is ​​filled with more and more non-zero entries.

Vector iteration

A simple starting idea for the calculation of eigenvectors of a matrix is the power method . A start vector is iteratively multiplied again and again

or, expressed with the -th matrix power, it is calculated. Behind this is the geometric notion that the vector is stretched most strongly in the direction of the eigenvector with the greatest eigenvalue in each step. In this simple form, however, vector iteration is unsuitable for practice, since in general the entries quickly become very small or very large. Therefore the vector is normalized with a vector norm in addition to the multiplication with . One can then prove, under certain prerequisites for the position of the eigenvalues, that this method actually converges against an eigenvector to the greatest eigenvalue except for a scalar prefactor.

If one applies this idea formally to the inverse matrix , one obtains an eigenvector with the smallest eigenvalue of . For this, of course, the inverse itself is not calculated, but the linear system of equations is used in each step

solved. A further generalization of the idea is obtained with the help of a so-called shift parameter . An eigenvector of the eigenvalue closest to it is namely also an eigenvector of the eigenvalue of the “shifted” matrix with the smallest amount . With the associated iteration

and normalization of in each step results in the method of inverse vector iteration .

Vector iteration methods thus first calculate a certain eigenvector of , the associated eigenvalue can be obtained with the help of the Rayleigh quotient . They are obviously well suited when - as is often the case in certain applications - only the largest, only the smallest or more generally only a single eigenvalue including its eigenvector is sought.

QR procedure

The QR method is currently the most important algorithm for calculating all eigenvalues ​​and eigenvectors of not too large, fully populated matrices . It is an iterative method that uses QR decomposition in each step to generate, through repeated similarity transformations, a matrix sequence that quickly converges to an upper triangular matrix. Starting with the output matrix, the basic idea in the -th step is the QR-disassembled,

,

and then the two factors are multiplied together again in reverse order:

,

to get the new approximation matrix. Because arises and from it ; this transformation is actually a similarity transformation with an orthogonal matrix. As a more detailed analysis shows, there is a close connection to the power method: The QR iteration can be understood as a power method that is applied simultaneously to all vectors of an orthonormal basis; The QR decomposition in each step ensures that these vectors also remain numerically stable orthonormal in the course of the iteration (see also subspace iteration ). This representation also provides a proof that the method converges to an upper triangular matrix under low conditions .

In this simple form, the QR method is not yet suitable for practical use for two reasons. On the one hand, the computational effort for the QR decomposition, which must be determined in each step, is very large. On the other hand, convergence generally takes place only slowly, so many steps have to be carried out in order to achieve the desired accuracy. You can meet the first point by bringing the matrix into a Hessenberg shape through similarity transformations in a preparatory step . This can be achieved through transformations with suitable Householder matrices. Since a Hessenberg matrix only has non-zero entries below the main diagonal, it can be quickly QR-decomposed with the corresponding Givens rotations. As can be easily shown, one step of the QR process is given symmetry and Hessenberg shape. Since a symmetrical Hessenberg matrix is ​​a tridiagonal matrix, the procedure is simplified again considerably in the symmetrical case. Similar to the inverse vector iteration, the speed of convergence can be increased significantly if the matrix is ​​transformed with a cleverly chosen shift parameter instead of the matrix in each step . For the choice of , the value should be an approximation to an eigenvalue of , there are various so-called shift strategies.

With a variant of the QR method, the so-called singular value decomposition of a matrix can also be calculated. This generalization of diagonalization to any - even non-square - matrices is used directly in some applications, such as image compression . The singular value decomposition can also be used to solve large, poorly conditioned linear compensation problems.

Krylov subspace method

The Krylow subspace methods with their numerous variants and specializations are the most important group of methods for solving both linear systems of equations and eigenvalue problems if the given matrix is very large and sparsely populated. The historically first algorithm from this group is the conjugate gradient method , CG method briefly (from English c onjugate g radients ) for solving linear systems of equations with symmetric and positive definite coefficient matrices.

CG procedure

The fruitful connection between the CG method and Krylow subspaces was only recognized later; its basic idea is different: Instead of the system of equations, it solves an optimization problem equivalent to it . If namely symmetric and positive definite, then the solution of is the uniquely determined minimal place of the function

In the two-dimensional case, the contour lines of the function to be minimized are ellipses. If you always choose the direction of the steepest descent, this leads to a zigzag course (green). The CG method uses conjugate directions and ends up with two steps exactly at the minimum (red).
.

This means that basically all numerical methods for solving optimization problems are also available for the linear system of equations, in particular the so-called descent method . These are iterative processes that, starting from the current approximation, in the -th step along a suitable search direction, the one-dimensional optimization problem

"Search so that it becomes minimal."

to solve. The position found becomes the new approximation for the next step. An initially obvious choice for the search direction is the direction of the steepest descent, which leads to the gradient method for determining the minimum point. However, it turns out that the approximations calculated in this way generally only approach the true solution very slowly and in a “zigzag course”. Search directions that take into account the special shape of the function to be minimized are much more suitable . The level sets of are -dimensional ellipsoids (in the illustrative, two-dimensional case, ellipses ), so it is useful to choose the search directions conjugate to one another (in the case of the illustration, this corresponds to the conjugate diameters ). Here, the names of two directions and conjugated if true. The CG method therefore selects the direction of the steepest descent for the first search direction, but the following so that all search directions are conjugate to one another. It can be shown that the true solution is reached after descending. Usually, however, a sufficiently accurate approximate solution is achieved after significantly fewer steps and the process can be terminated prematurely.

From the CG method to the Krylow subspace method

In the computation steps of the CG method, the matrix is only included in the form of matrix-vector products. It itself is not dismantled or reshaped - a great advantage if it is thinly cast. Assuming for simplicity (but without loss of generality ) that as a starting vector , the zero vector is selected, shows a more detailed analysis that any approximation a linear combination of the vectors is so out repeated multiplications of the right side with is established. In other words, each is in a Krylov subspace

.

This property is the characteristic of the Krylow subspace method: They generate iteratively for approximations with . It is also chosen so that the residual is as small as possible in a sense that has yet to be determined. With the CG method, the condition is not necessarily obvious, but it is well suited for the special structure of the problem: With the vector norm weighted by, every step is minimal. The disadvantage here is that this only works if it is actually symmetrical and positive is definite, otherwise it is not a norm at all. In general, the additional conditions that Krylow's subspace method place on the choice of are formulated as so-called projection conditions. One demands that the residual is orthogonal to all vectors from a -dimensional subspace , in symbols

.

They are usually themselves Krylow subspaces, in the simplest case, as in the CG method, for example . For the concrete calculation of the approximations, successive orthonormal bases of the Krylow subspaces involved are built up. The well-known Gram-Schmidt method for orthonormalization is unfortunately numerically unstable in its standard form. However, it can be stabilized with a small modification.

More Krylow subspace methods

A comparison of the norm of the error and the residual in the CG method (blue) and the MINRES method (green). The CG method, which specializes in positively defining matrices, converges faster than the more general MINRES method.

The above-mentioned basic ideas result in numerous variations, adaptations and improvements within this process class, of which only a few are to be mentioned as examples. A direct generalization of the CG method is the BiCG method . It removes the restriction to symmetrical matrices in that, in addition to the Krylow subspaces formed with , it also uses those belonging to the transposed matrix . One optimization that avoids the additional multiplications is the CGS method . Both types of process are unstable in many practical cases, but form the basis for various stabilization attempts, for example in the group of BiCGSTAB processes . Important and generally stable methods are GMRES and its specialization for symmetric matrices, MINRES . You start directly with the residuals and determine in the Krylow subspace so that is minimal. Other improvements to this basic principle include the QMR and TFQMR processes .

Krylow's subspace methods can be used not only for very large sparse linear equation systems, but also to solve such eigenvalue problems - another reason for their great importance in modern numerical linear algebra. Naturally, it is not possible to start with eigenvalue problems ( by definition it is not an eigenvector). The approximations and associated ones are determined here so that

with applies. This approach leads to a -dimensional eigenvalue problem, which can be easily solved for small ones and which approximates some eigenvalues ​​of . The associated basic algorithm is the Arnoldi method . As always with eigenvalue problems, there are clear simplifications for symmetric matrices; these lead to the Lanczos procedure .

literature

Textbooks

About history

  • Jean-Luc Chabert et al. a. (Ed.): A History of Algorithms . Springer, Berlin / Heidelberg 1999, ISBN 978-3-540-63369-3 .
  • Yousef Saad, Henk A. van der Vorst: Iterative solution of linear systems in the 20th century . In: Journal of Computational and Applied Mathematics . tape 123 , 2000, pp. 1-33 .
  • Gene H. Golub, Henk A. van der Vorst: Eigenvalue computation in the 20th century . In: Journal of Computational and Applied Mathematics . tape 123 , 2000, pp. 35-65 .

Web links

Individual evidence

  1. Franka Miriam Brückler: History of Mathematics compact - The most important things from arithmetic, geometry, algebra, number theory and logic . Springer, 2017, ISBN 978-3-540-76687-2 , pp. 103 f .
  2. Chabert: A History of Algorithms. 1999, p. 283 f.
  3. Chabert: A History of Algorithms. 1999, p. 291 f.
  4. Chabert: A History of Algorithms. 1999, p. 296 f.
  5. Chabert: A History of Algorithms. 1999, pp. 300-302.
  6. ^ Golub, Vorst: Eigenvalue computation in the 20th century. 2000, p. 42.
  7. Chabert: A History of Algorithms. 1999, p. 310 f.
  8. ^ Golub, Vorst: Eigenvalue computation in the 20th century. 2000, p. 43.
  9. ^ Golub, Vorst: Eigenvalue computation in the 20th century. 2000, p. 47.
  10. ^ Saad, Vorst: Iterative solution of linear systems in the 20th century. 2000, p. 12 f.
  11. ^ Saad, Vorst: Iterative solution of linear systems in the 20th century. 2000, p. 14 f.
  12. ^ Saad, Vorst: Iterative solution of linear systems in the 20th century. 2000, pp. 15-17.
  13. ^ Golub, Vorst: Eigenvalue computation in the 20th century. 2000, p. 45.
  14. Trefethen, Bau: Numerical Linear Algebra. 1997, p. Ix.
  15. Demmel: Applied Numerical Linear Algebra. 1997, pp. 83-90.
  16. Golub, Van Loan: Matrix Computations. 1996, p. 391 ff.
  17. Golub, Van Loan: Matrix Computations. 1996, p. 183, p. 193, p. 201.
  18. Wolfgang Dahmen, Arnold Reusken: Numerics for engineers and natural scientists . 2nd Edition. Springer, Berlin / Heidelberg 2008, ISBN 978-3-540-76492-2 , p. 18th f .
  19. Wolfgang Dahmen, Arnold Reusken: Numerics for engineers and natural scientists . 2nd Edition. Springer, Berlin / Heidelberg 2008, ISBN 978-3-540-76492-2 , 2.5.1 Operator norms , condition numbers of linear mappings. , S. 26-34 .
  20. ^ Hans Rudolf Schwarz, Norbert Köckler: Numerical Mathematics . 8th edition. Vieweg + Teubner, Wiesbaden 2011, ISBN 978-3-8348-1551-4 , pp. 53 f .
  21. Trefethen, Bau: Numerical Linear Algebra. 1997, p. 131.
  22. Martin Hanke-Bourgeois: Fundamentals of numerical mathematics and scientific computing . 3. Edition. Vieweg + Teubner, Wiesbaden 2009, ISBN 978-3-8348-0708-3 , pp. 214 .
  23. ^ Peter Deuflhard, Andreas Hohmann: Numerical Mathematics 1 . 4th edition. Walter de Gruyter, Berlin 2008, ISBN 978-3-11-020354-7 , p. 44 .
  24. Wolfgang Dahmen, Arnold Reusken: Numerics for engineers and natural scientists . 2nd Edition. Springer, Berlin / Heidelberg 2008, ISBN 978-3-540-76492-2 , p. 44 .
  25. ^ Peter Deuflhard, Andreas Hohmann: Numerical Mathematics 1 . 4th edition. Walter de Gruyter, Berlin 2008, ISBN 978-3-11-020354-7 , p. 49 f .
  26. Trefethen, Bau: Numerical Linear Algebra. 1997, p. 11.
  27. Wolfgang Dahmen, Arnold Reusken: Numerics for engineers and natural scientists . 2nd Edition. Springer, Berlin / Heidelberg 2008, ISBN 978-3-540-76492-2 , p. 94 .
  28. Demmel: Applied Numerical Linear Algebra. 1997, p. 150.
  29. Demmel: Applied Numerical Linear Algebra. 1997, pp. 146-148.
  30. ^ Higham: Accuracy and Stability of Numerical Algorithms. 2002, p. 354 ff.
  31. ^ Peter Deuflhard, Andreas Hohmann: Numerical Mathematics 1 . 4th edition. Walter de Gruyter, Berlin 2008, ISBN 978-3-11-020354-7 , p. 135 .
  32. ^ Börm, Mehl: Numerical Methods for Eigenvalue Problems. 2012, pp. 15-19.
  33. Martin Hanke-Bourgeois: Fundamentals of numerical mathematics and scientific computing . 3. Edition. Vieweg + Teubner, Wiesbaden 2009, ISBN 978-3-8348-0708-3 , pp. 46-57 .
  34. Demmel: Applied Numerical Linear Algebra. 1997, p. 49.
  35. Wolfgang Dahmen, Arnold Reusken: Numerics for engineers and natural scientists . 2nd Edition. Springer, Berlin / Heidelberg 2008, ISBN 978-3-540-76492-2 , p. 88 .
  36. Wolfgang Dahmen, Arnold Reusken: Numerics for engineers and natural scientists . 2nd Edition. Springer, Berlin / Heidelberg 2008, ISBN 978-3-540-76492-2 , p. 127 f .
  37. Demmel: Applied Numerical Linear Algebra. 1997, p. 123 f.
  38. Master: Numerics of linear systems of equations. 2015, p. 64.
  39. ^ Peter Deuflhard, Andreas Hohmann: Numerical Mathematics 1 . 4th edition. Walter de Gruyter, Berlin 2008, ISBN 978-3-11-020354-7 , p. 76 f .
  40. Master: Numerics of linear systems of equations. 2015, pp. 72–75.
  41. Master: Numerics of linear systems of equations. 2015, p. 85.
  42. Master: Numerics of linear systems of equations. 2015, p. 96.
  43. ^ Börm, Mehl: Numerical Methods for Eigenvalue Problems. 2012, p. 39.
  44. ^ Börm, Mehl: Numerical Methods for Eigenvalue Problems. 2012, p. 46.
  45. Demmel: Applied Numerical Linear Algebra. 1997, pp. 232-235.
  46. ^ Peter Deuflhard, Andreas Hohmann: Numerical Mathematics 1 . 4th edition. Walter de Gruyter, Berlin 2008, ISBN 978-3-11-020354-7 , p. 136 f .
  47. ^ Börm, Mehl: Numerical Methods for Eigenvalue Problems. 2012, p. 100.
  48. Trefethen, Bau: Numerical Linear Algebra. 1997, pp. 213-217.
  49. Trefethen, Bau: Numerical Linear Algebra. 1997, pp. 219-224.
  50. Demmel: Applied Numerical Linear Algebra. 1997, pp. 242-246.
  51. Demmel: Applied Numerical Linear Algebra. 1997, pp. 109-117.
  52. Master: Numerics of linear systems of equations. 2015, p. 152.
  53. Wolfgang Dahmen, Arnold Reusken: Numerics for engineers and natural scientists . 2nd Edition. Springer, Berlin / Heidelberg 2008, ISBN 978-3-540-76492-2 , p. 566-575 .
  54. Demmel: Applied Numerical Linear Algebra. 1997, p. 306 f.
  55. Trefethen, Bau: Numerical Linear Algebra. 1997, pp. 293-301.
  56. Trefethen, Bau: Numerical Linear Algebra. 1997, pp. 250-255.
  57. Master: Numerics of linear systems of equations. 2015, p. 146, pp. 165-224.
  58. ^ Börm, Mehl: Numerical Methods for Eigenvalue Problems. 2012, pp. 145–151.
  59. ^ Börm, Mehl: Numerical Methods for Eigenvalue Problems. 2012, pp. 159-165.
This article was added to the list of excellent articles on May 23, 2016 in this version .