Gaussian elimination: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Fredrik (talk | contribs)
→‎Bareiss algorithm: fixing punctuatio
 
(850 intermediate revisions by more than 100 users not shown)
Line 1: Line 1:
{{Short description|Algorithm for solving systems of linear equations}}
In [[mathematics]], '''Gaussian elimination''' or '''Gauss-Jordan elimination''', named after [[Carl Friedrich Gauss]] and [[Wilhelm Jordan]], is an algorithm in [[linear algebra]] for determining the solutions of a [[system of linear equations]], for determining the [[rank of a matrix|rank]] of a [[matrix_(mathematics)|matrix]], and for [[matrix inversion|calculating the inverse]] of an invertible square matrix. When applied to a matrix, Gaussian elemination produces row-echelon form.
[[File:Gaussian elimination animated.gif|thumb|Animation of Gaussian elimination. Red row eliminates the following rows, green rows change their order.]]
In mathematics, '''Gaussian elimination''', also known as '''row reduction''', is an [[algorithm]] for solving [[System of linear equations|systems of linear equations]]. It consists of a sequence of row-wise operations performed on the corresponding [[matrix (mathematics)|matrix]] of coefficients. This method can also be used to compute the [[Rank (linear algebra)|rank]] of a matrix, the [[determinant]] of a [[square matrix]], and the inverse of an [[invertible matrix]]. The method is named after [[Carl Friedrich Gauss]] (1777–1855). To perform row reduction on a matrix, one uses a sequence of [[elementary row operations]] to modify the matrix until the lower left-hand corner of the matrix is filled with zeros, as much as possible. There are three types of elementary row operations:
* Swapping two rows,
* Multiplying a row by a nonzero number,
* Adding a multiple of one row to another row.
Using these operations, a matrix can always be transformed into an [[Triangular matrix|upper triangular matrix]], and in fact one that is in [[row echelon form]]. Once all of the leading coefficients (the leftmost nonzero entry in each row) are 1, and every column containing a leading coefficient has zeros elsewhere, the matrix is said to be in [[reduced row echelon form]]. This final form is unique; in other words, it is independent of the sequence of row operations used. For example, in the following sequence of row operations (where two elementary operations on different rows are done at the first and third steps), the third and fourth matrices are the ones in row echelon form, and the final matrix is the unique reduced row echelon form.
<math display="block">\begin{bmatrix}
1 & 3 & 1 & 9 \\
1 & 1 & -1 & 1 \\
3 & 11 & 5 & 35
\end{bmatrix}\to
\begin{bmatrix}
1 & 3 & 1 & 9 \\
0 & -2 & -2 & -8 \\
0 & 2 & 2 & 8
\end{bmatrix}\to
\begin{bmatrix}
1 & 3 & 1 & 9 \\
0 & -2 & -2 & -8 \\
0 & 0 & 0 & 0
\end{bmatrix}\to
\begin{bmatrix}
1 & 0 & -2 & -3 \\
0 & 1 & 1 & 4 \\
0 & 0 & 0 & 0
\end{bmatrix} </math>


Using row operations to convert a matrix into reduced row echelon form is sometimes called '''{{vanchor|Gauss–Jordan elimination}}'''. In this case, the term ''Gaussian elimination'' refers to the process until it has reached its upper triangular, or (unreduced) row echelon form. For computational reasons, when solving systems of linear equations, it is sometimes preferable to stop row operations before the matrix is completely reduced.
== History ==


== Definitions and example of algorithm ==
The method is named after the [[mathematician]] [[Carl Friedrich Gauss]], but the method is described by [[Liu Hui]]'s comments written in [[263]] A.D. to the Chinese book ''Jiuzhang suanshu'' or ''[[The Nine Chapters on the Mathematical Art]]''.
The process of row reduction makes use of [[elementary row operations]], and can be divided into two parts. The first part (sometimes called forward elimination) reduces a given system to row echelon form, from which one can tell whether there are no solutions, a unique solution, or infinitely many solutions. The second part (sometimes called [[Triangular matrix#Forward and back substitution|back substitution]]) continues to use row operations until the solution is found; in other words, it puts the matrix into reduced row echelon form.


Another point of view, which turns out to be very useful to analyze the algorithm, is that row reduction produces a [[matrix decomposition]] of the original matrix. The elementary row operations may be viewed as the multiplication on the left of the original matrix by [[elementary matrix|elementary matrices]]. Alternatively, a sequence of elementary operations that reduces a single row may be viewed as multiplication by a [[Frobenius matrix]]. Then the first part of the algorithm computes an [[LU decomposition]], while the second part writes the original matrix as the product of a uniquely determined invertible matrix and a uniquely determined reduced row echelon matrix.
== Numerical analysis ==


=== Row operations ===
The [[computational complexity theory|computational complexity]] of Gaussian elimination is [[Big O notation|O(''n''<sup>3</sup>)]], that is, the number of operations required is [[proportional]] to ''n''<sup>3</sup> if the matrix size is ''n''-by-''n''.
There are three types of elementary row operations which may be performed on the rows of a matrix:
# Swap the positions of two rows.
# Multiply a row by a non-zero [[scalar (mathematics)|scalar]].
# Add to one row a scalar multiple of another.


If the matrix is associated to a system of linear equations, then these operations do not change the solution set. Therefore, if one's goal is to solve a system of linear equations, then using these row operations could make the problem easier.
This algorithm can be used on a computer for systems with thousands of equations and unknowns. It is, however, numerically unstable, at least on pathological examples; that is, floating-point errors committed throughout the computation are accumulated and may result in results far from the correct solution. For this reason and for reasons of its prohibitive cost on large matrices, iterative methods, generally based on finding a fixed point of a [[contraction mapping]] (see [[Banach fixed point theorem]]) are generally preferred for larger systems; there also exist specific methods for even larger systems whose coefficients follow a regular pattern. See [[system of linear equations]]. Gaussian elimination is, however, a good method for systems of equations over a field where computations are exact, such as [[finite field]]s.


== Example ==
=== Echelon form ===
{{Main|Row echelon form}}


For each row in a matrix, if the row does not consist of only zeros, then the leftmost nonzero entry is called the ''[[leading coefficient]]'' (or ''pivot'') of that row. So if two leading coefficients are in the same column, then a row operation of [[#Row operations|type 3]] could be used to make one of those coefficients zero. Then by using the row swapping operation, one can always order the rows so that for every non-zero row, the leading coefficient is to the right of the leading coefficient of the row above. If this is the case, then matrix is said to be in row echelon form. So the lower left part of the matrix contains only zeros, and all of the zero rows are below the non-zero rows. The word "echelon" is used here because one can roughly think of the rows being ranked by their size, with the largest being at the top and the smallest being at the bottom.
Suppose you need to find numbers <var>x</var>, <var>y</var>, and <var>z</var> such that the following three equations are all simultaneously true:


For example, the following matrix is in row echelon form, and its leading coefficients are shown in red:
<table border="0">
<math display="block">\begin{bmatrix}
<tr><td align="right">2<var>x</var><td>+<td align="right"><var>y</var><td>&minus;<td align="right"><var>z</var><td>=<td align="right">8,
0 & \color{red}{\mathbf{2}} & 1 & -1 \\
<tr><td align="right">&minus;3<var>x</var><td>&minus;<td align="right"><var>y</var><td>+<td align="right">2<var>z</var><td>=<td align="right">&minus;11,
0 & 0 & \color{red}{\mathbf{3}} & 1 \\
<tr><td align="right">&minus;2<var>x</var><td>+<td align="right"><var>y</var><td>+<td align="right">2<var>z</var><td>=<td align="right">&minus;3.
0 & 0 & 0 & 0
</table>
\end{bmatrix}.</math>


It is in echelon form because the zero row is at the bottom, and the leading coefficient of the second row (in the third column), is to the right of the leading coefficient of the first row (in the second column).
This is called a ''system of linear equations'' for the unknowns <var>x</var>, <var>y</var>, and <var>z</var>. They are called ''linear'' because each term is either constant or is a constant times a single variable to the first power. The goal is to transform this system to an equivalent one so that we can easily read off the solution. The operations to transform a system of equations to another, whilst still preserving the solutions are as follows:
* multiply/divide an equation by a non-zero number
* switch two equations
* add a multiple of one equation to another one
The strategy is as follows: eliminate <var>x</var> from all but the first equation, eliminate <var>y</var> from all but the second equation, and then eliminate <var>z</var> from all but the third equation.


A matrix is said to be in reduced row echelon form if furthermore all of the leading coefficients are equal to 1 (which can be achieved by using the elementary row operation of type 2), and in every column containing a leading coefficient, all of the other entries in that column are zero (which can be achieved by using elementary row operations of type 3).
In our example, we eliminate <var>x</var> from the second equation by adding 3/2 times the first equation to the second, and then we eliminate <var>x</var> from the third equation by adding the first equation to the third. The result is:


=== Example of the algorithm ===
<table border="0">
Suppose the goal is to find and describe the set of solutions to the following system of linear equations:
<tr><td align="right">2<var>x</var><td>+<td align="right"><var>y</var><td>&minus;<td align="right"><var>z</var><td>=<td align="right">8,
<math display="block">
<tr><td align="right"><td><td align="right">1/2 <var>y</var><td>+<td align="right">1/2 <var>z</var><td>=<td align="right">1,
\begin{alignat}{4}
<tr><td align="right"><td><td align="right">2<var>y</var><td>+<td align="right"><var>z</var><td>=<td align="right">5.
2x &{}+{}& y &{}-{}& z &{}={}& 8 & \qquad (L_1) \\
</table>
-3x &{}-{}& y &{}+{}& 2z &{}={}& -11 & \qquad (L_2) \\
-2x &{}+{}& y &{}+{}& 2z &{}={}& -3 & \qquad (L_3)
\end{alignat}
</math>


The table below is the row reduction process applied simultaneously to the system of equations and its associated [[augmented matrix]]. In practice, one does not usually deal with the systems in terms of equations, but instead makes use of the augmented matrix, which is more suitable for computer manipulations. The row reduction procedure may be summarized as follows: eliminate {{mvar|x}} from all equations below {{math|''L''<sub>1</sub>}}, and then eliminate {{mvar|y}} from all equations below {{math|''L''<sub>2</sub>}}. This will put the system into [[triangular form]]. Then, using back-substitution, each unknown can be solved for.
Now we eliminate ''y'' from the first equation by adding &minus;2 times the second equation to the first, and then we eliminate ''y'' from the third equation by adding &minus;4 times the second equation to the third:


: {| style="background-color:white;" class="wikitable"
<table border="0">
|-
<tr><td align="right">2<var>x</var><td><td align="right"><td>&minus;<td align="right">2<var>z</var><td>=<td align="right">6,
! System of equations !! Row operations !! Augmented matrix
<tr><td align="right"><td><td align="right">1/2 <var>y</var><td>+<td align="right">1/2 <var>z</var><td>=<td align="right">1,
|- align="center"
<tr><td align="right"><td><td align="right"><td>&minus;<td align="right"><var>z</var><td>=<td align="right">1.
| <math>
</table>
\begin{alignat}{4}
2x &{}+{}& y &{}-{}& z &{}={}& 8 & \\
-3x &{}-{}& y &{}+{}& 2z &{}={}& -11 & \\
-2x &{}+{}& y &{}+{}& 2z &{}={}& -3 &
\end{alignat}
</math>
|
| <math>
\left[\begin{array}{rrr|r}
2 & 1 & -1 & 8 \\
-3 & -1 & 2 & -11 \\
-2 & 1 & 2 & -3
\end{array}\right]
</math>
|- align="center"
| <math>
\begin{alignat}{4}
2x &{}+{}& y &{}-{}& z &{}={}& 8 & \\
& & \tfrac12 y &{}+{}& \tfrac12 z &{}={}& 1 & \\
& & 2y &{}+{}& z &{}={}& 5 &
\end{alignat}
</math>
| <math>
\begin{align}
L_2 + \tfrac32 L_1 &\to L_2 \\
L_3 + L_1 &\to L_3
\end{align}
</math>
| <math>
\left[\begin{array}{rrr|r}
2 & 1 & -1 & 8 \\
0 & \frac12 & \frac12 & 1 \\
0 & 2 & 1 & 5
\end{array}\right]
</math>
|- align="center"
| <math>
\begin{alignat}{4}
2x &{}+{}& y &{}-{}& z &{}={}& 8 & \\
& & \tfrac12 y &{}+{}& \tfrac12 z &{}={}& 1 & \\
& & & & -z &{}={}& 1 &
\end{alignat}
</math>
| <math>
L_3 + -4 L_2 \to L_3</math>
| <math>
\left[\begin{array}{rrr|r}
2 & 1 & -1 & 8 \\
0 & \frac12 & \frac12 & 1 \\
0 & 0 & -1 & 1
\end{array}\right]
</math>
|-
|colspan=3 align="center"| The matrix is now in echelon form (also called triangular form)
|- align="center"
| <math>
\begin{alignat}{4}
2x &{}+{}& y & & &{}={} 7 & \\
& & \tfrac12 y & & &{}={} \tfrac32 & \\
& & &{}-{}& z &{}={} 1 &
\end{alignat}
</math>
| <math>
\begin{align}
L_1 - L_3 &\to L_1\\
L_2 + \tfrac12 L_3 &\to L_2
\end{align}
</math>
| <math>
\left[\begin{array}{rrr|r}
2 & 1 & 0 & 7 \\
0 & \frac12 & 0 & \frac32 \\
0 & 0 & -1 & 1
\end{array}\right]
</math>
|- align="center"
| <math>
\begin{alignat}{4}
2x &{}+{}& y &\quad& &{}={}& 7 & \\
& & y &\quad& &{}={}& 3 & \\
& & &\quad& z &{}={}& -1 &
\end{alignat}
</math>
| <math>
\begin{align}
2 L_2 &\to L_2 \\
-L_3 &\to L_3
\end{align}
</math>
| <math>
\left[\begin{array}{rrr|r}
2 & 1 & 0 & 7 \\
0 & 1 & 0 & 3 \\
0 & 0 & 1 & -1
\end{array}\right]
</math>
|- align="center"
| <math>
\begin{alignat}{4}
x &\quad& &\quad& &{}={}& 2 & \\
&\quad& y &\quad& &{}={}& 3 & \\
&\quad& &\quad& z &{}={}& -1 &
\end{alignat}
</math>
| <math>
\begin{align}
L_1 - L_2 &\to L_1 \\
\tfrac12 L_1 &\to L_1
\end{align}
</math>
| <math>
\left[\begin{array}{rrr|r}
1 & 0 & 0 & 2 \\
0 & 1 & 0 & 3 \\
0 & 0 & 1 & -1
\end{array}\right]
</math>
|}


The second column describes which row operations have just been performed. So for the first step, the {{mvar|x}} is eliminated from {{math|''L''<sub>2</sub>}} by adding {{math|{{sfrac|3|2}}''L''<sub>1</sub>}} to {{math|''L''<sub>2</sub>}}. Next, {{mvar|x}} is eliminated from {{math|''L''<sub>3</sub>}} by adding {{math|''L''<sub>1</sub>}} to {{math|''L''<sub>3</sub>}}. These row operations are labelled in the table as
Finally, we eliminate ''z'' from the first equation by adding &minus;2 times the third equation to the first, and then we eliminate ''z'' from the second equation by adding 0.5 times the third equation to the second:
<math display="block">\begin{align}
L_2 + \tfrac32 L_1 &\to L_2, \\
L_3 + L_1 &\to L_3.
\end{align}</math>


Once {{mvar|y}} is also eliminated from the third row, the result is a system of linear equations in triangular form, and so the first part of the algorithm is complete. From a computational point of view, it is faster to solve the variables in reverse order, a process known as back-substitution. One sees the solution is {{math|1=''z'' = −1}}, {{math|1=''y'' = 3}}, and {{math|1=''x'' = 2}}. So there is a unique solution to the original system of equations.
<table border="0">
<tr><td align="right">2<var>x</var><td><td align="right"><td><td align="right"><td>=<td align="right">4,
<tr><td align="right"><td><td align="right">1/2 <var>y</var><td><td align="right"><td>=<td align="right">3/2,
<tr><td align="right"><td><td align="right"><td>&minus;<td align="right"><var>z</var><td>=<td align="right">1.
</table>


Instead of stopping once the matrix is in echelon form, one could continue until the matrix is in ''reduced'' row echelon form, as it is done in the table. The process of row reducing until the matrix is reduced is sometimes referred to as Gauss–Jordan elimination, to distinguish it from stopping after reaching echelon form.
We can now read off the solution by dividing: <var>x</var> = 2, <var>y</var> = 3 and <var>z</var> = &minus;1.


== History ==
This algorithm works generally, also for bigger systems. Sometimes it is necessary to switch two equations: for instance if ''y'' had not occurred in the second equation after our first step above, we would have switched the second and third equation and then eliminated ''y'' from the first equation. It is possible that the algorithm gets "stuck": for instance if ''y'' had not occurred in the second and the third equation after our first step above. In this case, the system does not have a unique solution.
The method of Gaussian elimination appears – albeit without proof – in the Chinese mathematical text [[Rod calculus#System of linear equations|Chapter Eight: ''Rectangular Arrays'']] of ''[[The Nine Chapters on the Mathematical Art]]''. Its use is illustrated in eighteen problems, with two to five equations. The first reference to the book by this title is dated to 179&nbsp;AD, but parts of it were written as early as approximately 150&nbsp;BC.<ref>{{Cite web |title=DOCUMENTA MATHEMATICA, Vol. Extra Volume: Optimization Stories (2012), 9-14 |url=https://www.emis.de/journals/DMJDMV/vol-ismp/10_yuan-ya-xiang.html |access-date=2022-12-02 |website=www.emis.de}}</ref><ref>{{harvnb|Calinger|1999|pp=234–236}}</ref><ref name="princeton">{{cite book | author1=Timothy Gowers | author2=June Barrow-Green | author3=Imre Leader | title=The Princeton Companion to Mathematics |url=https://archive.org/details/princetoncompanio00gowe|date=8 September 2008|publisher=Princeton University Press|isbn=978-0-691-11880-2|page=607}}<!-- |access-date=28 September 2012 --></ref> It was commented on by [[Liu Hui]] in the 3rd century.


The method in Europe stems from the notes of [[Isaac Newton]].<ref>{{harvnb|Grcar|2011a|pp=169–172}}</ref><ref>{{harvnb|Grcar|2011b|pp=783–785}}</ref> In 1670, he wrote that all the algebra books known to him lacked a lesson for solving simultaneous equations, which Newton then supplied. Cambridge University eventually published the notes as ''Arithmetica Universalis'' in 1707 long after Newton had left academic life. The notes were widely imitated, which made (what is now called) Gaussian elimination a standard lesson in algebra textbooks by the end of the 18th century. [[Carl Friedrich Gauss]] in 1810 devised a notation for symmetric elimination that was adopted in the 19th century by professional [[Human computer|hand computers]] to solve the normal equations of least-squares problems.<ref>{{harvnb|Lauritzen|p=3}}</ref> The algorithm that is taught in high school was named for Gauss only in the 1950s as a result of confusion over the history of the subject.<ref>{{harvnb|Grcar|2011b|p=789}}</ref>
Usually, in practice, one does not deal with the actual systems in terms of equations but instead makes use of the coefficient [[matrix (mathematics)|matrix]] (which is also suitable for computer manipulations), so doing this, our original system would then look like


Some authors use the term ''Gaussian elimination'' to refer only to the procedure until the matrix is in echelon form, and use the term Gauss–Jordan elimination to refer to the procedure which ends in reduced echelon form. The name is used because it is a variation of Gaussian elimination as described by [[Wilhelm Jordan (geodesist)|Wilhelm Jordan]] in 1888. However, the method also appears in an article by Clasen published in the same year. Jordan and Clasen probably discovered Gauss–Jordan elimination independently.<ref>{{Citation | last1=Althoen | first1=Steven C. | last2=McLaughlin | first2=Renate | title=Gauss–Jordan reduction: a brief history | doi=10.2307/2322413 | year=1987 | journal=[[American Mathematical Monthly|The American Mathematical Monthly]] | issn=0002-9890 | volume=94 | issue=2 | pages=130–142 | jstor=2322413 | publisher=Mathematical Association of America}}</ref>
<math>

\begin{pmatrix}
== Applications ==
2 & 1 & -1 & 8 \\
Historically, the first application of the row reduction method is for solving systems of linear equations. Below are some other important applications of the algorithm.
-3 & -1 & 2 & -11 \\

-2 & 1 & 2 & -3
=== Computing determinants ===
\end{pmatrix}
To explain how Gaussian elimination allows the computation of the determinant of a square matrix, we have to recall how the elementary row operations change the determinant:
* Swapping two rows multiplies the determinant by −1
* Multiplying a row by a nonzero scalar multiplies the determinant by the same scalar
* Adding to one row a scalar multiple of another does not change the determinant.

If Gaussian elimination applied to a square matrix {{mvar|A}} produces a row echelon matrix {{mvar|B}}, let {{mvar|d}} be the product of the scalars by which the determinant has been multiplied, using the above rules. Then the determinant of {{mvar|A}} is the quotient by {{mvar|d}} of the product of the elements of the diagonal of {{mvar|B}}:
<math display="block">\det(A) = \frac{\prod\operatorname{diag}(B)}{d}.</math>

Computationally, for an {{math|''n'' × ''n''}} matrix, this method needs only {{math|[[O notation|O(''n''<sup>3</sup>)]]}} arithmetic operations, while using [[Leibniz formula for determinants]] requires {{math|O(''n''!)}} operations (number of summands in the formula), and
recursive [[Laplace expansion]] requires {{math|O(2<sup>''n''</sup>)}} operations (number of sub-determinants to compute, if none is computed twice). Even on the fastest computers, these two methods are impractical or almost impracticable for {{math|''n''}} above 20.

=== Finding the inverse of a matrix ===
{{See also|Invertible matrix}}
A variant of Gaussian elimination called Gauss–Jordan elimination can be used for finding the inverse of a matrix, if it exists. If {{math|''A''}} is an {{math|''n'' × ''n''}} square matrix, then one can use row reduction to compute its [[invertible matrix|inverse matrix]], if it exists. First, the {{math|''n'' × ''n''}} [[identity matrix]] is augmented to the right of {{math|''A''}}, forming an {{math|''n'' × 2''n''}} [[block matrix]] {{math|[''A'' {{!}} ''I'']}}. Now through application of elementary row operations, find the reduced echelon form of this {{math|''n'' × 2''n''}} matrix. The matrix {{math|''A''}} is invertible if and only if the left block can be reduced to the identity matrix {{math|''I''}}; in this case the right block of the final matrix is {{math|''A''<sup>−1</sup>}}. If the algorithm is unable to reduce the left block to {{math|''I''}}, then {{math|''A''}} is not invertible.

For example, consider the following matrix:
<math display="block">A =
\begin{bmatrix}
2 & -1 & 0 \\
-1 & 2 & -1 \\
0 & -1 & 2
\end{bmatrix}.
</math>
</math>


To find the inverse of this matrix, one takes the following matrix augmented by the identity and row-reduces it as a 3&nbsp;×&nbsp;6 matrix:
and in the end we are left with
<math display="block">[ A | I ] =
\left[\begin{array}{ccc|ccc}
2 & -1 & 0 & 1 & 0 & 0 \\
-1 & 2 & -1 & 0 & 1 & 0 \\
0 & -1 & 2 & 0 & 0 & 1
\end{array}\right].
</math>


By performing row operations, one can check that the reduced row echelon form of this augmented matrix is
<math>
<math display="block">[ I | B ] =
\begin{pmatrix}
\left[\begin{array}{rrr|rrr}
2 & 0 & 0 & 4 \\
0 & 1/2 & 0 & 3/2 \\
1 & 0 & 0 & \frac34 & \frac12 & \frac14 \\
0 & 0 & -1 & 1
0 & 1 & 0 & \frac12 & 1 & \frac12 \\
0 & 0 & 1 & \frac14 & \frac12 & \frac34
\end{pmatrix}
\end{array}\right].
</math>
</math>


One can think of each row operation as the left product by an [[elementary matrix]]. Denoting by {{math|''B''}} the product of these elementary matrices, we showed, on the left, that {{math|1=''BA'' = ''I''}}, and therefore, {{math|1=''B'' = ''A''<sup>−1</sup>}}. On the right, we kept a record of {{math|1=''BI'' = ''B''}}, which we know is the inverse desired. This procedure for finding the inverse works for square matrices of any size.
or, after dividing the rows by 2, 1/2 and -1, respectively:


=== Computing ranks and bases ===
<math>
The Gaussian elimination algorithm can be applied to any {{math|''m'' × ''n''}} matrix {{mvar|A}}. In this way, for example, some 6&nbsp;×&nbsp;9 matrices can be transformed to a matrix that has a row echelon form like
\begin{pmatrix}
<math display="block"> T=
1 & 0 & 0 & 2 \\
\begin{bmatrix}
0 & 1 & 0 & 3 \\
0 & 0 & 1 & -1
a & * & * & *& * & * & * & * & * \\
0 & 0 & b & * & * & * & * & * & * \\
\end{pmatrix}
0 & 0 & 0 & c & * & * & * & * & * \\
0 & 0 & 0 & 0 & 0 & 0 & d & * & * \\
0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & e \\
0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0
\end{bmatrix},
</math>
</math>
where the stars are arbitrary entries, and {{math|''a'', ''b'', ''c'', ''d'', ''e''}} are nonzero entries. This echelon matrix {{mvar|T}} contains a wealth of information about {{mvar|A}}: the [[rank of a matrix|rank]] of {{mvar|A}} is 5, since there are 5 nonzero rows in {{mvar|T}}; the [[vector space]] spanned by the columns of {{mvar|A}} has a basis consisting of its columns 1, 3, 4, 7 and 9 (the columns with {{math|''a'', ''b'', ''c'', ''d'', ''e''}} in {{mvar|T}}), and the stars show how the other columns of {{mvar|A}} can be written as linear combinations of the basis columns.


All of this applies also to the reduced row echelon form, which is a particular row echelon format.
== Applications ==


== Computational efficiency ==
=== Finding the inverse of a matrix ===
The number of [[Arithmetic#Arithmetic operations|arithmetic operations]] required to perform row reduction is one way of measuring the algorithm's computational efficiency. For example, to solve a system of {{math|''n''}} equations for {{math|''n''}} unknowns by performing row operations on the matrix until it is in echelon form, and then solving for each unknown in reverse order, requires {{math|''n''(''n'' + 1)/2}} divisions, {{math|(2''n''<sup>3</sup> + 3''n''<sup>2</sup> − 5''n'')/6}} multiplications, and {{math|(2''n''<sup>3</sup> + 3''n''<sup>2</sup> − 5''n'')/6}} subtractions,<ref>{{harvnb|Farebrother|1988|p=12}}</ref> for a total of approximately {{math|2''n''<sup>3</sup>/3}} operations. Thus it has a ''arithmetic complexity'' ([[time complexity]], where each arithmetic operation take a unit of time, independently of the size of the inputs) of {{math|O(''n''<sup>3</sup>)}}.


This complexity is a good measure of the time needed for the whole computation when the time for each arithmetic operation is approximately constant. This is the case when the coefficients are represented by [[Floating-point arithmetic|floating-point numbers]] or when they belong to a [[finite field]]. If the coefficients are [[integer]]s or [[rational number]]s exactly represented, the intermediate entries can grow exponentially large, so the [[bit complexity]] is exponential.<ref>{{Cite conference
Suppose ''A'' is a square ''n''-by-''n'' matrix and you need to calculate its [[matrix inversion|inverse]]. The ''n''-by-''n'' identity matrix is augmented to the right of ''A'', which produces an ''n''-by-2''n'' matrix. Then you start the algorithm on that matrix. When the algorithm finishes, the identity matrix will appear on the left; the inverse of ''A'' can then be found to the right of the identity matrix.
| first1 = Xin Gui
| last1 = Fang
| first2 = George
| last2 = Havas
| title = On the worst-case complexity of integer Gaussian elimination
| book-title = Proceedings of the 1997 international symposium on Symbolic and algebraic computation
| conference = ISSAC '97
| pages = 28–31
| publisher = ACM
| year = 1997
| location = Kihei, Maui, Hawaii, United States
| url = https://scholar.archive.org/work/2htta67odrfilhxegzjqfratyy
| doi = 10.1145/258726.258740
| isbn = 0-89791-875-4
}}</ref>
However, [[Bareiss algorithm|Bareiss' algorithm]] is a variant of Gaussian elimination that avoids this exponential growth of the intermediate entries; with the same arithmetic complexity of {{math|O(''n''<sup>3</sup>)}}, it has a bit complexity of {{math|O(''n''<sup>5</sup>)}}, and has therefore a [[strongly-polynomial time]] complexity.


Gaussian elimination and its variants can be used on computers for systems with thousands of equations and unknowns. However, the cost becomes prohibitive for systems with millions of equations. These large systems are generally solved using [[iterative method]]s. Specific methods exist for systems whose coefficients follow a regular pattern (see [[system of linear equations]]).
If the algorithm gets "stuck" as explained above, then ''A'' is not invertible.


===Bareiss algorithm===
In practice, inverting a matrix is rarely required. Most of the time, one is really after the solution of a particular system of linear equations.
{{main|Bareiss algorithm}}
The first [[strongly-polynomial time]] algorithm for Gaussian elimination was published by [[Jack Edmonds]] in 1967.<ref name=":0">{{Cite Geometric Algorithms and Combinatorial Optimization}}</ref>{{Rp|page=37}} Independently, and almost simultaneously, Erwin Bareiss discovered another algorithm, based on the following remark, which applies to a division-free variant of Gaussian elimination.


In standard Gaussian elimination, one subtracts from each row <math>R_i</math> below the pivot row <math>R_k</math> a multiple of <math>R_k</math> by <math>r_{i,k}/r_{k,k},</math> where <math>r_{i,k}</math> and <math>r_{k,k}</math> are the entries in the pivot column of <math>R_i</math> and <math>R_k,</math> respectively.
=== The general algorithm to compute ranks and bases ===


Bareiss variant consists, instead, of replacing <math>R_i</math> with <math display=inline>\frac{r_{k,k}R_i-r_{i,k}R_k}{r_{k-1,k-1}}.</math> This produces a row echelon form that has the same zero entries as with the standard Gaussian elimination.
The Gaussian elimination algorithm can be applied to any ''m''-by-''n'' matrix ''A''. If we get "stuck" in a given column, we move to the next column. In this way, for example, any 6x9 matrix can be transformed to a matrix that has a ''reduced row-echelon form'' like


Bareiss' main remark is that each matrix entry generated by this variant is the determinant of a submatrix of the original matrix.
<math>

\begin{pmatrix}
In particular, if one starts with integer entries, the divisions occurring in the algorithm are exact divisions resulting in integers. So, all intermediate entries and final entries are integers. Moreover, [[Hadamard inequality]] provides an upper bound on the absolute values of the intermediate and final entries, and thus a bit complexity of <math>\tilde O(n^5),</math> using [[soft O notation]].
1 & * & 0 & 0 & * & * & 0 & * & 0 \\

0 & 0 & 1 & 0 & * & * & 0 & * & 0 \\
Moreover, as a upper bound on the size of final entries is known, a complexity <math>\tilde O(n^4)</math> can be obtained with [[modular arithmetic|modular computation]] followed either by [[Chinese remainder theorem|Chinese remaindering]] or [[Hensel lifting]].
0 & 0 & 0 & 1 & * & * & 0 & * & 0 \\

0 & 0 & 0 & 0 & 0 & 0 & 1 & * & 0 \\
As a corollary, the following problems can be solved in strongly polynomial time with the same bit complexity:<ref name=":0" />{{Rp|page=40}}
0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\

0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0
* Testing whether ''m'' given rational vectors are [[linearly independent]]
\end{pmatrix}
* Computing the [[determinant]] of a rational matrix
</math>
* Computing a solution of a rational equation system ''Ax'' = ''b''
* Computing the [[inverse matrix]] of a nonsingular rational matrix
* Computing the [[Rank (linear algebra)|rank]] of a rational matrix

=== Numeric instability ===
One possible problem is [[numerical stability|numerical instability]], caused by the possibility of dividing by very small numbers. If, for example, the leading coefficient of one of the rows is very close to zero, then to row-reduce the matrix, one would need to divide by that number. This means that any error which existed for the number that was close to zero would be amplified. Gaussian elimination is numerically stable for [[diagonally dominant]] or [[Positive-definite matrix|positive-definite]] matrices. For general matrices, Gaussian elimination is usually considered to be stable, when using [[Pivot element#Partial and complete pivoting|partial pivoting]], even though there are examples of stable matrices for which it is unstable.<ref>{{harvnb|Golub|Van Loan|1996|at=§3.4.6}}</ref>

== Generalizations ==
Gaussian elimination can be performed over any [[field (mathematics)|field]], not just the real numbers.

[[Buchberger's algorithm]] is a generalization of Gaussian elimination to [[systems of polynomial equations]]. This generalization depends heavily on the notion of a [[monomial order]]. The choice of an ordering on the variables is already implicit in Gaussian elimination, manifesting as the choice to work from left to right when selecting pivot positions.

Computing the rank of a tensor of order greater than 2 is [[NP-hard]].<ref>{{cite arXiv |last1=Hillar |first1=Christopher |author2-link=Lek-Heng Lim |last2=Lim |first2=Lek-Heng |date=2009-11-07 |eprint=0911.1393 |title=Most tensor problems are NP-hard |class=cs.CC}}</ref> Therefore, if {{math|[[P=NP|P ≠ NP]]}}, there cannot be a polynomial time analog of Gaussian elimination for higher-order [[tensors]] (matrices are [[Array data structure|array]] representations of order-2 tensors).

== Pseudocode ==
<!--{{Unreferenced section|date=March 2018}}-->
As explained above, Gaussian elimination transforms a given {{math|''m'' × ''n''}} matrix {{mvar|A}} into a matrix in [[row-echelon form]].

In the following [[pseudocode]], <code>A[i, j]</code> denotes the entry of the matrix {{mvar|A}} in row {{mvar|i}} and column {{mvar|j}} with the indices starting from&nbsp;1. The transformation is performed ''in place'', meaning that the original matrix is lost for being eventually replaced by its row-echelon form.

h := 1 /* ''Initialization of the pivot row'' */
k := 1 /* ''Initialization of the pivot column'' */
'''while''' h ≤ m '''and''' k ≤ n
/* ''Find the k-th pivot:'' */
i_max := [[argmax]] (i = h ... m, abs(A[i, k]))
'''if''' A[i_max, k] = 0
/* ''No pivot in this column, pass to next column'' */
k := k + 1
'''else'''
'''swap rows'''(h, i_max)
/* ''Do for all rows below pivot:'' */
'''for''' i = h + 1 ... m:
f := A[i, k] / A[h, k]
/* ''Fill with zeros the lower part of pivot column:'' */
A[i, k] := 0
/* ''Do for all remaining elements in current row:'' */
'''for''' j = k + 1 ... n:
A[i, j] := A[i, j] - A[h, j] * f
/* ''Increase pivot row and column'' */
h := h + 1
k := k + 1

This algorithm differs slightly from the one discussed earlier, by choosing a pivot with largest [[absolute value]]. Such a ''partial pivoting'' may be required if, at the pivot place, the entry of the matrix is zero. In any case, choosing the largest possible absolute value of the pivot improves the [[numerical stability]] of the algorithm, when floating point is used for representing numbers.<ref>{{Cite journal |last=Kurgalin |first=Sergei |last2=Borzunov |first2=Sergei |date=2021 |title=Algebra and Geometry with Python |url=https://doi.org/10.1007/978-3-030-61541-3 |journal=SpringerLink |language=en |doi=10.1007/978-3-030-61541-3}}</ref>

Upon completion of this procedure the matrix will be in row echelon form and the corresponding system may be solved by back substitution.


== See also ==
(the *s are arbitrary entries). Note that the entries above and below and in front of the leading ones are all zero. This echelon matrix ''T'' contains a wealth of information about ''A'': the [[rank of a matrix|rank]] of ''A'' is 5 since there are 5 non-zero rows in ''T''; the vectorspace spanned by the rows of ''A'' has as basis the first, third, forth, seventh and ninth column of ''A'' (the rows of the ones in ''T''), and the *'s tell you how the other rows of ''A'' can be written as linear combinations of the basis rows.
*[[Fangcheng (mathematics)]]
*[[Gram–Schmidt process]] - another process for bringing a matrix into some canonical form.


== References ==
The Gaussian elimination can be performed over any [[field (mathematics)|field]]. The three elementary operations used in the Gaussian elimination (multiplying rows, switching rows, and adding multiples of rows to other rows) amount to multiplying the original matrix ''A'' with invertible ''m''-by-''m'' matrices from the left. In general, we can say:
{{Reflist}}
:To every ''m''-by-''n'' matrix ''A'' over the field ''K'' there exists a uniquely determined invertible ''m''-by-''m'' matrix ''S'' and a uniquely determined reduced row-echelon matrix ''T'' such that ''A'' = ''ST''.
''S'' is the product of the matrices corresponding to the row operations performed.


=== Works cited ===
The formal algorithm to compute ''T'' from ''A'' follows. We write ''A''[''i'',''j''] for the entry in row ''i'', column ''j'' in matrix ''A''. The transformation is performed "in place", meaning that the original matrix ''A'' is lost and successively replaced by ''T''.
{{refbegin}}
<pre>
* {{Citation | last1=Atkinson | first1=Kendall A. | title=An Introduction to Numerical Analysis | publisher=[[John Wiley & Sons]] | location=New York | edition=2nd | isbn=978-0471624899| year=1989}}.
i=1
* {{Citation | last1=Bolch | first1=Gunter | last2=Greiner | first2=Stefan | last3=de Meer | first3=Hermann | last4=Trivedi | first4=Kishor S. | title=Queueing Networks and Markov Chains: Modeling and Performance Evaluation with Computer Science Applications | publisher=[[Wiley-Interscience]] | edition=2nd | isbn=978-0-471-79156-0 | year=2006}}.
j=1
* {{Citation | last1=Calinger | first1=Ronald | title=A Contextual History of Mathematics | publisher=[[Prentice Hall]] | isbn=978-0-02-318285-3 | year=1999}}.
while (i <= m and j<= n) do
* {{Citation | last1=Farebrother | first1=R.W. | title=Linear Least Squares Computations | publisher=Marcel Dekker | series=STATISTICS: Textbooks and Monographs | isbn=978-0-8247-7661-9 | year=1988}}.
# Find pivot in column j, starting in row i:
* {{Citation | last1=Lauritzen | first1=Niels | title=Undergraduate Convexity: From Fourier and Motzkin to Kuhn and Tucker}}.
max_val = A[i,j]
* {{Citation | last1=Golub | first1=Gene H. | author1-link=Gene H. Golub | last2=Van Loan | first2=Charles F. | author2-link=Charles F. Van Loan | title=Matrix Computations | publisher=Johns Hopkins | edition=3rd | isbn=978-0-8018-5414-9 | year=1996}}.
max_ind = i
* {{Citation
for k=i+1 to m do
val = A[k,j]
| last = Grcar
| first = Joseph F.
if abs(val) > abs(max_val) then
| title = How ordinary elimination became Gaussian elimination
max_val = val
| journal = Historia Mathematica
max_ind = k
end_if
| year = 2011a
| pages = 163–218
end_for
| doi = 10.1016/j.hm.2010.06.003
if max_val <> 0 then
| arxiv = 0907.2397
switch rows i and max_ind
| volume = 38
divide row i by max_val
for u = 1 to m do
| issue = 2 | s2cid = 14259511
}}
if u <> i then
* {{Citation
add - A[u,j] * row i to row u
end_if
| last = Grcar
| first = Joseph F.
end_for
| title = Mathematicians of Gaussian elimination
i = i + 1
| journal = Notices of the American Mathematical Society
end_if
j = j + 1
| year = 2011b
| pages = 782–792
end_while
| url = https://www.ams.org/notices/201106/rtx110600782p.pdf
</pre>
| volume = 58
| issue = 6 }}
* {{Citation | last1=Higham | first1=Nicholas | author1-link=Nicholas Higham | title=Accuracy and Stability of Numerical Algorithms | publisher=[[Society for Industrial and Applied Mathematics|SIAM]] | edition=2nd | isbn=978-0-89871-521-7 | year=2002}}.
* {{Citation | last1=Katz | first1=Victor J. | title=A History of Mathematics, Brief Version | publisher=[[Addison-Wesley]] | isbn=978-0-321-16193-2 | year=2004}}.
* {{Cite web | last1=Kaw | first1=Autar | last2=Kalu | first2=Egwu | year=2010 | title=Numerical Methods with Applications: Chapter 04.06 Gaussian Elimination | edition=1st | publisher=University of South Florida | url=http://mathforcollege.com/nm/mws/gen/04sle/mws_gen_sle_txt_gaussian.pdf |archive-url=https://web.archive.org/web/20120907224427/http://mathforcollege.com/nm/mws/gen/04sle/mws_gen_sle_txt_gaussian.pdf |archive-date=2012-09-07 |url-status=live }}
* {{Citation | last1=Lipson | first1=Marc | last2=Lipschutz | first2=Seymour | title=Schaum's outline of theory and problems of linear algebra | publisher=[[McGraw-Hill]] | location=New York | isbn=978-0-07-136200-9 | year=2001 | pages=69–80}}
*{{Citation|last1=Press|first1=WH | last2=Teukolsky|first2=SA | last3=Vetterling|first3=WT | last4=Flannery|first4=BP | year=2007 | title=Numerical Recipes: The Art of Scientific Computing | edition=3rd | publisher=Cambridge University Press | publication-place=New York| isbn=978-0-521-88068-8 | chapter=Section 2.2 | chapter-url=http://apps.nrbook.com/empanel/index.html?pg=46| access-date=2011-08-08| archive-date=2012-03-19| archive-url=https://web.archive.org/web/20120319193237/http://apps.nrbook.com/empanel/index.html?pg=46|url-status=dead}}
{{refend}}


== External links ==
This algorithm differs slightly from the one discussed earlier, because before eliminating a variable, it first exchanges rows to move the entry with the largest absolute value to the "pivot position". Such a pivoting procedure improves the numerical stability of the algorithm; some variants are also in use.
{{Wikibooks
|1= Linear Algebra
|2= Gauss' Method
|3= Gaussian elimination}}
* [http://pnp.mathematik.uni-stuttgart.de/igt/eiserm/lehre/Gael/ Interactive didactic tool]


{{linear algebra}}
Note that if the field is the real or complex numbers and [[floating point]] arithmetic is in use, the comparison "max_val <> 0" should be replaced by "abs(max_val) > eps" for some small, machine-dependent constant eps, since it is never correct to compare floating point numbers to zero.
{{Authority control}}


[[Category:Numerical linear algebra]]
[[de:Gaußsches Eliminationsverfahren]]
[[Category:Articles with example pseudocode]]
[[ko:&#44032;&#50864;&#49828; &#49548;&#44144;&#48277;]]
[[Category:Exchange algorithms]]
[[ja:&#12460;&#12454;&#12473;&#12398;&#28040;&#21435;&#27861;]]
[[Category:Linear algebra]]
[[Category:Numerical analysis]]

Latest revision as of 08:38, 13 May 2024

Animation of Gaussian elimination. Red row eliminates the following rows, green rows change their order.

In mathematics, Gaussian elimination, also known as row reduction, is an algorithm for solving systems of linear equations. It consists of a sequence of row-wise operations performed on the corresponding matrix of coefficients. This method can also be used to compute the rank of a matrix, the determinant of a square matrix, and the inverse of an invertible matrix. The method is named after Carl Friedrich Gauss (1777–1855). To perform row reduction on a matrix, one uses a sequence of elementary row operations to modify the matrix until the lower left-hand corner of the matrix is filled with zeros, as much as possible. There are three types of elementary row operations:

  • Swapping two rows,
  • Multiplying a row by a nonzero number,
  • Adding a multiple of one row to another row.

Using these operations, a matrix can always be transformed into an upper triangular matrix, and in fact one that is in row echelon form. Once all of the leading coefficients (the leftmost nonzero entry in each row) are 1, and every column containing a leading coefficient has zeros elsewhere, the matrix is said to be in reduced row echelon form. This final form is unique; in other words, it is independent of the sequence of row operations used. For example, in the following sequence of row operations (where two elementary operations on different rows are done at the first and third steps), the third and fourth matrices are the ones in row echelon form, and the final matrix is the unique reduced row echelon form.

Using row operations to convert a matrix into reduced row echelon form is sometimes called Gauss–Jordan elimination. In this case, the term Gaussian elimination refers to the process until it has reached its upper triangular, or (unreduced) row echelon form. For computational reasons, when solving systems of linear equations, it is sometimes preferable to stop row operations before the matrix is completely reduced.

Definitions and example of algorithm[edit]

The process of row reduction makes use of elementary row operations, and can be divided into two parts. The first part (sometimes called forward elimination) reduces a given system to row echelon form, from which one can tell whether there are no solutions, a unique solution, or infinitely many solutions. The second part (sometimes called back substitution) continues to use row operations until the solution is found; in other words, it puts the matrix into reduced row echelon form.

Another point of view, which turns out to be very useful to analyze the algorithm, is that row reduction produces a matrix decomposition of the original matrix. The elementary row operations may be viewed as the multiplication on the left of the original matrix by elementary matrices. Alternatively, a sequence of elementary operations that reduces a single row may be viewed as multiplication by a Frobenius matrix. Then the first part of the algorithm computes an LU decomposition, while the second part writes the original matrix as the product of a uniquely determined invertible matrix and a uniquely determined reduced row echelon matrix.

Row operations[edit]

There are three types of elementary row operations which may be performed on the rows of a matrix:

  1. Swap the positions of two rows.
  2. Multiply a row by a non-zero scalar.
  3. Add to one row a scalar multiple of another.

If the matrix is associated to a system of linear equations, then these operations do not change the solution set. Therefore, if one's goal is to solve a system of linear equations, then using these row operations could make the problem easier.

Echelon form[edit]

For each row in a matrix, if the row does not consist of only zeros, then the leftmost nonzero entry is called the leading coefficient (or pivot) of that row. So if two leading coefficients are in the same column, then a row operation of type 3 could be used to make one of those coefficients zero. Then by using the row swapping operation, one can always order the rows so that for every non-zero row, the leading coefficient is to the right of the leading coefficient of the row above. If this is the case, then matrix is said to be in row echelon form. So the lower left part of the matrix contains only zeros, and all of the zero rows are below the non-zero rows. The word "echelon" is used here because one can roughly think of the rows being ranked by their size, with the largest being at the top and the smallest being at the bottom.

For example, the following matrix is in row echelon form, and its leading coefficients are shown in red:

It is in echelon form because the zero row is at the bottom, and the leading coefficient of the second row (in the third column), is to the right of the leading coefficient of the first row (in the second column).

A matrix is said to be in reduced row echelon form if furthermore all of the leading coefficients are equal to 1 (which can be achieved by using the elementary row operation of type 2), and in every column containing a leading coefficient, all of the other entries in that column are zero (which can be achieved by using elementary row operations of type 3).

Example of the algorithm[edit]

Suppose the goal is to find and describe the set of solutions to the following system of linear equations:

The table below is the row reduction process applied simultaneously to the system of equations and its associated augmented matrix. In practice, one does not usually deal with the systems in terms of equations, but instead makes use of the augmented matrix, which is more suitable for computer manipulations. The row reduction procedure may be summarized as follows: eliminate x from all equations below L1, and then eliminate y from all equations below L2. This will put the system into triangular form. Then, using back-substitution, each unknown can be solved for.

System of equations Row operations Augmented matrix
The matrix is now in echelon form (also called triangular form)

The second column describes which row operations have just been performed. So for the first step, the x is eliminated from L2 by adding 3/2L1 to L2. Next, x is eliminated from L3 by adding L1 to L3. These row operations are labelled in the table as

Once y is also eliminated from the third row, the result is a system of linear equations in triangular form, and so the first part of the algorithm is complete. From a computational point of view, it is faster to solve the variables in reverse order, a process known as back-substitution. One sees the solution is z = −1, y = 3, and x = 2. So there is a unique solution to the original system of equations.

Instead of stopping once the matrix is in echelon form, one could continue until the matrix is in reduced row echelon form, as it is done in the table. The process of row reducing until the matrix is reduced is sometimes referred to as Gauss–Jordan elimination, to distinguish it from stopping after reaching echelon form.

History[edit]

The method of Gaussian elimination appears – albeit without proof – in the Chinese mathematical text Chapter Eight: Rectangular Arrays of The Nine Chapters on the Mathematical Art. Its use is illustrated in eighteen problems, with two to five equations. The first reference to the book by this title is dated to 179 AD, but parts of it were written as early as approximately 150 BC.[1][2][3] It was commented on by Liu Hui in the 3rd century.

The method in Europe stems from the notes of Isaac Newton.[4][5] In 1670, he wrote that all the algebra books known to him lacked a lesson for solving simultaneous equations, which Newton then supplied. Cambridge University eventually published the notes as Arithmetica Universalis in 1707 long after Newton had left academic life. The notes were widely imitated, which made (what is now called) Gaussian elimination a standard lesson in algebra textbooks by the end of the 18th century. Carl Friedrich Gauss in 1810 devised a notation for symmetric elimination that was adopted in the 19th century by professional hand computers to solve the normal equations of least-squares problems.[6] The algorithm that is taught in high school was named for Gauss only in the 1950s as a result of confusion over the history of the subject.[7]

Some authors use the term Gaussian elimination to refer only to the procedure until the matrix is in echelon form, and use the term Gauss–Jordan elimination to refer to the procedure which ends in reduced echelon form. The name is used because it is a variation of Gaussian elimination as described by Wilhelm Jordan in 1888. However, the method also appears in an article by Clasen published in the same year. Jordan and Clasen probably discovered Gauss–Jordan elimination independently.[8]

Applications[edit]

Historically, the first application of the row reduction method is for solving systems of linear equations. Below are some other important applications of the algorithm.

Computing determinants[edit]

To explain how Gaussian elimination allows the computation of the determinant of a square matrix, we have to recall how the elementary row operations change the determinant:

  • Swapping two rows multiplies the determinant by −1
  • Multiplying a row by a nonzero scalar multiplies the determinant by the same scalar
  • Adding to one row a scalar multiple of another does not change the determinant.

If Gaussian elimination applied to a square matrix A produces a row echelon matrix B, let d be the product of the scalars by which the determinant has been multiplied, using the above rules. Then the determinant of A is the quotient by d of the product of the elements of the diagonal of B:

Computationally, for an n × n matrix, this method needs only O(n3) arithmetic operations, while using Leibniz formula for determinants requires O(n!) operations (number of summands in the formula), and recursive Laplace expansion requires O(2n) operations (number of sub-determinants to compute, if none is computed twice). Even on the fastest computers, these two methods are impractical or almost impracticable for n above 20.

Finding the inverse of a matrix[edit]

A variant of Gaussian elimination called Gauss–Jordan elimination can be used for finding the inverse of a matrix, if it exists. If A is an n × n square matrix, then one can use row reduction to compute its inverse matrix, if it exists. First, the n × n identity matrix is augmented to the right of A, forming an n × 2n block matrix [A | I]. Now through application of elementary row operations, find the reduced echelon form of this n × 2n matrix. The matrix A is invertible if and only if the left block can be reduced to the identity matrix I; in this case the right block of the final matrix is A−1. If the algorithm is unable to reduce the left block to I, then A is not invertible.

For example, consider the following matrix:

To find the inverse of this matrix, one takes the following matrix augmented by the identity and row-reduces it as a 3 × 6 matrix:

By performing row operations, one can check that the reduced row echelon form of this augmented matrix is

One can think of each row operation as the left product by an elementary matrix. Denoting by B the product of these elementary matrices, we showed, on the left, that BA = I, and therefore, B = A−1. On the right, we kept a record of BI = B, which we know is the inverse desired. This procedure for finding the inverse works for square matrices of any size.

Computing ranks and bases[edit]

The Gaussian elimination algorithm can be applied to any m × n matrix A. In this way, for example, some 6 × 9 matrices can be transformed to a matrix that has a row echelon form like

where the stars are arbitrary entries, and a, b, c, d, e are nonzero entries. This echelon matrix T contains a wealth of information about A: the rank of A is 5, since there are 5 nonzero rows in T; the vector space spanned by the columns of A has a basis consisting of its columns 1, 3, 4, 7 and 9 (the columns with a, b, c, d, e in T), and the stars show how the other columns of A can be written as linear combinations of the basis columns.

All of this applies also to the reduced row echelon form, which is a particular row echelon format.

Computational efficiency[edit]

The number of arithmetic operations required to perform row reduction is one way of measuring the algorithm's computational efficiency. For example, to solve a system of n equations for n unknowns by performing row operations on the matrix until it is in echelon form, and then solving for each unknown in reverse order, requires n(n + 1)/2 divisions, (2n3 + 3n2 − 5n)/6 multiplications, and (2n3 + 3n2 − 5n)/6 subtractions,[9] for a total of approximately 2n3/3 operations. Thus it has a arithmetic complexity (time complexity, where each arithmetic operation take a unit of time, independently of the size of the inputs) of O(n3).

This complexity is a good measure of the time needed for the whole computation when the time for each arithmetic operation is approximately constant. This is the case when the coefficients are represented by floating-point numbers or when they belong to a finite field. If the coefficients are integers or rational numbers exactly represented, the intermediate entries can grow exponentially large, so the bit complexity is exponential.[10] However, Bareiss' algorithm is a variant of Gaussian elimination that avoids this exponential growth of the intermediate entries; with the same arithmetic complexity of O(n3), it has a bit complexity of O(n5), and has therefore a strongly-polynomial time complexity.

Gaussian elimination and its variants can be used on computers for systems with thousands of equations and unknowns. However, the cost becomes prohibitive for systems with millions of equations. These large systems are generally solved using iterative methods. Specific methods exist for systems whose coefficients follow a regular pattern (see system of linear equations).

Bareiss algorithm[edit]

The first strongly-polynomial time algorithm for Gaussian elimination was published by Jack Edmonds in 1967.[11]: 37  Independently, and almost simultaneously, Erwin Bareiss discovered another algorithm, based on the following remark, which applies to a division-free variant of Gaussian elimination.

In standard Gaussian elimination, one subtracts from each row below the pivot row a multiple of by where and are the entries in the pivot column of and respectively.

Bareiss variant consists, instead, of replacing with This produces a row echelon form that has the same zero entries as with the standard Gaussian elimination.

Bareiss' main remark is that each matrix entry generated by this variant is the determinant of a submatrix of the original matrix.

In particular, if one starts with integer entries, the divisions occurring in the algorithm are exact divisions resulting in integers. So, all intermediate entries and final entries are integers. Moreover, Hadamard inequality provides an upper bound on the absolute values of the intermediate and final entries, and thus a bit complexity of using soft O notation.

Moreover, as a upper bound on the size of final entries is known, a complexity can be obtained with modular computation followed either by Chinese remaindering or Hensel lifting.

As a corollary, the following problems can be solved in strongly polynomial time with the same bit complexity:[11]: 40 

  • Testing whether m given rational vectors are linearly independent
  • Computing the determinant of a rational matrix
  • Computing a solution of a rational equation system Ax = b
  • Computing the inverse matrix of a nonsingular rational matrix
  • Computing the rank of a rational matrix

Numeric instability[edit]

One possible problem is numerical instability, caused by the possibility of dividing by very small numbers. If, for example, the leading coefficient of one of the rows is very close to zero, then to row-reduce the matrix, one would need to divide by that number. This means that any error which existed for the number that was close to zero would be amplified. Gaussian elimination is numerically stable for diagonally dominant or positive-definite matrices. For general matrices, Gaussian elimination is usually considered to be stable, when using partial pivoting, even though there are examples of stable matrices for which it is unstable.[12]

Generalizations[edit]

Gaussian elimination can be performed over any field, not just the real numbers.

Buchberger's algorithm is a generalization of Gaussian elimination to systems of polynomial equations. This generalization depends heavily on the notion of a monomial order. The choice of an ordering on the variables is already implicit in Gaussian elimination, manifesting as the choice to work from left to right when selecting pivot positions.

Computing the rank of a tensor of order greater than 2 is NP-hard.[13] Therefore, if P ≠ NP, there cannot be a polynomial time analog of Gaussian elimination for higher-order tensors (matrices are array representations of order-2 tensors).

Pseudocode[edit]

As explained above, Gaussian elimination transforms a given m × n matrix A into a matrix in row-echelon form.

In the following pseudocode, A[i, j] denotes the entry of the matrix A in row i and column j with the indices starting from 1. The transformation is performed in place, meaning that the original matrix is lost for being eventually replaced by its row-echelon form.

h := 1 /* Initialization of the pivot row */
k := 1 /* Initialization of the pivot column */

while h ≤ m and k ≤ n
    /* Find the k-th pivot: */
    i_max := argmax (i = h ... m, abs(A[i, k]))
    if A[i_max, k] = 0
        /* No pivot in this column, pass to next column */
        k := k + 1
    else
        swap rows(h, i_max)
        /* Do for all rows below pivot: */
        for i = h + 1 ... m:
            f := A[i, k] / A[h, k]
            /* Fill with zeros the lower part of pivot column: */
            A[i, k] := 0
            /* Do for all remaining elements in current row: */
            for j = k + 1 ... n:
                A[i, j] := A[i, j] - A[h, j] * f
        /* Increase pivot row and column */
        h := h + 1
        k := k + 1

This algorithm differs slightly from the one discussed earlier, by choosing a pivot with largest absolute value. Such a partial pivoting may be required if, at the pivot place, the entry of the matrix is zero. In any case, choosing the largest possible absolute value of the pivot improves the numerical stability of the algorithm, when floating point is used for representing numbers.[14]

Upon completion of this procedure the matrix will be in row echelon form and the corresponding system may be solved by back substitution.

See also[edit]

References[edit]

  1. ^ "DOCUMENTA MATHEMATICA, Vol. Extra Volume: Optimization Stories (2012), 9-14". www.emis.de. Retrieved 2022-12-02.
  2. ^ Calinger 1999, pp. 234–236
  3. ^ Timothy Gowers; June Barrow-Green; Imre Leader (8 September 2008). The Princeton Companion to Mathematics. Princeton University Press. p. 607. ISBN 978-0-691-11880-2.
  4. ^ Grcar 2011a, pp. 169–172
  5. ^ Grcar 2011b, pp. 783–785
  6. ^ Lauritzen, p. 3
  7. ^ Grcar 2011b, p. 789
  8. ^ Althoen, Steven C.; McLaughlin, Renate (1987), "Gauss–Jordan reduction: a brief history", The American Mathematical Monthly, 94 (2), Mathematical Association of America: 130–142, doi:10.2307/2322413, ISSN 0002-9890, JSTOR 2322413
  9. ^ Farebrother 1988, p. 12
  10. ^ Fang, Xin Gui; Havas, George (1997). "On the worst-case complexity of integer Gaussian elimination". Proceedings of the 1997 international symposium on Symbolic and algebraic computation. ISSAC '97. Kihei, Maui, Hawaii, United States: ACM. pp. 28–31. doi:10.1145/258726.258740. ISBN 0-89791-875-4.
  11. ^ a b Grötschel, Martin; Lovász, László; Schrijver, Alexander (1993), Geometric algorithms and combinatorial optimization, Algorithms and Combinatorics, vol. 2 (2nd ed.), Springer-Verlag, Berlin, doi:10.1007/978-3-642-78240-4, ISBN 978-3-642-78242-8, MR 1261419
  12. ^ Golub & Van Loan 1996, §3.4.6
  13. ^ Hillar, Christopher; Lim, Lek-Heng (2009-11-07). "Most tensor problems are NP-hard". arXiv:0911.1393 [cs.CC].
  14. ^ Kurgalin, Sergei; Borzunov, Sergei (2021). "Algebra and Geometry with Python". SpringerLink. doi:10.1007/978-3-030-61541-3.

Works cited[edit]

External links[edit]