Projection matrix (computer vision)

Model of a pinhole camera. The mapping of the 3D object can be described mathematically with the projection matrix.

If a camera records an object, the object is reproduced on the camera image. This mapping (also called projection) is mathematically described by the so-called projection matrix . This is a special matrix from the field of computer vision and describes the perspective mapping of a three-dimensional object point to the two-dimensional image position. ${\ displaystyle \ mathbf {P}}$

Introduction and application

The projection matrix describes the perspective mapping of a three-dimensional object point to the image position by a camera. The following relationship applies between object and image point: ${\ displaystyle \ mathbf {X} = [X \; Y \; Z \; W]}$${\ displaystyle \ mathbf {x} = [x \; y \; w]}$

${\ displaystyle {\ begin {bmatrix} x \\ y \\ w \ end {bmatrix}} = {\ begin {bmatrix} p_ {11} & p_ {12} & p_ {13} & p_ {14} \\ p_ {21 } & p_ {22} & p_ {23} & p_ {24} \\ p_ {31} & p_ {32} & p_ {33} & p_ {34} \ end {bmatrix}} {\ begin {bmatrix} X \\ Y \\ Z \\ W \ end {bmatrix}} \ quad \ cong \ quad \ mathbf {x} = \ mathbf {PX}}$

The mapping of the object point onto the image plane is described here with the homogeneous coordinates used in projective geometry . Compared to Cartesian or affine coordinates, homogeneous coordinates are expanded by one coordinate and are only unique up to a scaling factor. The two-dimensional Cartesian or affine coordinates correspond to the homogeneous coordinates . The homogeneous coordinates and represent the same point. The same applies to three-dimensional space. The projection matrix thus carries out a transformation of the projective spaces in . The elements of the projection matrix depend on the orientation parameters of the camera. These are in detail the internal structure of the camera ("internal orientation") and the position of the camera in space as well as the direction of the camera ("external orientation"). ${\ displaystyle x, \, y}$${\ displaystyle u, \, v, \, w = wx, \, wy, \, w}$${\ displaystyle u, \, v, \, w}$${\ displaystyle u / w, \, v / w, \, 1 = x, \, y, \, 1}$ ${\ displaystyle \ mathbb {P} ^ {3}}$${\ displaystyle \ mathbb {P} ^ {2}}$

The internal orientation of the camera consists of the following elements: ${\ displaystyle \ mathbf {K}}$

1. The chamber constant c as the distance between the image plane and the (image-side) projection center of the camera.
2. The number of pixels per millimeter in the direction of the x-axis ( ) and y-axis ( ).${\ displaystyle k_ {x}}$${\ displaystyle k_ {y}}$
3. The position of the main point of the image as the point of penetration of the optical axis through the image plane and${\ displaystyle h_ {0} = (x_ {0}, y_ {0})}$
4. the shear angle Θ between the image axes.

This is summarized in the calibration matrix : ${\ displaystyle \ mathbf {K}}$

${\ displaystyle \ mathbf {K} = {\ begin {bmatrix} ck_ {x} & - ck_ {x} \ cot (\ Theta) & x_ {0} \\ 0 & ck_ {y} / \ sin (\ Theta) & y_ { 0} \\ 0 & 0 & 1 \ end {bmatrix}}}$

In the following, the position of a camera in relation to the world coordinate system is denoted by and the direction of recording by . The latter is a 3 × 3 rotation matrix . For thus results: ${\ displaystyle \ mathbf {C}}$${\ displaystyle \ mathbf {R}}$${\ displaystyle \ mathbf {P}}$

${\ displaystyle \ mathbf {P} = \ mathbf {KR} [\ mathbf {I} | - \ mathbf {C}]}$

( is the 3 × 3 identity matrix ). Since it is a 3 × 4 matrix, it is also 3 × 4. is thus clearly determined. ${\ displaystyle \ mathbf {I}}$${\ displaystyle [\ mathbf {I} | - \ mathbf {C}]}$${\ displaystyle \ mathbf {P}}$${\ displaystyle \ mathbf {P}}$

The advantage of the projection matrix compared to other forms of representation such as the collinearity equation is its compact representation in a single matrix. This means that there is no need to explicitly state the individual orientation parameters. Any ambiguity about the sequence of the transformation steps does not arise. It is used wherever corresponding images are made by a camera. This is the case, for example, in the areas of photogrammetry in the determination of 3D coordinates and calibration, computer vision and in projective geometry . Usually, the recorded image points are used to calculate the coordinates of the observed object points.

Geometric interpretation of the projection matrix

The elements of can be interpreted geometrically. The rows of the matrix are 4-vectors and can be viewed as planes in projective space . These 3 levels intersect in the center of the projection . The columns are 3 vectors. The first three columns are the images of the world coordinate system and correspond to the vanishing points of the X, Y and Z axes. The last column is the mapping of the origin of the world coordinate system. ${\ displaystyle \ mathbf {P}}$${\ displaystyle p ^ {i}}$${\ displaystyle \ mathbf {P}}$${\ displaystyle \ mathbb {P} ^ {3}}$${\ displaystyle \ mathbf {C}}$${\ displaystyle p_ {i}}$${\ displaystyle p_ {1}, p_ {2}, p_ {3}}$${\ displaystyle p_ {4}}$

Since the projection matrix is ​​only known up to a scaling factor λ due to the homogeneous representation, it should be standardized for this. To do this, the amount and the sign of the normalization factor must be determined. The first 3 × 3 sub-matrix of is considered for the amount . If the third row is from , then the entire projection matrix must be divided by the norm of this vector. The correct sign results from the condition . If the determinant is less than 0, the sign of all components of must be inverted. ${\ displaystyle \ mathbf {M}}$${\ displaystyle \ mathbf {P} = [\ mathbf {M} | \ mathbf {t}]}$${\ displaystyle \ mathbf {m} ^ {3}}$${\ displaystyle \ mathbf {M}}$${\ displaystyle \ det (\ mathbf {M})> 0}$${\ displaystyle \ mathbf {P}}$

Decomposition of the projection matrix

It is possible to calculate the individual orientation parameters of the camera from this. The relationship applies to the projection center . This property can be understood as a linear system of equations and solved by means of singular value decomposition . It should be noted that the rectangular matrix must be supplemented by a row with zeros. ${\ displaystyle \ mathbf {P}}$${\ displaystyle \ mathbf {C}}$${\ displaystyle \ mathbf {PC} = 0}$${\ displaystyle \ mathbf {P}}$

The rotation matrix and the calibration matrix extract a QR decomposition from the first matrix 3 × 3 sub-matrix of : ${\ displaystyle \ mathbf {R}}$${\ displaystyle \ mathbf {K}}$${\ displaystyle \ mathbf {M}}$${\ displaystyle \ mathbf {P}}$

${\ displaystyle \ mathbf {M} = \ mathbf {KR} = {\ begin {bmatrix} k_ {11} & k_ {21} & k_ {31} \\ 0 & k_ {22} & k_ {32} \\ 0 & 0 & k_ {33} \ end {bmatrix}} {\ begin {bmatrix} r_ {11} & r_ {21} & r_ {31} \\ r_ {21} & r_ {22} & r_ {32} \\ r_ {31} & r_ {23} & r_ {33 } \ end {bmatrix}}}$

${\ displaystyle \ mathbf {K}}$is the calibration matrix here, contains the elements of the rotation matrix. Thus all parameters of the inner and outer orientation are determined. ${\ displaystyle \ mathbf {R}}$

Calculation of the projection matrix from point correspondences

As shown in the Mathematical Representation section, the projection matrix can be calculated directly from the camera's orientation parameters. Since the calculation of the projection matrix is ​​usually carried out before the camera parameters are determined, this case rarely occurs. The following explains how calculations can only be made with the help of known object points and their images. ${\ displaystyle \ mathbf {P}}$

If a lot of point correspondences are given, one can calculate from these point pairs. The aim is to determine a matrix such that . For this purpose, the formula is by means of the cross product to converted. If , after rearranging the equation, the following relationship arises: ${\ displaystyle X_ {i} \ leftrightarrow x_ {i}}$${\ displaystyle \ mathbf {P}}$${\ displaystyle \ mathbf {P}}$${\ displaystyle \ mathbf {x} _ {i} = \ mathbf {P} \ mathbf {X} _ {i}}$${\ displaystyle \ mathbf {x} _ {i} \ times \ mathbf {PX} _ {i} = \ mathbf {0}}$${\ displaystyle \ mathbf {x} _ {i} = [x_ {i} \ quad y_ {i} \ quad w_ {i}]}$

${\ displaystyle {\ begin {bmatrix} \ mathbf {0} ^ {T} & - w_ {i} \ mathbf {X} _ {i} & y_ {i} \ mathbf {X} _ {i} \\ w_ { i} \ mathbf {X} _ {i} & \ mathbf {0} ^ {T} & - x_ {i} \ mathbf {X} _ {i} \\ - y_ {i} \ mathbf {X} _ { i} & x_ {i} \ mathbf {X} _ {i} & \ mathbf {0} ^ {T} \ end {bmatrix}} {\ begin {pmatrix} \ mathbf {P} ^ {1T} \\\ mathbf {P} ^ {2T} \\\ mathbf {P} ^ {3T} \ end {pmatrix}} = \ mathbf {0}}$

with the i- th line of . ${\ displaystyle \ mathbf {P} ^ {i}}$${\ displaystyle \ mathbf {P}}$

Since these three equations are linearly dependent, only the first two are used. A point correspondence thus yields two equations. A matrix of 2 n × 12 is obtained from n point correspondences . The projection matrix is ​​calculated from , where the vector with the elements of is. ${\ displaystyle \ mathbf {A}}$${\ displaystyle \ mathbf {Ap} = 0}$${\ displaystyle \ mathbf {p}}$${\ displaystyle \ mathbf {P}}$

Minimal solution

Since the matrix has twelve elements and is of rank 11, eleven equations are sufficient to solve the system of equations. Since every point correspondence yields two equations, five point correspondences and knowledge of the x or y coordinate of the sixth correspondence are sufficient. is then an 11 × 12 matrix whose right null space contains the solution for . ${\ displaystyle \ mathbf {P}}$${\ displaystyle \ mathbf {A}}$${\ displaystyle \ mathbf {P}}$

Overdetermined solution

Since the point correspondences mostly contain errors, there is no exact solution for . Therefore, a solution must be determined by minimizing an algebraic or geometric measure of error. ${\ displaystyle \ mathbf {Ap} = 0}$

Algebraic measure of error

In the case of an algebraic measure of error, the approach is to minimize with a secondary constraint. These secondary restrictions can be: ${\ displaystyle || \ mathbf {Ap} ||}$

1. ${\ displaystyle || \ mathbf {p} || = 1}$
2. ${\ displaystyle || \ mathbf {\ dot {p}} ^ {3} || = 1}$, where contains the first three elements of the last line of .${\ displaystyle || \ mathbf {\ dot {p}} ^ {3} ||}$${\ displaystyle \ mathbf {P}}$

In both cases, the error vector is called an algebraic error . This method was presented by Ivan Sutherland in 1963 as part of his dissertation on Sketchpad . ${\ displaystyle || \ mathbf {Ap} ||}$

Geometric measure of error

Control point field with marks

If very precisely measured world coordinates are available, as when using a measured control point field , the geometric error d can be defined in the image: ${\ displaystyle \ mathbf {X_ {i}}}$

${\ displaystyle d = \ sum _ {i} d (\ mathbf {x} _ {i}, {\ hat {\ mathbf {x}}} _ {i}) ^ {2}}$

Here are the measured pixels and the point . If the errors are normally distributed , then this is the solution ${\ displaystyle \ mathbf {x} _ {i}}$${\ displaystyle {\ hat {\ mathbf {x}}} _ {i}}$${\ displaystyle \ mathbf {PX} _ {i}}$

${\ displaystyle \ min _ {p} \ sum _ {i} d (\ mathbf {x} _ {i}, {\ hat {\ mathbf {x}}} _ {i}) ^ {2}}$

the maximum likelihood estimate of . Iterative techniques such as the Levenberg-Marquardt algorithm are used for the solution. ${\ displaystyle \ mathbf {P}}$

Procedure in practice

The prerequisite for the calculation of is that there are more than six point correspondences. The aim is then to determine the maximum likelihood estimate of . Since the maximum likelihood method requires good starting values ​​for the minimization, a solution is determined beforehand using the algebraic error measure. In addition, the input data is normalized. All image points are shifted so that their center of gravity lies in the origin of the coordinate system. After that, they are scaled so that the average distance to the origin is. The object points are also moved to the origin and scaled so that the average distance to the origin is. This approach leads to numerically more stable results. The respective transformations of the image points and the object points must be reversed after calculating . ${\ displaystyle \ mathbf {P}}$${\ displaystyle \ mathbf {P}}$${\ displaystyle \ mathbf {P}}$${\ displaystyle {\ sqrt {2}}}$${\ displaystyle {\ sqrt {3}}}$${\ displaystyle \ mathbf {T}}$${\ displaystyle \ mathbf {U}}$${\ displaystyle \ mathbf {P}}$

Distortion correction

Before you can actually start determining the projection matrix, you have to correct any distortion in the image beforehand , in accordance with the accuracy requirements . The distortion parameters must have previously been determined by a camera calibration . A suitable distortion correction can then be carried out in this way. The image can then be viewed as free of distortion, ie the image points match the straight imaging rays - according to the pinhole camera model.

The determination of the projection matrix itself is often part of a camera calibration. Then a multi-stage approach is necessary. In a first step, as many parameters as possible are determined using linear least squares adjustment. An iterative optimization then takes place, taking into account all model parameters including the necessary distortion parameters.

Individual evidence

1. ^ Ivan Edward Sutherland : Sketchpad: A man-machine graphical communications system . In: Technical Report 296, MIT Lincoln Laboratories . 1963 ( annotated version, 2003 [PDF; 4.1 MB ]).
2. Chester C. Slama (Ed.): Manual of Photogrammetry. 4th edition. American Society of Photogrammetry, Falls Church VA 1980, ISBN 0-937294-01-2 .
3. ^ Berthold KP Horn: Tsai's camera calibration method revisited. 2000, accessed on July 25, 2020 .