Data matrix

from Wikipedia, the free encyclopedia

In the statistics which is the data matrix , also experimental design matrix , the design matrix (from English design research : German experimental design ) model matrix , observation matrix or regressor called a matrix , the data on a plurality of features of several persons or objects ( statistical units ) contains. It is the basis of the classical model of linear multiple regression .

The term test plan or design matrix (denoted by ) comes from the sub-area of statistical test planning, which deals with the statistically optimal design of experiments (see optimal test planning ). When the values ​​of the are planned (determined by the researcher), the matrix essentially contains the design and is therefore sometimes referred to as the design matrix.

definition

Assuming that there are units of investigation on which variables were observed, then the value observed on the -th unit of investigation is the -th variable . The data matrix is defined as the matrix

.

The th row of the data matrix is the - with the most educated - observed th object variable values row vector , . The -th object can be represented geometrically as a point by interpreting the elements as the coordinates of a point in a -dimensional feature space that is spanned by feature axes arranged at right angles. If all line vectors of are represented as points in this way , a distribution of points in the feature space that represents the objects (units of investigation) results.

Likewise, you can see the data matrix as a summary of column vectors , interpret. Each column vector is assigned to a variable and contains the values ​​of this variable observed on the units of investigation. With these values, the variables can be represented as points in a right-angled coordinate system in which the axes represent the units of investigation. The relationships between the variables can be illustrated in the object space spanned by the axes .

Alternative representations

The data matrix can be expressed as a partitioned matrix with respect to its columns as

.

The columns of the data matrix including the one- vector are all -dimensional vectors and therefore points in the data space. Since it is usually assumed to be of rank , the vectors are linearly independent . The set of all possible linear combinations of the columns of form a subset of the data space.

Individual evidence

  1. a b design matrix. Glossary of statistical terms. In: International Statistical Institute . June 1, 2011, accessed May 19, 2020 .
  2. ^ Rencher, Alvin C., and G. Bruce Schaalje: Linear models in statistics. , John Wiley & Sons, 2008., p. 139
  3. Werner Timischl : Applied Statistics. An introduction for biologists and medical professionals. 3. Edition. 2013, p. 420.
  4. Werner Timischl: Applied Statistics. An introduction for biologists and medical professionals. 3. Edition. 2013, p. 420.
  5. ^ Rencher, Alvin C., and G. Bruce Schaalje: Linear models in statistics. , John Wiley & Sons, 2008., p. 153.