# Data matrix

In the statistics which is the data matrix , also experimental design matrix , the design matrix (from English design research : German experimental design ) model matrix , observation matrix or regressor called a matrix , the data on a plurality of features of several persons or objects ( statistical units ) contains. It is the basis of the classical model of linear multiple regression .

The term test plan or design matrix (denoted by ) comes from the sub-area of statistical test planning, which deals with the statistically optimal design of experiments (see optimal test planning ). When the values ​​of the are planned (determined by the researcher), the matrix essentially contains the design and is therefore sometimes referred to as the design matrix. ${\ displaystyle \ mathbf {X}}$${\ displaystyle x_ {ij}}$${\ displaystyle \ mathbf {X}}$

## definition

Assuming that there are units of investigation on which variables were observed, then the value observed on the -th unit of investigation is the -th variable . The data matrix is defined as the matrix ${\ displaystyle n}$ ${\ displaystyle p = k + 1}$${\ displaystyle i}$${\ displaystyle j}$${\ displaystyle x_ {ij}}$${\ displaystyle n \ times p}$

${\ displaystyle \ mathbf {X} = (x_ {ij}) _ {n \ times p} = {\ begin {pmatrix} 1 & x_ {11} & x_ {12} & \ cdots & x_ {1k} \\ 1 & x_ {21} & x_ {22} & \ cdots & x_ {2k} \\\ vdots & \ vdots & \ vdots & \ ddots & \ vdots \\ 1 & x_ {n1} & x_ {n2} & \ cdots & x_ {nk} \ end {pmatrix}} }$.

The th row of the data matrix is the - with the most educated - observed th object variable values row vector , . The -th object can be represented geometrically as a point by interpreting the elements as the coordinates of a point in a -dimensional feature space that is spanned by feature axes arranged at right angles. If all line vectors of are represented as points in this way , a distribution of points in the feature space that represents the objects (units of investigation) results. ${\ displaystyle i}$${\ displaystyle \ mathbf {X}}$${\ displaystyle i}$ ${\ displaystyle \ mathbf {x} _ {i \ mathbf {.}} ^ {\ top} = (x_ {i0}, x_ {i1}, \ dotsc, x_ {ik})}$${\ displaystyle i = 1, \ dotsc, n}$${\ displaystyle i}$${\ displaystyle p}$${\ displaystyle p}$${\ displaystyle \ mathbf {X}}$

Likewise, you can see the data matrix as a summary of column vectors , interpret. Each column vector is assigned to a variable and contains the values ​​of this variable observed on the units of investigation. With these values, the variables can be represented as points in a right-angled coordinate system in which the axes represent the units of investigation. The relationships between the variables can be illustrated in the object space spanned by the axes . ${\ displaystyle \ mathbf {x} _ {\ mathbf {.} j} = (x_ {1j}, x_ {2j}, \ dotsc, x_ {nj}) ^ {\ top}}$${\ displaystyle j = 0, \ dotsc, k}$${\ displaystyle X_ {j}}$${\ displaystyle n}$${\ displaystyle n}$

## Alternative representations

The data matrix can be expressed as a partitioned matrix with respect to its columns as ${\ displaystyle \ mathbf {X}}$${\ displaystyle p = k + 1}$

${\ displaystyle \ mathbf {X} = (\ mathbf {1}, \ mathbf {x} _ {(1)}, \ mathbf {x} _ {(2)}, \ dotsc, \ mathbf {x} _ { (k)})}$.

The columns of the data matrix including the one- vector are all -dimensional vectors and therefore points in the data space. Since it is usually assumed to be of rank , the vectors are linearly independent . The set of all possible linear combinations of the columns of form a subset of the data space. ${\ displaystyle \ mathbf {X}}$ ${\ displaystyle \ mathbf {1}}$${\ displaystyle n}$${\ displaystyle \ mathbf {X}}$ ${\ displaystyle k + 1}$${\ displaystyle \ mathbf {X}}$

## Individual evidence

1. a b design matrix. Glossary of statistical terms. In: International Statistical Institute . June 1, 2011, accessed May 19, 2020 .
2. ^ Rencher, Alvin C., and G. Bruce Schaalje: Linear models in statistics. , John Wiley & Sons, 2008., p. 139
3. Werner Timischl : Applied Statistics. An introduction for biologists and medical professionals. 3. Edition. 2013, p. 420.
4. Werner Timischl: Applied Statistics. An introduction for biologists and medical professionals. 3. Edition. 2013, p. 420.
5. ^ Rencher, Alvin C., and G. Bruce Schaalje: Linear models in statistics. , John Wiley & Sons, 2008., p. 153.