# Linear regression

The linear regression (short: LR ) is a special case of regression analysis , so a statistical method, which attempts, an observed dependent variable by one or more or independent variables to explain. In the case of linear regression, a linear model ( LM for short ) is assumed. So only those relationships are used in which the dependent variable is a linear combination of the regression coefficients (but not necessarily the independent variables). The term regression or regression towards the middle was mainly coined by the statistician Francis Galton .

## Simple linear regression

The simple linear regression model ( ELR for short ) assumes only two metric variables: an influencing variable and a target variable . With the simple linear regression, a straight line is drawn through a point cloud with the help of two parameters in such a way that the linear relationship between and is described as well as possible. The equation of single linear regression is given by ${\ displaystyle X}$${\ displaystyle Y}$${\ displaystyle X}$${\ displaystyle Y}$

${\ displaystyle Y_ {i} = \ beta _ {0} + \ beta _ {1} x_ {i} + \ varepsilon _ {i}, \ quad i = 1, \ dotsc, n}$.

## Multiple linear regression

The multiple linear regression (abbreviated: MLR ) represents a generalization of the simple linear regression, whereby K regressors are now assumed, which are supposed to explain the dependent variable. In addition to the variation across the observations, a variation across the regressors is also assumed, which results in a linear system of equations which can be summarized in matrix notation as follows:

${\ displaystyle \ mathbf {y} = \ mathbf {X} {\ boldsymbol {\ beta}} + {\ boldsymbol {\ varepsilon}} \;}$with .${\ displaystyle \; {\ boldsymbol {\ varepsilon}} \ sim (\ mathbf {0}, \ sigma ^ {2} \ mathbf {I} _ {T})}$

### Generalized linear regression

The generalized linear regression model (short: VLR ) is an extension of the multiple linear regression model, which also allows heteroscedasticity and autocorrelation . The variance-covariance matrix of the error terms is then no longer , but a non-constant matrix . In matrix notation, the model is: ${\ displaystyle \ sigma ^ {2} \ mathbf {I} _ {T}}$${\ displaystyle {\ boldsymbol {\ Phi}} = \ sigma ^ {2} \ mathbf {\ Psi}}$

${\ displaystyle \ mathbf {y} = \ mathbf {X} {\ boldsymbol {\ beta}} + {\ boldsymbol {\ varepsilon}} \;}$with .${\ displaystyle \; {\ boldsymbol {\ varepsilon}} \ sim (\ mathbf {0}, \ sigma ^ {2} {\ boldsymbol {\ Psi}})}$

### Classic normal regression

If the assumption of the normal distribution of the error terms is made in addition to the previous (classical) multiple linear model (abbreviated: KLM ), then one speaks of a classical linear model of normal regression. The assumption of the normal distribution of the error terms is required to perform statistical inference, i.e. That is, it is required to be able to calculate confidence intervals and significance tests .

${\ displaystyle \ mathbf {y} = \ mathbf {X} {\ boldsymbol {\ beta}} + {\ boldsymbol {\ varepsilon}} \;}$with .${\ displaystyle \; {\ boldsymbol {\ varepsilon}} \ sim {\ mathcal {N}} (\ mathbf {0}, \ sigma ^ {2} \ mathbf {I} _ {T})}$

### Panel data regression

The general linear panel data model allows the axis intercept and the slope parameters to vary, on the one hand, over the individuals (in cross-sectional dimension) and, on the other hand, over time (non-time-invariant). The general linear panel data model is: ${\ displaystyle i}$${\ displaystyle t}$

${\ displaystyle y_ {it} = \ alpha _ {it} + \ mathbf {x} _ {it} ^ {\ top} {\ boldsymbol {\ beta}} _ {it} + \ varepsilon _ {it}, \ ; \; i = 1, \ dotsc, N; \; \; t = 1, \ dotsc, T}$

with the variance-covariance matrix:

${\ displaystyle \ operatorname {Cov} ({\ boldsymbol {\ varepsilon}}) = \ operatorname {E} ({\ boldsymbol {\ varepsilon}} {\ boldsymbol {\ varepsilon}} ^ {\ top}) = \ mathbf {\ Sigma} \ otimes \ mathbf {I} _ {T} = \ mathbf {\ Phi}}$

Here is a scalar dependent variable, is a vector of independent variables, is a scalar error term. Since this model is too general and cannot be estimated if there are more parameters than observations, limiting assumptions must be made regarding the variation of and with and and regarding the behavior of the error term. These additional restrictions and the models based on them are topics of linear panel data models and panel data analysis . ${\ displaystyle y_ {it}}$${\ displaystyle \ mathbf {x} _ {it} ^ {\ top}}$${\ displaystyle (K \ times 1)}$${\ displaystyle \ varepsilon _ {it}}$${\ displaystyle \ alpha _ {it}}$${\ displaystyle \ beta _ {it}}$${\ displaystyle i}$${\ displaystyle t}$

## Generalized linear models

Linear models can be expanded in such a way that no fixed data matrix is ​​examined, but that this is also random. This model is called a generalized linear model (short: GLM ). In this case, the examination methods do not change substantially, but become significantly more complicated and therefore more computationally expensive.

## General linear models

The general linear model ( ALM for short ) considers the situation in which the dependent variable is not a scalar but a vector. In this case, conditioned linearity is also assumed as in the classic linear model, but with a matrix that replaces the vector of the classic linear model. Multivariate counterparts to the common least squares method and the generalized least squares method have been developed. General linear models are also called “multivariate linear models”. However, these are not to be confused with multiple linear models. The general linear model is given by ${\ displaystyle Y}$${\ displaystyle \ operatorname {E} (\ mathbf {y} \ mid \ mathbf {X}) = \ mathbf {X} \ mathbf {B}}$${\ displaystyle \ mathbf {B}}$${\ displaystyle {\ boldsymbol {\ beta}}}$

${\ displaystyle \ mathbf {Y} = \ mathbf {X} \ mathbf {B} + \ mathbf {U}}$.

## Orthogonal regression

The orthogonal regression (more precisely: orthogonal linear regression) is used to calculate a best-fit line for a finite set of metrically scaled data pairs using the least squares method. ${\ displaystyle (x_ {i}, y_ {i})}$

## Regression regularization

In order to ensure a desired behavior of the regression and thus to avoid over-fitting to the training data set , there is the possibility of providing the regression term with penalty terms that occur as secondary conditions .

The most well-known regularizations include:

• The -Regularisierung (also called LASSO regularization): Through are preferably individual elements of the vector minimized. The other elements of the vector can, however, assume large values ​​(in terms of absolute value). This favors the formation of sparse matrices , which enables more efficient algorithms.${\ displaystyle L_ {1}}$${\ displaystyle {\ varvec {\ hat {\ beta}}} = {\ underset {\ varvec {\ beta}} {\ arg \ min}} (\ | \ mathbf {y} - \ mathbf {X} {\ boldsymbol {\ beta}} \ | ^ {2} + \ lambda \ | {\ boldsymbol {\ beta}} \ | _ {1})}$${\ displaystyle {\ boldsymbol {\ hat {\ beta}}}}$
• The regularization (also called ridge regularization): The entire vector is uniformly minimized by, but the matrices are fuller.${\ displaystyle L_ {2}}$${\ displaystyle {\ varvec {\ hat {\ beta}}} = {\ underset {\ varvec {\ beta}} {\ arg \ min}} (\ | \ mathbf {y} - \ mathbf {X} {\ boldsymbol {\ beta}} \ | ^ {2} + \ lambda \ | {\ boldsymbol {\ beta}} \ | ^ {2})}$${\ displaystyle {\ boldsymbol {\ hat {\ beta}}}}$
• The elastic network: Here both the - and the - regularization are carried out by the expression .${\ displaystyle {\ varvec {\ hat {\ beta}}} = {\ underset {\ varvec {\ beta}} {\ arg \ min}} (\ | \ mathbf {y} - \ mathbf {X} {\ boldsymbol {\ beta}} \ | ^ {2} + \ lambda _ {2} \ | {\ boldsymbol {\ beta}} \ | ^ {2} + \ lambda _ {1} \ | {\ boldsymbol {\ beta }} \ | _ {1})}$${\ displaystyle L_ {1}}$${\ displaystyle L_ {2}}$

## Applications of regression analysis

Special applications of regression analysis also relate to the analysis of discrete and restricted dependent variables. A distinction can be made here between the type of dependent variable and the type of restriction of the value range. The regression models that can be used here are listed below. For more information, see Frone (1997) and Long (1997).

Models for different types of dependent variables ( generalized linear models ):

Models for different types of restricted value ranges:

### Application in econometrics

The following are particularly suitable for quantitative economic analyzes within the framework of regression analysis, for example econometrics :

Wikibooks: Introduction to Regression Calculation  - Learning and Teaching Materials
Commons : Linear Regression  - collection of images, videos and audio files

## literature

• Norman R. Draper, Harry Smith: Applied Regression Analysis. 3. Edition. Wiley, New York 1998, ISBN 0-471-17082-8 .
• Ludwig Fahrmeir , Thomas Kneib , Stefan Lang: Regression: Models, Methods and Applications. Springer Verlag, Berlin / Heidelberg / New York 2007, ISBN 978-3-540-33932-8 .
• Peter Schönfeld: Methods of Econometrics . Berlin / Frankfurt 1969.
• Dieter Urban, Jochen Mayerl: Regression analysis: theory, technology and application. 2., revised. Edition. VS Verlag, Wiesbaden 2006, ISBN 3-531-33739-4 .
• G. Judge, R. Carter Hill: Introduction to the Theory and Practice of Econometrics. Wiley, New York 1988, ISBN 0-471-62414-4 .

## Individual evidence

1. ^ Hui Zou, Trevor Hastie: Regularization and Variable Selection via the Elastic Net. (PDF).
2. MR Frone: Regression models for discrete and limited dependent variables. Research Methods Forum No. 2, 1997, online. ( Memento of January 7, 2007 in the Internet Archive ).
3. ^ JS Long: Regression models for categorical and limited dependent variables. Sage, Thousand Oaks, CA 1997.