Dummy variable

from Wikipedia, the free encyclopedia

As a dummy variable (also design variables , indicator variables , Boolean variable , deputy variable or rarely dummy variable ; English dummy variable ) is referred to in the statistical data analysis a variable with the expressions 1 and 0 (yes-no variables) as an indicator of the presence of an expression of a multi-level variable is used. This variable on which the dummy variable is based can have any scale level .

Applications and examples

In the case of statistical evaluations , it can be helpful to know whether or not a unit of investigation has a certain characteristic of a categorical variable . For this purpose, a dummy variable with the values ​​1 and 0 is created:

  • 1 = value is available
  • 0 = value is not available

The conversion of a categorical variable into an artificial numerical variable is called coding (see below).

Example:

In an election poll, a categorical variable indicates which party the respondent would vote for. To determine the proportion of CDU voters, a dummy variable is used with the values 1 = CDU voters and 0 = no CDU voters .

With interval-scaled variables, dummies are often used to indicate whether a value is dichotomously below or above a certain limit.

Example:

The dummy variable receives the value 1 if the respondent is younger than 50 years, and otherwise the value 0.

Dummy coding

Dummy coding is used to generate indicator variables (in addition to effect coding and contrast coding ). These indicator variables are used to represent a multi-level nominally scaled feature. In addition to the (two-stage) example shown above, characteristics of a categorical variable can be mapped with dummy variables. In general, the dummy coding for a categorical variable is with categories, where , is defined as follows: First, a reference category must be defined for reasons of identifiability , e.g. B. the category . The variable can then be coded with dummy variables . Formally:

.

For the reference category you get .

example

If the above example is expanded to include other parties, the following coding results (x1 corresponds to the first dummy variable, x2 to the second, etc.):

Political party x1 x2 x3
CDU 1 0 0
SPD 0 1 0
The left 0 0 1
The green 0 0 0

With the dummy variable x1 it is coded whether a person prefers the CDU or not, with the second whether a person prefers the SPD or not and with the third whether a person prefers the left. If none of the parties is preferred, it automatically follows that the Greens are preferred (reference category). From the dummy coding in this example it follows that a preference for no party, multiple parties or a party that is not listed cannot be mapped.

application

For logistic regression analysis , it can be of interest to operationalize the probability of the occurrence of a variable that has to be dummy-coded beforehand. Dummy-coded variables can also be used as explanatory variables in a multiple linear regression . The regression parameters in a regression with dummy-coded predictor variables correspond to the deviations of the group means from the reference group, which is consistently coded with zero. The dummy coding is therefore suitable for comparing several experimental conditions with a control condition.

The problem is that the choice of coding is arbitrary and a suitable reference group is not evident (which reference is chosen when comparing five different countries?). However, the choice of the reference group should make sense from the point of view of interpretation. The dummy variables also correlate, since the reference group has the same value in each case. This leads to the fact that variance components that are not independent of one another are coded.

literature

  • C. Reinboth: Multivariate Analysis Methods in Market Research, LuLu-Verlagsgruppe, Morrisville, 2006.
  • Brosius, F. (2002). SPSS 11. Bonn: mitp-Verlag.
  • Bortz, J. Schuster, C. (2010). Statistics for human and social scientists (7th edition). Heidelberg: Springer Medicine Verlag
  • Wentura, D. Pospeschill, M. (2015). Multivariate Data Analysis - A Compact Introduction. Heidelberg: Springer

Web links

Individual evidence

  1. Bernd Rönz, Hans G. Strohe (1994), Lexicon Statistics , Gabler Verlag, p. 90.
  2. ^ Ludwig Fahrmeir , Thomas Kneib , Stefan Lang, Brian Marx: Regression: models, methods and applications. Springer Science & Business Media, 2013, ISBN 978-3-642-34332-2 , p. 32.