# Regression discontinuity analysis

The regression discontinuity design , or Regressionsdiskontinuitätsanalyse ( English regression discontinuity design is) a method of inferential statistics and the Ökonometrie to that is applied to the causal effects of the change to identify a variable to the variation of other variables. The basic idea is to use a discontinuity or discontinuity in an observed control variable that leads to an almost random allocation to the treatment or control group . The regression discontinuity analysis, like the instrument variable approach and the difference-of-differences approach, is one of the methods that use so-called “natural” or “ quasi-experiments ”.

## idea

In many situations in which causal effects are to be examined and quantified, there is a correlation between the explanatory variable and the error term, which leads to endogeneity and thus to inconsistency of the least squares method : Even for large samples, the least squares estimator is used not be undistorted. Regression discontinuity analysis can be used to overcome this problem.

The basic idea of ​​regression discontinuity analysis is to find a discontinuity in an observable control variable that has an impact on whether or not an individual receives treatment . This can best be illustrated with an example. In a study published in 1999, economists Joshua Angrist and Viktor Lavy examined the effect of class size on student performance. They used the “rule of Maimonides ”, which is still used in Israel today, to regulate the class size in public schools. According to this rule, a class can have a maximum of 40 students. If she has more, a second class must be formed. A strong discontinuity arises here between the number of students in a year at a school and the class size: If the school has 39 students, there is a class of 39 students; If the school has 40 students, there are two classes with 20 students each. Whether a school has 39 or 40 students is not completely under the control of the individuals involved, but is at least partly due to chance. Because of this, it can be viewed as an exogenous variation that allows a consistent estimate of the effect of class size on student performance.

Differences are "sharp RD analysis" must be in the RD analysis between the classical ( English sharp regression discontinuity design ) and the "fuzzy RD Analysis" ( English fuzzy regression discontinuity design ). In sharp RD analysis, "treatment" is a deterministic function of the underlying control variable; H. the control variable determines the treatment perfectly (as in the example above). In fuzzy RD analysis, the control variable does not perfectly determine the treatment, but it does affect its likelihood or expectation .

## Math background

### Sharp RD analysis

The underlying " true model " is

${\ displaystyle Y = \ beta _ {0} + \ beta _ {D} D + \ beta _ {X} X + U}$

where is an indicator variable that indicates whether a person has been "treated" or not. In the example above , "is in a small class" would be "Number of students at the school". be the point at which the discontinuity lies, i.e. in the above example . Than are ${\ displaystyle D}$${\ displaystyle D}$${\ displaystyle X}$${\ displaystyle X = c}$${\ displaystyle 40}$

${\ displaystyle \ operatorname {E} (Y \ mid X = c) = \ beta _ {0} + \ beta _ {D} + \ beta _ {X} c + \ operatorname {E} (U \ mid X = c )}$

Assuming that is continuous, the following also applies for the limit value on the left: ${\ displaystyle \ operatorname {E} (U \ mid X)}$

${\ displaystyle \ lim _ {x \ to c ^ {-}} \ operatorname {E} (Y \ mid X = x) = \ beta _ {0} + \ beta _ {X} c + \ operatorname {E} ( U \ mid X = c)}$

(where the limit to the left of the discontinuity is supposed to represent) Then is ${\ displaystyle \ lim _ {x \ to c ^ {-}}}$

${\ displaystyle \ operatorname {E} (Y \ mid X = c) - \ lim _ {x \ to c ^ {-}} \ operatorname {E} (Y \ mid X = x) = \ beta _ {D} }$, the effect of the treatment can thus be expressed as the difference between the two expected values.

These expected values ​​can be estimated, for example, by rescaling the data so that it is the zero point and then performing two least-squares estimates to the left and right of it. The difference in the expected values ​​can then be calculated as the difference between the two constants of the least squares estimate. Alternatively, an estimate using a single least squares estimate with corresponding interaction terms is also possible. ${\ displaystyle c}$

If the effect of the treatment is different for different individuals, it can be shown that the sharp RD analysis the average treatment effect ( English average treatment effect ), indicating. ${\ displaystyle \ operatorname {E} (\ beta _ {Tue})}$

### Fuzzy RD analysis

The underlying true model is again

${\ displaystyle Y = \ beta _ {0} + \ beta _ {D} D + \ beta _ {X} X + U}$

However is now

${\ displaystyle D = \ gamma _ {Z} Z + \ gamma _ {X} X + V}$

Where Z has no direct effect on Y. Then it can be calculated

${\ displaystyle \ operatorname {E} (Y \ mid X = c) - \ lim _ {x \ to c ^ {-}} \ operatorname {E} (Y \ mid X = x) = \ beta _ {D} (\ operatorname {E} (D \ mid X = c) - \ lim _ {x \ to c ^ {-}} \ operatorname {E} (D \ mid X = x))}$

and consequently

${\ displaystyle \ beta _ {D} = {\ frac {\ operatorname {E} (Y \ mid X = c) - \ lim _ {x \ to c ^ {-}} \ operatorname {E} (Y \ mid X = x)} {\ operatorname {E} (D \ mid X = c) - \ lim _ {x \ to c ^ {-}} \ operatorname {E} (D \ mid X = x)}}}$.

The fuzzy RD analysis can be estimated like an instrument variable estimation , with as an instrument for . First , regressions are made. The estimated values ​​obtained in this way are then used in a second regression as explanatory variables for (see also the mathematical background on instrument variables ). ${\ displaystyle Z}$${\ displaystyle D}$${\ displaystyle D}$${\ displaystyle Z}$${\ displaystyle {\ hat {D}}}$${\ displaystyle Y}$

The advantages of using regression discontinuity analysis are numerous. If the observing individuals have no influence on the allocation variable ( in the example above), the allocation to the treatment and control group is random and allows a procedure analogous to an actual experiment based on random selection without having carried out one. In fact, this is enough even if the individuals do not have perfect control over the allocation variable. Even if the “test subjects” can determine to a certain extent, the final distribution around the discontinuity point is random. This is a particular advantage of RDD over other quasi-experimental research approaches , where the quasi-random allocation often has to be assumed and defended with the help of verbal arguments. ${\ displaystyle X}$${\ displaystyle X}$

RDD is also an important part of an entire quasi-experimental research agenda, also known as "credibility Revolution" ( English credibility revolution is known in applied economics). Proponents of this agenda emphasize that the increased use of experimental and quasi-experimental research approaches has led to more credible research results.

A potential problem with the application of RDD estimators is the risk of an incorrect specification of the underlying functional form. If the underlying “ true model ” does not follow a linear relationship, for example, an estimate as described above would generally be biased and not true to expectations. Possible remedies for this are inserting higher polynomials (for example:) or resorting to non-parametric estimates . ${\ displaystyle X ^ {2}, X ^ {3}, X ^ {4}, \ dots}$

As part of the “quasi-experimental” research method, RDD is also exposed to criticism of this. Christopher Sims sees RDD and related research as "useful but [...] not a panacea," while Angus Deaton fears that researchers' attention may shift to the feasibility of a study over its relevance. As with all quasi-experimental approaches, the external validity of RDD estimators is also limited. Strictly speaking, the measured effects are only reliably measured for the examined discontinuity. These findings are difficult to generalize in relation to other contexts (e.g. other countries, populations, or policies).

## history

Regression discontinuity analysis was first used in 1960 by psychologists Donald L. Thistlewaite and Donald T. Campbell . In economics and econometrics, however, it was not widely used until much later, in the late 1990s and early 2000s. The first important studies included the above-mentioned article by Angrist and Lavy and an article by Wilbert van der Klaauw from 2002. Since then, the RD analysis has become a widely used tool in empirical economics.

## literature

• Angrist, Joshua D. / Pischke, Jörn-Steffen : Mostly Harmless Econometrics: An Empiricist's Companion , Princeton University Press, 2008
• Angrist, Joshua D./Lavy, Victor: Using Maimonides' Rule To Estimate The Effect Of Class Size On Scholastic Achievement , Quarterly Journal of Economics 114.2, May 1999, pp. 533-575
• Lee, David S. / Lemieux, Thomas: Regression Discontinuity Designs in Economics , Journal of Economic Literature 48, June 2010, pp. 281–355
• Donald L. Thistlewaite, Donald T. Campbell: Regression-Discontinuity Analysis: An alternative to the ex post facto experiment , 1960, Journal of Educational Psychology 51: 309-317
• van der Klaauw, Wilbert: Estimating the Effect of Financial Aid Offers on College Enrollment: A Regression-Discontinuity Approach , International Economic Review , 43.4, November 2010, pp. 1249–1287

## Remarks

1. Angrist & Pischke, Mostly Harmless Econometrics , 2008, p. 137
2. Angrist & Pischke, Mostly Harmless Econometrics , 2008, p. 142
3. ^ Lee & Lemieux, Regression Discontinuity Designs in Economics , 2010, p. 283, p. 295
4. See e.g. B. Joshua D. Angrist and Jörn-Steffen Pischke: The Credibility Revolution in Empirical Economics: How Better Research Design is Taking the Con out of Econometrics , Journal of Economic Perspectives , 24.2, Summer 2010, pp. 3-30. Similarly, Imbens, Guido W .: Better LATE than nothing: Some Comments on Deaton (2009) and Heckman and Urzua (2009) , NBER Working Paper 14896, April 2009
5. ^ Lee & Lemieux, Regression Discontinuity Designs in Economics , 2010, p. 316
6. Sims, Christopher: But Economics Is Not an Experimental Science , Journal of Economic Perspectives, February 24, 2010, p. 59
7. ^ Deaton, Angus: Instruments of Development: Randomization in the Tropics, and the Search for the Elusive Keys to Economic Development , NBER Working Paper 14690, January 2009, pp. 9f.
8. Lee & Lemieux, Regression Discontinuity Designs in Economics , 2010, pp. 281f.