Chow test

The Chow test is a statistical test used to test the coefficients of two linear regressions for equality. The test is named after its inventor, the economist Gregory Chow .

The Chow test is in econometrics used time series of structural changes to test. Another area of application is program evaluation, where two different subgroups (programs), such as two types of schools, are compared with one another. In contrast to the time series analysis, the two subgroups cannot be assigned to successive intervals; instead, the classification is based on a qualitative aspect, such as the type of school.

Structural break	Program evaluation

If there is a structural break, regressions on the partial intervals and provide better modeling than the regression over the entire interval (dashed) ${\ displaystyle x = 1 {,} 7}$ ${\ displaystyle [0; 1,7]}$ ${\ displaystyle [1,7; 4]}$	Comparison of two programs (red, green) in the same data set; separate regressions on the data belonging to a program provide better modeling than the regression over the entire data set (black)

Action

Given is a data set with for , the relationship of which is described by a linear function with a normally distributed error ( ) with an expected value of 0 ( ) (multiple regression analysis), i.e. H. One has ${\ displaystyle (Y_ {i}, X_ {i})}$ ${\ displaystyle X_ {i} = (x_ {i1}, \ ldots, x_ {ik})}$ ${\ displaystyle i = 1 \ ldots N}$ ${\ displaystyle \ epsilon}$ ${\ displaystyle E (\ epsilon) = 0}$

{\ displaystyle Y_ {i} = c_ {0} + c_ {1} x_ {i1} + c_ {2} x_ {i2} + \ ldots + c_ {k} x_ {ik} + \ epsilon _ {i}}

for .

{\ displaystyle i = 1 \ ldots N}

It is assumed, however, that the data set can be divided into two groups of sizes and that are better described by two different linear functions. ${\ displaystyle N_ {a}}$ ${\ displaystyle N_ {b}}$

{\ displaystyle Y_ {i} = a_ {0} + a_ {1} x_ {i1} + a_ {2} x_ {i2} + \ ldots + a_ {k} x_ {ik} + \ epsilon _ {i}}

For

{\ displaystyle i = 1 \ ldots N_ {a}}

{\ displaystyle Y_ {i} = b_ {0} + b_ {1} x_ {i1} + b_ {2} x_ {i2} + \ ldots + b_ {k} x_ {ik} + \ epsilon _ {i}}

For

{\ displaystyle i = N_ {a} +1 \ ldots N_ {a} + N_ {b}}

Here is and it is tested against the hypothesis . If you denote the sum of the squared residuals of the regression over the entire data set with and over the two subgroups with and , then the test variable defined below follows an F-distribution with degrees of freedom and . ${\ displaystyle N = N_ {a} + N_ {b}}$ ${\ displaystyle H_ {0} \ colon (a_ {0}, a_ {1}, \ ldots, a_ {k}) = (b_ {0}, b_ {1}, \ ldots, b_ {k})}$ ${\ displaystyle H_ {1} \ colon (a_ {0}, a_ {1}, \ ldots, a_ {k}) \ neq (b_ {0}, b_ {1}, \ ldots, b_ {k})}$ ${\ displaystyle S}$ ${\ displaystyle S_ {a}}$ ${\ displaystyle S_ {b}}$ ${\ displaystyle T}$ ${\ displaystyle k + 1}$ ${\ displaystyle N_ {a} + N_ {b} -2 (k + 1)}$

{\ displaystyle T: = {\ frac {(S- (S_ {a} + S_ {b})) / (k + 1)} {(S_ {a} + S_ {b}) / (N_ {a} + N_ {b} -2 (k + 1))}}}

example

The following data set is given, the relationship of which is to be modeled by the linear function : ${\ displaystyle Y = c_ {0} + c_ {1} X}$

${\ displaystyle X_ {i}}$	0.5	1.0	1.5	2.0	2.5	3.0	3.5	4.0	4.5	5.0	5.5	6.0
${\ displaystyle Y_ {i}}$	−0.043	0.435	0.149	0.252	0.571	0.555	0.678	3.119	2.715	3,671	3,928	3,962

The data plot suggests a structural break at .

{\ displaystyle x = 4}

A data plot suggests that there is a structural break , therefore the data set is divided into 2 intervals and and over these, in addition to regression over the entire data set, separate regressions are carried out. Then you test whether the two partial regressions generate the same linear function, i.e. against ${\ displaystyle x = 4}$ ${\ displaystyle [0 {,} 5; 3 {,} 5]}$ ${\ displaystyle [4 {,} 0; 6 {,} 0]}$ ${\ displaystyle H_ {0} \ colon (a_ {0}, a_ {1}) = (b_ {0}, b_ {1})}$ ${\ displaystyle H_ {1} \ colon (a_ {0}, a_ {1}) \ neq (b_ {0}, b_ {1})}$

Regression on the entire data set:

${\ displaystyle {\ overline {x}} = {\ frac {1} {12}} \ sum _ {i = 1} ^ {12} X_ {i} = 3 {,} 2500}$	${\ displaystyle {\ overline {y}} = {\ frac {1} {12}} \ sum _ {i = 1} ^ {12} Y_ {i} = 1 {,} 6660}$
${\ displaystyle S_ {xx} = \ sum _ {i = 1} ^ {12} (X_ {i} - {\ overline {x}}) ^ {2} = 35 {,} 7500}$	${\ displaystyle S_ {yy} = \ sum _ {i = 1} ^ {12} (Y_ {i} - {\ overline {y}}) ^ {2} = 29 {,} 7661}$
${\ displaystyle S_ {xy} = \ sum _ {i = 1} ^ {12} (X_ {i} - {\ overline {x}}) (Y_ {i} - {\ overline {y}}) = 30 {,} 0570}$	${\ displaystyle S = S_ {yy} - {\ frac {S_ {xy} ^ {2}} {S_ {xx}}} = 4 {,} 4955}$

Regression on ${\ displaystyle [0 {,} 5,3 {,} 5]}$

${\ displaystyle {\ overline {x}} = {\ frac {1} {7}} \ sum _ {i = 1} ^ {7} X_ {i} = 2 {,} 0000}$	${\ displaystyle {\ overline {y}} = {\ frac {1} {7}} \ sum _ {i = 1} ^ {7} Y_ {i} = 0 {,} 3710}$
${\ displaystyle S_ {xx} = \ sum _ {i = 1} ^ {7} (X_ {i} - {\ overline {x}}) ^ {2} = 7 {,} 0000}$	${\ displaystyle S_ {yy} = \ sum _ {i = 1} ^ {7} (Y_ {i} - {\ overline {y}}) ^ {2} = 0 {,} 4070}$
${\ displaystyle S_ {xy} = \ sum _ {i = 1} ^ {7} (X_ {i} - {\ overline {x}}) (Y_ {i} - {\ overline {y}}) = 1 {,} 4125}$	${\ displaystyle S_ {a} = S_ {yy} - {\ frac {S_ {xy} ^ {2}} {S_ {xx}}} = 0 {,} 1220}$

Data plot with regression lines

Regression on ${\ displaystyle [4 {,} 0.6 {,} 0]}$

${\ displaystyle {\ overline {x}} = {\ frac {1} {5}} \ sum _ {i = 1} ^ {5} X_ {i} = 5 {,} 0000}$	${\ displaystyle {\ overline {y}} = {\ frac {1} {5}} \ sum _ {i = 1} ^ {5} Y_ {i} = 3 {,} 4790}$
${\ displaystyle S_ {xx} = \ sum _ {i = 1} ^ {5} (X_ {i} - {\ overline {x}}) ^ {2} = 2 {,} 5000}$	${\ displaystyle S_ {yy} = \ sum _ {i = 1} ^ {5} (Y_ {i} - {\ overline {y}}) ^ {2} = 1 {,} 1851}$
${\ displaystyle S_ {xy} = \ sum _ {i = 1} ^ {5} (X_ {i} - {\ overline {x}}) (Y_ {i} - {\ overline {y}}) = 1 {,} 4495}$	${\ displaystyle S_ {b} = S_ {yy} - {\ frac {S_ {xy} ^ {2}} {S_ {xx}}} = 0 {,} 3446}$

Calculation of the test size:

{\ displaystyle T: = {\ frac {(S- (S_ {a} + S_ {b})) / (k + 1)} {(S_ {a} + S_ {b}) / (N_ {a} + N_ {b} -2 (k + 1))}} = 34 {,} 5345}

Because of (level of significance ) . The null hypothesis can thus be rejected. This means that the two regression lines on the sub-intervals are not identical. There is therefore a structural break and the partial regressions provide better modeling than the regression over the entire data set. ${\ displaystyle F_ {2; 8; 0.95} = 4 {,} 459 \,}$ ${\ displaystyle \ alpha = 0 {,} 05 \,}$ ${\ displaystyle T \ geq F_ {2; 8; 0.95}}$ ${\ displaystyle H_ {0} \,}$

literature

Howard E. Doran: Applied Regression Analysis in Econometrics . CRC Press 1989, ISBN 0-8247-8049-3 , p. 146 ( excerpt from Google book search)
Christopher Dougherty: Introduction to Econometrics . Oxford University Press 2007, ISBN 0-19-928096-7 , p. 194 ( excerpt from Google book search)
Gregory C. Chow: Tests of Equality Between Sets of Coefficients in Two Linear Regressions . Econometrica. 28 (3), 1960, pp. 591-605 ( JSTOR 1910133 )

Web links

Commons : Chow test - collection of pictures, videos and audio files