Chi-square test

from Wikipedia, the free encyclopedia

In mathematical statistics, a chi-square test ( test ) denotes a group of hypothesis tests with a chi-square-distributed test test variable .

A distinction is mainly made between the following tests:

  • Distribution test (also fit test given): Here, it is checked whether data present in a particular manner distributed are.
  • Independence test: Here it is checked whether two features are stochastically independent .
  • Homogeneity test: Here it is checked whether two or more samples come from the same distribution or a homogeneous population.

The chi-square test and its test statistic were first described by Karl Pearson in 1900 .

Distribution test

We are looking at a statistical characteristic whose probabilities are unknown in the population. With regard to the probabilities of, it becomes a provisionally generally formulated null hypothesis

: The characteristic has the probability distribution

set up.

method

There are independent observations of the trait that fall into different categories. If a characteristic has a large number of characteristics, it is expedient to group them together in classes and see the classes as categories. The number of observations in the -th category is the observed frequency .

One now thinks about how many observations on average would have to lie in a category if the hypothetical distribution actually had. To do this, one first calculates the probability that a characteristic of falls into the category . The absolute frequency to be expected below is:

If the frequencies observed in the present sample deviate “too much” from the expected frequencies, the null hypothesis is rejected. The test variable for the test

measures the size of the deviation.

If the test variable is sufficiently large, it is approximately chi-square distributed with degrees of freedom . If the null hypothesis is true, the difference between the observed frequency and the theoretically expected frequency should be small. So if the test variable value is high, it is rejected. The denial area for is on the right.

At a level of significance is rejected if true when the so obtained from the sample value of the test statistic is larger than the - quantile of the distribution with 's degrees of freedom.

There are tables of the quantiles ( critical values ) depending on the number of degrees of freedom and the desired level of significance ( see below ).

If the significance level that belongs to a certain value is to be determined, an intermediate value must usually be calculated from the table. Logarithmic interpolation is used for this .

particularities

Estimation of distribution parameters

In general, one specifies the parameters of the distribution in the distribution hypothesis. If these cannot be specified, they must be estimated from the sample. With the chi-square distribution, one degree of freedom is lost for each estimated parameter. It therefore has degrees of freedom with the number of estimated parameters. For the normal distribution it would be if the expected value and the variance are estimated.

Minimum size of the expected frequencies

So that the test variable can be viewed as approximately chi-square distributed, each expected frequency must have a certain minimum size. Various textbooks put this at 1 or 5. If the expected frequency is too small, several classes can possibly be combined in order to achieve the minimum size.

Example of the distribution test

The sales of approx. 200 listed companies are available. The following histogram shows their distribution.

SalesHisto0.PNG

Let it be the turnover of a company [million €].

We now want to test the hypothesis that is normally distributed .

Since the data are available in many different forms, they have been divided into classes. The table resulted:

class interval Observed frequency
j over to n y
1 ... 0 0
2 0 5000 148
3 5000 10,000 17th
4th 10,000 15000 5
5 15000 20000 8th
6th 20000 25,000 4th
7th 25,000 30000 3
8th 30000 35000 3
9 35000 ... 9
total     197

Since no parameters are specified, they are determined from the sample. It's appreciated

and

It is tested:

: is normally distributed with the expected value and the standard deviation .

In order to determine the frequencies below expected, the probabilities that fall into the given classes are first calculated . One then calculates

It contains a random variable with a standard normal distribution and its distribution function. Analogously one calculates:

...

This gives the expected frequencies

...

For example, around 25 companies would have to have an average turnover between 0 € and 5000 € if the characteristic turnover is actually normally distributed.

The expected frequencies, along with the observed frequencies, are listed in the following table.

class interval Observed frequency probability Expected frequency
j over to n y p 0j n 0y
1 ... 0 0 0.3228 63.59
2 0 5000 148 0.1270 25.02
3 5000 10,000 17th 0.1324 26.08
4th 10,000 15000 5 0.1236 24.35
5 15000 20000 8th 0.1034 20.36
6th 20000 25,000 4th 0.0774 15.25
7th 25,000 30000 3 0.0519 10.23
8th 30000 35000 3 0.0312 6.14
9 35000 ... 9 0.0303 5.98
total     197 1.0000 197.00

The test variable is now determined as follows:

At a significance level , the critical value lies Testprüfgröße at . There , the null hypothesis is rejected. It can be assumed that the sales characteristic is not normally distributed in the population.

complement

The above data were then logarithmized . Based on the result of the test of the data set of the logarithmized data for normal distribution, the null hypothesis of normal distribution of the data could not be rejected at a significance level of 0.05. Provided that the logarithmized sales data actually come from a normal distribution, the original sales data are logarithmically normalized .

The following histogram shows the distribution of the logarithmized data.

LgUmsatzHisto0.PNG

Chi-square distribution test in jurisdiction

In Germany, the chi-square distribution test was judicially confirmed as part of the application of Benford's law as a means of a tax authority to object to the correctness of the cash management. Specifically, the frequency distribution of digits in cash book entries was examined using the chi-square test, which resulted in a "strong indication of manipulation of the receipt records". However, the applicability is subject to restrictions and other statistical methods may have to be used (see Benford's law ).

Independence test

The independence test is a significance test for stochastic independence in the contingency table .

We consider two statistical features and , which can be scaled as required . One is interested in whether the features are stochastically independent. It becomes the null hypothesis

: The features and are stochastically independent.

set up.

method

The observations of are in categories , those of the characteristic in categories . If a characteristic has a large number of characteristics, it is expedient to group them together into classes and to understand the class affiliation as the -th category. There are a total of pairwise observations of and , which are divided into categories.

Conceptually, the test is to be understood as follows:

Consider two discrete random variables and , the common probabilities of which can be represented in a probability table.

You now count how often the -th expression of occurs together with the -th expression of . The observed common absolute frequencies can be entered in a two-dimensional frequency table with rows and columns.

feature Sum Σ
feature 1 2 ... k ... r n y .
1 n 11 n 12 ... n 1 k ... n 1 r n 1.
2 n 21 n 22 ... n 2 k ... n 2 r n 2.
... ... ... ... ... ... ... ...
j n j1 ... ... n jk ... ... n y.
... ... ... ... ... ... ... ...
m n m 1 n m 2 ... n mk ... n mr n m .
Sum Σ n .1 n .2 ... n . k ... n . r n

The row and column sums give the absolute marginal frequencies or as

and

Correspondingly, the common relative frequencies and the relative marginal frequencies and .

Probability theory applies: If two events and statistically independent, the probability of their common occurrence is equal to the product of the individual probabilities:

One now thinks that analogously to above, with stochastic independence from and , it would also have to apply

with multiplied accordingly

or

If these differences are small for all of them , one can assume that and are actually stochastically independent.

If one sets for the expected frequency in the presence of independence

the test variable for the independence test results from the above consideration

If the expected frequencies are sufficiently large, the test variable is approximately chi-square distributed with degrees of freedom.

If the test variable is small, the hypothesis is presumed to be true. So is rejected at a high Prüfgrößenwert, the critical region for is right.

At a level of significance , it is rejected if the quantile is the chi-square distribution with degrees of freedom.

particularities

So that the test variable can be viewed as approximately chi-square distributed, every expected frequency must have a certain minimum size. Various textbooks put this at 1 or 5. If the expected frequency is too small, several classes can possibly be combined in order to achieve the minimum size.

Alternatively, the sample distribution of the test statistics can be examined using the bootstrapping method based on the given marginal distributions and the assumption that the characteristics are independent .

Example of the independence test

As part of the quality management, the customers of a bank were asked, among other things, about their satisfaction with the business transaction and their overall satisfaction. The degree of satisfaction was based on the school grading system.

The data gives the following cross table of overall satisfaction with bank customers versus their satisfaction with doing business. You can see that some of the expected frequencies were too small.

Chigross.png

A reduction of the categories to three by combining the grades 3–6 to a new overall grade of 3 yielded methodologically correct results.

Chiklein.png

The following table contains the expected frequencies , which are calculated as follows:

feature
feature 1 2 3 Σ
1 44.35 44.84 12.81 102
2 156.09 157.82 45.09 359
3 69.57 70.34 20.10 160
Σ 270 273 78 621

The test variable is then determined as follows:

For one , the critical value is attached to the test size . Since is, the hypothesis is significantly rejected, so it is assumed that satisfaction with the business transaction and overall satisfaction are associated.

Homogeneity test

With the chi-square homogeneity test , the associated sample distributions can be used to check whether (independent) random samples of discrete features with the sample sizes come from identically distributed (i.e. homogeneous) populations . It is thus helpful in deciding whether several samples come from the same population or distribution, or in deciding whether a characteristic is distributed in the same way in different populations (e.g. men and women). The test, like the other chi-square tests, can be used at any scale level .

The hypotheses are:

The independent characteristics are distributed identically.
At least two of the characteristics are distributed differently.

If the distribution function of is indicated with, the hypotheses can also be formulated as follows:

  for at least one

method

The examined random variable (the characteristic), e.g. B. Answer to “the Sunday question ”, be graded, d. That is, there are feature categories (the feature has characteristics), e.g. B. SPD, CDU, B90 / Greens, FDP, Die Linke and Others (i.e., ). The samples can e.g. B. the survey results of various opinion research institutes. It could then be of interest to check whether the survey results differ significantly.

The observed frequencies per sample (survey) and feature category (called party) are in a corresponding - Crosstab entered (here 3 × 3):

Feature category
sample Category 1 Category 2 Category 3 total
total

The deviations between the observed (empirical) frequency or probability distributions of the samples across the categories of the characteristic are now examined. The observed cell frequencies are compared with the frequencies that would be expected if the null hypothesis were valid.

The cell frequencies expected under the validity of the null hypothesis of a homogeneous population are determined from the marginal distributions:

denotes the expected number of observations (absolute frequency) from sample in category .

On the basis of the values ​​calculated in this way, the following approximately chi-square distributed test value is calculated:

In order to arrive at a test decision, the value obtained for the test variable is compared with the associated critical value; H. with the quantile of the chi-square distribution that depends on the number of degrees of freedom and the level of significance (alternatively, the p-value can be determined). If the deviations between at least two sample distributions are significant, the null hypothesis is rejected; H. the null hypothesis of homogeneity is rejected if

.

The rejection range for is to the right of the critical value.

Conditions of use

So that the test variable can be viewed as approximately ( approximately ) chi-square distributed, the following approximation conditions must apply:

  • "Large" sample size ( )
  • for all
  • min. 80% of the
  • Rinne (2003) and Voss (2000) also call for cell frequencies  

If some of the expected frequencies are too small, several classes or feature categories must be combined in order to comply with the approximation conditions.

Does the random variable being examined have a large number of (possible) characteristics, e.g. B. because the variable is metrically continuous, it is expedient to summarize it in classes (= categories) so that the now classified random variable can be examined with the chi-square test. It should be noted, however, that the way the observations are classified can influence the test result.

Comparison to independence and distribution test

The homogeneity test can also be interpreted as an independence test if the samples are viewed as expressions of a second characteristic. It can also be viewed as a form of distribution test that compares not one empirical and one theoretical distribution, but rather several empirical distributions. However, the independence test and distribution test are one-sample problems, while the homogeneity test is a multiple-sample problem. In the independence test, a single sample is taken with regard to two characteristics, and in the case of the distribution test, a sample is taken with regard to one characteristic. In the homogeneity test, several random samples are taken with regard to a characteristic.

Four-field test

The chi-square four-field test is a statistical test . It is used to check whether two dichotomous features are stochastically independent of one another or whether the distribution of a dichotomous feature in two groups is identical.

method

The four-field test is based on a (2 × 2) contingency table that visualizes the (bivariate) frequency distribution of two characteristics :

Feature X
Feature Y Expression 1 Expression 2 Line total
Expression 1 a b a + b
Expression 2 c d c + d
Column total a + c b + d n = a + b + c + d

According to a rule of thumb , the expected value of all four fields must be at least 5. The expected value is calculated from the row total * column total / total number. If the expected value is less than 5, statisticians recommend the Fisher's exact test .

Test statistics

In order to test the null hypothesis that both features are stochastically independent, the following test variable is first calculated for a two-sided test:

.

The test variable is approximately chi-square distributed with one degree of freedom. It should only be used if each of the two samples contains at least six carriers of characteristics (observations).

Test decision

If the test value obtained on the basis of the sample is smaller than the critical value belonging to the selected significance level (i.e. the corresponding quantile of the chi-square distribution), then the test could not prove that there was a significant difference. If, on the other hand, a test value is calculated that is greater than or equal to the critical value, there is a significant difference between the samples.

The probability that the calculated (or an even larger) test value was only obtained by chance due to the sampling (p-value) can be calculated approximately as follows:

The approximation of this (rule of thumb) formula to the actual p-value is good if the test variable is between 2.0 and 8.0.

Examples and Applications

When asked whether a medical measure is effective or not, the four-field test is very helpful because it focuses on the main decision criterion.

example 1

50 (randomly selected) women and men are asked whether they smoke or not.

The result is:

  • Women: 25 smokers, 25 non-smokers
  • Men: 30 smokers, 20 non-smokers

If a four-field test is carried out on the basis of this survey, the formula shown above results in a test value of approx. 1. Since this value is smaller than the critical value 3.841, the null hypothesis that smoking behavior is independent of gender can not be rejected become. The proportion of smokers and non-smokers does not differ significantly between the sexes.

Example 2

500 (randomly selected) women and men are asked whether they smoke or not.

The following data is received:

  • Women: 250 non-smokers, 250 smokers
  • Men: 300 non-smokers, 200 smokers

The four-field test results in a test value of greater than 3.841. Since the null hypothesis that the characteristics “smoking behavior” and “gender” are stochastically independent of one another can be rejected at a significance level of 0.05.

Table of the quantiles of the chi-square distribution

The table shows the most important quantiles of the chi-square distribution . The degrees of freedom are entered in the left column and the levels in the top line . Reading example: The quantile of the chi-square distribution with 2 degrees of freedom and a level of 1% is 9.21.

1 − α
0.900 0.950 0.975 0.990 0.995 0.999
1 2.71 3.84 5.02 6.63 7.88 10.83
2 4.61 5.99 7.38 9.21 10.60 13.82
3 6.25 7.81 9.35 11.34 12.84 16.27
4th 7.78 9.49 11.14 13.28 14.86 18.47
5 9.24 11.07 12.83 15.09 16.75 20.52
6th 10.64 12.59 14.45 16.81 18.55 22.46
7th 12.02 14.07 16.01 18.48 20.28 24.32
8th 13.36 15.51 17.53 20.09 21.95 26.12
9 14.68 16.92 19.02 21.67 23.59 27.88
10 15.99 18.31 20.48 23.21 25.19 29.59
11 17.28 19.68 21.92 24.72 26.76 31.26
12 18.55 21.03 23.34 26.22 28.30 32.91
13 19.81 22.36 24.74 27.69 29.82 34.53
14th 21.06 23.68 26.12 29.14 31.32 36.12
15th 22.31 25.00 27.49 30.58 32.80 37.70
16 23.54 26.30 28.85 32.00 34.27 39.25
17th 24.77 27.59 30.19 33.41 35.72 40.79
18th 25.99 28.87 31.53 34.81 37.16 42.31
19th 27.20 30.14 32.85 36.19 38.58 43.82
20th 28.41 31.41 34.17 37.57 40.00 45.31
21st 29.62 32.67 35.48 38.93 41.40 46.80
22nd 30.81 33.92 36.78 40.29 42.80 48.27
23 32.01 35.17 38.08 41.64 44.18 49.73
24 33.20 36.42 39.36 42.98 45.56 51.18
25th 34.38 37.65 40.65 44.31 46.93 52.62
26th 35.56 38.89 41.92 45.64 48.29 54.05
27 36.74 40.11 43.19 46.96 49.64 55.48
28 37.92 41.34 44.46 48.28 50.99 56.89
29 39.09 42.56 45.72 49.59 52.34 58.30
30th 40.26 43.77 46.98 50.89 53.67 59.70
40 51.81 55.76 59.34 63.69 66.77 73.40
50 63.17 67.50 71.42 76.15 79.49 86.66
60 74.40 79.08 83.30 88.38 91.95 99.61
70 85.53 90.53 95.02 100.43 104.21 112.32
80 96.58 101.88 106.63 112.33 116.32 124.84
90 107.57 113.15 118.14 124.12 128.30 137.21
100 118.50 124.34 129.56 135.81 140.17 149.45
200 226.02 233.99 241.06 249.45 255.26 267.54
300 331.79 341.40 349.87 359.91 366.84 381.43
400 436.65 447.63 457.31 468.72 476.61 493.13
500 540.93 553.13 563.85 576.49 585.21 603.45

Alternatives to the chi-square test

The chi-square test is still widely used, although better alternatives are available today. The test statistics are particularly problematic with small values ​​per cell (rule of thumb:) , while the chi-square test is still reliable with large samples.

The original advantage of the chi-square test was that the test statistics can also be calculated by hand, especially for smaller tables, because the most difficult calculation step is squaring, while the more precise G-test, the most difficult calculation step, requires logarithmization. The test statistic is approximately distributed chi-square and is also robust when the contingency table contains rare events.

In computational linguistics , the G-Test has been able to establish itself, since the frequency analysis of rarely occurring words and text modules is a typical problem there.

Since today's computers offer enough computing power, both tests can be replaced by Fisher's exact test .

See also

Web links

Four field test:

Wikibooks: four-field test with R  - learning and teaching materials

Individual evidence

  1. ^ Karl Pearson: On the criterion that a given system of derivations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling . In: The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science . tape 50 , no. 5 , 1900, pp. 157-175 , doi : 10.1080 / 14786440009463897 .
  2. Decision of the FG Münster from November 10, 2003 (Ref .: 6 V 4562/03 E, U) (PDF)
  3. Horst Rinne: Pocket book of statistics . 3. Edition. Verlag Harri Deutsch, 2003, p. 562-563 .
  4. Bernd Rönz, Hans G. Strohe: Lexikon Statistics , Gabler Verlag, 1994, p. 69.
  5. see Political Parties in Germany
  6. a b Horst Rinne: Pocket book of statistics . 3. Edition. Verlag Harri Deutsch, 2003, p. 562 .
  7. a b c Werner Voss: Pocket book of statistics . 1st edition. Fachbuchverlag Leipzig, 2000, p. 447 .
  8. ^ Jürgen Bortz, Nicola Döring: Research methods and evaluation for human and social scientists . 4th edition. Springer, 2006, p. 103 .
  9. ^ Hans-Hermann Dubben, Hans-Peter Beck-Bornholdt: The dog who lays eggs . 4th edition. Rowohlt Science, 2009, p. 293 .