G test

from Wikipedia, the free encyclopedia

In statistics, the G -test is used to check whether the frequencies in a contingency table came about by chance or not. The G test replaces the older chi-square test in many areas, especially in computational linguistics .

As with the chi-square test, you divide the characteristics of the characteristic into categories and count how often the characteristic falls into each of these categories.

The formula for calculating the test statistic G is as follows:

is the observed frequency with which the trait falls into the -th category, is the expected frequency of the same cell, assuming the null hypothesis , and is the natural logarithm . The sum symbol adds the results for all categories. The test statistic is approximately chi-square distributed with degrees of freedom .

Comparison with the chi-square test

Both tests solve the same statistical problem, but the chi-square test has a squaring as the most complex calculation step, while the G test calculates the logarithm. The chi-square test owes its popularity to the simple calculation that can easily be carried out by hand with small contingency tables. In addition, the chi-square test has always been covered in basic statistics textbooks.

The rule of thumb for chi-square tests is that the frequency value per cell must be at least 5. The G test is more robust with small samples.

literature