G test
In statistics, the G -test is used to check whether the frequencies in a contingency table came about by chance or not. The G test replaces the older chi-square test in many areas, especially in computational linguistics .
As with the chi-square test, you divide the characteristics of the characteristic into categories and count how often the characteristic falls into each of these categories.
The formula for calculating the test statistic G is as follows:
is the observed frequency with which the trait falls into the -th category, is the expected frequency of the same cell, assuming the null hypothesis , and is the natural logarithm . The sum symbol adds the results for all categories. The test statistic is approximately chi-square distributed with degrees of freedom .
Comparison with the chi-square test
Both tests solve the same statistical problem, but the chi-square test has a squaring as the most complex calculation step, while the G test calculates the logarithm. The chi-square test owes its popularity to the simple calculation that can easily be carried out by hand with small contingency tables. In addition, the chi-square test has always been covered in basic statistics textbooks.
The rule of thumb for chi-square tests is that the frequency value per cell must be at least 5. The G test is more robust with small samples.