Shapiro-Wilk test

The Shapiro-Wilk test is a statistical significance test that tests the hypothesis that the underlying population of a sample is normally distributed . The test was developed by Samuel Shapiro and Martin Wilk and first introduced in 1965.

The null hypothesis assumes that the population distribution is normal. In contrast, the alternative hypothesis assumes that there is no normal distribution. If the value of the test statistic is greater than the critical value , the null hypothesis is not rejected and it is assumed that the distribution is normal. ${\ displaystyle H_ {0}}$ ${\ displaystyle H_ {1}}$ ${\ displaystyle {W}}$ ${\ displaystyle {W} _ {\ text {critical}}}$

If, as an alternative, the value of the test is determined, the null hypothesis is generally not rejected if the value is greater than the specified significance level . ${\ displaystyle p}$ ${\ displaystyle p}$ ${\ displaystyle \ alpha}$

The test method was published in 1965 by the American Samuel Shapiro and the Canadian Martin Wilk and is the result of their original idea of summarizing the graphic information from the analysis of normal distribution using a normal probability plot in a key figure.

The test can be used to check univariate samples with 3 to 5000 observations. A further development of the test, the so-called Royston's H-Test , enables the checking of multi-dimensional samples for multi-dimensional normal distribution .

In addition to other well-known tests for normal distribution, such as the Kolmogorow-Smirnow test or the chi-square test , the Shapiro-Wilk test is characterized by its comparatively high test strength in numerous test situations, especially when checking smaller samples . ${\ displaystyle n <50}$

The Shapiro-Wilk test or modifications of the test such as the Ryan-Joiner test are represented in common commercial and non-commercial statistical software packages .

properties

Pre-test for further test projects

Some inferential analysis methods (such as analysis of variance , t-test or linear regression ) assume that the prediction errors ( residuals ) come from a normally distributed population, at least with small sample sizes . Thus, the Shapiro-Wilk test for normal distribution can also be viewed as a preliminary test for further test projects. ${\ displaystyle n <30}$

No general adaptation test

While some normality tests such as the Kolmogorov-Smirnov test or the chi-square test general goodness of fit tests (goodness-of-fit tests) which are to be tested a sample of different hypothetical distributions out able (including the normal distribution) the Shapiro-Wilk test is designed solely to examine the normal distribution. In contrast to general adaptation tests, which usually require at least 50 to 100 observations to obtain meaningful test results, the Shapiro-Wilk test often requires fewer observations.

Property as an omnibus test

The Shapiro-Wilk test is an omnibus test , ie it is only able to determine whether there is a significant deviation from the normal distribution or not. He is not able to describe in which form the deviation occurs. He can z. B. make no statement about whether the distribution is skewed to the left or right , or whether it is an end-heavy distribution or, if applicable, both.

Sample size up to 5000 observations

Originally the test was only able to examine samples of the size . In 1972 it became possible to use the test for samples of the size through an extension by Shapiro and Francia . After that there were further adjustments that further increased the possible scope of application. Royston introduced another improvement in 1982, making sample size sampling possible. In 1997 Rahman and Govidarajulu expanded the scope of the test to include samples of the size . ${\ displaystyle 3 \ leq n <50}$ ${\ displaystyle n <100}$ ${\ displaystyle n <2000}$ ${\ displaystyle n \ leq 5000}$

High test strength

In general, the test strength for all normality tests is lower for small sample sizes than for larger ones, since the standard error is relatively large here . Only when the sample size increases does the standard error decrease and the test strength increase. The Shapiro-Wilk test has a relatively high test strength compared to other tests , even with a small sample size . For example, the Shapiro-Wilk test has a test power of 54% with a sample size of 20 observations when the actual distribution is a chi-square distribution, compared to the 1970 D'Agostino test which has a test power of 29% . ${\ displaystyle n <50}$

functionality

The test statistic is a quotient that expresses the relationship between two variance estimators. ${\ displaystyle W}$

{\ displaystyle W = {\ frac {b ^ {2}} {(n-1) s ^ {2}}}}

The test statistic uses a first estimator in the numerator to calculate what the variance of a sample would have to look like if it came from a normally distributed population, and compares this "expected" variance with a second estimator in the denominator for the actual variance of the sample. If the population of the sample is indeed normally distributed, then both estimators for the variance should come to roughly the same result independently of one another. The smaller the estimated variances differ, the more likely it is that the population of the sample is actually normally distributed.

The Shapiro-Wilk test is based on an analysis of variance (ANOVA) of the sample, which is also made clear by the original title of the publication An Analysis of Variance Test for Normality (for complete samples) .

The estimator for the sample variance in the denominator is the usual corrected sample variance . ${\ displaystyle s ^ {2}}$

{\ displaystyle s ^ {2} = {\ frac {1} {n-1}} \ sum \ limits _ {i = 1} ^ {n} \ left (x_ {i} - {\ overline {x}} \ right) ^ {2}}

The expected variance for a sample from a normally distributed population in the numerator (i.e. assumed is true) is estimated using the least squares method by the slope of the regression line in the QQ diagram , which compares the ordered observations of a sample with corresponding order statistics from a normal distribution . ${\ displaystyle H_ {0}}$

The ordinary linear model is understood as ${\ displaystyle y_ {i} = \ alpha + \ beta x_ {i} + \ varepsilon _ {i}}$

{\ displaystyle x_ {i} = \ mu + \ sigma m_ {i} + \ varepsilon _ {i}}

in which

${\ displaystyle \ sigma}$ describes the slope of the regression line and is therefore the estimator in the numerator of the test statistics ${\ displaystyle b}$
${\ displaystyle \ mu}$ is the intersection with the axis and the estimator for the mean ${\ displaystyle y}$
${\ displaystyle m_ {i}}$ are the expected order statistics from a normal distribution
${\ displaystyle x_ {i}}$ the order statistics are from a sample
${\ displaystyle \ varepsilon _ {i}}$ is the disturbance that represents undetectable influences

With this approach, the test differs from various other methods, such as the Jarque-Bera test , which tests how close the match between the sample distribution and specific properties of the appearance of the normal distribution is, which is characterized by its moments such as skewness and curvature .

requirements

The observations of the sample must be independent of one another. ${\ displaystyle x _ {(1)}, x _ {(2)}, \ dots, x _ {(n)}}$
The sample cannot be smaller than or larger than . ${\ displaystyle n = 3}$ ${\ displaystyle n = 5000}$
The same values should not appear more than once in the sample. If this is the case, then it is very unlikely that it is a continuous distribution at all. On the other hand, values from practice can be rounded. That would speak against a normal distribution, but nevertheless one could often treat the data as if they were normally distributed. Many other tests are less sensitive to this.
The random variable must have a metric scale level .

Calculation of the test statistics

The test tests the hypothesis that a sample was drawn from a normally distributed population by comparing the test statistic to a critical value for the rejection range (from the distribution of the test statistic). ${\ displaystyle W}$

Establishing the hypotheses and determining the level of significance

The null hypothesis is established , which states that the population distribution is normal, and the alternative hypothesis, which states that the distribution is not normal. At the same time, a level of significance is chosen, usually . ${\ displaystyle H_ {0}}$ ${\ displaystyle H_ {1}}$ ${\ displaystyle \ alpha = 5 \, \%}$

${\ displaystyle H_ {0}: F = F_ {0} \ quad {\ text {and}} \ quad H_ {1}: F \ neq F_ {0}}$
${\ displaystyle \ alpha}$

Creation of the order statistics

All observations of the sample are sorted according to increasing size and each value is assigned a rank. ${\ displaystyle x _ {(1)}, x _ {(2)}, \ dots, x _ {(n)}}$ ${\ displaystyle x _ {(1)} \ leq x _ {(2)} \ leq \ cdots \ leq x _ {(n)}}$

This is how you get the order statistics of the sample with the values . Where is defined as the -th ordered statistic. ${\ displaystyle X _ {(1)}, X _ {(2)}, \ ldots, X _ {(n)}}$ ${\ displaystyle x _ {(1)}, x _ {(2)}, \ ldots, x _ {(n)}}$ ${\ displaystyle X _ {(i)}}$ ${\ displaystyle i}$

Calculation of the estimators b ² and s ²

{\ displaystyle W = {\ frac {b ^ {2}} {(n-1) s ^ {2}}}}

with as the sum of pairs of numbers in the order statistics, each multiplied by a corresponding coefficient (also referred to as weight ). If the number of observations in the sample is even, is , if the number is odd . Thus: ${\ displaystyle b}$ ${\ displaystyle k}$ ${\ displaystyle \ left (x _ {(n + 1-i)} - x _ {(i)} \ right)}$ ${\ displaystyle a _ {(i)}}$ ${\ displaystyle k = n / 2}$ ${\ displaystyle k = (n-1) / 2}$

{\ displaystyle b = a _ {(1)} \ left (x _ {(n)} - x _ {(1)} \ right) + a _ {(2)} \ left (x _ {\ left (n-1 \ right )} - x _ {(2)} \ right) + \ cdots}

where the coefficients are given by the components of the vector ${\ displaystyle a _ {(i)}}$

{\ displaystyle a = {[(m ^ {\ top} V ^ {- 1} V ^ {- 1} m)} ^ {- \ {{1} \ over {2}}}] \ m ^ {\ top} V ^ {- 1}}

with representing the expected order statistics of a normal distribution ${\ displaystyle m _ {(i)}}$

{\ displaystyle m = {(m _ {(1)}, \ dots, \ m _ {(n)})} ^ {\ top}}

where is roughly equal to

{\ displaystyle m _ {(i)}}

{\ displaystyle {\ Phi} ^ {- 1} \ left ({{i - {{3} \ over {8}}} \ over {n + {{1} \ over {4}}}} \ right)}

{\ displaystyle {\ Phi} \ left (x _ {(i)} \ right) = \ {{1} \ over {\ sigma {\ sqrt {2 \ pi}}}} e ^ {- \ {{{{\ left (x _ {(i)} - \ mu \ right)} ^ {2}} \ over {2 {\ sigma} ^ {2}}}}}

the formula can be safely derived from the inverse normal distribution with the parameters (event rate) and (mean) ${\ displaystyle x = \ left ({{i - {{3} \ over {8}}} \ over {n + {{1} \ over {4}}}} \ right)}$ ${\ displaystyle \ lambda> 0}$ ${\ displaystyle \ mu> 0}$

{\ displaystyle f (x; \ mu, \ lambda) = \ left ({\ frac {\ lambda} {2 \ pi x ^ {3}}} \ right) ^ {1/2} \ exp {\ frac { - \ lambda (x- \ mu) ^ {2}} {2 \ mu ^ {2} x}}}

and the covariance matrix V consisting of the expected order statistics

{\ displaystyle V = {\ begin {pmatrix} \ operatorname {Cov} (m_ {1}, m_ {1}) & \ cdots & \ operatorname {Cov} (m_ {1}, m_ {n}) \\\ vdots & \ ddots & \ vdots \\\ operatorname {Cov} (m_ {n}, m_ {1}) & \ cdots & \ operatorname {Cov} (m_ {n}, m_ {n}) \ end {pmatrix} }}

The coefficients are also often found for the first 50 pairs of numbers in tables in many statistics books. ${\ displaystyle a _ {(1)}, \ dots, a _ {(n)}}$

The variance as well as the mean of the sample are calculated by ${\ displaystyle s ^ {2}}$ ${\ displaystyle {\ overline {x}}}$

{\ displaystyle s ^ {2} = {\ frac {\ sum _ {i = 1} ^ {n} (x_ {i} - {\ overline {x}}) ^ {2}} {n-1}} \ quad {\ text {with}} \ quad {\ overline {x}} = {\ frac {\ sum _ {i = 1} ^ {n} x_ {i}} {n}}}

Comparison of the test statistic with a critical value

The value of the test statistic is compared to a critical value for a given sample size and the previously determined significance level . For the critical values with there are tables that are printed in many statistics books. Critical values for samples with can be determined using a Monte Carlo simulation. ${\ displaystyle W}$ ${\ displaystyle W _ {\ text {critical}}}$ ${\ displaystyle n}$ ${\ displaystyle \ alpha}$ ${\ displaystyle n <50}$ ${\ displaystyle n> 50}$

Assessment of the results

If the value of the test statistic is greater than the critical value , the null hypothesis is not rejected. That is, it is assumed that there is a normal distribution. The test statistic can be interpreted as a correlation coefficient that can have values between 0 and 1, similar to the coefficient of determination . The closer the test statistic is to 1, the fewer deviations the actual variance shows from the hypothetical variance assuming normal distribution. However, are there statistically significant deviations, i. This means that the test statistic is smaller than the critical value , so the null hypothesis is rejected in favor of the alternative hypothesis and it is assumed that there is no normal distribution. The Shapiro-Wilk test is in contrast to many other normality tests, which reject the null hypothesis if the respective test statistic is greater than the critical value. ${\ displaystyle W}$ ${\ displaystyle {W} _ {\ text {critical}}}$ ${\ displaystyle W}$ ${\ displaystyle W}$ ${\ displaystyle {W} _ {\ text {critical}}}$

Evaluation using p-value

In addition or as an alternative to the test statistics , many computer programs indicate the value. ${\ displaystyle W}$ ${\ displaystyle p}$

The value indicates the probability of receiving such a sample as it was drawn, assuming that the sample actually comes from a normally distributed population. (Null hypothesis is true) ${\ displaystyle p}$

The smaller the value, the lower the probability that such a sampling would occur in a normally distributed population. ${\ displaystyle p}$
A value of 0 indicates that it is 0% likely, and a value of 1 that it is 100% likely to draw such a sample if it came from a normal distribution. ${\ displaystyle p}$ ${\ displaystyle p}$
As a rule, the null hypothesis is rejected if the value is less than the specified significance level. ${\ displaystyle p}$

The method for calculating the value depends on the sample size . The probability distribution of is known for. A transformation into the normal distribution is carried out for samples with . ${\ displaystyle p}$ ${\ displaystyle n}$ ${\ displaystyle n = 3}$ ${\ displaystyle W}$ ${\ displaystyle n> 3}$

The values for the respective sample sizes are calculated using a Monte Carlo simulation. ${\ displaystyle \ sigma, \ lambda, \ mu}$ ${\ displaystyle n> 3}$

Practical example

The following 10 observations ( ) of a sample are checked for normal distribution: ${\ displaystyle n = 10}$

200, 545, 290, 165, 190, 355, 185, 205, 175, 255

The ordered sample is:

165, 175, 185, 190, 200, 205, 255, 290, 355, 545

The number of the sample is even , so pairs of numbers are formed. The corresponding weights are taken from a table. ${\ displaystyle n = 10}$ ${\ displaystyle k = n / 2 = 5}$ ${\ displaystyle a _ {(i)}}$

b = 0.5739 (545-165) + 0.3291 (355-175) + 0.2141 (290-185) + 0.1224 (255-190) + 0.0399 (205-200) = 218.08 + 59.24 + 22.48 + 7.96 + 0.2 = 307.96

For the sample is . Hence is ${\ displaystyle s = 117 {,} 59}$

{\ displaystyle W = {\ frac {{307 {,} 96} ^ {2}} {\ left (10-1 \ right) {117 {,} 59} ^ {2}}} = 0 {,} 76 }

.

The critical value for at a significance level of is taken from a table and is . ${\ displaystyle n = 10}$ ${\ displaystyle \ alpha = 5 \, \%}$ ${\ displaystyle {W} _ {\ text {critical}} = 0 {,} 842}$

Since (0.76 <0.842), falls within the rejection range and the null hypothesis is rejected. Consequently, it is assumed that the sample does not come from a normally distributed population. The density function of the test statistic is very skewed to the left and the rejection range of the test falls into the small end of the distribution. ${\ displaystyle {W \ leq W} _ {\ text {critical}}}$ ${\ displaystyle W}$ ${\ displaystyle W}$

Advantages and disadvantages

advantages

Compared to a more subjective visual check for normal distribution using a histogram or a QQ diagram, the Shapiro-Wilk test, as a statistical significance test, offers the option of looking at more objective standards.
In many test situations, the test offers a high test strength, especially with smaller samples . ${\ displaystyle n <50}$
The mean and variance of the hypothetical normal distribution need not be known beforehand.
Many popular statistics software packages such as SAS, SPSS, Minitab and R have implemented the test.

disadvantage

With the test it can be proven for almost any distribution that it is a normal distribution by choosing very small. Actually, the null hypothesis is the opposite of the statement that you want to show (for example the equality of the group means in the analysis of variance). By choosing, you determine with how much certainty the actual statement should be shown. Here, in the Shapiro-Wilk test, what is actually supposed to be shown is fatally chosen as the null hypothesis, which turns the test logic upside down. The higher the apparent security is screwed, the greater the probability of the type 2 error, that is, of mistakenly assuming that it is a normal distribution. ${\ displaystyle \ alpha}$ ${\ displaystyle \ alpha}$ ${\ displaystyle \ alpha}$

It is a fundamental mistake to conclude that the null hypothesis is not rejected.

The test can only be used for samples between . ${\ displaystyle 3 \ leq n \ leq 5000}$
The test is very sensitive to outliers , both for one-sided and bilateral outliers. Outliers can strongly distort the distribution pattern, so that the normal distribution assumption could be wrongly rejected.
The test is relatively susceptible to ties , i.e. that is, if there are many identical values, the test strength is severely affected. If you originally worked with rounded data, the test strength can be improved with the so-called Sheppard correction. Sheppard's correction produces an adapted one , given by ${\ displaystyle W}$ ${\ displaystyle W _ {\ text {adapted}} = W * \ {{\ sum {{(x _ {\ left (i \ right)} - {\ overline {x}})} ^ {2}}} \ over {\ left \ {\ sum _ {i = 1} ^ {n>} {{(x _ {\ left (i \ right)} - {\ overline {x}})} ^ {2} - {{n- 1} \ over {12}}} \ omega ^ {2} \ right \}}}}$

with as a rounding difference. ${\ displaystyle \ omega}$

The way the test works is very mathematical and therefore not easy to understand.
The test requires the use of special coefficients, the weights, which are only available in the form of a table for smaller sample sizes . ${\ displaystyle n <50}$
When calculating the test statistics and the critical values without a computer program, the computational effort is very high for larger sample sizes.

Alternative procedures

Other significance tests

In addition to the Shapiro-Wilk test, there are at least 40 other normality tests or modifications of individual tests.

Normality tests, which to a certain extent serve as yardsticks, compare in one way or another characteristic features of the model standard normal distribution with the distribution of the sample. The tests differ in terms of which standards they use as a comparison criterion.

While the Shapiro-Wilk test uses the technique of regression and correlation and analyzes the correlation with regard to variance, other test methods are based on the investigation of the distribution function (e.g. Kolmogorow-Smirnow test, Anderson-Darling test , Cramér-von- Mises test ).

Further tests focus on the comparison of skewness and kurtosis properties (e.g. D'Agostino-Pearson test , Jarque-Bera test , Anscombe-Glynn test ).

The strength of any normality test varies based on sample size, actual distribution, and other factors such as outliers and ties. There is no single test that has the highest strength for all situations.

Graphic methods

Histograms and normal probability plots such as the QQ diagram or the PP diagram are often used as tools for visually checking the distribution for normal distribution and can either confirm or contest the statement of a significance test.

Individual evidence

↑ Rahman and Govidarajulu: A modification of the test of Shapiro and Wilk for normality . In: Journal of the Royal Statistical Society. Series C (Applied Statistics) . 31, No. 2, 1982, pp. 115-124. JSTOR 2347973 . doi : 10.2307 / 2347973 .
↑ Rahman and Govidarajulu: A modification of the test of Shapiro and Wilk for normality . In: Journal of Applied Statistics . 24, No. 2, 1997, pp. 219-236. doi : 10.1080 / 02664769723828 .
^ Edith Seier: Comparison of Tests for Univariate Normality , Department of Mathematics. East Tennessee State University, 2002 http://interstat.statjournals.net/YEAR/2002/articles/0201001.pdf
↑ Berna Yazici, Senay Yolacan: A comparison of various tests of normality , Journal of Statistical Computation and Simulation, 77, No. 2, 2007, pp. 175-183, doi : 10.1080 / 10629360600678310

literature

Sam S. Shapiro, Martin Bradbury Wilk: An analysis of variance test for normality (for complete samples) , Biometrika, 52 (3/4), 1965, pp. 591-611, doi : 10.1093 / biomet / 52.3-4.591 , JSTOR 2333709 .
DG Rees: Essential Statistics , Chapman & Hall, 2000
Berna Yazici, Senay Yolacan: A comparison of various tests of normality , Journal of Statistical Computation and Simulation, 77 (2), 2007, pp. 175-183, doi : 10.1080 / 10629360600678310 .
Edith Seier: Comparison of Tests for Univariate Normality , Department of Mathematics. East Tennessee State University, 2002
Manfred Precht, Roland Kraft, Martin Bachmaier: Applied Statistics , Oldenbourg, 2005
JR Leslie, MA Stephens and Fotopoulos: Asymptotic Distribution of the Shapiro-Wilk W for Testing Normality , The Annals of Statistics, 14 (4), pp. 1497-1506, 1986, doi : 10.1214 / aos / 1176350172 , JSTOR 2241484 .

Web links

Wikibooks: Shapiro-Wilk test with R - learning and teaching materials

[1] Rahman and Govidarajulu: A modification of the test of Shapiro and Wilk for normality . In: Journal of the Royal Statistical Society. Series C (Applied Statistics) . 31, No. 2, 1982, pp. 115-124. JSTOR 2347973 . doi : 10.2307 / 2347973 .

[2] Rahman and Govidarajulu: A modification of the test of Shapiro and Wilk for normality . In: Journal of Applied Statistics . 24, No. 2, 1997, pp. 219-236. doi : 10.1080 / 02664769723828 .

[3] Edith Seier: Comparison of Tests for Univariate Normality , Department of Mathematics. East Tennessee State University, 2002 http://interstat.statjournals.net/YEAR/2002/articles/0201001.pdf

[4] Berna Yazici, Senay Yolacan: A comparison of various tests of normality , Journal of Statistical Computation and Simulation, 77, No. 2, 2007, pp. 175-183, doi : 10.1080 / 10629360600678310