Quantile-quantile diagram

from Wikipedia, the free encyclopedia

A quantile-quantile diagram , short QQ-diagram ( English quantile-quantile plot , short QQ plot ) is an exploratory, graphical tool, in which the quantiles of two statistical variables against each other ablated to their distribution to be compared.

A PP diagram or probability-probability plot is an exploratory, graphic tool in which the distribution functions of two statistical variables are plotted against each other in order to compare their distributions .

QQ diagram

Comparison of the distribution of two statistical features

The observation values ​​of two characteristics whose distribution one wants to compare are each sorted according to size . This ordered data is combined to form pairs of values ​​and mapped out in a coordinate system . If the points produce (approximately) a straight line, one can assume that the two features are based on the same distribution. The procedure is problematic if there are different numbers of observations of the two characteristics. Here, with interpolation to be addressed.

An example is given here for approx. 110 warships at the outbreak of the Second World War. The variables length and width were recorded. The scatter plot shows that there are obviously two different groups that clearly stand out as clusters. The data for the quantile-quantile diagram have been standardized to facilitate comparability. You can see the breakdown of the data into two clusters at the gap in the point curve. For the lower left cluster, the type of distribution appears to be the same for both variables. For the second cluster at the top right, the width tends to be larger compared to the first cluster. The "bulge" of the plot shows that the distributions of length and width are unequal.

Scatter plot of the latitude and longitude variables
QQ diagram of the variables longitude and latitude

Checking the distribution of a feature

QQ diagram with large deviations between the distributions
QQ plot of the width of warships compared to the normal distribution
Trended QQ plot of the width of warships compared to the normal distribution

The observation values ​​of a feature are sorted according to size. The quantiles of the theoretical distribution that belong to the corresponding distribution value serve as a comparison. If the feature values ​​come from the comparative distribution, the empirical and theoretical quantiles approximately match, ie the values ​​lie on a diagonal.

Large systematic deviations from this diagonal indicate that the theoretical and empirical distribution differ from one another. However, the quantile-quantile diagram cannot replace a distribution test .

Formal definition

For each of the observations , an empirical undershoot portion is determined. With the help of the inverse distribution function (or quantile function ) of the theoretical distribution, the quantile

calculated. The plotting is now versus .

The calculation of the undershoot portion is done with the help of the observation rank :

method Formula for For
Blom
Rankit
Tukey
Van der Waerden

Trend-adjusted QQ diagram

In the trend-adjusted quantile-quantile diagram, the points are plotted instead of . If the empirical and theoretical distribution match, then all points are present . The deviations come only from the differences between the theoretical and empirical distribution. In the quantile-quantile plot, the points in the diagram always go from bottom left to top right, i.e. H. Deviations between the theoretical and empirical distribution are shown here in relation to the range of values ​​of the theoretical and empirical distribution. The trended QQ diagram therefore offers a better view of the structure of the deviations than the QQ diagram.

PP diagram

PP diagram of the width of warships compared to the normal distribution
Trend-adjusted PP diagram of the width of warships compared to the normal distribution

Checking the distribution of a feature

For the observation values, the underflow proportions according to Blom etc. are calculated. For the distribution to be compared, the observed values ​​are inserted into the cumulative theoretical distribution function. This is how you get the theoretical underflow rate . If the characteristic values ​​come from the comparison distribution, the values ​​of and approximately match, ie the values ​​lie on a diagonal.

In contrast to the QQ diagram, the edges of the distribution in the PP diagram have less of a visual impact. However, the probability-probability plot cannot replace a distribution test .

Trend-adjusted PP diagram

In the trend-adjusted probability-probability plot, the points are plotted instead of . If the empirical and theoretical distribution match, then all points are present . As with the trended QQ diagram, this graphic provides a better overview of the deviations.

Application examples

  • Comparison of an empirical frequency distribution with a theoretical or hypothetical distribution:
    • Graphical inspection of regression residuals for normal distribution
    • Optical testing of distribution requirements before performing a parametric test procedure

literature

  • Hartung, Joachim, Elpelt, Bärbel, Klösener, Karl-Heinz: Statistics. Munich 2002
  • JM Chambers, WS Cleveland, Beat Kleiner, Paul A. Tukey: Graphical Methods for Data Analysis. Wadsworth, 1983.

Individual evidence

  1. Peter P. Eckstein: Applied Statistics with SPSS , p. 97