Stratified random sample

Taking a stratified random sample (also: stratified random sample ) can be advantageous in statistics if the population can be divided into meaningful groups, the so-called strata . Meaningful here means that the layers are relatively homogeneous in themselves and differ from one another as clearly as possible with regard to one or more features that also influence the expression of the ultimately interesting feature. Typical classes that play a role for random samples to answer social-scientific, medical or market research-relevant questions would be age groups or population classes according to income, educational qualification, place of residence, etc.

The purely random selection of the sample elements is now restricted by specifying the sample sizes per stratum and then drawing a purely random sample in each stratum. (The individual random samples are evaluated separately and the results are then summarized.) This "prohibits" extreme samples which, for example, happen to contain almost only elements from one stratum and consequently get better point estimates , i.e. H. Smaller variance estimator . With a suitable stratification, the total sample size can be reduced compared to a simple random sample drawing with the same accuracy of the results, which lowers the costs of data collection.

In Monte Carlo simulations , stratified random drawings can be used as a means of reducing variance . The stratification features ( paradata ) must be known in advance.

Population sizes

${\ displaystyle N}$Population size, number of layers, characteristic of interest . The following is . Scope of the shift . Expression of the feature in layer . relative layer thickness. Expected value in shift . Variance in layer The following applies: ${\ displaystyle L}$${\ displaystyle X}$${\ displaystyle i = 1, \ dots, L}$${\ displaystyle \ quad N_ {i}}$${\ displaystyle i}$${\ displaystyle \ quad X_ {i}}$${\ displaystyle i}$${\ displaystyle \ quad p_ {i} = N_ {i} / N}$${\ displaystyle \ quad \ operatorname {E} (X_ {i}) = \ mu _ {i}}$ ${\ displaystyle i}$${\ displaystyle \ quad \ operatorname {Var} (X_ {i}) = \ sigma _ {i} ^ {2}}$ ${\ displaystyle i. \ quad}$

${\ displaystyle \ mu = \ operatorname {E} (X) = \ sum _ {i = 1} ^ {L} p_ {i} \ mu _ {i}; \ quad \ sigma ^ {2} = \ operatorname { Var} (X) = \ sum _ {i = 1} ^ {L} p_ {i} \ sigma _ {i} ^ {2} + \ sum _ {i = 1} ^ {L} p_ {i} ( \ mu _ {i} - \ mu) ^ {2}}$.

The total variance is the sum of the variance in the strata and the variance between the strata .

Estimates for the population parameters

We focus on estimating . Let be the sample sizes per stratum and the total sample size . Let the sample values ​​from the layers continue. Then there is an unbiased estimate for and an unbiased estimate for . For comparison with the estimator of interest here , the sample mean based on pure random selection is used. ${\ displaystyle \ mu}$${\ displaystyle n_ {i}; \ i = 1, \ dots, L}$${\ displaystyle n = \ sum _ {i = 1} ^ {L} n_ {i}}$${\ displaystyle x_ {i1}, \ dots, x_ {in_ {i}}}$${\ displaystyle {\ hat {\ mu}} _ {i} = 1 / n_ {i} \ sum _ {j = 1} ^ {n_ {i}} x_ {ij} = {\ overline {X}} _ {i}}$${\ displaystyle \ mu _ {i}}$${\ displaystyle {\ hat {\ mu}} = \ sum _ {i = 1} ^ {L} p_ {i} {\ overline {X}} _ {i}}$${\ displaystyle \ mu}$${\ displaystyle {\ hat {\ mu}}}$ ${\ displaystyle {\ overline {X}}}$

Types of stratification

• Proportional layering :

A proportionally stratified random sample is used when the sizes of the samples taken from the various strata are proportional to the proportion of the stratum in the population: each stratum is represented in the sample in the same proportion as in the population.

If you choose, they are proportional to the extent of the shift . is significantly smaller than if the expected values differ greatly in the layers, i.e. H. when the variance between the layers is large. ${\ displaystyle n_ {i} = p_ {i} n; \ quad i = 1, \ dots, L}$${\ displaystyle n_ {i}}$${\ displaystyle N_ {i}}$${\ displaystyle \ \ operatorname {Var} {\ hat {\ mu}}}$${\ displaystyle \ \ operatorname {Var} {\ overline {X}}}$${\ displaystyle \ mu _ {i}}$

In the case of proportional two-fold stratification , the sample sizes in the stratified cells will often not be whole numbers, see also controlled rounding .

• Disproportionate layering :

If the sizes of the random samples taken from the strata are independent of the proportion of the stratum in the population, it is a disproportionately stratified random sample. In the simplest case, random samples of approximately the same size are drawn from all strata. A motive for such an approach can e.g. For example, it could be that the random sample to be taken for a very small stratum with proportional stratification and reasonable effort for the overall survey would be too small for a meaningful statistical evaluation. For example, in the surveys for the PISA studies, disproportionately stratified samples are used in order to determine the properties of smaller strata such as those of the students in the small federal states of Hamburg and Bremen with sufficient accuracy. (In order not to distort the study results overall, the results obtained from the individual layers are again weighted proportionally.)

Special forms of disproportionate stratification are the variance-optimal and the cost-optimal stratification.

• Variance-optimal stratification :

If

${\ displaystyle n_ {i} = {\ frac {n} {\ sum _ {j = 1} ^ {L} p_ {j} \ sigma _ {j}}} p_ {i} \ sigma _ {i}; \ quad i = 1, \ dots, L}$,

then the variance is significantly smaller in the case of strongly differing stratifications than in the case of proportional stratification, because strata with a large scatter are sampled more strongly. Proportional stratification is optimal for variance when all are equal. ${\ displaystyle \ sigma _ {i}}$${\ displaystyle {\ hat {\ mu}}}$${\ displaystyle \ sigma _ {i}}$

• Cost-optimized stratification :

Let be the total cost available and the cost of selecting an element from layer . If one now minimizes the variance of under the secondary condition not to exceed the costs, then the result is ${\ displaystyle c}$${\ displaystyle c_ {i}}$${\ displaystyle i; \ i = 1, \ dots, L}$${\ displaystyle {\ hat {\ mu}}}$

${\ displaystyle n_ {i} = {\ frac {n} {\ sum _ {j = 1} ^ {L} p_ {j} \ sigma _ {j} {\ sqrt {c_ {j}}}}} \ cdot {\ frac {p_ {i} \ sigma _ {i}} {\ sqrt {c_ {i}}}}}$.

As a rule, the above value is not a natural number and should therefore be rounded.

Stratification problem

Stratification is the division of the population into layers. There are two sub-problems:

1. Establishing the number of layers.
2. The definition of the stratification.

The aim is to solve the two sub-problems in such a way that the estimates become more accurate. However, this usually requires prior information about the population (e.g. from official statistics or previous studies).

A solution above. The Dalenius stratification model, including corresponding approximate solutions such as the cum rule or the equal aggregate rule, presents a problem. ${\ displaystyle {\ sqrt {f}}}$${\ displaystyle \ sigma}$

Comparison with cluster sample

In the case of stratified samples and cluster samples , the population is subdivided into groups - in the case of the stratified sample these are the strata, in the cluster sample the so-called "clusters" or clusters. The main difference between the two sampling methods lies in the statistical properties of the groups in comparison with one another and with the population.

The application of the cluster sample is based on the assumption that each cluster is as ideal a scaled-down image of the population as possible, i.e. that it comes as close as possible in terms of expected value and variance or distribution of the characteristic of interest and other possibly correlating characteristics: the clumps are as close as possible in themselves heterogeneous as the population and ideally very similar in this regard.

In contrast, the strata of the stratified sample are expediently chosen so that they are essentially more homogeneous than the population with regard to the characteristics relevant for the selection of the strata (i.e. each have a smaller variance than the population for these characteristics) and differ from one another with regard to the expected values differentiate these features as much as possible

Comparison with quota sample

The quota sample is very similar to the proportionally stratified random sample in two respects: Both procedures aim to firstly divide the population to be examined into groups that are characterized by certain relevant features; and secondly, to take samples from these groups, the relative size of which is determined by the proportion of the group in the population.

The difference between the two methods lies in the use of a random or arbitrary selection process for the individuals / elements ultimately included in the sample: The stratified random sample has a definable probability of drawing for each element of the population, while no such probability of drawing can be stated for the quota sample. An arbitrary selection can be based on self- selection, for example : the investigator searches for suitable study participants via an advertisement, contacts suitable members of an online panel who have agreed to take part in opinion polls, or addresses randomly suitable passers-by, only a few of whom are in favor decide to answer him. He does this until he has met the quotas for his samples. If the characteristics of the participants who induced them to self-select also influence the characteristic of interest, the results of the quota sample will be distorted compared to the results of a stratified random sample (something similar happens with a random sample, however, due to non-response ). The interviewer may also experience sample distortion in the quota sample . E.g. passers-by are addressed based on sympathy or a list of telephone numbers is “processed” in a certain order.

Quota samples are cheaper, faster and less demanding in terms of their requirements than stratified random samples; in many cases they can be a viable substitute for them. Quota sampling is the method of choice in commercial market and opinion research and is also used in academic research.

literature

• L. Kish: Survey Sampling . Wiley, 1965, especially pages 75–112 (Chapter 3: Stratified sampling )
• H. Stenger: Sample theory . Physica-Verlag, 1971, especially pages 115–150 (Chapter 6: Stratification )
• WG Cochran: Sampling Techniques. 3. Edition. Wiley, New York 1977, especially pages 89–149 (Chapter 5: Stratified random sampling and Chapter 5A: Further aspects of stratified sampling )
• J. Hartung: Statistics. 15th edition. Oldenbourg, Munich 2009, especially pages 278–287 (Chapter V, Section 1.5: Stratified random selection )

Individual evidence

1. a b c Marcus M. Gillhofer: Recruiting participants in online social research . Joseph Eul Verlag, Lohmar 2010, ISBN 978-3-89936-905-2 , 5.2.2 The stratified random sample, p. 68 f . ( [1] ).
2. ^ Rüdiger Jacob: Lecture "Methods and Techniques of Empirical Social Research - 7th Selection Process". (PDF) University of Trier , accessed on November 6, 2019 .
3. For sampling within the PISA extension. (PDF) Max Planck Institute for Human Development , accessed on November 6, 2019 .
4. RGCumming: Is probability sampling always better? A comparison of results from a quota and a probability sample survey . In: Community Health Studies . 14, No. 2, 1990, pp. 37-7. doi : 10.1111 / j.1753-6405.1990.tb00033.x . PMID 2208977 .
5. Michael Meyer, Thomas Reutterer: Qualitative market research: concepts, methods, analyzes . Ed .: Renate Buber, Hartmut H. Holzmüller. Gabler, Wiesbaden 2007, ISBN 978-3-8349-0229-0 , Sampling Methods in Market Research, p. 239 ( [2] ).
6. ^ Duane R. Monette, Thomas J. Sullivan, Cornell R. DeJong: Applied Social Research: A Tool for the Human Services . Brooks / Cole, Belmont 2011, ISBN 978-0-8400-3205-8 , Quota Sampling, pp. 152 (English, [3] ).