Selection set

from Wikipedia, the free encyclopedia

The selection set , and selection, inclusion,, overdrafts or inclusion probability , rarely sample weights (engl. Inclusion probability ) indicates with which probability of one or more elements of a population in a random sample access. Inclusion probabilities can only be calculated for random samples .

The first-order probability of inclusion is the probability that the i-th element of the population is contained in a sample of the size . Similarly, the 2nd order inclusion probability is (with ) the probability with which the i-th and j-th elements get into a sample of the size .

With an unrestricted or simple random sample, the inclusion probabilities can be specified directly. Design effects occur with more complex sampling methods . Here, not every element has the same probability of being included in the sample.

Calculation of the inclusion probabilities

The inclusion probability of the 1st order can be calculated with an unrestricted or simple random sample using the hypergeometric distribution :

The hypergeometric distribution describes the probability that for N given elements ("population of size N"), of which M have the desired property, exactly k hits will be achieved when picking out n samples ("sample of size n"), d. H. the probability of X = k successes in n attempts.

Since there is only one i-th element in the population, M = 1 and either it is drawn (k = 1) or not (k = 0):

Hence:

The 2nd order inclusion probability for an unrestricted random sample can be calculated analogously ; here M = 2 and k = 2:

example

The population consists of four elements: {w1, w2, w3, w4}. We consider three samples of size n = 2, namely {w1, w3}, {w2, w4} and {w3, w4}. With an unrestricted random sample, there would be a total of possible samples; d. H. if only the three above samples are possible, it is not an unrestricted random sample.

The probability for each of the samples is just 1/3 and the inclusion probabilities result in

Inclusion probability w1 w2 w3 w4
1st order 1/3 1/3 2/3 2/3
2nd order w1 w2 w3 w4
w1 - 0 1/3 0
w2 - 0 1/3
w3 - 1/3
w4 -

Design effect

The design effect is the ratio of the variance of an estimator given the sample design to the variance of the estimator given a simple random sample (and the same sample size) . It describes the statistical distortion that has arisen as a result of a special selection process for a sample (stratification, clumping, multi-stage drawing) compared to pure random selection (simple random sample). Design effects arise from the fact that not all elements have the same selection probability , i.e. H. the chance to get into the sample. The population parameters can nevertheless be estimated well through suitable variance estimation and mean estimation .

literature

  • Shadish, WR, Cook, TD & Campbell, DT (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston: Houghton-Mifflin.
  • Rossi, PH & Freeman, HE (1999). Evaluation: A systematic approach. Thousand Oaks: Sage.
  • Döring, N. & Bortz, J. (2016). Research methods and evaluation (5th ed.). Heidelberg: Springer.