Representativeness

Representativeness (usually referred to as a representative sample or representative survey ) is a property of certain data surveys that makes it possible to make statements about a much larger number ( population ) from a small sample . Often this means random samples or quota samples . Representative samples are mainly used for surveys on attitudes, behavior and opinions of people for whom there is otherwise no precise statistical data ( opinion polls , market research ).

The term “representative sample” is not a technical term. In practice, random samples or quota samples are difficult to achieve.

Empiricism

The specific question is important, for example, for the analysis of social structures or the prognosis of voter behavior in the context of so-called “representative” samples. It may well happen that when evaluating a representative survey beyond its original purpose, e.g. B. if small spatial differentiation is not provided, only an unsatisfactory accuracy is achieved. In practice, they are also important for estimating distributions (e.g. proportional or mean values). For testing related hypotheses , representativeness is not of central importance. Variance control designs and the elimination of disruptive factors are more important here.

For empirical science, it is important to state the following characteristics of the sampling technique and survey method:

Specification of the sampling technique (of the selection process)
- Random sample: then also the response rate
- or quota sample: then also the quota characteristics
Number of realized elements (after deduction of refusal ( non response ))
The survey method (by phone, in person)
Weighting procedure
There should be a comparison between theory and practice, e.g. B. by reviewing the interviewer

It is important that the inclusion probability of an element can be specified. A statement about the accuracy of the survey is helpful. Whether sufficient accuracy has been achieved can often be judged by comparing the estimated values with values known from other sources. In relation to the questioning of people, this means e.g. B. that the estimates of age structure, level of education, marital status etc. correspond to the results of the official publications.

Random sample vs. "Representative" sample and the problem of representativeness

Allegedly “representative” surveys are carried out by specialized opinion research institutes on behalf of radio and television companies or newspaper publishers . Strictly speaking, however, there is no such thing as “representativeness”. The common notion that the “representative” subset has an equal distribution of all features relevant to the object of investigation is in fact not feasible because the investigation itself determines (and can be) which features are actually relevant. It is therefore crucial to draw a random sample . Then statistically controllable conclusions about the population are possible.

Therefore, in sample theory , part of the statistics , the “representative” sample, in contrast to the random sample, does not play a role.

In empirical research, the concept of representativeness is not clearly defined.
The concept of the “representative” sample differs significantly from the concept of a random sample.

P. von der Lippe and A. Kladroba summarize the intuitive concept of representativeness as follows:

The common notion of representativeness can best be described as follows: The selection of a subpopulation is to be made in such a way that “the result of the partial survey can be used to infer the proportions of the total mass as precisely and reliably as possible.” This is the case, “if it corresponds to the distribution of all interesting features of the total mass, d. H. represents a scaled-down, but otherwise realistic image of the whole. ”( Berekoven et al. (1999), p. 50).
... In
summary, it can be said that in general linguistic usage a subpopulation is representative if it has a similar structure to the population in certain features. From this it is concluded that one can then - and for many authors (e.g. Zentes 1996, p. 383) only then - inferring the population from the sub-population.

The following example highlights the difference: Let us assume that we knew that the number of men and women in the population is equal. If we take a sample of 100, then each representative sample must contain exactly 50 men and 50 women. With the help of probability theory , we can calculate that on average only just under 8% of the simple random samples contain exactly 50 men and women. It follows:

If you draw a large number of random samples, most of the random samples are not representative.
If you take a large number of “representative” samples, each sample must contain exactly 50 men and 50 women. This means that these samples are not random; H. also no random samples.

An even more serious problem is that the selection process for the “representative” sample, in contrast to random selection, uses the properties of the sample elements for selection. Would you like For example, subjecting the intelligence quotient or voting behavior to an analysis, then a “representative” sample would have to be representative of all parameters of the population that influence this desired variable (e.g. preferred party, intelligence quotient). These parameters (e.g. socio-demographic and psychographic personality traits) are often not known in terms of their distribution and relevance for the variable sought. Therefore, so-called “quota samples”, which want to be representative for certain parameters, should be viewed critically. In practice, representativity is usually only required for some of the characteristics surveyed (e.g. age, gender, course of study); mostly for variables that can be collected easily and without errors. And for the part for which representativity is not required, it is not clear whether the “representative” sample is a reflection of the population.

Despite all of the aforementioned problems, “representative” samples can be analyzed using statistical methods. Descriptive statistical methods can be used without hesitation. The methods of inferential statistics ( confidence intervals , tests, etc.) are problematic .

In order to draw conclusions from a “representative” sample about the population, methods other than probability-based inductive statistics are required, e.g. B. the replicability of the research results in different survey designs.

statistics

The size of the sample to be drawn depends on the desired degree of accuracy of the statistical conclusions. Here a compromise has to be made between accuracy and economy. The size of the sample alone is no guarantee of “representative” results. This is shown by the example of the literary digest disaster : In 1936, the US magazine The Literary Digest, despite an enormously large (but incorrectly drawn) sample (2.5 million subjects), did not succeed in the US presidential election between Alfred Landon and Franklin D. Roosevelt to predict correctly. George Gallup , the later founder of the Gallup Organization , on the other hand, succeeded in predicting Roosevelt's victory with a sample of only 50,000 subjects .

practice

In their studies, German market research companies often work with a multi-layered random selection based on the ADM master sample of the working group of German market research companies.

Complex sampling plans are often used in practice. It is almost never possible to carry out a survey exactly according to the theoretical specifications. For example, in practice there are almost always units for which no data can be collected ( response failures ).

Problems

Internet: There are major problems with surveys on the Internet , since the population can often not be delimited here (if one understands this to mean all Internet users) and since the problem of self-selectivity also arises when using passive selection procedures . In addition, the identity of the participants can usually not be verified beyond doubt, so that due to the low costs for the participant, multiple votes are possible to a large extent. However, this only applies in part, as there are also internet votes that are technically set in such a way that only one vote per person is possible. For this, however, a determination of the identity by post is necessary, which is usually not used for cost reasons.
Telephone: It is a little easier to draw samples using the telephone book because, at least in Germany, one can assume that almost every household has a telephone connection. Thus (almost) every element of the population can be reached through the media. Attempts are made to alleviate the problem of unregistered numbers through computer-aided random generation of telephone numbers ( Random Digit Dialing , RDD). The problem of multiple accessibility of people is more difficult to solve because it is often difficult to determine how many numbers a person can be reached on.
Mobile phones: The problem of people who can only be reached via a mobile phone, which will certainly increase in the future, has been recognized, but has not yet been solved in a methodologically convincing manner, because only very few people can be entered in directories. These elements of the population could in principle be reached by randomly controlled calling of all conceivable mobile phone numbers; However, this procedure requires a lot of money and time.

literature

A. Diekmann: Empirical social research . ISBN 3-499-55551-4 , p. 368ff ( ISBN 978-3-499-55678-4 , 08/2007, p. 430).
P. Hartmann: How representative are population surveys? A comparison of the Allbus and the microcensus. , in: Zuma-Nachrichten 26 (1990), pp. 7-30.
J. Koch: Market Research - Terms and Methods , Munich 1997.
G. Rothe, M. Wiedenbeck: Sample Weighting : Is Representativeness Feasible ? , in: ZUMA News. 21 1987, pp. 43-58.

R. Schnell: The homogeneity of social categories as a prerequisite for "representativeness" and weighting procedures. , in: Zeitschrift für Soziologie 22 1993, pp. 16–32.

S. Schumann: Representative survey. Practice-oriented introduction to empirical methods and statistical analysis procedures , 6th, updated edition, Munich 2012, ISBN 978-3-486-71415-9 .

Web links

Wiktionary: representative - explanations of meanings, word origins, synonyms, translations

Individual evidence

↑ Diekmann, Andreas (2002): Empirische Sozialforschung , p. 368.

↑ Mimi.hu. Kun-Pál Gábor, accessed March 29, 2009 .

↑ ^a ^b Peter von der Lippe, Andreas Kladroba: Representativity of samples. In: Marketing. 2002, ZFP 24, pp. 139-145.

↑ Ludwig Berekoven, Werner Eckert, Peter Ellenrieder: Market research: methodological principles and practical application. 8th edition. Wiesbaden 1999.

↑ J. Zentes: Basic Concepts of Marketing. 4th edition. Stuttgart 1996.

^ J. Bortz: Statistics: For human and social scientists. Springer textbook, 2006.

[1] Diekmann, Andreas (2002): Empirische Sozialforschung , p. 368.

[2] Mimi.hu. Kun-Pál Gábor, accessed March 29, 2009 .

[lk02-3] Peter von der Lippe, Andreas Kladroba: Representativity of samples. In: Marketing. 2002, ZFP 24, pp. 139-145.

[4] Ludwig Berekoven, Werner Eckert, Peter Ellenrieder: Market research: methodological principles and practical application. 8th edition. Wiesbaden 1999.