Normalization (psychological diagnostics)

from Wikipedia, the free encyclopedia

In psychological diagnostics, normalization is the development of a conversion scale from raw values to normal values for the purpose of establishing the comparability of an individual test result with a representative comparison group.

So z. For example, the results of an intelligence test of a specific group of people, such as high school graduates, are compared, classified and interpreted with the intelligence distribution of the corresponding group shown in the table of standards.

It is usually based on the assumption that psychological characteristics are normally distributed and that the degree of deviation of a result from the mean range of the reference group is relevant for interpretation. These interpretations can then e.g. B. be classified as “above average”, “average” or “below average” - the evaluation results from the content of the characteristic (e.g. in the case of intelligence other than aggressiveness). As a rule, the mean or average range comprises the distance of one standard deviation from the mean, but this limit value is not psychologically justified. In some tests, therefore, deviations of two or three standard deviations are required for the interpretation of an extreme expression. More precise is the question-specific determination of limit values ​​in the context of validation for making a diagnostic decision (e.g. from which concentration performance value fitness to drive must be denied because the risk of causing an accident is greater than the restriction of personal freedom).

The implementation of a standardization is an essential quality criterion for a mature test procedure and its practical usability. For paper-and-pencil tests , the standardization table (conversion of raw value to standard value must be available in the test manual. In the case of computer-aided processes or evaluation programs where automatic conversion takes place, at least information on the sample (and the subdivision of the standard, e.g. according to age , Gender, etc.), survey methodology and the survey period must be published (cf. e.g. DIN 33430 ) Here, the direct availability of the standard tables is often dispensed with for reasons of test and investment protection, since the collection of representative standardization samples is usually the most expensive individual item a test development is to prevent a replica by third parties.

For each psychological test, it must be stated for which target group and which diagnostic decision this test should be a valid measuring instrument and substantiated by empirical results in the test manual. The type, up-to-dateness and quality of the standardization are decisive for the so-called utility of the test procedure.

Standardization as a quality criterion

The scientific importance and practical utility of a test procedure are measured against so-called quality criteria . The availability of standard tables is such a quality criterion. The normalization is carried out on the basis of tests carried out on a representative sample and processed statistically. This requires a longer period of testing and maturation until the test procedure meets the requirements to be placed on it. Many of the tests traded suffer from the lack of this quality criterion and are therefore only very limited meaningful and usable:

Tests initially only provide raw values ​​as the immediate result. These can only be judged through a comparison. When evaluating a 100-meter run, it can be determined that a time of 11.6 seconds was achieved and that this represents a higher performance than 12.0 seconds. Without a benchmark, however, it cannot be assessed whether this result represents an outstanding, poor or average result for the corresponding comparison group (children, men, women, high-performance athletes, disabled people). Without the possibility of comparison with a table of standards obtained from a larger comparison group, the results are only suitable for "home use", for example within a school class or a club department. An evaluation of the raw scores going beyond this requires a yardstick by which one can read off what is to be considered as “average”, “above average” or below the average of the population concerned.

The standardization and the standard tables resulting from it therefore represent an important prerequisite for being able to interpret and evaluate a special test result after the evaluation.

Standardization examples

The progressive matrix test by John C. Raven is a speech-free test procedure for measuring intelligence . It works with the multiple choice method and was initially developed in the service of the British Army and published in three different forms for different intelligence levels. The raw scores are evaluated using foils. For each of the versions, in addition to the standardization for the starting country Great Britain, standard tables specifically valid for Germany had to be created. Since the Raven matrices came into the public eye and were also improperly used as exercise material, parallel versions had to be designed and adjusted standardizations made several times.

The Vienna coordination course by Siegbert A. Warwitz is a well-engineered test method for recording movement coordination . For him, standards tables for both sexes of 17 to 21-year-old high school students were first drawn up from a representative sample. Specially trained experimental psychology students gradually expanded the tables for the age groups from the age of twelve as well as for the special populations of male and female sports students in additional tests. The normalizations were repeated ten years later with a population of N = 2778 and their results were confirmed at the significance level of p = 1%. Because of the objective comparability of performance, the WKP is used today primarily in the aptitude tests for sports studies at universities and in tests for police and military careers. The differentiated tables of standards allow for a cross-regional comparison as well as a generational comparison. The assigned to the individual test performance percentile rankings allow doing a power allocation within a five-point scale from "inadequate" to "flawed", "average" and "good" to "excellent."

Culture dependence of standardization

In addition to the factors of age, gender, etc., a psychodiagnostic instrument must also be standardized for different cultures . A test that correctly measures the construct of “social submissiveness” in Central Europe can give useless indices when used in the Far East , since many social interactions from the Asian region in Central Europe would be interpreted as excessive politeness or even gestures of submission. Other dimensions that need to be included in the standardization process are conceivable.

In the development phase of the test procedure, the researcher typically uses a large item pool, i.e. a comprehensive collection of possibly suitable questions (called "items"), some of which he will select for the first prototype. The representative test group is then confronted with the test. If (in this case) the Gaussian normal distribution is not achieved, but rather the results speak for a ceiling effect, a change of the items from the pool to more demanding problems is necessary. This procedure - normalization - may have to be repeated several times.

Influence of time on standardization processes

Psychodiagnostic measuring instruments are not easily applicable over an unlimited period of time. The above-mentioned intelligence tests in particular must be checked regularly and, if necessary, re-standardized. One reason for this is the often cited so-called Flynn effect .

Influence of social changes on standardization processes

In particular, psychological tests, which are not assigned to the projective but to the objective (performance) tests, must be "maintained" regularly. So the question in a knowledge test about politician names from the Second World War in the 1950s is certainly a question that can be classified as easy. If it were asked today, however, it would be more difficult to answer because of the time gap. The test in question would be proportionately more complicated and the maximum of the distribution function would be shifted slightly to smaller values. Social change can force a “readjustment” of a psychological procedure.

Influence of the international dissemination of tests

Well-engineered test procedures that meet the high demands of as many quality criteria as possible are popularized by the scientific community in the entire professional world. This means additional effort for the standardization:

Even small changes to the question or task can significantly distort the results. This fact becomes particularly problematic in the case of language-based tests and the need for a translation into another language and can make a complete re-normalization necessary.

literature

  • R. Horn (Ed.): Standard Progressive Matrices (SPM). (German processing and standardization according to JC Raven.) 2nd edition. Pearson Assessment, Frankfurt 2009.
  • HW Krohne & M. Hock: Psychological diagnostics - intelligence tests. Kohlhammer, Stuttgart 2007.
  • Gustav A. Lienert, Ulrich Raatz: Test setup and test analysis. 6th edition. Beltz, Weinheim 1998, ISBN 3-621-27424-3
  • J. Raven, John C. Raven, JH Court: Raven's Progressive Matrices and Vocabulary Scales. Basic manual. Pearson Assessment, Frankfurt 2003
  • N. Schirach: The creation of tables of norms for a sport motor test battery (Vienna coordination course). Knowledge State examination thesis GHS, Karlsruhe 1979
  • Siegbert Warwitz: The Vienna coordination course. In: Siegbert Warwitz: The sports science experiment. Planning-implementation-evaluation-interpretation. Verlag Hofmann, Schorndorf 1976, pp. 48-62
  • Siegbert Warwitz: Norm boards for the Vienna coordination course (WKP). In: Sportunterricht (Lehr Aid) 4 (1982) pp. 59–64

Individual evidence

  1. ^ Gustav A. Lienert, Ulrich Raatz: Test setup and test analysis. 6th edition. Beltz, Weinheim 1998
  2. ^ Siegbert Warwitz: The sport science experiment. Planning-implementation-evaluation-interpretation. Verlag Hofmann, Schorndorf 1976
  3. J. Raven, JC Raven, JH Court: Raven's Progressive Matrices and Vocabulary Scales. Basic manual. Pearson Assessment, Frankfurt 2003
  4. R. Horn (Ed.): Standard Progressive Matrices (SPM). (German processing and standardization according to JC Raven.) 2nd edition. Pearson Assessment, Frankfurt 2009
  5. ^ Siegbert Warwitz: Norm boards for the Vienna coordination course (WKP). In: Sportunterricht (Lehr Aid) 4 (1982) pp. 59–64
  6. N. Schirach: The creation of norm tables for a sport motor test battery (Vienna coordination course). Knowledge State examination thesis GHS, Karlsruhe 1979
  7. ^ Siegbert Warwitz: The Vienna coordination course. In: Siegbert Warwitz: The sports science experiment. Planning-implementation-evaluation-interpretation. Verlag Hofmann, Schorndorf 1976, pp. 48-62