In empirical terms, validity denotes the agreement of the content of an empirical measurement with a logical measurement concept. In general, this is the degree of accuracy with which the feature that is to be measured is actually measured. With regard to models and hypotheses , validity denotes the agreement between prognoses or conclusions and data.
A distinction is made between a representation conclusion (if the test behavior is representative of overall behavior) and a correlation conclusion (if the behavior in the test correlates with the behavior outside the test situation). Depending on which variable is used as the criterion for behavior outside of the test situation, a distinction is made between content-related, predictive or construct validity .
Validity as a quality criterion for measuring instruments
The validity is one of the so-called main quality criteria for measuring instruments . It is a measure of whether the data generated during the measurement represent the variable to be measured as intended. Only then can the data be meaningfully interpreted.
In addition to validity, objectivity (independence of the results from the measurement conditions) and reliability (reliability, formal accuracy of the measurement) are among the three main quality criteria. They build on each other: without objectivity there is no reliability, without reliability there is no validity.
- Example: If a test is to predict fitness to drive, corresponding tasks (e.g. on concentration, perception, sensorimotor skills, intelligence) are put together, which provide a test value after the test. This must be objective and reliable. Validity relates to the question of whether it actually predicts fitness to drive and z. B. Identified people at risk. Despite the present objectivity and reliability, the validity does not have to be given, e.g. B. if the measured features are not representative of the fitness to drive.
There are various aspects of validity and associated measurement and estimation methods.
Validity as a quality criterion for psychological tests
These quality criteria are used as evaluation criteria for quality, especially for psychological tests . A test must be designed in such a way that its execution, evaluation and interpretation are independent of the test leader or the test conditions (objectivity) and that the test result is also confirmed with the same or a comparable test (reliability). The validity or validity is related to the fact that e.g. For example, aspects of intelligence are actually measured by an intelligence test and this measurement allows a prediction of performance in real life (e.g. training success or professional success). As a result of the measurements, such predictions are provided with an error and only give probability statements - at the same time, some content is also criticized, cf. z. B. Critique of the concept of intelligence .
Forms or aspects of validity
In its Technical recommendations for psychological tests and diagnostic techniques (1954), the American Psychological Association proposed four types of validity, these are content validity , construct validity , and prognostic and diagnostic criterion validity , of which "historically and practically [...] criterion-related validity is the most significant Aspect "is. “Like all agreements, the agreement through a rating is not something closed, but can be subject to constant change. [...] It is up to each test interpreter to recognize or reject this criterion or to look for a better one. "
Content validity (engl. Content validity ) is assumed if a method of measuring a particular construct or feature the best possible operationalization is this construct. This is the case, for example, with interest and knowledge tests: a class test or driving test directly represent the skills to be measured. That is why one speaks of logical or trivial validity . Experts use ratings to decide whether the content is valid or not.
The term construct is understood to mean theoretical property dimensions ( latent variables ). Construct validity refers to the admissibility of statements based on the operationalization of the entire underlying construct. This is usually the case when the scope of the meaning of the construct is completely, precisely and comprehensibly mapped. Convergent and discriminant (or also: divergent ) validity are considered empirical indicators of construct validity :
- Convergence validity
- The measurement data from test procedures that map the same construct would have to be highly correlated with one another .
- Discriminant validity
- The measurement data from test procedures that map different constructs should only correlate slightly with one another (provided that the constructs are actually independent of one another).
Both convergent and discriminant validity must be given to ensure full evidence of construct validity. The empirical procedure for convergent and discriminant validity are special cases of criterion validity.
In the multitrait multimethod analysis , the convergent validity and the discriminant validity are compared with one another on the basis of a single sample. In short, it is expected that the convergent validity is greater than the discriminant validity.
Factors for a reduced construct validity can be:
- vague definition of the construct
- mono-operation bias : only one aspect of the construct is examined
- mono-method bias : only one method is used to operationalize the construct
- Hypothesis rates ( Hawthorne effect )
- social desirability
- Expectations of the investigator ( Rosenthal effect )
- Omit relevant factor levels
- more than one independent variable is effective (see confusion effect )
- Interaction between measurement and treatment
- limited generalizability to similar variables
Criterion validity refers to the relationship between the results of the measuring instrument and an empirical criterion (Schnell, Hill & Esser, 2005, p. 155). For example: A researcher examines the connection between his new intelligence test and the subjects' school grades in order to check the validity of his test. "Inner (criterion) validity" is used when another test recognized as valid is used as the criterion. If an objective measure (for example psychophysiological measures or economic values) or an expert rating is used as a criterion, we speak of external (criterion) validity . A distinction can also be made according to the point in time at which compliance with the criterion should exist:
- Diagnostic validity / concurrent validity ( concurrent validity )
- The external criterion, which must already be valid (e.g. another test), is presented to the same test subjects at the same time as the measuring instrument to be validated. The results of the two measuring instruments are correlated . The level of correlation is the measure of the validity of the agreement. The procedure for determining the convergent and discriminant test validity are special cases of this category.
- Prognostic validity / predictive ( predictive validity )
- The measurement data are collected at a point in time before the external criterion was collected. In contrast to the agreement validity, when determining the prediction validity, the prediction interval lies between the two measurements. In this way, the degree to which the measurement data predict the criterion can be determined. For example, in the context of an assessment center, a prognosis for professional success can be made, or later school success can be predicted from the performance in an intelligence test . A test fulfills the prediction validity if its predictions correlate highly with the result that actually occurred later.
Face validity , also known as face validity , depends on whether a measuring instrument appears plausible to laypeople. Appearance validity says nothing about the actual validity, i.e. the content, criterion and construct validity, but determines the acceptance of a measurement method. Even measuring instruments that are not very valid (such as unstructured recruitment interviews) enjoy a high level of visual validity and are therefore often used in practice.
Validity of statements about causal relationships
Based on the operationalizations relating to individual constructs, in most empirical studies researchers draw conclusions about cause-effect relationships first in the statistical analysis and then with regard to their causal hypotheses . The terms of statistical , internal and external validity refer to the occurrence, validity and transferability of these ( inductive ) conclusions. The degree of validity of these conclusions can only be discussed and estimated, never proven, and it is therefore more sensible - as before - to speak of the degree of validity rather than the existence (or non-existence) of these forms of validity.
A high degree of statistical validity is assumed for statements or conclusions drawn in empirical studies (usually about cause-and-effect relationships) if the reliability and test strength of the measuring instruments and selected statistical methods are high and generally the error variance has been limited, the mathematical Assumptions of the statistical methods were not violated and not individual significantities (for example from a correlation matrix ) were "fished out" ( fishing ).
A high degree of internal validity is assumed for statements or conclusions drawn in empirical studies if alternative explanations for the existence or the extent of the effects found can be largely excluded. Internal validity (or ceteris paribus validity) exists if the change in the dependent variable can be clearly attributed to the variation in the independent variable (no alternative explanation). In order to guarantee this, interfering variables must be controlled or switched off using various methods such as elimination, keeping constant and parallelization. So that the effects cannot be traced back to the characteristics of the test persons, these must be randomly assigned to the test conditions.
The internal validity is endangered by:
- History. Every unplanned event between two measurements can have an unintended influence on the test subjects. Example: During the first measurement, before treatment with a new antidepressant, the weather is cold and rainy, during the second measurement, which is supposed to test the effect of the drug, the weather is warm and sunny.
- Maturation. Subjects change between two measurements solely as they get older.
- Reactivity. Subjects may react to the measurements themselves, for example with habituation or sensitization , especially if the measurement process is unpleasant.
- Change in the measuring instrument. During a study, the characteristics of the measuring instruments, including the person measuring, can change. For example, these can measure more precisely through experience or less precisely through growing boredom. The treatment can also cause the dependent variable to reach a value range in which the measuring instrument is less precise. This can lead to floor or ceiling effects . Example: An intelligence test is used to measure the effect of cognitive training on children. The training is so successful that the children all get full marks on the second measurement.
- Regression to the middle . This statistical artifact can be superimposed on treatment effects if, for example, to prevent floor or ceiling effects, subjects with particularly high (or low) initial values in the relevant characteristic are excluded from the outset.
- Selection due to insufficient randomization. If the assignment of the test persons to the test conditions is not random, the experimental and control groups can differ even before the treatment, so that the measurement of the treatment effect is falsified. In addition, history, maturation and instrument effects can affect the groups in different ways.
- Failure. If subjects drop out during the study, it may be due to treatment. The smaller groups in the second measurement are the result of an unwanted selection.
- Direction of the causal inference. A causal relationship between the independent and the dependent variable becomes doubtful if (in another study) an effect of the dependent on the independent variable is also found and this correlation cannot be explained by a third variable.
- Exchange of information. If subjects interact between measurements (for example, “I think I belong to the placebo group”), this can have an impact on the next measurement process. Effects of conformity can overshadow the effects of treatment; or one group reacts to the fact that its test conditions are much more uncomfortable than those of the other group, for example with compensation or demotivation.
- Rosenthal effects . The experimenter unconsciously reveals more about the experiment through gestures, facial expressions and choice of words than the test subject is allowed to know. A distinction can be made between autosuggestion and suggestion . With the former, the experimenter, with all conscious efforts to be neutral, tends to collect data that support his previous expectations and hypotheses. During the suggestion, these expectations are communicated to the test subject , who acts in accordance with the preliminary expectations of the test director and provides suitable data ( good subject effect ).
In English there is the donkey bridge THIS MESS. This acronym refers to eight factors that represent threats to internal validity, namely T esting (see. Reactivity), H istory (story), I instrument change (change in the measuring instrument), S tatistical regression toward the mean (regression to the mean) , M aturation (ripening), e xperimental mortality (failure), S (selection by insufficient randomization) election and S election interaction (interaction between selection and another factor, z. B. ripening only in the experimental group).
External validity - also general validity, generalizability or ecological validity (see ecological fallacy ) - denotes the correspondence between the actual and the intended object of investigation. The basic idea here is the question of generalizability (induction). According to the classical view, statements or conclusions drawn in empirical studies have a high degree of external validity if (a) the results can be generalized to the population for which the study was designed, and (b) beyond the specific setting of the study Can be transferred to other designs, instruments, places, times and situations, i.e. they are generally valid and can be generalized. The most frequent threat to personal external quality (a) lies in practical problems in recruiting the information carriers, i.e. the people who are interviewed or the test subjects required for an experiment . Is their participation forced or voluntary? How did you find out about the opportunity to participate (through newspaper advertisement, notice, etc.)? What motivates you to participate (are you interested in the topic, do you need the money, etc.)? These are filters that can limit the quality of the sample. The most common threat to the situation-related external quality (b) lies in the artificiality of laboratory experiments.
The external validity increases with each successful replication of the findings, because repetition with other test subjects (age group, gender, culture, etc.) or variations in the test conditions reduce the restrictions on the validity of the findings. Example: As long as Pavlov had only shown that dogs mouth watering when a bell rang, if the bell rang often enough at the same time as they were given food, he was only showing that. One can speak of the phenomenon of classical conditioning only when many types of subjects show many types of conditioned responses to many types of conditioned stimuli. The meta-analysis method is available for the statistical evaluation of replication studies .
From this classic point of view, internal and external validity are in conflict: A high degree of internal validity is best achieved through highly controlled and therefore quite artificial (laboratory) conditions. Research designs that are particularly realistic , as they seem advisable for the highest possible external validity, however, harbor the risk of uncontrollable or overlooked interference. From a deductivist perspective, however, this is only an apparent contradiction. Since both criteria were developed from an inductivist research logic, the generalization of empirical findings (e.g. from an experiment) is in the foreground. The question of the replicability of the results under different conditions with different samples is a useful question here. However, a deductivist research logic pursues a different goal. Here the attempt is made to falsify a (generally valid) theory on the basis of a special prediction , not, as in empirical research logic , to verify a theory through sufficient observations. If, according to this logic, the observation contradicts the theory, it is considered falsified. It is irrelevant here whether the results are "representative" in any way. If the prediction of a theory is confirmed in an experiment, the theory is considered proven, but must be subjected to further tests. Objections that question the validity of the results of the experiment are objections to the internal validity of the experiment.
The research design has a great influence on the admissibility and validity of causal inferences, which is why the validities of experimental and quasi-experimental research designs are always critically questioned.
Validity in biological nomenclature
The term "validity" in biological nomenclature refers to the formal validity of a taxon (a systematic unit of living beings). The validity is given if the first description of the taxon meets the corresponding formal requirements (referred to in botany as a " valid publication "). In this case, the name chosen for the taxon is also considered to be "valid". If the assigned name of the taxon is not valid due to formal deficiencies, this name is a noun nudum .
- DT Campbell , DW Fiske: Convergent and discriminant validation by the multitrait-multimethod matrix. In: Psychological Bulletin . 56, 1959, pp. 81-105.
- Andreas Diekmann : Empirical social research. 18th edition, Reinbek near Hamburg 2007.
- RM Liebert, LL Liebert: Science and behavior. An introduction to methods of psychological research. Prentice Hall, Englewood Cliffs, NJ 1995.
- Rainer Schnell , Paul B. Hill , Elke Esser: Methods of empirical social research. 8th, unchanged edition, Oldenbourg Verlag, Munich 2008.
- W. Shadish, T. Cook, D. Campbell: Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Houghton Mifflin, Boston 2002.
- Lienert and Raatz 1994 according to validity in the DORSCH Lexicon for Psychology .
- Fisseni, Hermann-Josef: Textbook of psychological diagnostics, 3rd edition, Göttingen et al. 2004, p. 62 f.
- Fisseni, Hermann-Josef: Textbook of psychological diagnostics, 3rd edition, Göttingen et al. 2004, p. 62 f. and Brockhaus Psychologie, 2nd edition, Mannheim 2009.
- GA Lienert, U. Raatz: Test setup and test analysis. 5th, completely revised and expanded edition, Beltz, Weinheim 1994, p. 220.
- Gustav A. Lienert : Test setup and test analysis. Psychologie Verlags Union, 4th edition, 1989, p. 256.
- Joachim Krauth: Experimental Design . Elsevier / Saunders 2000. ISBN 0-444-50637-3 .
- PM Wortman: Evaluation research - A methodological perspective . In: Annual Review of Psychology . 34, 1983, pp. 223-260. doi : 10.1146 / annurev.ps.34.020183.001255 .
- E. Aronson , TD Wilson, RM Akert: Social Psychology . Pearson Studium, 6th edition 2008, ISBN 978-3-8273-7359-5 , p. 42 f.