The reliability (dt. Reliability) is a measure of the formal precision and reliability of scientific measurements. It is that part of the variance that can be explained by actual differences in the characteristic to be measured and not by measurement errors . Highly reliable results must be largely free of random errors , i. H. if the measurement is repeated under the same framework conditions, the same measurement result would be achieved ( reproducibility of results under the same conditions ).
In addition to validity and objectivity, reliability is one of the three most important quality criteria for empirical studies. High reliability is fundamentally a prerequisite for high validity, whereby too high reliability can be at the expense of validity ( reliability-validity dilemma ).
Reliability comprises three aspects:
- Stability (equality or similarity of the measurement results when used at different times)
- Consistency (extent to which all items that are combined into one characteristic in a test measure the same characteristic)
- Equivalence (equivalence of measurements)
Various methods can be used to estimate reliability. Depending on the method, different types of reliability are used.
- Parallel test reliability
- The same test subjects are presented with two tests that are very similar to one another (either immediately one after the other or staggered in time). The parallel test reliability is determined in the parallel test procedure. It indicates whether a comparable measurement method delivers identical results. Instead of equivalent test procedures, parallel forms of the test can also be used (for example, the tasks and should be equally suitable for measuring the ability to simply add).
- Split-half reliability / test halving method
- With split-half reliability, the test is divided into two halves, each half being a parallel test to the other half. If the result set is sufficiently large, the mean values and other statistical parameters should be the same. The allocation of the individual items to the test halves is usually carried out using the odd-even method, i.e. H. Items with an odd sequence number are placed in one half of the test, items with an even sequence number in the other half of the test. From a mathematical point of view, however, in this case you actually only get the reliability of the "half" test and the split-half reliability underestimates the actual reliability, the original result must be corrected with the Spearman-Brown correction . The test halving method leads to a distorted reliability coefficient (artificially increased or decreased) in tests with a speed component (speed test).
- Retest reliability
- The retest reliability (also: re-test reliability ) is the reliability in the event of a repeated measurement: the same test is presented to the test subjects at different times. The results of the first and second measurements are correlated. The test-retest procedure checks whether repeating the measurement if the property to be measured is constant provides the same measured values. The retest reliability indicates the degree of agreement. For many tests, repetition according to the test-retest procedure is only theoretically possible, since the memory, learning or exercise effects associated with the test can influence the result and simulate a "sham reliability". In the case of math problems, for example, there is the possibility that the test person will remember the solution from the first test. The time interval between the measurements must therefore be large enough to rule out memory effects, but at the same time short enough to ensure the constancy of features. No systematic, test-related errors can be discovered with the retest reliability.
- Internal consistency
- Internal consistency is a measure of how the items on a scale are related to one another. Internal consistency is, so to speak, a detour to ascertain the measurement accuracy of an instrument if no retest or parallel test is available to determine reliability. The reliability measurement is therefore carried out internally, with each item being treated as a parallel test, so to speak, and correlated with every other item (intercorrelation matrix). The quality of an item can be determined by calculating the internal consistency if the item were not included in the scale. A common parameter for internal consistency can be calculated for dichotomous items using the Kuder-Richardson formula . For items on an interval scale , depending on the measurement model, a parameter is given by tau-equivalent reliability (= "Cronbach's Alpha") or congeneric reliability , alternatively also McDonald's Omega.
- Interrater reliability
- The agreement between assessors / observers determined at the same point in time or in relation to the same test objects is referred to as interrater reliability . Other common values are the consistency coefficient according to Holsti and Cohen's kappa .
Opportunities for improvement
- The reliability of the tests can be improved by extending the test with the help of comparable items, because the measurement accuracy increases with the length of the test.
- Objectivity is a necessary condition for reliability. Correspondingly, an improvement in objectivity can increase the reliability of the measuring instrument.
- When formulating the items, item homogeneity should be aimed for. Items are homogeneous if they contain each other. This means that test subjects who affirm the extreme item also affirm the weaker formulated item or deny a negatively poled item.
- Less selective items should be excluded. An item that distinguishes well between people with low and high characteristics contributes to the measurement accuracy of the test.
- Joachim Krauth: Test construction and test theory . Psychology Verlag Union, Weinheim 1995, ISBN 3-621-27286-0
- GA Lienert: Test setup and test analysis . 4th edition. Psychologie Verlags Union, Weinheim 1989
- G. Lienert, A. Raatz: Test analysis and test construction . Beltz, Weinheim 2001
- M. Wirtz, F. Caspar: Assessment agreement and assessment reliability . Hogrefe, Göttingen 2002
- M. Bühner: Introduction to test and questionnaire construction . Pearson studies, Munich 2006
- MS Excel macro for calculating various reliability coefficients ( Memento from June 21, 2012 in the Internet Archive )
- Christian Becker-Carus , Mike Wendt: General Psychology. An introduction. 2nd Edition. Springer Verlag, Berlin, Heidelberg 2017, ISBN 978-3-662-53006-1 , p. 21
- William Revelle, Richard E. Zinbarg: Coefficients alpha, beta, omega, and the glb: Comments on Sijtsma In: Psychometrika. 2009, vol. 74, no. 1, pp. 145-154. doi: 10.1007 / s11336-008-9102-z