Intra-class correlation

from Wikipedia, the free encyclopedia

The intra-class correlation is a parametric statistical method for the quantification of the agreement ( interrater reliability ) between several assessors (raters) in relation to several observation objects. The associated measure, the intra-class correlation coefficient ( IKK or ICC , Asendorpf & Wallbott 1979, Shrout & Fleiss 1979, McGraw & Wong 1996, Wirtz & Caspar 2002) requires interval-scaled data and is usually calculated if there are more than two observers and / or several observation times are to be compared with one another.

To determine the interrater reliability , the variance between different ratings in relation to the same measurement object (= observation object, case, person or feature carrier, etc.) is compared with the variance that has arisen across all ratings and measurement objects.

A reliable observation can be assumed if the differences between the measurement objects are relatively large (which indicates systematic differences between the observed cases) and at the same time the variance between the observers with regard to the measurement objects is small. If the judgment concordance is high (i.e. there is little variance between the assessment values), the ICC is high.

As with other correlation coefficients , the ICC can assume values ​​between −1.0 and +1.0. Since reliability measures are by definition restricted to a range of values ​​from 0 to 1, negative ICCs indicate a reliability of 0 (Wirtz & Caspar [2002, p. 234]). In the scatter diagram for the two measured values, the intra-class correlation coefficient ICC means the deviation of the values ​​from the bisector.

Types of ICC

Up to six different types of ICC can be distinguished (Shrout & Fleiss 1979), depending on whether all raters assess all or different cases or whether or not the raters were randomly selected from a larger number of raters. In addition, it makes a difference whether the individual values ​​of the raters are compared with one another or whether (e.g. to increase the stability) the average assessments of a group of raters are involved.

Types and selection of the ICC
Question 1 Is every case assessed by all the raters?
No, Yes
Question 2 the raters were chosen at random. Were the raters chosen at random?
Yes No
Question 3
Are raw rating values ​​of individual raters or
mean values ​​of k different raters the data basis?
Single value Average Single value Average Single value Average
ICC type (Shrout & Fleiss) ICC (1.1) ICC (1, k ) ICC (2.1) ICC (2, k ) ICC (3.1) ICC (3, k )
ICC type (McGraw & Wong) ICC (1) ICC ( k ) ICC (A, 1) ICC (A, k ) ICC (C, 1) ICC (C, k )
SPSS model one-way random two-way random two-way mixed
single measure average measure single measure average measure single measure average measure

Another distinction that SPSS needs with the two-way model is whether the estimate should be adjusted or not . Adjusted and unadjusted refers to whether mean value differences between the raters (e.g. a strict vs. a mild rater) are calculated out of the error variance in the model or, as in the unadjusted model, are retained as part of the error variance (Wirtz & Caspar 2002 ). SPSS refers to the adjusted model as consistency and the unadjusted model as absolute agreement . The unadjusted model corresponds to the stricter test.

Other names for the different types of ICC go back to Bartko (1976). He designates the ICC (1,1) as ICC (1) and the ICC (1, k) as ICC (2) (see Bliese 2000).

calculation

The basic principle of calculation (i.e., the mathematical model) of the ICC is that of an analysis of variance ; Here, too, it is about the decomposition of variance components and their relationship. If

  • the number of guessers is
  • the number of measurement objects (cases),
  • the variance between the cases (= measurement objects, persons) (with ),
  • the variance within the cases (with ),
  • the variance between the raters (with ) and
  • the remaining variance (with ),

so:

.

literature

  • Asendorpf, J. & Wallbott, HG (1979): Measures of Observer Agreement : A Systematic Comparison. In: Zeitschrift für Sozialpsychologie, 10 , 243–252.
  • Bartko, JJ (1976). On various intraclass correlation reliability coefficients. In: Psychological Bulletin, 83 , 762-765.
  • Bliese, PD (2000). Within-group agreement, non-independence, and reliability: Implications for data aggregation and analysis. In: KJ Klein & SW Kozlowski (Eds.), Multilevel theory, research, and methods in organizations (pp. 349–381). San Francisco, CA: Jossey Bass.
  • Fleiss, JL and Cohen, J. (1973): The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. In: Educational and Psychological Measurement 33 , 613-619.
  • Müller, R. & Büttner, P. (1994): A critical discussion of intraclass correlation coefficients. In: Statistics in Medicine, 13 , 2465-2476.
  • McGraw, KO, & Wong, SP (1996): Forming inferences about some intraclass correlation coefficients. In: Psychological Methods, 1 , 30-46.
  • Shrout, PE & Fleiss, JL (1979): Intraclass correlation: Uses in assessing rater reliability. In: Psychological Bulletin, 86 , 420-428.
  • Wirtz, M. & Caspar, F. (2002): Assessment agreement and assessment reliability . Göttingen: Hogrefe.

Web links