Point bisiserial correlation

As a point-biserial correlation coefficient is correlation coefficient of the relationship between an interval scaled characteristic and a dichotomous ( Bernoulli distributed ) feature called. It is not an independent measure, but a special case of the usual Pearson correlation coefficient, which in this case can be calculated as ${\ displaystyle I}$ ${\ displaystyle D}$

{\ displaystyle \ rho = {\ frac {{\ overline {I}} _ {D = 1} - {\ overline {I}} _ {D = 0}} {\ sqrt {\ mathrm {QS} (I) }}} \ cdot {\ sqrt {n \ cdot p \ cdot q}}}

,

where the sum of squares , the sample size, the proportion of the examination units with the property recorded in D and the proportion of the examination units without the characteristic recorded in D denotes. ${\ displaystyle \ mathrm {QS}}$ ${\ displaystyle n}$ ${\ displaystyle p}$ ${\ displaystyle q}$

Derivation from the Pearson correlation

For the sake of simplicity, it is assumed that the dichotomous feature takes on the values 0 and 1, so that the mean value in is equal to . The correlation between and over is calculated according to the general formula ${\ displaystyle D}$ ${\ displaystyle D}$ ${\ displaystyle p}$ ${\ displaystyle I}$ ${\ displaystyle D}$

{\ displaystyle \ rho = {\ frac {\ sum _ {i = 1} ^ {n} (I_ {i} - {\ bar {I}}) (D_ {i} - {\ bar {D}}) } {\ sqrt {\ mathrm {QS} (I) \ cdot \ mathrm {QS} (D)}}}}

.

A distinction can now be made between cases: units of investigation are D = 1 and are above the mean value in D, the other units of investigation are D = 0 and are below the mean value in D. This applies ${\ displaystyle n \ cdot p}$ ${\ displaystyle 1-p = q}$ ${\ displaystyle n \ cdot q}$ ${\ displaystyle 0-p = -p}$

{\ displaystyle \ rho = {\ frac {n \ cdot p \ cdot ({\ bar {I}} _ {D = 1} - {\ bar {I}}) \ cdot q + n \ cdot q \ cdot ( {\ bar {I}} _ {D = 0} - {\ bar {I}}) \ cdot (-p)} {\ sqrt {\ mathrm {QS} (I) \ cdot (n \ cdot p \ cdot q ^ {2} + n \ cdot q \ cdot (-p) ^ {2})}}}}

,

what is about

{\ displaystyle \ rho = {\ frac {n \ cdot p \ cdot q \ cdot ({\ bar {I}} _ {D = 1} - {\ bar {I}} _ {D = 0})} { \ sqrt {\ mathrm {QS} (I) \ cdot (n \ cdot p \ cdot q)}}}}

can be simplified to the above equation.

Use in common statistics software

SPSS and R automatically use the point-to-point calculation method if the commands CORRELATEor cor, are cor.testrequested and one of the variables has only two characteristics (e.g. the values 0 and 1) that are also considered relevant to the calculation (−7 or 99 e.g. B. can be marked as missing values in SPSS and thus ignored).

literature

Jürgen Bortz: Statistics for human and social scientists. 6th edition. Springer, Berlin a. a. 2005, ISBN 3-540-21271-X .
J. Cohen, P. Cohen, SG West, LS Aiken: Applied Multiple Regression / Correlation Analysis For The Behavioral Sciences. London 2003, ISBN 0-8058-2223-2 .