Factor analysis

The factor analysis or factor analysis is a multivariate statistical technique . It is used to infer a few underlying latent variables (“factors”) from empirical observations of many different manifest variables ( observables , statistical variables ) . The discovery of these mutually independent variables or characteristics is the point of the data-reducing (also dimension-reducing) method of factor analysis.

A distinction is made between exploratory and confirmatory factor analysis. The latter is an inference-statistical method and can be understood as a special case of a structural equation model.

background

history

The factor analysis was developed by the psychologist Charles Spearman for the evaluation of intelligence tests. In 1904 he showed that test results could be explained to a large extent by a one-dimensional personality trait , the general factor ( g-factor ). The generalization to multi-factor analysis is attributed to JC Maxwell Garnett (Steiger 1979); it was popularized in the 1940s by Louis Leon Thurstone .

Maximum likelihood estimation methods were proposed by Lawley and Victor Barnett in the 1930s and 1940s; a stable algorithm was developed in the 1960s by Gerhard Derflinger and Karl Gustav Jöreskog .

To this day, however, despite poor convergence properties , an iterative variant of the principal component analysis is also used for factor extraction. Inaccuracies up to the complete equation of factor and principal component analysis are widespread.

Applications

The factor analysis is a universally applicable tool to deduce the unobservable causes underlying these phenomena from the visible phenomena. For example, constructs such as “intelligence” or “ambition” cannot be measured, but they are seen as the cause of many behaviors. However, in order not to deliver erroneous results, the factor analysis requires at least an interval scale level for the data used . Social science data rarely achieve this level of scale and are mostly nominal or ordinal scaled .

Occasionally, factor analysis is also used for scientific problems. There are examples of the factor-analytical processing of sound signals (speech recognition) in which acoustic main factors are extracted. This makes it easier to understand language overlays (airport announcements, conference recordings) or overlaid music recordings ( blind source separation , independence analysis (ICA), see also web links ).

According to Markus Wirtz and Christof Nachtigall, factor analysis generally pursues three goals:

Reduction of the number of variables : The factor analysis recognizes groups of variables in which all variables capture similar information. If the variables are combined within each homogeneous group, the result is a more economical presentation of the overall information.

Determination of reliable measured variables : If the variables are combined into a factor, this factor has more favorable metrological properties than the individual variables.

Analytical objectives : The factor analysis makes it possible to infer superordinate latent variables (e.g. intelligence) from the manifest variables (the indicator variables ).

The exploratory factor analysis serves exclusively to explore hidden structures of a sample or to reduce dimensions. It is not suitable for checking existing theories. The confirmatory factor analysis is the appropriate procedure for this .

Mathematical framework

Geometric meaning

From a geometrical point of view, the items included in the calculation are seen as vectors that all originate from the same origin. The length of these p vectors is determined by the commonality of the respective items and the angles between the vectors are determined by their correlation . The correlation r of two items , and the angle between their vectors, are related together ${\ displaystyle x_ {i}}$ ${\ displaystyle x_ {j}}$ ${\ displaystyle \ alpha}$

{\ displaystyle {r} (x_ {i}, x_ {j}) = \ cos \ alpha}

A correlation of 1 therefore represents an angle of 0 °, whereas an uncorrelation represents a right angle . A model made up of p variables thus spans a p -dimensional space. The aim of the factor analysis is to simplify this construct geometrically, i.e. to find a q -dimensional subspace ( ). Irrelevant factors should be "masked out" by the extraction process. The solution to this procedure are so-called " point clouds " in a q -dimensional coordinate system. The coordinates of these points represent the so-called factor loadings. The q extracted factors should be rotated as closely as possible into these point clouds by means of a rotation process. ${\ displaystyle q <p}$

Linear factor model

The factor analysis is always based on a linear model:

{\ displaystyle x = \ mu + \ Gamma z + \ epsilon}

With

${\ displaystyle x}$ : Vector of the variable to be explained, ${\ displaystyle p}$
${\ displaystyle \ mu}$ : Vector with constant values,
${\ displaystyle \ Gamma}$ : Matrix of "factor loadings",
${\ displaystyle z}$ : Vector of factor values, ${\ displaystyle q}$
${\ displaystyle \ epsilon}$ : Random vector with expected value 0.

It is required that the components of are centered, normalized and uncorrelated with one another and with . ${\ displaystyle z}$ ${\ displaystyle \ epsilon}$

As a rule, it is also required that the components of are not correlated with one another. If this requirement is dropped, the model is invariant under orthogonal transformation of the , and . ${\ displaystyle \ epsilon}$ ${\ displaystyle \ Gamma}$ ${\ displaystyle z}$ ${\ displaystyle \ epsilon}$

The empirical data material consists of realizations of the variable vector (e.g. questionnaires with questions that were processed by test persons). To simplify the notation, it can be assumed that the raw data were centered in a first step of the evaluation, so that applies. ${\ displaystyle n}$ ${\ displaystyle x}$ ${\ displaystyle p}$ ${\ displaystyle n}$ ${\ displaystyle \ mu = 0}$

As part of a factor analysis, the following are to be estimated:

the number of factors

{\ displaystyle q}

the factor loadings ,

{\ displaystyle p \ times q}

{\ displaystyle \ Gamma}

the variances of residuals from , ${\ displaystyle p}$ ${\ displaystyle \ epsilon}$

the realizations of the factor vector .

{\ displaystyle n \ times q}

{\ displaystyle z}

The estimate is typically made in three or more steps:

Possible factors are identified (“extracted”);
it is decided what number of factors should be taken into account; ${\ displaystyle q}$
factors may be rotated to simplify their interpretation;
Finally, the factor vectors for the individual realizations of (e.g. personal values for individual subjects) are estimated. ${\ displaystyle z}$ ${\ displaystyle x}$

main clause

From the model assumptions, after a short calculation, the main theorem of factor analysis follows:

{\ displaystyle \ operatorname {Cov} \ left (x_ {i}, x_ {j} \ right) = \ left (\ Gamma \ Gamma ^ {\ top} \ right) _ {ij} + \ operatorname {Cov} \ left (\ epsilon _ {i}, \ epsilon _ {j} \ right).}

For this sentence simplifies to ${\ displaystyle i = j}$

{\ displaystyle \ operatorname {Var} \ left (x_ {i} \ right) = \ sum _ {k = 1} ^ {q} \ Gamma _ {ik} ^ {2} + \ operatorname {Var} \ left ( \ epsilon _ {i} \ right).}

Here, Var stands for the variance , for the covariance and for matrix transposition . ${\ displaystyle \ operatorname {Cov} (\ cdot)}$ ${\ displaystyle \ top}$

The term is that part of the variance of the observables that is not explained by the factor model . The declared part, i.e. the sum of the squared factor loads, is called the commonality of the variables . ${\ displaystyle \ operatorname {Var} (\ epsilon _ {i})}$ ${\ displaystyle x_ {i}}$ ${\ displaystyle \ operatorname {Var} (x_ {i}) - \ operatorname {Var} (\ epsilon _ {i})}$ ${\ displaystyle x_ {i}}$

example

A magnet with a vertical direction of action and a fan with a horizontal direction of action are installed in a garbage sorting plant to separate the garbage. The geometric coordinates of the pieces of garbage when falling down may be part of the data collected. Directional correlations are found for pieces without metal and high susceptibility to wind as well as pieces with metal content and low susceptibility to wind.

With the factor analysis you can first find that there are two orthogonal influences that influence the direction of movement.

The application of the examination method may then be

First, estimate the number of factors (see below): It is certainly not interesting to document the trajectory for each individual piece and to assume a separate factor for each piece, but rather to extract essential common factors from the correlations of the data: it is very likely that they will arise two factors from the data,

to determine the strength and orientation of these influences (still without a theory about the nature of the influences) or

From the knowledge of the piece properties (metallic, compact vs non-metallic, susceptible to wind) to describe the content of the factors and to describe the “charges” on the factors (their correlations with the magnetic force and the fan power) for the continuous properties “metal content” and “wind resistance” .

This example also makes the difference between orthogonal and oblique-angled factor analysis clear: especially in the social sciences, non-orthogonal factors are generally assumed: the social-scientific analogues of the fan and magnet in the example do not necessarily have to be arranged at an angle of 90 degrees to one another be and act accordingly.

In an exploratory situation, in which one does not yet have hypotheses about the reasons for the occurrence of correlated impact points, one will be satisfied with finding and marking two factors and try to narrow down what these directional correlations are due to. In a confirmatory situation one will investigate whether the correlations found can actually be explained with two factors (as perhaps to be assumed from a theory), or whether one must assume a third factor (or only one factor actually works).

Exploratory factor analysis

The exploratory factor analysis is carried out in four steps

Estimation of a correlation matrix or covariance matrix ,
Estimate of factor loadings,
Determining the number of factors and
Rotation of factor loadings to improve factor interpretation.

Factor extraction

The first step in factor analysis, the identification of possible factors, is the estimation of the factor loadings and the residual variances. A quality criterion is required for such an estimate. This essential theoretical basis is not clearly identified in large parts of the literature.

The “weight” of a factor is determined by how strongly the measurement variables correlate with it, i. H. how high they "load on this factor". This is quantified by the sum of the squares of the charge (in the orthogonal case this agrees with the eigenvalues of the charge matrix ). The factors can be sorted according to the amount of the sum of squares (LQS). ${\ displaystyle \ Gamma}$

If you find two groups of factors that can be easily separated, one with high LQS and another with low LQS, you will equate the number of factors in the model with the number of LQS-high factors. The separability of these groups can be seen on a line plot over the LQS; if there is a recognizable kink, this can serve as a separation criterion ( scree test ).

Another criterion is that the LQS of a common factor should be greater than the variance of an individual measurement variable (otherwise it would be difficult to understand it as a “common” factor). This then means i. d. R. LQS ≥ 1 ( Kaiser criterion ).

Principal axis method

In the principal axis method, the communalities are first estimated: Either as a coefficient of determination of the regression of the measured variables under consideration on all other measured variables or as the maximum of the amounts of the correlations of the measured variables under consideration with all other measured variables. Then an iterative procedure is carried out:

The variances of the residuals are estimated as the difference between the variance of the measurement variables and the corresponding commonality.
The eigenvalues and vectors are calculated for the reduced covariance matrix. In contrast to the covariance matrix, the reduced covariance matrix contains the communalities on the main diagonal .
The reproduced correlation matrix is calculated with the eigenvectors of the largest eigenvalues. The main diagonal of the reproduced correlation matrix gives a new estimate of the communalities. ${\ displaystyle q}$
The first three steps are repeated until the estimates of the loads, communalities, and variances of the residuals have stabilized.

With the principal axis method, the communalities and variances of the residuals are first estimated and then the eigenvalue decomposition is carried out. In the principal component analysis, the eigenvalue decomposition is carried out first and then the communalities and variances of the residuals are estimated. For the interpretation, this means that in the principal component analysis, the entire variance of a measured variable can be fully explained by the components, while in the principal axis method there is a portion of the variance of a measured variable that cannot be explained by the factors.

A disadvantage of the principal axis method is that in the course of the iteration process the variance of the residuals can become negative or greater than the variance of the measurement variables. The procedure is then terminated without any result.

Maximum likelihood estimation

The parameter estimation is on a safe basis if the Γ, the and the (not noted in the previous sections) μ is determined in such a way that they maximize the likelihood of the observed realizations of x . ${\ displaystyle \ zeta = \ operatorname {Var} (\ epsilon)}$ ${\ displaystyle L (x; \ mu, \ Gamma, \ zeta)}$

However, with this estimation procedure one has to make assumptions about the probability distribution of the manifest variable x , i.e. usually assume a normal distribution .

Determination of the number of factors

There are many factors involved in extraction, depending on the option and method. Few of them explain enough variance to justify their continued use. The selection of the factors is primarily used to obtain meaningful, easily interpretable results and can therefore only be objectified to a limited extent. The following criteria can provide clues:

Kaiser criterion
Scree test (also called the elbow criterion)
Parallel analysis (a modification of the scree test )

In principle, several criteria should be used. In case of doubt, in particular, it is advisable to calculate several factor numbers and check them with regard to charges and interpretability.

If the theory on which the investigation is based specifies a certain number of factors, this can also be used in the factor analysis. The examiner can also determine more or less arbitrarily which portion of the total variance is to be explained; the number of factors required for this is then derived from this. However, even with a theory-based or variance-based determination, the number of factors must be checked for plausibility using the criteria mentioned.

Factor rotation

The rotation is intended to make the content of the factors easier to interpret. Various methods are available, including:

orthogonal, d. H. the rotated factors are still uncorrelated,
- Varimax
- Quartimax
- Equamax
and oblique, d. H. the rotated factors are correlated,
- Obline
- Promax

These methods approach the rotation solution iteratively and usually require between 10 and 40 iterative calculations. The calculation is based on a correlation matrix .

Factor versus principal component analysis

Factor analysis and principal component analysis have a number of things in common:

Both methods serve to reduce dimensions.
Both methods are linear models between the components / factors and variables.
Both methods can be applied to both a covariance matrix and a correlation matrix.
Both methods often give similar results (if rotation is not used in the factor analysis).

However, there are also a number of differences:

Principal component analysis begins by finding a low-dimensional linear subspace that best describes the data . Since the subspace is linear, it can be described by a linear model. It is therefore a descriptive-exploratory process. The factor analysis is based on a linear model and tries to approximate the observed covariance or correlation matrix. It is therefore a model-based procedure.

In principal component analysis there is a clear ranking of the vectors, given by the descending eigenvalues of the covariance or correlation matrix. In the factor analysis, the dimension of the factor space is first determined and all vectors are equally important.

In principal component analysis , a p -dimensional random vector x is represented by a linear combination of random vectors that are chosen in such a way that the first summand explains as large a proportion as possible of the variance of x , the second summand as much as possible of the remaining variance, and so on. If you break off this sum after q terms, you get the representation of x ${\ displaystyle z_ {k}}$

{\ displaystyle x_ {i} = \ sum _ {k = 1} ^ {q} G_ {ik} z_ {k} + e_ {i}}

with the remainder

{\ displaystyle e_ {i} = \ sum _ {k = q + 1} ^ {p} G '_ {ik} z_ {k}}

.

At first glance, x looks like the linear model of factor analysis. However, the components of e are correlated with each other since they depend on the same . Since this violates the requirement of the factor analysis, a correct factor model is not obtained from a principal component analysis.

{\ displaystyle z_ {k}}

The principal component analysis only models the variances, but not the covariances of the x . The total variance , the optimality criterion of the principal component analysis, can be written as the sum of the distance between the observations and the mean of the observations. The exact arrangement of the observations in high-dimensional space, the linear part of which is described with the covariance or correlation matrix, does not matter.

literature

Dirk Revenstorf: Textbook of factor analysis. Kohlhammer, Stuttgart 1976, ISBN 3-17-001359-9 .
Karl Überla : Factor Analysis . Springer Verlag, Berlin 1968.

S. Mulaik: The foundations of factor analysis. 2nd ed., CRC Press, Boca Raton [et al. a.] 2010, ISBN 978-1-4200-9961-4 .

Klaus Backhaus et al .: Multivariate Analysis Methods. 14th edition, Springer Verlag, Heidelberg 2016, ISBN 978-3-662-46075-7 , pp. 385-452, doi : 10.1007 / 978-3-662-46076-4_8 .

WJ Krzanowski: Principles of Multivariate Analysis. A User's Perspective (rev. Ed. Reprint). Oxford [u. a.]: Oxford University Press 2008, ISBN 978-0-19-850708-6 .

James H. Steiger: Factor indeterminacy in the 1930's and the 1970's. Some interesting parallels. Psychometrika 44, 1979, 157-167, doi : 10.1007 / BF02293967 , ( online ).

Web links

Exploratory factor analysis - detailed method presentation

Individual evidence

↑ (Krzanowski, p. 487)
↑ Markus Wirtz and Christof Nachtigall: Descriptive Statistics. 3rd edition, Juventa Verlag, Weinheim 2004, p. 199 f.
↑ SPSS (2007), SPSS 16.0 Algorithms, SPSS Inc., Chicago, Illinois, p. 280.
↑ Krzanowski, WJ (2000). Principles of multivariate analysis: a user's perspective, p. 482

[1] (Krzanowski, p. 487)

[2] Markus Wirtz and Christof Nachtigall: Descriptive Statistics. 3rd edition, Juventa Verlag, Weinheim 2004, p. 199 f.

[3] SPSS (2007), SPSS 16.0 Algorithms, SPSS Inc., Chicago, Illinois, p. 280.

[4] Krzanowski, WJ (2000). Principles of multivariate analysis: a user's perspective, p. 482