# correlation

A correlation ( Medieval-. Correlatio for "correlation") describes a relationship between two or more features, states or functions. The relationship does not have to be a causal relationship : some elements of a system do not influence one another, or there is a stochastic relationship between them that is influenced by chance .

## description

A correlation as a measure of the connection should clarify two questions:

How strong is the connection?
The magnitude of the correlation measures is mostly in a range from zero (= no connection) to one (= strong connection). Looking at the hair and eye color of students, the corrected contingency coefficient is 0.55. Since this is in the middle range between zero and one, we have a medium-strong relationship.
If possible, which direction is the connection?
An example of a positive correlation (if more, then more) is: “More feed, thicker cows.” An example of a negative or anti- correlation (if more, then less) is: “More distance traveled by car, less fuel Tank."

Often there are saturation limits. Example: If I give more gas, my car goes faster (but not faster than its technical maximum speed). In many correlations in the economy, the following applies: marginal costs rise and marginal utility decrease.

What is the scaling of the variables involved in the correlation ?

The relevant scale level is important for determining the correlation coefficient. Depending on the scale pairing, a different correlation measure must be determined and interpreted differently, for example CramersV or Phi for nominal pairing, Spearman's rank correlation coefficient for ordinal pairing, and the product-moment correlation coefficient of Bravais and Pearson for the correlation of metrically (also cardinally ) scaled features .

## Correlation and causal relationship

 The articles Correlation # Correlation and causal connection , spurious correlation and Cum hoc ergo propter hoc overlap thematically. Help me to better differentiate or merge the articles (→  instructions ) . To do this, take part in the relevant redundancy discussion . Please remove this module only after the redundancy has been completely processed and do not forget to include the relevant entry on the redundancy discussion page{{ Done | 1 = ~~~~}}to mark. Xiooix ( discussion ) 15:58, Feb. 21, 2015 (CET)

### From the correlation to the causal relationship

However, a correlation does not describe a cause-effect relationship in one and / or the other direction, i.e. H. It does not follow from a strong connection that there is also a clear cause-effect relationship.

Examples:

• The fact that a lot of sunburns occur in summers with high ice cream sales does not mean that eating ice cream causes sunburn.
• There could well be a correlation between the decline in storks in Burgenland and a decline in the number of newborns. This correlation is not due to the fact that storks bring children, nor is the fact that storks are attracted to children. The connection would be much more indirect.
• People who laugh a lot regularly say in opinion polls that they are happier than others. Since these two phenomena always occur together, it is conceivable
• that happy people laugh more,
• that people who have a lot to laugh about become happier as a result,
• that there is no direct correlation at all, but that both laughter and happiness depended on what the weather was on the day the observations were made.

In the first two examples, the respective measured variables are causally related via a third variable . In the first case it is solar radiation that causes both ice cream sales and sunburn, in the second case urbanization that destroys nesting sites and leads to fewer children (see work-life balance ). Correlations of this kind are, somewhat ambiguously, called spurious correlations (actually, they 're spurious causalities ).

Correlations are often reported in the press in a way that suggests direct causality, even though there is a mixture of direct and indirect connections.

Examples of headlines conceivable alternative influencing factors and mechanisms of action Remarks
Immigrants are more likely to be criminals Theft, robbery and the like are often acts of desperation. Such despair arises, among other things, from financial poverty, low socio-economic status, unemployment and disorientation and a lack of support in family and friends, as is usually the case when moving and moving; Aggression is, among other things, a common symptom of post-traumatic disorders which, for example, are often associated with flight and displacement; Phenomenon of self-fulfilling prophecy : if a person repeatedly experiences that certain personality traits or behaviors are associated with their own visible characteristics (e.g. young man with dark skin with a specific linguistic accent) (e.g. lower professional success, higher potential for aggression / willingness to use violence) , Dishonesty, ...), this constant confrontation can lead to the increasing adaptation of precisely those behaviors / characteristics. Unfortunately, the attempt to “prove the opposite” to society and e.g. B. to be particularly successful in school final exams , due to the constant stress induced in this way and the additional cognitive load, often exactly to the opposite (and thus again clichéd expected) outcome (e.g. poor test result and thus reduced (professional) opportunities despite actually high intelligence) . Another problem in this specific example is that crime is often too poorly differentiated. For example, in the refugee crime statistics, fare dodging has been included again and again , which is often not (only) due to financial bottlenecks, but (also) due to inadequate instruction in the public transport system or the correct procedure for selecting the price level via the payment process for stamping. The language barrier that often exists is also a factor that should not be ignored. It is therefore questionable whether fare dodging and similar offenses, which are often due to only temporary / initial comprehension problems, should be included in the same comparative statistics on crime as serious crimes.
CO 2 explains near death experience After cardiac death, the CO 2 level in the blood rises rapidly, as the CO 2 -rich (and oxygen-poor) blood is no longer "replaced" by oxygen-rich (CO 2 -poor) blood via the pulmonary circulation and is transferred to the large bloodstream Organs (including the brain) can be transported. Irreversible brain death also occurs relatively quickly - if aid measures are not initiated in good time. In the phase in between (heart dead, not yet brain dead), many patients experience so-called near-death experiences. The fact that both an increase in the CO 2 level in the blood and the occurrence of near-death experiences can be observed over a similar period does not say anything about the cause. Alternatively, the decrease in oxygen concentration could be responsible, but near-death experiences may also exist in this time window completely independent of any physical changes occurring in parallel. A possible causality (attribution of cause) is difficult to check here, since both phenomena (CO 2 accumulation and near-death experience) inevitably (only) occur in this specific time window and are difficult to control experimentally.
Bigger people make more money Confounding variable (additional influencing factor) could be: self-confidence - higher self-confidence in physically taller people leads via intermediate steps to on average better paid jobs
Creative people have more sex In addition to the presumed specific and possibly more “attractive” personality traits for this occupational group, there are possible reasons: Occupations with flexible time management could enable a richer sex life; Fundamental satisfaction as an enabling factor of fulfilled sexuality depends perhaps on the satisfaction with the job and conviction of its meaningfulness, which is possibly more often the case in artistic professions, for which those working there often had to defy family and social resistance. Worst case scenario: In the case of this study, the title does not match the content or result of the study: The study stated that people who work full-time in the creative field have on average twice as many sexual partners as the “rest”. Against this background, the group dynamics and internal minority norms within many artist societies come into question as a further cause . Depending on the study, another factor to be viewed critically is the survey method and the limitation of the survey to a relatively small sample of a total of only 425 British people.
Happy people are healthier Physical and mental health or even the absence of illness can make a significant contribution to the subjective feeling of happiness. Here, too, a common (moderating) influencing variable can be the socio-economic status or the available financial resources and the level of education, which both influence the general feeling of security, stress factors and the responsibility transferred (professional and private, for yourself and others) as well as on eating habits.
Lowering unemployment requires strong economic growth Perhaps the opposite direction of causality: strengthening economic growth requires lower unemployment

In some cases, the assumed and possibly obvious causality (cause-effect structure) may actually be present, but the mere determination of a correlation never allows such a statement with certainty.

### From causal relationship to correlation

However, if there is actually a cause-and-effect relationship, then one expects a correlation between cause and effect. A correlation is assessed as an indication that two statistical variables could be causally related to one another .

This always works particularly well when both variables are related to one another through a “the more…” relationship ( proportionality ) and one of the variables depends solely on the other variable.

For example, it can be shown that grain thrives better under certain conditions if it is watered more. This knowledge is based on knowledge about the grain - for example through experience or scientific considerations. The correlation does not differentiate whether the water has a direct effect on the growth of the grain or whether it does not instead worsen the living conditions of a plant pest, which therefore hampers the growth of the grain less than before. A cause-effect relationship can only describe which side (here the water) has an effect (the growth of the grain). If there are several factors influencing the growth of the grain (for example the temperature, the nutrient content of the soil, the incident light, etc.), the amount of water is no longer the only explanation for the growth of the grain. The explanatory power is thus reduced. However, the correlation between the amount of water and the growth of the grain remains unchanged; it is an actual connection that cannot always be proven or fully described.

### False conclusions - cum hoc ergo propter hoc

The fallacy of correlation to causality is also known as cum hoc ergo propter hoc . In order to really establish causalities and to be able to define directions of causality, a substance-scientific consideration is fundamentally necessary. The question “why does noise in the house have a negative effect on the intelligence of children?” Can only be explained in this case by groups of people with the appropriate specialist knowledge, such as psychologists and environmental scientists .

To assess a hypothesis , for example, experiments would be necessary in which one factor is determined experimentally (e.g. the noise in the house ) and the other factor is measured (e.g. the intelligence of the children ). Such experiments would be evaluated using regression analysis or analysis of variance . A regression, on the other hand, describes the relationship but cannot explain it . Many such experiments are not feasible:

• too long duration and / or
• too high costs and / or
• unethical.

Due to their focus on people, only correlative studies, but mostly no experiments, can be ethically justified for many social science and medical questions. In order to be able to interpret correlation results as causal, further investigations are necessary (for example long-term relationships can be helpful; longitudinal studies are used for this ). Correlative studies are sometimes wrongly interpreted as experiments .

## Mathematical representation

In contrast to proportionality , the correlation is only a statistical relationship. The linear or monotonic relationship between two variables is often determined. This means in these cases that the correlation between and can be described by the equation ; is there is a positive correlation, when there is a negative correlation ago. From this property it follows that no estimation of is possible without knowing the parameters and . The parameters for the assumed linear relationship can be estimated using a linear regression . ${\ displaystyle x}$${\ displaystyle y}$${\ displaystyle y_ {i} = \ beta _ {1} + \ beta _ {2} \ x_ {i} + \ varepsilon _ {i}}$${\ displaystyle \ beta _ {2}> 0}$${\ displaystyle \ beta _ {2} <0}$${\ displaystyle y}$${\ displaystyle \ beta _ {1}}$${\ displaystyle \ beta _ {2}}$

The confusion of correlation and direct causal connection is promoted by the fact that mathematically very similar methods are used when calculating the correlation coefficients according to Pearson and in linear regression with an independent variable. The coefficient of determination is given in regression analyzes ; it is equal to the squared correlation coefficient and describes the explained variance of the simple regression model. This encourages the false assumption that the two methods with their respective possible interpretations are interchangeable. The correlation describes the strength of the relationship, while the regression measures an assumed causal direction of the relationship. ${\ displaystyle r_ {xy}}$ ${\ displaystyle R ^ {2}}$${\ displaystyle r_ {xy} ^ {2}}$

## Use in capital investments

The term correlation is of considerable importance in capital investments . The following applies: The lower the correlation between the individual investments, the lower the overall risk of the entire portfolio.

Example of positive correlation: If a portfolio only consists of many individual stocks, the decline in the price of stock 1 can also lead to a loss in value of stock 2 and also stock 3 in a certain ratio. If the portfolio consists of equities and half of bonds, the loss is smaller because there is only a slight correlation between equities and bonds.

However, there are also negative correlations, albeit smaller ones, e.g. B. regarding share pension. If the stock market is weak, there is a tendency to invest in bonds (capital flight into the safe haven ). The bond prices are rising. However, this does not compensate for the complete loss in the equity sector. It therefore makes sense to diversify into investments other than bonds and stocks. The risk reduction through diversification or investment into negatively correlated assets is called hedging . With ideal diversification, the correlation between the returns is negative (more precisely: −1).

According to the Markowitz model, reducing the correlation of the overall portfolio in relation to its individual investments improves the risk-return ratio. On a long-term basis, a higher return is achieved with a lower risk.

The correlation primarily makes statements about the direction of the course, e.g. B. from stock prices, but not about the extent of the respective change. From the positive correlation of a share of 0.8, for example, it cannot be calculated by how much the share price rises in the event of a 3% increase in the DAX . Nor does the correlation say whether the DAX affects the share or the share affects the DAX. The Capital Asset Pricing Model was developed for the analysis of securities , where the beta factor comes into play as an important key figure.