Prevalence errors

from Wikipedia, the free encyclopedia

As base rate fallacy is called the error that occurs when determining the conditional probability of a random variable A under a condition B regardless of the prevalence or a priori probability of A is made. The prevalence describes the distribution of A over the population in question and is also known as the base rate . The prevalence error is therefore also referred to as base rate error , base rate disregard or base rate fallacy . For A and B events, but also properties come into question; the error is a general phenomenon in the interpretation of statistical correlation .

Calculation example

Based on Lindsey / Hertwig / Gigerenzer.

The assumption that a DNA test can unequivocally identify a person on the basis of traces is based on two prevalence errors.

Assumptions

In a crime, a trace of DNA is found on the victim. There are only a few other indications:

  1. In principle, 10 million people could be the originators of the trail. The probability that a person selected by chance from this ensemble is the author is .
  2. With a natural random spread of the DNA profile, around 10 of the 10 million people in question have a genetic fingerprint that is identical to the DNA profile of the trace on the victim, so this particular DNA profile has a prevalence of .
  3. The DNA test used should not produce false negative results. If a person has the DNA profile of the trace, the test will also find a match. If the conditional probability of is under the condition , then is , that is, under the condition that the DNA profile is present, the test is positive.
  4. But even with people who do not have this DNA profile, a match is determined anyway due to a small but unavoidable test inaccuracy of (this is a prevalence of 1 in 100,000).

How is it to be assessed if, as part of a DNA search, one person randomly selected out of the 10 million selected is tested positive?

consequences

  • If all 10 million people are tested, the test produces a false positive result in an average of 100 cases . This results from the conditional probability or the multiplication law of probability theory:

  • Of the 10 million people, a total of 110 would test positive in a DNA scan . The prevalence for the positive test is thus greater than the prevalence for the DNA profile .
  • But the result would only be correct in around an eleventh of the cases. Although the probability that the genetic fingerprint results in a positive test is even 1, the probability that a positive test is also followed by the DNA profile is ( ) according to Bayes' theorem and the calculated relative frequencies . If it is assumed that a positive test on a randomly selected person is a good predictor of a match with the DNA profile of the trace, an initial prevalence error is made in which and were mixed up.
  • The originator of the track is one of the 10 carriers of the DNA profile. So it applies , but at the same time . Although the originator of the trace has the DNA profile with certainty and would therefore test positive with certainty, under the given conditions the probability that any positive test case identifies the originator is quite small: of 110 positive test cases only 10 have the DNA Profile , and only one of these is the author:
.
  • In comparison, the probability that a person who randomly tested positive is not the originator of the trace is very high: 100 of 110 test cases are not bearers of the profile and 9 bearers, but not the originators:
.

evaluation

If it is claimed that a person who tested positive without initial suspicion would certainly also be the originator of the DNA trace, then there is a double prevalence error, because both the prevalence of the DNA profile and that of the marker that lead to positive test results leads to be overlooked. So and are confused with each other. While it is almost certain that the originator of the trail will test positive, it is unlikely that any positive test case is the originator. By swapping the high probabilities of and are suppressed.

In the example, the DNA test is unsuitable for incriminating an otherwise unsuspecting person. If there is already a suspicion on the basis of other circumstances that are unrelated to the trace, the test can confirm or dispel the suspicion. Its significance increases the smaller the population of possible authors becomes - in the example this is very large at 10 million - but only as long as it can be ensured that the author of the track is still included in the population. In order for an authorship to be inferred from a match, a group of people must first be found who objectively comes into question. In addition, it must be checked whether the group of perpetrators contains people with DNA markers who achieve the same comparison result. To further increase security, attempts can be made to reduce the error rate of the test.

Psychological results

Psychological experiments have shown that the assessment of the probability of a certain statistical variable is strongly biased and deviates from the prevalence if other properties of the case to be assessed were known beforehand, even if these have no predictive or explanatory value for the occurrence of A.

According to Daniel Kahneman and Amos Tversky , this finding can be explained by the representativeness heuristic . Richard Nisbett has argued that attribution errors such as B. the correspondence bias are based on the prevalence error: The complex probability of prevalence of behavior in a situation is ignored in favor of the simpler dispositional attribution .

See also

Literature and web links

Individual evidence

  1. ^ Samuel Lindsey, Ralph Hertwig, Gerd Gigerenzer: Communicating Statistical DNA Evidence. In: Jurimetrics. Volume 43, 2003, pp. 147-163, JSTOR 29762803 .
  2. A. Tversky, D. Kahneman: Availability: A heuristic for judging frequency and probability. In: Cognitive Psychology. Volume 42, 1973, pp. 207-232, doi: 10.1016 / 0010-0285 (73) 90033-9 .