Recovery method

from Wikipedia, the free encyclopedia

The retrieval method is a method of estimating the size of a population of animals or other individuals. A sample of the population to be measured is caught, marked and released again. Then a random sample is caught again and the total size is deduced from the proportion of animals marked in it. The recapture method is also known as the capture recapture or Petersen method. The Danish biostatistician C. G. J. Petersen proposed this method for the first time in 1896.

calculation

The population size N can be estimated as

It is M , the number previously marked individuals, n the number of individuals in the sample and m the number of selected individuals who were found in the sample. The procedure can be explained by the fact that the proportion of marked individuals in the sample should be as large as in the entire population:

If no marked individuals are found in the sample, no conclusion about the population size is possible.

conditions

To ensure that the result is not falsified, the following conditions must be met:

  • No new individuals are added between the application of the markings and the collection of the second sample.
  • The markings are not lost in the meantime (neither through detachment of the mark from individuals nor through migration of individuals).
  • The probability of being caught is the same for all individuals with or without a mark.

Mathematical derivation

It is assumed that the random variable “number of animals caught in the second sample” follows a hypergeometric distribution with the parameters (size of the total population), (number of marked individuals) and (size of the second sample). The probability of having precisely marked individuals in the sample is:

The binomial coefficient denotes "N over n". Since all values ​​are known from the samples, except that results from maximizing the function of the Petersen estimator (application of the maximum likelihood method ).

Other uses

The method can also be used to estimate the size of a population that is only partially known to two or more instances. Each instance draws a sample independently of the others. For example, the share of indexed documents for a search query on the WWW can be estimated using two search engines :

  1. Put the search query on the first search engine and note the M documents found.
  2. Put the search query on the second search engine. The number of documents found is n and the number of documents that were already found in the first search is m .

However, the process is less precise if the overlap between the search engines is generally greater.

The method was introduced in 1981 by Walther Umstätter and Margarete Rehm as part of bibliometrics . The retrieval method can be used, for example, to estimate how many articles or books have been published on a particular topic without having to search through all catalogs (see also publication bias ).

Individual evidence

  1. ^ C. G. J. Petersen: The yearly immigration of young plaice into the Limfjord from the German Sea . In: Report of the Danish Biological Station . tape 6 , 1896, pp. 5-84 .

literature

  • Walther Umstätter, Margarete Rehm: Introduction to literature documentation and information transfer . Saur Verlag, 1981. ISBN 3-598-10390-5