Approximate Bayesian Computation

Approximate Bayesian Computation ( Engl. , To dt. "Approximate Bayesian computation", abbreviated ABC ) is a class of computational methods in Bayesian statistics . In the model-based statistical inference is the likelihood function of central importance, since they are the probability of the observed Expresses data under a particular statistical model and thus quantifies the support that data parameters and models give. For simple models, an analytical formula for the likelihood function can often be derived. For more complex models, however, an analytical form can be difficult to find or difficult to evaluate.

ABC methods bypass the evaluation of the likelihood function. In this way, they expand the range of models for which statistical conclusions can be drawn. ABC methods are mathematically sound, but they make assumptions and approximations, the effects of which must be considered. In addition, the further area of application of ABC exacerbates the challenges in parameter estimation and model selection.

ABC has grown immensely in popularity in recent years, particularly for analyzing complex problems in population genetics , ecology , epidemiology, and systems biology .

history

The first ideas for ABC go back to the 1980s. Donald Rubin, when discussing the interpretation of Bayesian statements in 1984, described a hypothetical sampling mechanism that provides a sample of the posterior distribution . This scheme was more of a conceptual thought experiment to demonstrate what kind of manipulations are performed when inferring the a posteriori distributions of parameters. The description of the sampling mechanism is exactly the same as that of the ABC rejection scheme , and this article can be considered the first to describe the approximate Bayesian computation. However, as early as the late 1800s, Francis Galton constructed a two-stage quincunx that can be viewed as a physical implementation of an ABC rejection algorithm for a single unknown parameter and a single observation. Another forward-looking point was made by Rubin when he argued that applied statisticians should not only be satisfied with analytically manageable models for Bayesian inference, but instead consider computational methods that enable them to find the a posteriori distribution of interest appreciate. This way, a wider range of models can be considered. These arguments are particularly relevant in connection with ABC.

In 1984 Peter Diggle and Richard Gratton suggested using a systematic simulation scheme to approximate the likelihood function in situations where its analytical form is impractical. Their method was based on defining a grid in parameter space and using it to approximate the likelihood by running multiple simulations for each grid point. The approximation was then improved by applying smoothing techniques to the results of the simulations. While the idea of using simulations for hypothesis testing was not new, it appears that Diggle and Gratton introduced the first technique that uses simulation for statistical inference in circumstances where likelihood is inaccessible.

The Diggle and Gratton method was not exactly identical to what is now known as ABC, as it aimed at likelihood rather than a posteriori distribution. In an article by Simon Tavaré et al. an ABC algorithm for inferring the posterior distribution was proposed for the first time. Her pioneering work considered inference over the genealogy of DNA sequence data, and in particular the problem of determining the posterior distribution of time to the last common ancestor of the sample subjects. Such an inference is analytically insoluble for many demographic models, but the authors presented possibilities to simulate coalescence trees under the putative models. A sample of model parameters was obtained by accepting or rejecting proposals based on a comparison of the number of breakpoints in the synthetic and real data. This work was followed by an applied study modeling the variation of the human Y chromosome by Jonathan K. Pritchard et al. with the ABC method. Finally, the term Approximate Bayesian Computation (ABC) was developed by Mark Beaumont et al. introduced, which further expanded the ABC methodology and discussed the suitability of the ABC approach for problems in population genetics. Since then, ABC has expanded to applications outside of population genetics such as systems biology, epidemiology, and phylogeography .

method

motivation

The Bayes' theorem connects the conditional probability (or conditional probability density) of a particular parameter value given observed data with the probability of given via the control ${\ displaystyle \ theta}$ ${\ displaystyle D}$ ${\ displaystyle D}$ ${\ displaystyle \ theta}$

{\ displaystyle p (\ theta | D) = {\ frac {p (D | \ theta) p (\ theta)} {p (D)}}}

,

where the posterior probability (also known as posterior ) denotes the likelihood, the a priori distribution of (also known as prior), and the evidence (also known as marginal likelihood or prior-predictive probability). ${\ displaystyle p (\ theta | D)}$ ${\ displaystyle p (D | \ theta)}$ ${\ displaystyle p (\ theta)}$ ${\ displaystyle \ theta}$ ${\ displaystyle p (D)}$

The prior reflects beliefs about before is available and is often specified by choosing a particular distribution from a set of known and manageable families of distributions so that both evaluating the posterior and generating random ones are relatively simple. For certain types of models, it is more pragmatic to indicate the prior by using a factorization of the common distribution of all elements of over a sequence of their conditional distributions. If one is only interested in the relative posterior plausibility of different values of , the evidence can be ignored, since it represents a normalization constant which cancels out. MCMC methods use this, for example . However, it remains necessary to evaluate the likelihood and the prior . For many applications, however, it is computationally intensive or even impossible to evaluate the likelihood, which motivates the use of ABC to circumvent this problem. ${\ displaystyle \ theta}$ ${\ displaystyle D}$ ${\ displaystyle \ theta}$ ${\ displaystyle p (\ theta)}$ ${\ displaystyle \ theta}$ ${\ displaystyle \ theta}$ ${\ displaystyle p (D)}$ ${\ displaystyle p (D | \ theta)}$ ${\ displaystyle p (\ theta)}$

ABC rejection algorithm

All ABC-based methods approximate the likelihood function through simulations, the results of which are compared with the observed data. More precisely, the ABC Rejection Algorithm - the most basic form of ABC - first draws a set of parameters according to the a priori distribution. Given a randomly drawn parameter , a data set is then simulated, under the statistical model specified by . If the generated data is too different from the observed data , the parameter is discarded. More concrete is accepted with tolerance , though ${\ displaystyle {\ hat {\ theta}}}$ ${\ displaystyle {\ hat {D}}}$ ${\ displaystyle M}$ ${\ displaystyle {\ hat {\ theta}}}$ ${\ displaystyle {\ hat {D}}}$ ${\ displaystyle D}$ ${\ displaystyle {\ hat {D}}}$ ${\ displaystyle \ epsilon \ geq 0}$

{\ displaystyle \ rho ({\ hat {D}}, D) \ leq \ epsilon}

,

where the distance measure quantifies the discrepancy between and based on a given metric (e.g. Euclidean distance ). A strictly positive tolerance is usually necessary (except in discrete spaces), since the probability that the simulation result exactly matches the data (event ) is negligible for all except trivial applications of ABC, which in practice lead to the rejection of almost all sampled parameter points would. The result of the ABC rejection algorithm is a sample of parameter values that are approximately distributed according to the desired a posteriori distribution and, essentially, obtained without the explicit evaluation of the likelihood function. ${\ displaystyle \ rho ({\ hat {D}}, D)}$ ${\ displaystyle {\ hat {D}}}$ ${\ displaystyle D}$ ${\ displaystyle {\ hat {D}} = D}$

Summary statistics

The likelihood of generating a data set with a small spacing typically decreases exponentially as the dimensionality of the data increases. This leads to a significant reduction in the computational efficiency of the basic ABC rejection algorithm above. A common approach to alleviating this problem is to replace with a set of lower dimensional summary statistics which are selected to capture the relevant information . The acceptance criterion in the ABC rejection algorithm is replaced by ${\ displaystyle {\ hat {D}}}$ ${\ displaystyle D}$ ${\ displaystyle D}$ ${\ displaystyle S (D)}$ ${\ displaystyle D}$

{\ displaystyle \ rho (S ({\ hat {D}}), S (D)) \ leq \ epsilon}

.

If the summary statistics with regard to the model parameters are sufficient , the increase in efficiency achieved in this way does not lead to errors. By definition, sufficiency means that all information is captured in about from . ${\ displaystyle \ theta}$ ${\ displaystyle D}$ ${\ displaystyle \ theta}$ ${\ displaystyle S (D)}$

Typically, outside of the exponential family of distributions, it is impossible to identify a finite dimensional set of sufficient statistics. Nevertheless, informative, but potentially inadequate, summary statistics are often used in applications where inference is performed using ABC methods. Thus, there are two sources of approximation errors when using ABC to sample from the posterior distribution: through the acceptance tolerance parameter and through the summary statistics . ${\ displaystyle \ epsilon}$ ${\ displaystyle S (D)}$

ABC-MCMC and ABC-SMC algorithms

A disadvantage of the ABC rejection algorithm is that parameter values are drawn directly from the a priori distribution. For example, if the data are informative, the a posteriori distribution will be significantly more concentrated, so that many simulated data do not agree well with the observed data, and consequently the acceptance rate will be low. Therefore, it makes sense to draw the parameters from a distribution which is (adaptively) closer to the a posteriori distribution. There are essentially two ways to do this: Using a Markov Chain Monte Carlo Algorithm (MCMC) or a Sequential Monte Carlo Algorithm (SMC). Both have their respective strengths; SMC has established itself more strongly in the ABC context, as this scheme can be easily parallelized and enables an adaptive reduction in the acceptance tolerance parameter . An overview of the algorithms can be found in. ${\ displaystyle \ epsilon}$

software

There are different software implementations for ABC. The following is a non-exhaustive list of implementations:

pyABC
DIY ABC
abc R package
EasyABC R package
ABC-SysBio
ABCtoolbox ( Memento from February 19, 2013 in the web archive archive.today )
msBayes
PopABC
ONeSAMP
ABC4F
2BAD
ELFI

Individual evidence

↑ Sunnåker M, Busetto AG, Numminen E, Corander J, M Foll, Dessimoz C (2013). "Approximate Bayesian Computation". PLOS Computational Biology . 9 (1): e1002803. doi: 10.1371 / journal.pcbi.1002803 PMC 3547661 (free full text).
^ Rubin, DB (1984). "Bayesianly Justifiable and Relevant Freqency Calculations for the Applied Statistician". The Annals of Statistics . 12: 1151-1172. doi: 10.1214 / aos / 1176346785
↑ see figure 5 in Stigler, Stephen M. (2010). "Darwin, Galton and the Statistical Enlightenment". Journal of the Royal Statistical Society. Series A (Statistics in Society) . 173 (3): 469-482. doi: 10.1111 / j.1467-985X.2010.00643.x
^ Diggle, PJ (1984). "Monte Carlo Methods of Inference for Implicit Statistical Models". Journal of the Royal Statistical Society, Series B . 46: 193-227. JSTOR 2345504
↑ Bartlett, MS (1963). "The spectral analysis of point processes". Journal of the Royal Statistical Society, Series B . 25: 264-296. JSTOR 2984295
↑ Hoel, DG; Mitchell, TJ (1971). "The simulation, fitting and testing of a stochastic cellular proliferation model". Biometrics . 27: 191-199. JSTOR 2528937
↑ Tavaré, S; Balding, DJ; Griffiths, RC; Donnelly, P. (1997). "Inferring Coalescence Times from DNA Sequence Data". Genetics . 145 (2): 505-518. PMC 1207814 (free full text)
↑ Pritchard, JK; Seielstad, MT; Perez-Lezaun, A; et al. (1999). "Population Growth of Human Y Chromosomes: A Study of Y Chromosome Microsatellites". Molecular Biology and Evolution . 16: 1791-1798. doi: 10.1093 / oxfordjournals.molbev.a026091
↑ Beaumont, MA; Zhang, W; Balding, DJ (2002). "Approximate Bayesian Computation in Population Genetics". Genetics . 162: 2025-2035. PMC 1462356 (free full text)
↑ Busetto AG, Buhmann J (2009). "Stable Bayesian Parameter Estimation for Biological Dynamical Systems". IEEE Computer Society Press pp. 148-157. doi: 10.1109 / CSE.2009.134
↑ ^a ^b Beaumont, MA (2010). "Approximate Bayesian Computation in Evolution and Ecology". Annual Review of Ecology, Evolution, and Systematics . 41: 379-406. doi: 10.1146 / annurev-ecolsys-102209-144621
↑ Bertorelle, G; Benazzo, A; Mona, S (2010). "ABC as a flexible framework to estimate demography over space and time: some cons, many pros". Molecular Ecology . 19: 2609-2625. doi: 10.1111 / j.1365-294x.2010.04690.x
↑ Csilléry, K; Blum, MGB; Gaggiotti, OE; François, O (2010). "Approximate Bayesian Computation (ABC) in practice". Trends in Ecology & Evolution . 25: 410-418. doi: 10.1016 / j.tree.2010.04.001

[Sunnåker-1] Sunnåker M, Busetto AG, Numminen E, Corander J, M Foll, Dessimoz C (2013). "Approximate Bayesian Computation". PLOS Computational Biology . 9 (1): e1002803. doi: 10.1371 / journal.pcbi.1002803 PMC 3547661 (free full text).

[Rubin-2] Rubin, DB (1984). "Bayesianly Justifiable and Relevant Freqency Calculations for the Applied Statistician". The Annals of Statistics . 12: 1151-1172. doi: 10.1214 / aos / 1176346785

[Stigler2010-3] see figure 5 in Stigler, Stephen M. (2010). "Darwin, Galton and the Statistical Enlightenment". Journal of the Royal Statistical Society. Series A (Statistics in Society) . 173 (3): 469-482. doi: 10.1111 / j.1467-985X.2010.00643.x

[Diggle-4] Diggle, PJ (1984). "Monte Carlo Methods of Inference for Implicit Statistical Models". Journal of the Royal Statistical Society, Series B . 46: 193-227. JSTOR 2345504

[Bartlett63-5] Bartlett, MS (1963). "The spectral analysis of point processes". Journal of the Royal Statistical Society, Series B . 25: 264-296. JSTOR 2984295

[Hoel71-6] Hoel, DG; Mitchell, TJ (1971). "The simulation, fitting and testing of a stochastic cellular proliferation model". Biometrics . 27: 191-199. JSTOR 2528937

[Tavare-7] Tavaré, S; Balding, DJ; Griffiths, RC; Donnelly, P. (1997). "Inferring Coalescence Times from DNA Sequence Data". Genetics . 145 (2): 505-518. PMC 1207814 (free full text)

[Pritchard1999-8] Pritchard, JK; Seielstad, MT; Perez-Lezaun, A; et al. (1999). "Population Growth of Human Y Chromosomes: A Study of Y Chromosome Microsatellites". Molecular Biology and Evolution . 16: 1791-1798. doi: 10.1093 / oxfordjournals.molbev.a026091

[Beaumont2002-9] Beaumont, MA; Zhang, W; Balding, DJ (2002). "Approximate Bayesian Computation in Population Genetics". Genetics . 162: 2025-2035. PMC 1462356 (free full text)

[Busetto2009a-10] Busetto AG, Buhmann J (2009). "Stable Bayesian Parameter Estimation for Biological Dynamical Systems". IEEE Computer Society Press pp. 148-157. doi: 10.1109 / CSE.2009.134

[Beaumont2010-11] Beaumont, MA (2010). "Approximate Bayesian Computation in Evolution and Ecology". Annual Review of Ecology, Evolution, and Systematics . 41: 379-406. doi: 10.1146 / annurev-ecolsys-102209-144621

[Bertorelle-12] Bertorelle, G; Benazzo, A; Mona, S (2010). "ABC as a flexible framework to estimate demography over space and time: some cons, many pros". Molecular Ecology . 19: 2609-2625. doi: 10.1111 / j.1365-294x.2010.04690.x

[Csillery-13] Csilléry, K; Blum, MGB; Gaggiotti, OE; François, O (2010). "Approximate Bayesian Computation (ABC) in practice". Trends in Ecology & Evolution . 25: 410-418. doi: 10.1016 / j.tree.2010.04.001