Title: | Bayesian optimized likelihood-free inference on genetic data |
Author: | Sipola, Aleksi |
Contributor: | University of Helsinki, Faculty of Science |
Publisher: | Helsingin yliopisto |
Date: | 2020 |
Language: | eng |
URI: |
http://urn.fi/URN:NBN:fi:hulib-202012094759
http://hdl.handle.net/10138/322534 |
Thesis level: | master's thesis |
Discipline: | Soveltava matematiikka |
Abstract: | Most of the standard statistical inference methods rely on the evaluating so called likelihood functions. But in some cases the phenomenon of interest is too complex or the relevant data inapplicable and as a result the likelihood function cannot be evaluated. Such a situation blocks frequentist methods based on e.g. maximum likelihood estimation and Bayesian inference based on estimating posterior probabilities. Often still, the phenomenon of interest can be modeled with a generative model that describes supposed underlying processes and variables of interest. In such scenarios, likelihood-free inference, such as Approximate Bayesian Computation (ABC), can provide an option for overcoming the roadblock. Creating a simulator that implements such a generative model provides a way to explore the parameter space and approximate the likelihood function based on similarity between real world data and the data simulated with various parameter values. ABC provides well defined and studied framework for carrying out such simulation-based inference with Bayesian approach. ABC has been found useful for example in ecology, finance and astronomy, in situations where likelihood function is not practically computable but models and simulators for generating simulated data are available. One such problem is the estimation of recombination rates of bacterial populations from genetic data, which often is unsuitable for typical statistical methods due to infeasibly massive modeling and computation requirements. Overcoming these hindrances should provide valuable insight into evolution of bacteria and possibly aid in tackling significant challenges such as antimicrobial resistance. Still, ABC inference is not without its limitations either. Often considerable effort in defining distance functions, summary statistics and threshold for similarity is required to make the comparison mechanism successful. High computational costs can also be a hindrance in ABC inference; As increasingly complex phenomena and thus models are studied, the computations that are needed for sufficient exploration of parameter space with the simulation-comparison cycles can get too time- and resource-consuming. Thus efforts have been made to improve the efficiency of ABC inference. One improvement here has been the Bayesian Optimization for Likelihood-Free Inference algorithm (BOLFI), which provides efficient method to optimize the exploration of parameter space, reducing the amount of needed simulation-comparison cycles by up to several magnitudes. This thesis aims to describe some of the theoretical and applied aspects of the complete likelihood-free inference pipelines using both Rejection ABC and BOLFI methods. The thesis presents also use case where the neutral evolution recombination rate in Streptococcus pneumoniae population is inferred from well-studied real world genome data set. This inference task is used to provide context and concrete examples for the theoretical aspects, and demonstrations for numerous applied aspects. The implementations, experiments and acquired results are also discussed in some detail. |
Subject: |
Likelihood-free inference
Bayesian optimization Simulation Computational biology Genomics |
Files | Size | Format | View |
---|---|---|---|
There are no files associated with this item. |