Browsing by Subject "APPROXIMATE BAYESIAN COMPUTATION"

Sort by: Order: Results:

Now showing items 1-9 of 9
  • Sipola, Aleksi; Marttinen, Pekka; Corander, Jukka (2018)
    The advent of genomic data from densely sampled bacterial populations has created a need for flexible simulators by which models and hypotheses can be efficiently investigated in the light of empirical observations. Bacmeta provides fast stochastic simulation of neutral evolution within a large collection of interconnected bacterial populations with completely adjustable connectivity network. Stochastic events of mutations, recombinations, insertions/deletions, migrations and micro-epidemics can be simulated in discrete non-overlapping generations with a Wright-Fisher model that operates on explicit sequence data of any desired genome length. Each model component, including locus, bacterial strain, population and ultimately the whole metapopulation, is efficiently simulated using C++ objects and detailed metadata from each level can be acquired. The software can be executed in a cluster environment using simple textual input files, enabling, e.g. large-scale simulations and likelihood-free inference. Availability and implementation: Bacmeta is implemented with C++ for Linux, Mac and Windows. It is available at https://bitbucket.org/aleksisipola/bacmeta under the BSD 3-clause license. Contact: aleksi.sipola@helsinki.fi or jukka.corander@medisin.uio.no Supplementary information: Supplementary data are available at Bioinformatics online.
  • Gaia Collaboration; Luri, X.; Muinonen, K.; Fedorets, G.; Granvik, M.; Penttila, A.; Siltala, L. (2021)
    Context. This work is part of the Gaia Data Processing and Analysis Consortium papers published with the Gaia Early Data Release 3 (EDR3). It is one of the demonstration papers aiming to highlight the improvements and quality of the newly published data by applying them to a scientific case. Aims. We use the Gaia EDR3 data to study the structure and kinematics of the Magellanic Clouds. The large distance to the Clouds is a challenge for the Gaia astrometry. The Clouds lie at the very limits of the usability of the Gaia data, which makes the Clouds an excellent case study for evaluating the quality and properties of the Gaia data. Methods. The basis of our work are two samples selected to provide a representation as clean as possible of the stars of the Large Magellanic Cloud (LMC) and the Small Magellanic Cloud (SMC). The selection used criteria based on position, parallax, and proper motions to remove foreground contamination from the Milky Way, and allowed the separation of the stars of both Clouds. From these two samples we defined a series of subsamples based on cuts in the colour-magnitude diagram; these subsamples were used to select stars in a common evolutionary phase and can also be used as approximate proxies of a selection by age. Results. We compared the Gaia Data Release 2 and Gaia EDR3 performances in the study of the Magellanic Clouds and show the clear improvements in precision and accuracy in the new release. We also show that the systematics still present in the data make the determination of the 3D geometry of the LMC a difficult endeavour; this is at the very limit of the usefulness of the Gaia EDR3 astrometry, but it may become feasible with the use of additional external data. We derive radial and tangential velocity maps and global profiles for the LMC for the several subsamples we defined. To our knowledge, this is the first time that the two planar components of the ordered and random motions are derived for multiple stellar evolutionary phases in a galactic disc outside the Milky Way, showing the differences between younger and older phases. We also analyse the spatial structure and motions in the central region, the bar, and the disc, providing new insightsinto features and kinematics. Finally, we show that the Gaia EDR3 data allows clearly resolving the Magellanic Bridge, and we trace the density and velocity flow of the stars from the SMC towards the LMC not only globally, but also separately for young and evolved populations. This allows us to confirm an evolved population in the Bridge that is slightly shift from the younger population. Additionally, we were able to study the outskirts of both Magellanic Clouds, in which we detected some well-known features and indications of new ones.
  • Fountain, Toby; Duvaux, Ludovic; Horsburgh, Gavin; Reinhardt, Klaus; Butlin, Roger K. (2014)
  • Xu, Yingying; Puranen, Santeri; Corander, Jukka; Kabashima, Yoshiyuki (2018)
    We propose an efficient procedure for significance determination in high-dimensional dependence learning based on surrogate data testing, termed inverse finite-size scaling (IFSS). The IFSS method is based on our discovery of a universal scaling property of random matrices which enables inference about signal behavior from much smaller scale surrogate data than the dimensionality of the original data. As a motivating example, we demonstrate the procedure for ultra-high-dimensional Potts models with order of 1010 parameters. IFSS reduces the computational effort of the data-testing procedure by several orders of magnitude, making it very efficient for practical purposes. This approach thus holds considerable potential for generalization to other types of complex models.
  • Gutmann, Michael U.; Dutta, Ritabrata; Kaski, Samuel; Corander, Jukka (2018)
    Increasingly complex generative models are being used across disciplines as they allow for realistic characterization of data, but a common difficulty with them is the prohibitively large computational cost to evaluate the likelihood function and thus to perform likelihood-based statistical inference. A likelihood-free inference framework has emerged where the parameters are identified by finding values that yield simulated data resembling the observed data. While widely applicable, a major difficulty in this framework is how to measure the discrepancy between the simulated and observed data. Transforming the original problem into a problem of classifying the data into simulated versus observed, we find that classification accuracy can be used to assess the discrepancy. The complete arsenal of classification methods becomes thereby available for inference of intractable generative models. We validate our approach using theory and simulations for both point estimation and Bayesian inference, and demonstrate its use on real data by inferring an individual-based epidemiological model for bacterial infections in child care centers.
  • Simola, U.; Pelssers, B.; Barge, D.; Conrad, J.; Corander, J. (2019)
    Reconstructing the position of an interaction for any dual-phase time projection chamber (TPC) with the best precision is key to directly detecting Dark Matter. Using the likelihood-free framework, a newalgorithm to reconstruct the 2-D (x; y) position and the size of the charge signal (e) of an interaction is presented. The algorithm uses the secondary scintillation light distribution (S2) obtained by simulating events using a waveform generator. To deal with the computational effort required by the likelihood-free approach, we employ the Bayesian Optimization for LikelihoodFree Inference (BOLFI) algorithm. Together with BOLFI, prior distributions for the parameters of interest (x; y; e) and highly informative discrepancy measures to performthe analyses are introduced. We evaluate the quality of the proposed algorithm by a comparison against the currently existing alternative methods using a large-scale simulation study. BOLFI provides a natural probabilistic uncertainty measure for the reconstruction and it improved the accuracy of the reconstruction over the next best algorithm by up to 15% when focusing on events at large radii (R > 30 cm, the outer 37% of the detector). In addition, BOLFI provides the smallest uncertainties among all the tested methods.
  • Heidel-Fischer, Hanna M.; Vogel, Heiko; Heckel, David G.; Wheat, Christopher W. (2010)
  • Ramiadantsoa, Tanjona; Siren, Jukka; Hanski, Ilkka (2017)
    Phylogeny can provide information about the processes that have shaped extant diversity. Here, we complement existing comparative phylogenetic methods by developing a model that couples diversity-dependent diversification rate and range dynamics. Unlike many models, we used Approximate Bayesian Computation to fit the model to the data. We validated the inference by estimating known parameter values from simulated data, and found that within-region speciation and extinction rates cannot be simultaneously estimated most likely due to correlations among parameter values. Since the model can estimate a diversification rate, we applied the model to a monophyletic lineage of 74 species of dung beetles (Canthonini: Nanos and Apotolamprus) endemic to Madagascar. The estimated diversification rate is clearly higher in northern than in eastern or western Madagascar. The current species richness is highest in North where complex topography and a mixture of biomes likely favour ecological diversification. The approach we have developed here is a step towards examining weaknesses and strengths of phylogenetic comparative methods in an explicit spatial context. Further development and testing of the model is needed before its routine application to empirical data.
  • Laine, Anna-Liisa; Barres, Benoit; Numminen, Elina; Siren, Jukka P. (2019)
    Many pathogens possess the capacity for sex through outcrossing, despite being able to reproduce also asexually and/or via selfing. Given that sex is assumed to come at a cost, these mixed reproductive strategies typical of pathogens have remained puzzling. While the ecological and evolutionary benefits of outcrossing are theoretically well-supported, support for such benefits in pathogen populations are still scarce. Here, we analyze the epidemiology and genetic structure of natural populations of an obligate fungal pathogen, Podosphaera plantaginis. We find that the opportunities for outcrossing vary spatially. Populations supporting high levels of coinfection -a prerequisite of sex - result in hotspots of novel genetic diversity. Pathogen populations supporting coinfection also have a higher probability of surviving winter. Jointly our results show that outcrossing has direct epidemiological consequences as well as a major impact on pathogen population genetic diversity, thereby providing evidence of ecological and evolutionary benefits of outcrossing in pathogens.