Browsing by Subject "Bayesian inference"

Sort by: Order: Results:

Now showing items 1-14 of 14
  • Solonen, Antti; Järvinen, Heikki (2013)
  • Trotsiuk, Volodymyr; Hartig, Florian; Cailleret, Maxime; Babst, Flurin; Forrester, David I.; Baltensweiler, Andri; Buchmann, Nina; Bugmann, Harald; Gessler, Arthur; Gharun, Mana; Minunno, Francesco; Rigling, Andreas; Rohner, Brigitte; Stillhard, Jonas; Thurig, Esther; Waldner, Peter; Ferretti, Marco; Eugster, Werner; Schaub, Marcus (2020)
    The response of forest productivity to climate extremes strongly depends on ambient environmental and site conditions. To better understand these relationships at a regional scale, we used nearly 800 observation years from 271 permanent long-term forest monitoring plots across Switzerland, obtained between 1980 and 2017. We assimilated these data into the 3-PG forest ecosystem model using Bayesian inference, reducing the bias of model predictions from 14% to 5% for forest stem carbon stocks and from 45% to 9% for stem carbon stock changes. We then estimated the productivity of forests dominated by Picea abies and Fagus sylvatica for the period of 1960-2018, and tested for productivity shifts in response to climate along elevational gradient and in extreme years. Simulated net primary productivity (NPP) decreased with elevation (2.86 +/- 0.006 Mg C ha(-1) year(-1) km(-1) for P. abies and 0.93 +/- 0.010 Mg C ha(-1) year(-1) km(-1) for F. sylvatica). During warm-dry extremes, simulated NPP for both species increased at higher and decreased at lower elevations, with reductions in NPP of more than 25% for up to 21% of the potential species distribution range in Switzerland. Reduced plant water availability had a stronger effect on NPP than temperature during warm-dry extremes. Importantly, cold-dry extremes had negative impacts on regional forest NPP comparable to warm-dry extremes. Overall, our calibrated model suggests that the response of forest productivity to climate extremes is more complex than simple shift toward higher elevation. Such robust estimates of NPP are key for increasing our understanding of forests ecosystems carbon dynamics under climate extremes.
  • Christopher, Solomon (2020)
    The study of how transmissible an infectious pathogen is and what its main routes of transmission are is key towards management and control of its spread. Some infections which begin with zoonotic or common-source transmission may additionally exhibit potential for direct person-to-person transmission. Methods to discern multiple transmission routes from observed outbreak datasets are thus essential. Features such as partial observation of the outbreak can make such inferences more challenging. This thesis presents a stochastic modelling framework to infer person-to-person transmission using data observed from a completed outbreak in a population of households. The model is specified hierarchically for the processes of transmission and observation. The transmission model specifies the process of acquiring infection from either the environment or infectious household members. This model is governed by two parameters, one for each source of transmission. While in continuous time they are characterised by transmission hazards, in discrete time they are characterised by escape probabilities. The observation model specifies the process of observation of outbreak based on symptom times and serological test results. The observation design is extended to address an ongoing outbreak with censored observation as well as to case-ascertained sampling where households are sampled based on index cases. The model and observation settings are motivated by the typical data from Hepatitis A virus (HAV) outbreaks. Partial observation of the infectious process is due to unobserved infection times, presence of asymptomatic infections and not-fully- sensitive serological test results. Individual-level latent variables are introduced in order to account for partial observation of the process. A data augmented Markov chain Monte Carlo (DA-MCMC) algorithm to estimate the transmission parameters by simultaneously sampling the latent variables is developed. A model comparison using deviance-information criteria (DIC) is formulated to test the presence of direct transmission, which is the primary aim in this thesis. In calculating DIC, the required computations utilise the DA-MCMC algorithm developed for the estimation procedures. \\ The inference methods are tested using simulated outbreak data based on a set of scenarios defined by varying the following: presence of direct transmission, sensitivity and specificity for observation of symptoms, values of the transmission parameters and household size distribution. Simulations are also used for understanding patterns in the distribution of household final sizes by varying the values of the transmission parameters. From the results using simulated outbreaks, DIC6 consistently indicates towards the correct model in almost all simulation scenarios and is robust across all the presented simulation scenarios. Also, the posterior estimates of the transmission parameters using DA- MCMC are fairly consistent with the values used in the simulation. The procedures presented in this thesis are for SEIR epidemic models wherein the latent period is shorter than the incubation period along with presence of asymptomatic infections. These procedures can be directly adapted to infections with similar or simpler natural history. The modelling framework is flexible and can be further extended to include components for vaccination and pathogen genetic sequence data.
  • Liu, Jia; Vanhatalo, Jarno (2020)
    In geostatistics, the spatiotemporal design for data collection is central for accurate prediction and parameter inference. An important class of geostatistical models is log-Gaussian Cox process (LGCP) but there are no formal analyses on spatial or spatiotemporal survey designs for them. In this work, we study traditional balanced and uniform random designs in situations where analyst has prior information on intensity function of LGCP and show that the traditional balanced and random designs are not efficient in such situations. We also propose a new design sampling method, a rejection sampling design, which extends the traditional balanced and random designs by directing survey sites to locations that are a priori expected to provide most information. We compare our proposal to the traditional balanced and uniform random designs using the expected average predictive variance (APV) loss and the expected Kullback-Leibler (KL) divergence between the prior and the posterior for the LGCP intensity function in simulation experiments and in a real world case study. The APV informs about expected accuracy of a survey design in point-wise predictions and the KL-divergence measures the expected gain in information about the joint distribution of the intensity field. The case study concerns planning a survey design for analyzing larval areas of two commercially important fish stocks on Finnish coastal region. Our experiments show that the designs generated by the proposed rejection sampling method clearly outperform the traditional balanced and uniform random survey designs. Moreover, the method is easily applicable to other models in general. (C) 2019 The Author(s). Published by Elsevier B.V.
  • Gutmann, Michael U.; Corander, Jukka (2016)
    Our paper deals with inferring simulator-based statistical models given some observed data. A simulator-based model is a parametrized mechanism which specifies how data are generated. It is thus also referred to as generative model. We assume that only a finite number of parameters are of interest and allow the generative process to be very general; it may be a noisy nonlinear dynamical system with an unrestricted number of hidden variables. This weak assumption is useful for devising realistic models but it renders statistical inference very difficult. The main challenge is the intractability of the likelihood function. Several likelihood-free inference methods have been proposed which share the basic idea of identifying the parameters by finding values for which the discrepancy between simulated and observed data is small. A major obstacle to using these methods is their computational cost. The cost is largely due to the need to repeatedly simulate data sets and the lack of knowledge about how the parameters affect the discrepancy. We propose a strategy which combines probabilistic modeling of the discrepancy with optimization to facilitate likelihood-free inference. The strategy is implemented using Bayesian optimization and is shown to accelerate the inference through a reduction in the number of required simulations by several orders of magnitude.
  • Länsman, Olá-Mihkku (Helsingin yliopisto, 2020)
    Demand forecasts are required for optimizing multiple challenges in the retail industry, and they can be used to reduce spoilage and excess inventory sizes. The classical forecasting methods provide point forecasts and do not quantify the uncertainty of the process. We evaluate multiple predictive posterior approximation methods with a Bayesian generalized linear model that captures weekly and yearly seasonality, changing trends and promotional effects. The model uses negative binomial as the sampling distribution because of the ability to scale the variance as a quadratic function of the mean. The forecasting methods provide highest posterior density intervals in different credible levels ranging from 50% to 95%. They are evaluated with proper scoring function and calculation of hit rates. We also measure the duration of the calculations as an important result due to the scalability requirements of the retail industry. The forecasting methods are Laplace approximation, Monte Carlo Markov Chain method, Automatic Differentiation Variational Inference, and maximum a posteriori inference. Our results show that the Markov Chain Monte Carlo method is too slow for practical use, while the rest of the approximation methods can be considered for practical use. We found out that Laplace approximation and Automatic Differentiation Variational Inference have results closer to the method with best analytical quarantees, the Markov Chain Monte Carlo method, suggesting that they were better approximations of the model. The model faced difficulties with highly promotional, slow selling, and intermittent data. Best fit was provided with high selling SKUs, for which the model provided intervals with hit rates that matched the levels of the credible intervals.
  • Lintusaari, Jarno; Gutmann, Michael U.; Dutta, Ritabrata; Kaski, Samuel; Corander, Jukka (2017)
    Bayesian inference plays an important role in phylogenetics, evolutionary biology, and in many other branches of science. It provides a principled framework for dealing with uncertainty and quantifying how it changes in the light of new evidence. For many complex models and inference problems, however, only approximate quantitative answers are obtainable. Approximate Bayesian computation (ABC) refers to a family of algorithms for approximate inference that makes a minimal set of assumptions by only requiring that sampling from a model is possible. We explain here the fundamentals of ABC, review the classical algorithms, and highlight recent developments.
  • Kulha, Niko; Pasanen, Leena; Aakala, Tuomas (2018)
    Time series of repeat aerial photographs currently span decades in many regions. However, the lack of calibration data limits their use in forest change analysis. We propose an approach where we combine repeat aerial photography, tree-ring reconstructions, and Bayesian inference to study changes in forests. Using stereopairs of aerial photographs from five boreal forest landscapes, we visually interpreted canopy cover in contiguous 0.1-ha cells at three time points during 1959-2011. We used tree-ring measurements to produce calibration data for the interpretation, and to quantify the bias and error associated with the interpretation. Then, we discerned credible canopy cover changes from the interpretation error noise using Bayesian inference. We underestimated canopy cover using the historical low-quality photographs, and overestimated it using the recent high-quality photographs. Further, due to differences in tree species composition and canopy cover in the cells, the interpretation bias varied between the landscapes. In addition, the random interpretation error varied between and within the landscapes. Due to the varying bias and error, the magnitude of credibly detectable canopy cover change in the 0.1-ha cells depended on the studied time interval and landscape, ranging from -10 to -18 percentage points (decrease), and from +10 to +19 percentage points (increase). Hence, changes occurring at stand scales were detectable, but smaller scale changes could not be separated from the error noise. Besides the abrupt changes, also slow continuous canopy cover changes could be detected with the proposed approach. Given the wide availability of historical aerial photographs, the proposed approach can be applied for forest change analysis in biomes where tree-rings form, while accounting for the bias and error in aerial photo interpretation.
  • Martino, L.; Elvira, V.; Luengo, D.; Corander, J. (2017)
    Monte Carlo methods represent the de facto standard for approximating complicated integrals involving multidimensional target distributions. In order to generate random realizations from the target distribution, Monte Carlo techniques use simpler proposal probability densities to draw candidate samples. The performance of any such method is strictly related to the specification of the proposal distribution, such that unfortunate choices easily wreak havoc on the resulting estimators. In this work, we introduce a layered (i.e., hierarchical) procedure to generate samples employed within a Monte Carlo scheme. This approach ensures that an appropriate equivalent proposal density is always obtained automatically (thus eliminating the risk of a catastrophic performance), although at the expense of a moderate increase in the complexity. Furthermore, we provide a general unified importance sampling (IS) framework, where multiple proposal densities are employed and several IS schemes are introduced by applying the so-called deterministic mixture approach. Finally, given these schemes, we also propose a novel class of adaptive importance samplers using a population of proposals, where the adaptation is driven by independent parallel or interacting Markov chain Monte Carlo (MCMC) chains. The resulting algorithms efficiently combine the benefits of both IS and MCMC methods.
  • Pensar, Johan; Nyman, Henrik; Niiranen, Juha; Corander, Jukka (2017)
    Markov networks are a popular tool for modeling multivariate distributions over a set of discrete variables. The core of the Markov network representation is an undirected graph which elegantly captures the dependence structure over the variables. Traditionally, the Bayesian approach of learning the graph structure from data has been done under the assumption of chordality since non-chordal graphs are difficult to evaluate for likelihood-based scores. Recently, there has been a surge of interest towards the use of regularized pseudo-likelihood methods as such approaches can avoid the assumption of chordality. Many of the currently available methods necessitate the use of a tuning parameter to adapt the level of regularization for a particular dataset. Here we introduce the marginal pseudo-likelihood which has a built-in regularization through marginalization over the graph-specific nuisance parameters. We prove consistency of the resulting graph estimator via comparison with the pseudo-Bayesian information criterion. To identify high-scoring graph structures in a high-dimensional setting we design a two-step algorithm that exploits the decomposable structure of the score. Using synthetic and existing benchmark networks, the marginal pseudo-likelihood method is shown to perform favorably against recent popular structure learning methods.
  • Aakala, Tuomas; Pasanen, Leena; Helama, Samuli; Vakkari, Ville; Drobyshev, Igor; Seppa, Heikki; Kuuluvainen, Timo; Stivrins, Normunds; Wallenius, Tuomo; Vasander, Harri; Holmstrom, Lasse (2018)
    Forest fires are a key disturbance in boreal forests, and characteristics of fire regimes are among the most important factors explaining the variation in forest structure and species composition. The occurrence of fire is connected with climate, but earlier, mostly local-scale studies in the northern European boreal forests have provided little insight into fire-climate relationship before the modern fire suppression period. Here, we compiled annually resolved fire history, temperature, and precipitation reconstructions from eastern Fennoscandia from the mid-16th century to the end of the 19th century, a period of strong human influence on fires. We used synchrony of fires over the network of 25 fire history reconstructions as a measure of climatic forcing on fires. We examined the relationship between fire occurrence and climate (summer temperature, precipitation, and a drought index summarizing the influence of variability in temperature and precipitation) across temporal scales, using a scale space multiresolution correlation approach and Bayesian inference that accounts for the annually varying uncertainties in climate reconstructions. At the annual scale, fires were synchronized during summers with low precipitation, and most clearly during drought summers. A scale-derivative analysis revealed that fire synchrony and climate varied at similar, roughly decadal scales. Climatic variables and fire synchrony showed varying correlation strength and credibility, depending on the climate variable and the time period. In particular, precipitation emerged as a credible determinant of fire synchrony also at these time scales, despite the large uncertainties in precipitation reconstruction. The findings explain why fire occurrence can be high during cold periods (such as from the mid-17th to early-18th century), and stresses the notion that future fire frequency will likely depend to a greater extent on changes in precipitation than temperature alone. We showed, for the first time, the importance of climate as a decadal-scale driver of forest fires in the European boreal forests, discernible even during a period of strong human influence on fire occurrence. The fire regime responded both to anomalously dry summers, but also to decadal-scale climate changes, demonstrating how climatic variability has shaped the disturbance regimes in the northern European boreal forests over various time scales.
  • Jälkö, Joonas (Helsingfors universitet, 2017)
    This thesis focuses on privacy-preserving statistical inference. We use a probabilistic point of view of privacy called differential privacy. Differential privacy ensures that replacing one individual from the dataset with another individual does not affect the results drastically. There are different versions of the differential privacy. This thesis considers the ε-differential privacy also known as the pure differential privacy, and also a relaxation known as the (ε, δ)-differential privacy. We state several important definitions and theorems of DP. The proofs for most of the theorems are given in this thesis. Our goal is to build a general framework for privacy preserving posterior inference. To achieve this we use an approximative approach for posterior inference called variational Bayesian (VB) methods. We build the basic concepts of variational inference with certain detail and show examples on how to apply variational inference. After giving the prerequisites on both DP and VB we state our main result, the differentially private variational inference (DPVI) method. We use a recently proposed doubly stochastic variational inference (DSVI) combined with Gaussian mechanism to build a privacy-preserving method for posterior inference. We give the algorithm definition and explain its parameters. The DPVI method is compared against the state-of-the-art method for DP posterior inference called the differentially private stochastic gradient Langevin dynamics (DP-SGLD). We compare the performance on two different models, the logistic regression model and the Gaussian mixture model. The DPVI method outperforms DP-SGLD in both tasks.
  • Vanhatalo, Jarno; Huuhtanen, Juri; Bergström, Martin; Helle, Inari; Mäkinen, Jussi Antti-Eerikki; Kujala, Pentti (2021)
    Ships operating in ice-infested Arctic waters are exposed to a range of ship-ice interaction related hazards. One of the most dangerous of these is the possibility of a ship becoming beset in ice, meaning that a ship is surrounded by ice preventing it from maneuvering under its own power. Such a besetting event may not only result in severe operational disruption, but also expose a ship to severe ice loading or cause it to drift towards shallow water. This may cause significant structural damage to a ship and potentially jeopardize its safety. To support safe and sustainable Arctic shipping operations, this article presents a probabilistic approach to assess the probability of a ship becoming beset in ice. To this end, the proposed approach combines different types of data, including Automatic Identification System (AIS) data, satellite ice data, as well as data on real-life ship besetting events. Based on this data, using a hierarchical Bayesian model, the proposed approach calculates the probability of a besetting event as a function of the Polar Ship Category of a ship, sea area, and the distance travelled in the prevailing ice concentration. The utility of the proposed approach, e.g. in supporting spatiotemporal risk assessments of Arctic shipping activities as well as Arctic voyage planning, is demonstrated through a case study in which the approach is applied to ships operating in the Northern Sea Route (NSR) area. The outcomes of the case study indicate that the probability of besetting is strongly dependent on the Polar Ship Category of a ship and that the probability increases significantly with higher ice concentrations. The sea area, on the other hand, does not appear to significantly affect the probability of besetting.
  • Kulha, Niko; Pasanen, Leena; Holmström, Lasse; Grandpre, Louis de; Gauthier, Sylvie; Kuuluvainen, Timo; Aakala, Tuomas (2020)
    Context: Changes in the structure of boreal old-growth forests are typically studied at a specific spatial scale. Consequently, little is known about forest development across different spatial scales. Objectives: We investigated how and at what spatial scales forest structure changed over several decades in three 4 km² boreal old-growth forests landscapes in northeastern Finland and two in Quebec, Canada. Methods: We used canopy cover values visually interpreted to 0.1-ha grid cells from aerial photographs taken at three time points between the years 1959 and 2011, and error distributions quantified for the interpretation. We identified the spatial scales at which canopy cover changed between the time points, and examined the credibility of changes at these scales using the error distributions in Bayesian inference. Results: Canopy cover changed at three to four spatial scales, the number of scales depending on the studied landscape and time interval. At large scales (15.4–321.7 ha), canopy cover increased in Finland during all time intervals. In Quebec, the direction of the large-scale change varied between the studied time intervals, owing to the occurrence of an insect outbreak and a consequent recovery. However, parts of these landscapes also showed canopy cover increase. Superimposed on the large-scale developments, canopy cover changed variably at smaller scales (1.3–2.8-ha and 0.1-ha). Conclusions: Our findings support the idea that the structure of boreal old-growth forests changes at discernible spatial scales. Instead of being driven by gap dynamics, the old-growth forests in the studied regions are currently reacting to large-scale drivers by an increase in canopy cover.