Browsing by Subject "112 Statistics and probability"

Sort by: Order: Results:

Now showing items 21-40 of 157
  • Cheng, Lu; Connor, Thomas R.; Aanensen, David M.; Spratt, Brian G.; Corander, Jukka (2011)
  • Woolley, Skipton; Bax, Nicolas; Currie, Jock; Dunn, Daniel; Hansen, Cecilie; Hill, Nicole; O'Hara, Timothy; Ovaskainen, Otso; Sayre, Roger; Vanhatalo, Jarno; Dunstan, Piers (2020)
    Bioregions are important tools for understanding and managing natural resources. Bioregions should describe locations of relatively homogenous assemblages of species occur, enabling managers to better regulate activities that might affect these assemblages. Many existing bioregionalization approaches, which rely on expert-derived, Delphic comparisons or environmental surrogates, do not explicitly include observed biological data in such analyses. We highlight that, for bioregionalizations to be useful and reliable for systems scientists and managers, the bioregionalizations need to be based on biological data; to include an easily understood assessment of uncertainty, preferably in a spatial format matching the bioregions; and to be scientifically transparent and reproducible. Statistical models provide a scientifically robust, transparent, and interpretable approach for ensuring that bioregions are formed on the basis of observed biological and physical data. Using statistically derived bioregions provides a repeatable framework for the spatial representation of biodiversity at multiple spatial scales. This results in better-informed management decisions and biodiversity conservation outcomes.
  • Yli-Jyrä, Anssi Mikael (The Association for Computational Linguistics, 2017)
    A recently proposed encoding for noncrossing digraphs can be used to implement generic inference over families of these digraphs and to carry out first-order factored dependency parsing. It is now shown that the recent proposal can be substantially streamlined without information loss. The improved encoding is less dependent on hierarchical processing and it gives rise to a high-coverage bounded-depth approximation of the space of non- crossing digraphs. This subset is presented elegantly by a finite-state machine that recognizes an infinite set of encoded graphs. The set includes more than 99.99% of the 0.6 million noncrossing graphs obtained from the UDv2 treebanks through planarisation. Rather than taking the low probability of the residual as a flat rate, it can be modelled with a joint probability distribution that is factorised into two underlying stochastic processes – the sentence length distribution and the related conditional distribution for deep nesting. This model points out that deep nesting in the streamlined code requires extreme sentence lengths. High depth is categorically out in common sentence lengths but emerges slowly at infrequent lengths that prompt further inquiry.
  • Valkeapää, Annukka; Vehkalahti, Kimmo (2012)
    The purpose of forest policy is to enhance the sustainable production of benefits of forests to serve the needs of all citizens. Theory of system justification claims that low status groups are the most likely to support, defend and justify existing social systems. This study explores how various aspects of forest related competencies affect satisfaction with the political system and the desire to influence decision making. The effect of competence on system satisfaction and the desire to influence outcomes, is evaluated using survey data on Finnish citizens' attitudes on forest policy. The results were in line with system justification theory: Competence decreases system satisfaction and increases the desire to influence outcomes. The dissatisfaction with the system becomes possible only if people have adequate knowledge. Forestry competent people tend to be satisfied with the system, while people with conservation knowledge tend to be dissatisfied. The challenges to the inclusion of citizens' views in political processes are addressed.
  • Numminen, Elina; Chewapreecha, Claire; Turner, Claudia; Goldblatt, David; Nosten, Francois; Bentley, Stephen D.; Turner, Paul; Corander, Jukka (2015)
    Streptococcus pneumoniae is a significant human pathogen and a leading cause of infant mortality in developing countries. Considerable global variation in the pneumococcal carriage prevalence has been observed and the ecological factors contributing to it are not yet fully understood. We use data from a cohort of infants in Asia to study the effects of climatic conditions on both acquisition and clearance rates of the bacterium, finding significantly higher transmissibility during the cooler and drier months. Conversely, the length of a colonization period is unaffected by the season. Independent carriage data from studies conducted on the African and North American continents suggest similar effects of the climate on the prevalence of this bacterium, which further validates the obtained results. Further studies could be important to replicate the findings and explain the mechanistic role of cooler and dry air in the physiological response to nasopharyngeal acquisition of the pneumococcus.
  • Sihvonen, Leila M; Jalkanen, Kaisa; Huovinen, Elisa; Toivonen, Susanna; Corander, Jukka; Kuusi, Markku; Skurnik, Mikael; Siitonen, Anja; Haukka, Kaisa (2012)
  • Dudel, Christian; Myrskylä, Mikko (2020)
    Objectives: Little is known about the length of working life, even though it is a key indicator for policy-makers. In this paper, we study how the length of working life at age 50 has developed in the United States from a cohort perspective. Methods: We use a large longitudinal sample of U.S. Social Security register data that covers close to 1.7 million individuals of the cohorts born from 1920 to 1965. For all of these cohorts, we study the employment trajectories and working life expectancy (WLE) at age 50 by gender and nativity (native-born/foreign-born). For the cohorts with employment trajectories that are only incompletely observed, we borrow information from older cohorts to predict their WLE. Results: The length of working life has been increasing for the native-born males and females, and the younger cohorts worked longer than the older cohorts. However, WLE might soon peak, and then stall. The gap in WLE between the nativeborn and the foreign-born has increased over time, although latter group might be able to catch up in the coming years. Discussion: Our findings show that studying employment from a cohort perspective reveals crucial information about patterns of working life. The future development of the length of working life should be a major concern for policy-makers.
  • Eggeling, Ralf; Roos, Teemu Teppo; Myllymäki, Petri; Grosse, Ivo (2012)
    Parsimonious Markov models, a generalization of variable order Markov models, have been recently introduced for modeling biological sequences. Up to now, they have been learned by Bayesian approaches. However, there is not always sufficient prior knowledge available and a fully uninformative prior is difficult to define. In order to avoid cumbersome cross validation procedures for obtaining the optimal prior choice, we here adapt scoring criteria for Bayesian networks that approximate the Normalized Maximum Likelihood (NML) to parsimonious Markov models. We empirically compare their performance with the Bayesian approach by classifying splice sites, an important problem from computational biology.
  • Seshadri, Shreyas; Remes, Ulpu; Räsänen, Okko (ISCA, 2017)
  • Fischer, Daniel; Mosler, Karl; Mottonen, Jyrki; Nordhausen, Klaus; Pokotylo, Oleksii; Vogel, Daniel (2020)
    The Oja median is one of several extensions of the univariate median to the multivariate case. It has many desirable properties, but is computationally demanding. In this paper, we first review the properties of the Oja median and compare it to other multivariate medians. Then, we discuss four algorithms to compute the Oja median, which are implemented in our R package OjaNP. Besides these algorithms, the package contains also functions to compute Oja signs, Oja signed ranks, Oja ranks, and the related scatter concepts. To illustrate their use, the corresponding multivariate one- and C-sample location tests are implemented.
  • Méric, Guillaume; McNally, Alan; Pessia, Alberto; Mourkas, Evangelos; Pascoe, Ben; Mageiros, Leonardos; Vehkala, Minna Emilia; Corander, Jukka Ilmari; Shepard, Samuel K. (2018)
    Human infection with the gastrointestinal pathogen Campylobacter jejuni is dependent upon the opportunity for zoonotic transmission and the ability of strains to colonize the human host. Certain lineages of this diverse organism are more common in human infection but the factors underlying this overrepresentation are not fully understood. We analyzed 601 isolate genomes from agricultural animals and human clinical cases, including isolates from the multihost (ecological generalist) ST-21 and ST-45 clonal complexes (CCs). Combined nucleotide and amino acid sequence analysis identified 12 human-only amino acid KPAX clusters among polyphyletic lineages within the common disease causing CC21 group isolates, with no such clusters among CC45 isolates. Isolate sequence types within human-only CC21 group KPAX clusters have been sampled from other hosts, including poultry, so rather than representing unsampled reservoir hosts, the increase in relative frequency in human infection potentially reflects a genetic bottleneck at the point of human infection. Consistent with this, sequence enrichment analysis identified nucleotide variation in genes with putative functions related to human colonization and pathogenesis, in human-only clusters. Furthermore, the tight clustering and polyphyly of human-only lineage clusters within a single CC suggest the repeated evolution of human association through acquisition of genetic elements within this complex. Taken together, combined nucleotide and amino acid analysis of large isolate collections may provide clues about human niche tropism and the nature of the forces that promote the emergence of clinically important C. jejuni lineages.
  • Siivola, Eero; Vehtari, Aki; Vanhatalo, Jarno; Gonzalez, Javier; Andersen, Michael (IEEE, 2018)
    IEEE International Workshop on Machine Learning for Signal Processing
    Bayesian optimization (BO) is a global optimization strategy designed to find the minimum of an expensive black-box function, typically defined on a compact subset of ℛ d , by using a Gaussian process (GP) as a surrogate model for the objective. Although currently available acquisition functions address this goal with different degree of success, an over-exploration effect of the contour of the search space is typically observed. However, in problems like the configuration of machine learning algorithms, the function domain is conservatively large and with a high probability the global minimum does not sit on the boundary of the domain. We propose a method to incorporate this knowledge into the search process by adding virtual derivative observations in the GP at the boundary of the search space. We use the properties of GPs to impose conditions on the partial derivatives of the objective. The method is applicable with any acquisition function, it is easy to use and consistently reduces the number of evaluations required to optimize the objective irrespective of the acquisition used. We illustrate the benefits of our approach in an extensive experimental comparison.
  • Caro, Pedro; Helin, Tapio; Kujanpää, Antti; Lassas, Matti (2019)
    Scattering from a non-smooth random field on the time domain is studied for plane waves that propagate simultaneously through the potential in variable angles. We first derive sufficient conditions for stochastic moments of the field to be recovered from empirical correlations between amplitude measurements of the leading singularities, detected in the exterior of a region where the potential is almost surely supported. The result is then applied to show that if two sufficiently regular random fields yield the same correlations, they have identical laws as function-valued random variables.
  • Barrera Vargas, Gerardo; Pardo, Juan Carlos (2020)
    In this paper, we study the cut-off phenomenon under the total variation distance of d-dimensional Ornstein-Uhlenbeck processes which are driven by Lévy processes. That is to say, under the total variation distance, there is an abrupt convergence of the aforementioned process to its equilibrium, i.e. limiting distribution. Despite that the limiting distribution is not explicit, its distributional properties allow us to deduce that a profile function always exists in the reversible cases and it may exist in the non-reversible cases under suitable conditions on the limiting distribution. The cut-off phenomena for the average and superposition processes are also determined.
  • Barrera, Gerardo; Högele, Michael A.; Pardo, Juan C. (2021)
    This article establishes cutoff thermalization (also known as the cutoff phenomenon) for a class of generalized Ornstein-Uhlenbeck systems with small additive Lévy noise and any nonzero initial value.
  • Kotze, D. Johan; O'Hara, Robert B.; Lehvävirta, Susanna (2012)
  • Harju-Luukkainen, Heidi Katarina; Vettenranta, Jouni; Ouakrim-Soivio, Najat; Bernelius, Venla Helminna (2016)
  • Heikkila, Mikko; Lagerspetz, Eemil; Kaski, Samuel; Shimizu, Kana; Tarkoma, Sasu; Honkela, Antti (NEURAL INFORMATION PROCESSING SYSTEMS (NIPS), 2017)
    Advances in Neural Information Processing Systems
    Many applications of machine learning, for example in health care, would benefit from methods that can guarantee privacy of data subjects. Differential privacy (DP) has become established as a standard for protecting learning results. The standard DP algorithms require a single trusted party to have access to the entire data, which is a clear weakness, or add prohibitive amounts of noise. We consider DP Bayesian learning in a distributed setting, where each party only holds a single sample or a few samples of the data. We propose a learning strategy based on a secure multi-party sum function for aggregating summaries from data holders and the Gaussian mechanism for DP. Our method builds on an asymptotically optimal and practically efficient DP Bayesian inference with rapidly diminishing extra cost.
  • Heikkilä, Mikko; Jälkö, Joonas; Dikmen, Onur; Honkela, Antti (NEURAL INFORMATION PROCESSING SYSTEMS (NIPS), 2019)
    Advances in Neural Information Processing Systems