# Browsing by Subject "Magisterprogrammet i matematik och statistik"

Sort by: Order: Results:

Now showing items 21-40 of 40
• (Helsingin yliopisto, 2020)
In insurance and reinsurance, heavy-tail analysis is used to model insurance claim sizes and frequencies in order to quantify the risk to the insurance company and to set appropriate premium rates. One of the reasons for this application comes from the fact that excess claims covered by reinsurance companies are very large, and so a natural field for heavy-tail analysis. In finance, the multivariate returns process often exhibits heavy-tail marginal distributions with little or no correlation between the components of the random vector (even though it is a highly correlated process when taking the square or the absolute values of the returns). The fact that vectors which are considered independent by conventional standards may still exhibit dependence of large realizations leads to the use of techniques from classical extreme-value theory, that contains heavy-tail analysis, in estimating an extreme quantile of the profit-and-loss density called value-at-risk (VaR). The need of the industry to understand the dependence between random vectors for very large values, as exemplified above, makes the concept of multivariate regular variation a current topic of great interest. This thesis discusses multivariate regular variation, showing that, by having multiple equivalent characterizations and and by being quite easy to handle, it is an excellent tool to address the real-world issues raised previously. The thesis is structured as follows. At first, some mathematical background is covered: the notions of regular variation of a tail distribution in one dimension is introduced, as well as different concepts of convergence of probability measures, namely vague convergence and $\mathbb{M}^*$-convergence. The preference in using the latter over the former is briefly discussed. The thesis then proceeds to the main definition of this work, that of multivariate regular variation, which involves a limit measure and a scaling function. It is shown that multivariate regular variation can be expressed in polar coordinates, by replacing the limit measure with a product of a one-dimensional measure with a tail index and a spectral measure. Looking for a second source of regular variation leads to the concept of hidden regular variation, to which a new hidden limit measure is associated. Estimation of the tail index, the spectral measure and the support of the limit measure are next considered. Some examples of risk vectors are next analyzed, such as risk vectors with independent components and risk vectors with repeated components. The support estimator presented earlier is then computed in some examples with simulated data to display its efficiency. However, when the estimator is computed with real-life data (the value of stocks for different companies), it does not seem to suit the sample in an adequate way. The conclusion is drawn that, although the mathematical background for the theory is quite solid, more research needs to be done when applying it to real-life data, namely having a reliable way to check whether the data stems from a multivariate regular distribution, as well as identifying the support of the limit measure.
• (Helsingin yliopisto, 2020)
Työni aihe on Gaussisten prosessien (Gp) soveltaminen aikasarjojen analysointiin. Erityisesti lähestyn aikasarjojen analysointia verrattain harvinaisen sovellusalan, historiallisten aikasarja-aineistojen analysoinnin näkökulmasta. Bayesilaisuus on tärkeä osa työtä: parametreja itsessään kohdellaan satunnaismuuttujina, mikä vaikuttaa sekä mallinnusongelmien muotoiluun että uusien ennusteiden tekemiseen työssä esitellyillä malleilla. Työni rakentuu paloittain. Ensin esittelen Gp:t yleisellä tasolla, tilastollisen mallinnuksen työkaluna. Gp:eiden keskeinen idea on, että Gp-prosessin äärelliset osajoukot noudattavat multinormaalijakaumaa, ja havaintojen välisiä yhteyksiä mallinnetaan ydinfunktiolla (kernel), joka samaistaa havaintoja niihin liittyvien selittäjien ja parametriensa funktiona. Oikeanlaisen ydinfunktion valinta ja datan suhteen optimoidut parametrit mahdollistavat hyvinkin monimutkaisten ja heikosti ymmärrettyjen ilmiöiden mallintamisen Gp:llä. Esittelen keskeiset tulokset, jotka mahdollistavat sekä GP:n sovittamisen aineistoon että sen käytön ennusteiden tekemiseen ja mallinnetun ilmiön alatrendien erittelyyn. Näiden perusteiden jälkeen siirryn käsittelemään sitä, miten GP-malli formalisoidaan ja sovitetaan, kun lähestymistapa on Bayesilainen. Käsittelen sekä eri sovittamistapojen vahvuuksia ja heikkouksia, että mahdollisuutta liittää Gp osaksi laajempaa tilastollista mallia. Bayesilainen lähestymistapa mahdollistaa mallinnettua ilmiötä koskevan ennakkotiedon syöttämisen osaksi mallin formalismia parametrien priorijakaumien muodossa. Lisäksi se tarjoaa systemaattisen, todennäköisyyksiin perustuvan tavan puhua sekä ennakko-oletuksista että datan jälkeisistä parametreihin ja mallinnetun ilmiön tuleviin arvoihin liittyvistä uskomuksista. Seuraava luku käsittelee aikasarjoihin erityisesti liittyviä Gp-mallintamisen tekniikoita. Erityisesti käsittelen kolmea erilaista mallinnustilannetta: ajassa tapahtuvan Gp:n muutoksen, useammasta eri alaprosessista koostuvan Gp:n ja useamman keskenään korreloivan Gp:n mallintamista. Tämän käsittelyn jälkeen työn teoreettinen osuus on valmis: aikasarjojen konkreettinen analysointi työssä esitellyillä työkaluilla on mahdollista. Viimeinen luku käsittelee historiallisten ilmiöiden mallintamista aiemmissa luvuissa esitellyillä tekniikoilla. Luvun tarkoitus on ensisijaisesti esitellä lyhyesti useampi potentiaalinen sovelluskohde, joita on yhteensä kolme. Ensimmäinen luvussa käsitelty mahdollisuus on usein vain repalaisesti havaintoja sisältävien historiallisten aikasarja-aineistojen täydentäminen GP-malleista saatavilla ennusteilla. Käytännön tulokset korostivat tarvetta vahvoille prioreille, sillä historialliset aikasarjat ovat usein niin harvoja, että mallit ovat valmiita hylkäämän havaintojen merkityksen ennustamisessa. Toinen esimerkki käsittelee historiallisia muutoskohtia, esimerkkitapaus on Englannin sisällissodan aikana äkillisesti räjähtävä painotuotteiden määrä 1640-luvun alussa. Sovitettu malli onnistuu päättelemään sisällissodan alkamisen ajankohdan. Viimeisessä esimerkissä mallinnan painotuotteiden määrää per henkilö varhaismodernissa Englannissa, käyttäen ajan sijaan selittäjinä muita ajassa kehittyviä muuttujia (esim. urbanisaation aste), jotka tulkitaan alaprosesseiksi. Tämänkin esimerkin tekninen toteutus onnistui, mikä kannustaa sekä tilastollisesti että historiallisesti kattavampaan analyysiin. Kokonaisuutena työni sekä esittelee että demonstroi Gp-lähestymistavan mahdollisuuksia aikasarjojen analysoinnissa. Erityisesti viimeinen luku kannustaa jatkokehitykseen historiallisten ilmiöiden mallintamisen uudella sovellusalalla.
• (Helsingin yliopisto, 2021)
This thesis surveys the vast landscape of uncertainty principles of the Fourier transform. The research of these uncertainty principles began in the mid 1920’s following a seminal lecture by Wiener, where he first gave the remark that condenses the idea of uncertainty principles: "A function and its Fourier transform cannot be simultaneously arbitrarily small". In this thesis we examine some of the most remarkable classical results where different interpretations of smallness is applied. Also more modern results and links to active fields of research are presented.We make great effort to give an extensive list of references to build a good broad understanding of the subject matter.Chapter 2 gives the reader a sufficient basic theory to understand the contents of this thesis. First we talk about Hilbert spaces and the Fourier transform. Since they are very central concepts in this thesis, we try to make sure that the reader can get a proper understanding of these subjects from our description of them. Next, we study Sobolev spaces and especially the regularity properties of Sobolev functions. After briefly looking at tempered distributions we conclude the chapter by presenting the most famous of all uncertainty principles, Heisenberg’s uncertainty principle.In chapter 3 we examine how the rate of decay of a function affects the rate of decay of its Fourier transform. This is the most historically significant form of the uncertainty principle and therefore many classical results are presented, most importantly the ones by Hardy and Beurling. In 2012 Hedenmalm gave a beautiful new proof to the result of Beurling. We present the proof after which we briefly talk about the Gaussian function and how it acts as the extremal case of many of the mentioned results.In chapter 4 we study how the support of a function affects the support and regularity of its Fourier transform. The magnificent result by Benedicks and the results following it work as the focal point of this chapter but we also briefly talk about the Gap problem, a classical problem with recent developments.Chapter 5 links density based uncertainty principle to Fourier quasicrystals, a very active field of re-search. We follow the unpublished work of Kulikov-Nazarov-Sodin where first an uncertainty principle is given, after which a formula for generating Fourier quasicrystals, where a density condition from the uncertainty principle is used, is proved. We end by comparing this formula to other recent formulas generating quasicrystals.
• (Helsingin yliopisto, 2021)
The nonlinear Schrödinger equation is a partial differential equation with applications in optics and plasma physics. It models the propagation of waves in presence of dispersion. In this thesis, we will present the solution theory of the equation on a circle, following Jean Bourgain’s work in the 1990s. The same techniques can be applied in higher dimensions and with other similar equations. The NLS equation can be solved in the general framework of evolution equations using a fixed-point method. This method yields well-posedness and growth bounds both in the usual L^2 space and certain fractional-order Sobolev spaces. The difficult part is achieving good enough bounds on the nonlinear term. These so-called Strichartz estimates involve precise Fourier analysis in the form of dyadic decompositions and multiplier estimates. Before delving into the solution theory, we will present the required analytical tools, chiefly related to the Fourier transform. This chapter also describes the complete solution theory of the linear equation and illustrates differences between unbounded and periodic domains. Additionally, we develop an invariant measure for the equation. Invariant measures are relevant in statistical physics as they lead to useful averaging properties. We prove that the Gibbs measure related to the equation is invariant. This measure is based on a Gaussian measure on the relevant function space, the construction and properties of which we briefly explain.
• (Helsingin yliopisto, 2021)
The topological data analysis studies the shape of a space at multiple scales. Its main tool is persistent homology, which is based on other homology theory, usually simplicial homology. Simplicial homology applies to finite data in real space, and thus it is mainly used in applications. This thesis aims to introduce the theories behind persistent homology and its application, image completion algorithm. Persistent homology is motivated by the question of which scale is the most essential to study data shape. A filtration contains all scales we want to explore, and thus it is an essential tool of persistent homology. The thesis focuses on forming a filtaration from a Delaunay triangulation and its subcomplexes, alpha-complexes. We will found that these provide sufficient tools to consider homology classes birth and deaths, but they are not particularly easy to use in practice. This observation motivates to define a regional complement of the dual alpha graph. We found that its components' and essential homology classes' birth and death times correspond. The algorithm utilize this observation to complete images. The results are good and mainly as could be expected. We discuss that algorithm has potential since it does need any training or other input parameters than data. However, future studies are needed to imply it, for example, in three-dimensional data.
• (Helsingin yliopisto, 2021)
This thesis is motivated by the following questions: What can we say about the set of primes p for which the equation f(x) = 0 (mod p) is solvable when f is (i) a polynomial or (ii) of the form a^x - b? Part I focuses on polynomial equations modulo primes. Chapter 2 focuses on the simultaneous solvability of such equations. Chapter 3 discusses classical topics in algebraic number theory, including Galois groups, finite fields and the Artin symbol, from this point of view. Part II focuses on exponential equations modulo primes. Artin's famous primitive root conjecture and Hooley's conditional solution is discussed in Chapter 4. Tools on Kummer-type extensions are given in Chapter 5 and a multivariable generalization of a method of Lenstra is presented in Chapter 6. These are put to use in Chapter 7, where solutions to several applications, including the Schinzel-Wójcik problem on the equality of orders of integers modulo primes, are given.
• (Helsingin yliopisto, 2021)
Puolueiden kannatusmittaukset vaalien välillä tehdään kyselytutkimusten avulla. Näitä mielipidetiedusteluita kutsutaan kansankielellä termillä gallup. Tässä työssä perehdytään poliittisten mielipidetutkimusten historiaan sekä tehdään lyhyt katsaus galluppien nykytilanteeseen Suomessa. Tässä maisterintutkielmassa on ollut käytössä kyselytutkimuksella kerätyt aineistot. Aineistoissa on kysytty vastaajien äänestyskäyttäytymistä seuraavissa vaaleissa: kuntavaalit 2012, eduskuntavaalit 2015 sekä kuntavaalit 2017. Tutkielmassa esitellään kyselytutkimuksien kysymyksen asettelu, aineistojen puhdistamisen työvaiheita sekä perusteet mitkä tiedot tarvitaan tilastollisen mallin sovittamista varten. Teoriaosuudessa esitellään yleistettyjä lineaarisia malleja. Menetelmänä sovitetaan yleistetty lineaarinen malli valittuihin ja puhdistettuihin aluperäisten aineistojen osa-aineistoihin. Näissä osa-aneistoissa on tiedot vastaajien äänestyskäyttäytymisestä kahdeksan eri eduskuntapuolueen kesken. Lisäksi tilastollisen mallin sovittamista varten osa-aineistossa on tiedot vastaajien sukupuolesta sekä asuinpaikasta NUTS 2 -aluejaon mukaisesti. Sukupuoli ja viisi eri aluetta toimivat mallissa selittävinä muuttujina, kun taas puoluekannatus selitettävänä muuttujana. Aineiston käsittely on toteutettu R-laskentaohjelmalla. Tuloksissa on esitetään taulukointina selittävien muuttujien vaikutusta tarkasteltavan puolueen äänestämiseen, niin itsenäisinä selittäjinä kuin niiden yhteisvaikuksina. Jokaista kahdeksaa puoluetta tarkastellaan kaikkien kolmen vaaliaineiston osalta erikseen. Analysoinnin työkaluina toimivat suurimman uskottavuuden estimaattit sekä niiden luottamusvälit.
• (Helsingin yliopisto, 2021)
In a quickest detection problem, the objective is to detect abrupt changes in a stochastic sequence as quickly as possible, while limiting rate of false alarms. The development of algorithms that after each observation decide to either stop and declare a change as having happened, or to continue the monitoring process has been an active line of research in mathematical statistics. The algorithms seek to optimally balance the inherent trade-off between the average detection delay in declaring a change and the likelihood of declaring a change prematurely. Change-point detection methods have applications in numerous domains, including monitoring the environment or the radio spectrum, target detection, financial markets, and others. Classical quickest detection theory focuses settings where only a single data stream is observed. In modern day applications facilitated by development of sensing technology, one may be tasked with monitoring multiple streams of data for changes simultaneously. Wireless sensor networks or mobile phones are examples of technology where devices can sense their local environment and transmit data in a sequential manner to some common fusion center (FC) or cloud for inference. When performing quickest detection tasks on multiple data streams in parallel, classical tools of quickest detection theory focusing on false alarm probability control may become insufficient. Instead, controlling the false discovery rate (FDR) has recently been proposed as a more useful and scalable error criterion. The FDR is the expected proportion of false discoveries (false alarms) among all discoveries. In this thesis, novel methods and theory related to quickest detection in multiple parallel data streams are presented. The methods aim to minimize detection delay while controlling the FDR. In addition, scenarios where not all of the devices communicating with the FC can remain operational and transmitting to the FC at all times are considered. The FC must choose which subset of data streams it wants to receive observations from at a given time instant. Intelligently choosing which devices to turn on and off may extend the devices’ battery life, which can be important in real-life applications, while affecting the detection performance only slightly. The performance of the proposed methods is demonstrated in numerical simulations to be superior to existing approaches. Additionally, the topic of multiple hypothesis testing in spatial domains is briefly addressed. In a multiple hypothesis testing problem, one tests multiple null hypotheses at once while trying to control a suitable error criterion, such as the FDR. In a spatial multiple hypothesis problem each tested hypothesis corresponds to e.g. a geographical location, and the non-null hypotheses may appear in spatially localized clusters. It is demonstrated that implementing a Bayesian approach that accounts for the spatial dependency between the hypotheses can greatly improve testing accuracy.
• (Helsingin yliopisto, 2020)
The Poisson regression is a well known generalized linear model that relates the expected value of the count to a linear combination of explanatory variables. Outliers affect severely the classical maximum likelihood estimator of the Poisson regression. Several robust alternatives for the maximum likelihood (ML) estimator have been developed, such as Conditionally unbiased bounded-influence (CU) estimator, Mallows quasi-likelihood (MQ) estimator and M-Estimators based on transformations (MT). The purpose of the thesis is to study robustness of the robust Poisson regression estimators in different conditions. Another goal is to compare their performance to each other. The robustness of the Poisson regression estimators is investigated by performing a simulation study, where the used estimators are the ML, CU, MQ and MT estimators. The robust estimators MQ and MT are studied with two different weight functions C and H and also without a weight function. The simulation is executed in three parts, where the first part handles a situation without any outliers, in the second part the outliers are in the X space and in the third part the outliers are in the Y space. The results of the simulation show that all the robust estimators are less affected by the outliers than the classical ML estimator, but nevertheless the outliers severely weaken the results of the CU estimator and the MQ based estimators. The MT based estimators and especially the MT and H-MT estimators have by far the lowest medians of the mean squared errors, when the data are contaminated with outliers. When there aren’t any outliers in the data, they compare favorably with the other estimators. Therefore the MT and H-MT estimators are an excellent option for fitting the Poisson regression model.
• (Helsingin yliopisto, 2020)
Työssä tutkitaan Hyvinkään sairaanhoitoalueen kustannuksia, sekä kokonaiskustannusten tasolla, että yksittäisen potilaan tasolla. Sairaanhoidon kustannukset ovat olennainen osa yhteiskunnan toimintaa ja ne vaikuttavat merkittävästi kuntien ja kaupunkien talouteen. Tämän takia on hyödyllistä pystyä ymmärtämään ja mallintamaan näitä kustannuksia. Aineistona on käytetty HUSilta saatua dataa kustannuslajeista, potilaista ja diagnoosiryhmistä. Tutkimuksen ensimmäinen tavoite on löytää tilastollinen malli, jolla voidaan ennustaa kokonaiskustannuksia. Toisena tavoitteena on löytää yksittäisten potilaiden kustannuksiin sopiva jakauma. Työn alussa esitellään todennäköisyysteoriaa ja tilastollisia menetelmiä, joita hyödynnetään tutkimuksessa. Näistä tärkeimmät ovat keskineliövirhe, aikasarjamalli ja tilastolliset testit. Näiden teorioiden avulla luodaan mallit kokonaiskustannuksille ja yksittäisen potilaan kustannuksille. Kokonaiskustannusten analysointi aloitetaan erottelemalla suurimmat kustannuslajit, jotta niiden tutkiminen olisi selkeämpää. Näihin isoimpiin kustannuslajeihin valitaan tärkeimmät selittävät muuttujat käyttämällä lineaarista regressiomallia ja informaatiokriteeriä. Näin saatujen muuttujien avulla voidaan muodostaa moniulotteinen aikasarjamalli kokonaiskustannuksille ja tärkeimmille muuttujille. Tämän mallin avulla voidaan luoda ennuste tulevaisuuden kustannuksista, kun se on validoitu muun aineiston avulla. Tutkielman viimeisessä osiossa tutustutaan tarkemmin paksuhäntäisiin jakaumiin, ja esitellään niiden tärkeimpiä ominaisuuksia. Paksuhäntäisillä jakaumilla suurien havaintojen todennäköisyys on merkittävästi suurempi kuin kevythäntäisillä. Tämä vuoksi niiden tunnistaminen on tärkeää, sillä paksuhäntäiset jakaumat voivat aiheuttaa merkittäviä kustannuksia. Termien esittelyn jälkeen tehdään visuaalista tarkastelua potilaiden kustannuksista. Tavoitteena on selvittää, mikä jakauma kuvaisi parhaiten potilaiden kustannuksia. Tutkimuksessa verrataan erilaisten teoreettisten jakaumien kuvaajia aineistosta laskettuun empiiriseen jakaumaan. Erilaisista kuvaajista voidaan päätellä, että kustannusten jakauma on paksuhäntäinen. Lisäksi huomataan, että havainnot sopisivat yhteen sen oletuksen kanssa, että jakauman häntä muistuttaa ainakin asymptoottisesti potenssihäntää. Työn lopussa perustellaan ääriarvoteoriaan nojaten, miksi potenssihännät ovat luonnollinen malli suurimmille kustannuksille.
• (Helsingin yliopisto, 2021)
Plane algebraic curves are defined as zeroes of polynomials in two variables over some given field. If a point on a plane algebraic curve has a unique tangent line passing through it, the point is called simple. Otherwise, it is a singular point or a singularity. Singular points exhibit very different algebraic and topological properties, and the objective of this thesis is to study these properties using methods of commutative algebra, complex analysis and topology. In chapter 2, some preliminaries from classical algebraic geometry are given, and plane algebraic curves and their singularities are formally defined. Curves and their points are linked to corresponding coordinate rings and local rings. It is shown that a point is simple if and only if its corresponding local ring is a discrete valuation ring. In chapter 3, the Newton-Puiseux algorithm is introduced. The algorithm outputs fractional power series known as Puiseux expansions, which are shown to produce parametrizations of the local branches of a curve around a singular point. In chapter 4, Puiseux expansions are used to study the topology of complex plane algebraic curves. Around singularities, curves are shown to have an iterated torus knot structure which is, up to homotopy, determined by invariants known as Puiseux pairs.
• (Helsingin yliopisto, 2019)
Computed tomography (CT) is an X-ray based imaging modality utilized not only in medicine but also in other scientific fields and industrial applications. The imaging process can be mathematically modelled as a linear equation and finding its solution is a typical example of an inverse problem. It is ill-posed especially if the number of projections is sparse. One approach is to combine the data mismatch term with a regularization one, and look for the minimizer of such a functional. The regularization is a penalty term that introduces prior information that might be available on the solution. Numerous algorithms exist to solve a problem of this type. For example the iterative primaldual fixed point algorithm (PDFP) is well suited for reconstructing CT images when the functional to minimize includes a non-negativity constraint and the prior information is expressed by an l1-norm of the shearlet transformed target. The motivation of this thesis stems from CT imaging of plants perfused with a liquid contrast agent aimed at increasing the contrast of the images and studying the ow of liquid in the plant over time. Therefore the task is to reconstruct dynamic CT images. The main idea is to apply 3D shearlets as a prior, treating time as the third dimension. For comparison, both Haar wavelet transform as well as 2D shearlet transform were tested. In addition a recently proposed technique based on the sparsity levels of the target was used to ease the non-trivial choice of the regularization parameter. The quality of di erent set-ups were assessed for said problem with simulated measurements, a real life scenario where the contrast agent is applied to a gel and, finally, to real data where the contrast agent is perfused to a real plant. The results indicate that the 3D shearlet-based approach produce suitable reconstructions for observing the changes in the contrast agent even though there are no drastic improvements to the quality of reconstructions compared to using the Haar transform.
• (Helsingin yliopisto, 2021)
HMC is a computational method build to efficiently sample from a high dimensional distribution. Sampling from a distribution is typically a statistical problem and hence a lot of works concerning Hamiltonian Monte Carlo are written in the mathematical language of probability theory, which perhaps is not ideally suited for HMC, since HMC is at its core differential geometry. The purpose of this text is to present the differential geometric tool's needed in HMC and then methodically build the algorithm itself. Since there is a great introductory book to smooth manifolds by Lee and not wanting to completely copy Lee's work from his book, some basic knowledge of differential geometry is left for the reader. Similarly, the author being more comfortable with notions of differential geometry, and to cut down the length of this text, most theorems connected to measure and probability theory are omitted from this work. The first chapter is an introductory chapter that goes through the bare minimum of measure theory needed to motivate Hamiltonian Monte Carlo. Bulk of this text is in the second and third chapter. The second chapter presents the concepts of differential geometry needed to understand the abstract build of Hamiltonian Monte Carlo. Those familiar with differential geometry can possibly skip the second chapter, even though it might be worth while to at least flip through it to fill in on the notations used in this text. The third chapter is the core of this text. There the algorithm is methodically built using the groundwork laid in previous chapters. The most important part and the theoretical heart of the algorithm is presented here in the sections discussing the lift of the target measure. The fourth chapter provides brief practical insight to implementing HMC and also discusses quickly how HMC is currently being improved.
• (Helsingin yliopisto, 2021)
In practice, outlying observations are not uncommon in many study domains. Without knowing the underlying factors to the outliers, it is appealing to eliminate the outliers from the datasets. However, unless there are scientific justification, outlier elimination amounts to alteration of the datasets. Otherwise, heavy-tailed distributions should be adopted to model the larger-than-expected variabiltiy in an overdispersed dataset. The Poisson distribution is the standard model to model the variation in count data. However, the empirical variability in observed datsets is often larger than the amount expected by the Poisson. This leads to unreliable inferences when estimating the true effect sizes of covariates in regression modelling. It follows that the Negative Binomial distribution is often adopted as an alternative to deal with the overdispersed datasets. Nevertheless, it has been proven that both Poisson and Negative Binomial observation distributions are not robust against the outliers, in a sense that the outliers have non-negligible influence on the estimation of the covariate effect size. On the other hand, the scale mixture of quasi-Poisson distributions (called the robust quasi-Poisson model), which is constructed similarly to the construction of the Student's t-distribution, is a heavy-tailed alternative to the Poisson. It is proven to be robust against outliers. The thesis shows the theoretical evidence on the robustness of the 3 aforementioned models in a Bayesian framework. Lastly, the thesis considers 2 simulation experiments with different kinds of the outlier source -- process error and covariate measurement error, to compare the robustness between the Poisson, Negative Binomial and robust quasi-Poisson regression models in the Bayesian framework. The model robustness was assessed, in terms of the model ability to infer correctly the covariate effect size, in different combination of error probability and error variability. It was proven that the robust quasi-Poisson regression model was more robust than its counterparts because its breakdown point was relatively higher than the others, in both experiments.
• (Helsingin yliopisto, 2020)
Several extensions of first-order logic are studied in descriptive complexity theory. These extensions include transitive closure logic and deterministic transitive closure logic, which extend first-order logic with transitive closure operators. It is known that deterministic transitive closure logic captures the complexity class of the languages that are decidable by some deterministic Turing machine using a logarithmic amount of memory space. An analogous result holds for transitive closure logic and nondeterministic Turing machines. This thesis concerns the k-ary fragments of these two logics. In each k-ary fragment, the arities of transitive closure operators appearing in formulas are restricted to a nonzero natural number k. The expressivity of these fragments can be studied in terms of multihead finite automata. The type of automaton that we consider in this thesis is a two-way multihead automaton with nested pebbles. We look at the expressive power of multihead automata and the k-ary fragments of transitive closure logics in the class of finite structures called word models. We show that deterministic twoway k-head automata with nested pebbles have the same expressive power as first-order logic with k-ary deterministic transitive closure. For a corresponding result in the case of nondeterministic automata, we restrict to the positive fragment of k-ary transitive closure logic. The two theorems and their proofs are based on the article ’Automata with nested pebbles capture first-order logic with transitive closure’ by Joost Engelfriet and Hendrik Jan Hoogeboom. In the article, the results are proved in the case of trees. Since word models can be viewed as a special type of trees, the theorems considered in this thesis are a special case of a more general result.
• (Helsingin yliopisto, 2020)
Spectral theory is a powerful tool when applied to differential equations. The fundamental result being the spectral theorem of Jon Von Neumann, which allows us to define the exponential of an unbounded operator, provided that the operator in question is self-adjoint. The problem we are considering in this thesis, is the self-adjointness of the Schr\"odinger operator $T = -\Delta + V$, a linear second-order partial differential operator that is fundamental to non-relativistic quantum mechanics. Here, $\Delta$ is the Laplacian and $V$ is some function that acts as a multiplication operator. We will study $T$ as a map from the Hilbert space $H = L^2(\mathbb{R}^d)$ to itself. In the case of unbounded operators, we are forced to restrict them to some suitable subspace. This is a common limitation when dealing with differential operators such as $T$ and the choice of the domain will usually play an important role. Our aim is to prove two theorems on the essential self-adjointness of $T$, both originally proven by Tosio Kato. We will start with some necessary notation fixing and other preliminaries in chapter 2. In chapter 3 basic concepts and theorems on operators in Hilbert spaces are presented, most importantly we will introduce some characterisations of self-adjointness. In chapter 4 we construct the test function space $D(\Omega)$ and introduce distributions, which are continuous linear functionals on $D(\Omega).$ These are needed as the domain for the adjoint of a differential operator can often be expressed as a subspace of the space of distributions. In chapter 5 we will show that $T$ is essentially self-adjoint on compactly supported smooth functions when $d=3$ and $V$ is a sum consisting of an $L^2$ term and a bounded term. This result is an application of the Kato-Rellich theorem which pertains to operators of the form $A+B$, where $B$ is bounded by $A$ in a suitable way. Here we will also need some results from Fourier analysis that will be revised briefly. In chapter 6 we introduce some mollification methods and prove Kato's distributional inequality, which is important in the proof of the main theorem in the final chapter and other results of similar nature. The main result of this thesis, presented in chapter 7, is a theorem originally conjectured by Barry Simon which says that $T$ is essentially self-adjoint on $C^\infty_c(\mathbb{R}^d)$, when $V$ is a non-negative locally square integrable function and $d$ is an arbitrary positive integer. The proof is based around mollification methods and the distributional inequality proven in the previous chapter. This last result, although fairly unphysical, is somewhat striking in the sense that usually for $T$ to be (essentially) self-adjoint, the dimension $d$ restricts the integrability properties of $V$ significantly.
• (Helsingin yliopisto, 2020)
Estimating the effect of random chance (’luck’) has long been a question of particular interest in various team sports. In this thesis, we aim to determine the role of luck in a single icehockey game by building a model to predict the outcome based on the course of events in a game. The obtained prediction accuracy should also to some extent reveal the effect of random chance. Using the course of events from over 10,000 games, we train feedforward and convolutional neural networks to predict the outcome and final goal differential, which has been proposed as a more informative proxy for outcome. Interestingly, we are not able to obtain distinctively higher accuracy than previous studies, which have focused on predicting the outcome with infomation available before the game. The results suggest that there might exist an upper bound for prediction accuracy even if we knew ’everything’ that went on in a game. This further implies that random chance could affect the outcome of a game, although assessing this is difficult, as we do not have a good quantitative metric for luck in the case of single ice hockey game prediction.
• (Helsingin yliopisto, 2021)
The goal of the thesis is to prove the Dold-Kan Correspondence, which is a theorem stating that the category of simplicial abelian groups sAb and the category of positively graded chain complexes Ch+ are equivalent. The thesis also goes through these concepts mentioned in the theorem, starting with categories and functors in the first section. In this section, the aim is to give enough information about category theory, so that the equivalence of categories can be understood. The second section uses these category theoretical concepts to define the simplex category, where the objects are ordered sets n = { 0 -> 1 -> ... -> n }, where n is a natural number, and the morphisms are order preserving maps between these sets. The idea is to define simplicial objects, which are contravariant functors from the simplex category to some other category. Here is also given the definition of coface and codegeneracy maps, which are special kind of morphisms in the simplex category. With these, the cosimplicial (and later simplicial) identities are defined. These identities are central in the calculations done later in the thesis. In fact, one can think of them as the basic tools for working with simplicial objects. In the third section, the thesis introduces chain complexes and chain maps, which together form the category of chain complexes. This lays the foundation for the fourth section, where the goal is to form three different chain complexes out of any given simplicial abelian group A. These chain complexes are the Moore complex A*, the chain complex generated by degeneracies DA* and the normalized chain complex NA*. The latter two of these are both subcomplexes of the Moore complex. In fact, it is later on shown that there exists an isomorphism An = NAn +DAn between the abelian groups forming these chain complexes. This connection between these chain complexes is an important one, and it is proved and used later on in the seventh section. At this point in the thesis, all the knowledge for understanding the Dold-Kan Correspondence has been presented. Thus begins the forming of the functors needed for the equivalence, which the theorem claims to exist. The functor from sAb to Ch+ maps a simplicial abelian group A to its normalized chain complex NA*, the definition of which was given earlier. This direction does not require that much additional work, since most of it was done in the sections dealing with chain complexes. However, defining the functor in the opposite direction does require some more thought. The idea is to map a chain complex K* to a simplicial abelian group, which is formed using direct sums and factorization. Forming it also requires the definition of another functor from a subcategory of the simplex category, where the objects are those of the simplex category but the morphisms are only the injections, to the category of abelian groups Ab. After these functors have been defined, the rest of the thesis is about showing that they truly do form an equivalence between the categories sAb and Ch+.
• (Helsingin yliopisto, 2021)
Tämän Pro gradu- tutkielma aiheena on valokuvien tarkentamiseen käytetyt algoritmit. Algoritmien tavoitteena on poistaa valokuvista esimerkiksi liikkeestä tai huonosta tarkennuksesta johtuvia epätarkkuuksia ja kohinaa. Työssä esiteltävissä algoritmeissä ongelmaa käsitellään tilastollisena inversio-ongelmana, jonka parametrien estimointiin käytetään erilaisia numeerisia menetelmiä. Tutkielman rakenne koostuu kolmesta osiosta; yleisestä teoriaosuudesta, työssä käytettävien algoritmien esittelystä sekä algoritmien soveltamisesta data aineistoihin ja tulosten vertailusta. Teoriaosuudessa käydään lyhyesti läpi inversio-ongelmien yleistä teoriaa, keskittyen erityisesti valokuvien tarkentamisen kannalta olennaiseen diskreettiin lineaariseen tapaukseen ja tämän tilastolliseen muotoiluun. Algoritmien puolestaan voidaan ajatella koostuvan kahdesta osasta: (i) tilastollisen mallin määrittämisestä ja (ii) mallin parametrien numeerisesta optimoinnista. Tutkielmassa esitellään kaksi klassista analyyttistä menetelmää nimiltään Richardson-Lucy ja ROF -algoritmit sekä syväoppimista ratkaisussa hyödyntävä iRestNet. Lopuksi algoritmeja sovelletaan kahdelle eri aineistoille: ohjelmallisesti generoidulle datalle ja vuonna 2021 järjestetyn Helsinki Deblur Challenge -haastekilpailun kuva-aineistolle. Tarkoituksena on selvittää algoritmien toteutuksessa tehtävien valintojen vaikutusta lopputuloksiin ja vertailla esiteltyjen algoritmien antamia tuloksia keskenään.
• (Helsingin yliopisto, 2021)
Maxwell’s equations are a set of equations which describe how electromagnetic fields behave in a medium or in a vacuum. This means that they can be studied from the perspective of partial differential equations as different kinds of initial value problems and boundary value problems. Because often in physically relevant situations the media are not regular or there can be irregular sources such as point sources, it’s not always meaningful to study Maxwell’s equations with the intention of finding a direct solution to the problem. Instead in these cases it’s useful to study them from the perspective of weak solutions, making the problem easier to study. This thesis studies Maxwell’s equations from the perspective of weak solutions. To help understand later chapters, the thesis first introduces theory related to Hilbert spaces, weak derivates and Sobolev spaces. Understanding curl, divergence, gradient and their properties is important for understanding the topic because the thesis utilises several different Sobolev spaces which satisfy different kinds of geometrical conditions. After going through the background theory, the thesis introduces Maxwell’s equations in section 2.3. Maxwell’s equations are described in both differential form and timeharmonic differential forms as both are used in the thesis. Static problems related to Maxwell’s equations are studied in Chapter 3. In static problems the charge and current densities are stationary in time. If the electric field and magnetic field are assumed to have finite energy, it follows that the studied problem has a unique solution. The thesis demonstrates conditions on what kind of form the electric and magnetic fields must have to satisfy the conditions of the problem. In particular it’s noted that the electromagnetic field decomposes into two parts, out of which only one arises from the electric and magnetic potential. Maxwell’s equations are also studied with the methods from spectral theory in Chapter 4. First the thesis introduces and defines a few concepts from spectral theory such as spectrums, resolvent sets and eigenvalues. After this, the thesis studies non-static problems related to Maxwell’s equations by utilising their time-harmonic forms. In time-harmonic forms the Maxwell’s equations do not depend on time but instead on frequencies, effectively simplifying the problem by eliminating the time dependency. It turns out that the natural frequencies which solve the spectral problem we study belong to the spectrum of Maxwell’s operator iA . Because the spectrum is proved to be discrete, the set of eigensolutions is also discrete. This gives the solution to the problem as the natural frequency solving the problem has a corresponding eigenvector with finite energy. However, this method does not give an efficient way of finding the explicit form of the solution.