Matemaattis-luonnontieteellinen tiedekunta


Recent Submissions

  • Cole, Elizabeth (Helsingin yliopisto, 2011)
    Thermal instability (hereafter TI) is investigated in numerical simulations to determine its effect on the growth and efficiency of the dynamo processes. The setup used is a three-dimensional periodic cube of a size several times the correlation length of the interstellar turbulence. The simulations are designed to model the interstellar medium without any shear or rotation, to isolate the effect of TI. Hydrodynamical and nonhelical simulations are run for comparison to determine the effects the magnetic field has upon the gas itself. Turbulence is simulated by external helical forcing of varying strength, which is known to create a large-scale dynamo of alpha^2-type. The nonhelical cases are also explored in an attempt to create a small-scale dynamo at high Rm, but no dynamo action could be detected in the range of Rm of approximately 30 150. The hydrodynamical simulations reproduce the tendency of the gas to separate into two phases if an unstable cooling function is present. The critical magnetic Reynolds number of the large-scale dynamo was observed to be almost twice as large for the unstable versus stable cooling function, indicating that the dynamo is harder to excite when TI is present. The efficiency of the dynamo as measured by the ratio of magnetic to kinetic energy was found to increase for the unstable case at higher forcing. The results of the runs from this thesis are part of a larger project studying dynamo action in interstellar flows.
  • Kärkkäinen, Johannes (2016)
    Työn saaminen Eurooppalaisesta koulusta matematiikan opettajana sai tutkian innostumaan tuomaan esille Helsingissä sijaitsevan uuden koulun matematiikan opetusta. Koulu on uusi Suomessa, vaikka Euroopassa ollut jo pitkään. Mitä tämä koulu tarjoaa matematiikan saralla ja miten se eroaa Helsingin muiden koulun matematiikan opetuksesta. Tutkielmassa käytetään apuna Eurooppalaisessa koulussa ja Helsingin kunnan koulun opetusmateriaalia sekä opetussuunnitelmaa. Käydään läpi lukuvuosi kerrallaan ja tarkastellaan, miten eroavaisuudet ja samankaltaisuudet näkyvät opetussuunnitelmassa. Tämä tutkielman pyrkii valaisemaan näitä aiheita ja tarjoamaan vilahduksen matematiikan opetussuunnitelman näkökulmasta koulun matematiika opetusta yläkoulun puolella. Tutkija toimii tutkielmassa näkökulman antajana pitkälti mukanaan muutaman opettajan kertomat kokemukset, sillä koulun matematiikan opetuksesta ei ole monella Suomessa kokemusta. Tutkielman tarkoituksena on esitellä myös Eurooppalaisen koulun tarjoama vaihtoehto yksityiskouluna muihin yksityiskouluihin verrattuna kuten myös kunnan koulujen ohella, keskittyen ensi sijaisesti matematiikan opetukseen, jota Eurooppalaisella koululla on tarjota. Tutkielmassa nousee esille kulttuuri, jota Eurooppalaisella koululla on tarjota, sekä tavoitteet, joita koulu pitää yllä matematiikan opetuksessa. Tutkielman lopussa käännetään katsetta tulevaan ja mahdollisuuksiin mitä tulevaisuus voi tuoda tullessaa uuden opetussuunnitelman muodossa. Miten matematiikka voisi näkyä tulevaisuudessa ja mihin se mahdollisesti voi viedä näiden kahden opetussuunnitelman ja koulukulttuurin matematiikan opettajan näkökulmasta.
  • Kangasniemi, Ilmari (2016)
    Coarse structures are an abstract construction describing the behavior of a space at a large distance. In this thesis, a variety of existing results on coarse structures are presented, with the main focus being coarse embeddability into Hilbert spaces. The end goal is to present a hierarchy of three coarse invariants, namely coarse embeddability into a Hilbert space, a property of metric spaces known as Property A, and a finite-valued asymptotic dimension. After outlining the necessary prerequisites and notation, the first main part of the thesis is an introduction to the basics of coarse geometry. Coarse structures are defined, and it is shown how a metric induces a coarse structure. Coarse maps, equivalences and embeddings are defined, and some of their basic properties are presented. Alongside this, comparisons are made to both topology and uniform topology, and results related to metrizability of coarse spaces are outlined. Once the basics of coarse structures have been presented, the focus shifts to coarse embeddability into Hilbert spaces, which has become a point of interest due to its applications to several unsolved conjectures. Two concepts are presented related to coarse embeddability into Hilbert spaces, the first one being Property A. It is shown that Property A implies coarse embeddability into a Hilbert space, and that it is a coarse invariant. The second main concept related to coarse embeddability is asymptotic dimension. Asymptotic dimension is a coarse counterpart to the Lebesgue dimension of topological spaces. Various definitions of asymptotic dimension are given and shown equivalent. The coarse invariance of asymptotic dimension is shown, and the dimensions of several example spaces are derived. Finally, it is shown that a finite asymptotic dimension implies coarse embeddability into a Hilbert space, and in the case of spaces with bounded geometry it also implies Property A.
  • Paksula, Matti (2016)
    Mobile apps are intended to be created with mobile platforms development tools and programming languages. This native development requires specialized skills and can therefore be prohibitively expensive. HTML5 hybrid app development is a popular alternative for native mobile app development. This development model allows developers to use standard web technologies and the end result can be indistinguishable from a native app by its visual representation. This model enables faster iteration speed, allows any web developer to build apps and supports simultaneous cross-platform development. However, since the web technology is not as performant as native, these hybrid apps have often been criticized for being noticeably “laggy” by the app developer community and end users. One of the key components that affects HTML5 hybrid apps performance is the native bridge used in the app. This component bridges the embedded HTML5 application to the device features that wouldn’t otherwise be available (such as writing to a file on the device’s file system). The native bridge is one of the few components that a developer can freely change. Selecting the best native bridge for the app’s needs is important as an inefficient native bridge can cause human noticeable delay in the app. The performance of native bridges has been acknowledged in academia and industry, but very little researched systematically. This thesis introduces a systematic method to evaluate native bridges performance. Along with this method, this thesis also describes a new open source tool implementing this method for benchmarking different native bridges. This tool hosts reference implementation for 32 native bridges. Example results from a test suite that tested all implemented native bridges with two embeddable web view engines (UIWebView and WKWebView) on four distinct iOS devices (two iPads, iPhone and iPod Touch) are evaluated. The results show that the majority of the known native bridge methods can cause human noticeable visual and auditory latency. It is also indicated that the performance is largely affected by app usage patterns. The slowest measured native bridge was over two times slower (from no delay to significant user interface delay) than the fastest one.
  • Sobih, Ahmed (2016)
    Our planet is pervaded by hundreds of millions of microorganisms that are not visible to the naked eye. These microorganisms, also known as microbes, include bacteria, archaea, fungi, protists and viruses. Metagenomics allows for the study of microbial samples collected directly from the environment without prior culturing. A crucial step in metagenomics analysis is to unveil the structure of the microbial community in a specific environment; this step is called metagenomics taxonomic analysis (or community profiling). In this thesis we explain what is metagenomics taxonomic analysis, why it is important, and we present MetaFlow, a new tool for solving the metagenomics community profiling problem using high-throughput sequencing data. MetaFlow estimates the richness and the abundances at species taxonomic rank, based on coverage analysis across entire genomes, and it is the first method to apply network flows to solve this problem. Experiments showed that MetaFlow is more sensitive and precise than popular tools such as MetaPhlAn and mOTU, and its abundance estimation is better by 2-4 times. MetaFlow is available at
  • Tuomainen, Risto Olli Oskari (2016)
    In nearest neighbors search the task is to find points from a data set that lie close in space to a given query point. To improve on brute force search, that computes distances between the query point and all data points, numerous data structures have been developed. These however perform poorly in high dimensional spaces. To tackle nearest neighbors search in high dimensions it is commonplace to use approximate methods that only return nearest neighbors with high probability. In practice an approximate solution is often as good as an exact one, among other reasons because approximations can be of such a high quality that they are practically indistinguishable from exact solutions. Approximate nearest neighbors search has found applications in many different fields, and can for example be used in the context of recommendation systems. One class of approximate nearest neighbors algorithms is space partitioning methods. These algorithms recursively partition the data set to smaller subsets in order to construct a search structure. Queries can then be performed very efficiently by using this structure to prune data points without needing to evaluate their distances to the query point. A recent proposal belonging to this class of algorithms is multiple random projections trees (MRPT). MRPT uses random projection trees (RP-trees) to prune the set from which nearest neighbors are searched. This thesis proposes a voting algorithm for using multiple RP-trees in nearest neighbors search. We also discuss a further improvement, called mixture method. The performance of these algorithms was evaluated against the previous MRPT algorithm using two moderately high dimensional data sets. Mixture method was found to improve considerably on MRPT in terms of accuracy attained. The results presented in this thesis suggest that the mixture method may potentially be a strong algorithm for nearest neighbors search, especially in very high dimensional spaces.
  • Aintila, Eeva Katri Johanna (2016)
    Expected benefits from agile methodologies to project success have encouraged organizations to extend agile approaches to areas they were not originally intended to such as large scale information systems projects. Research regarding agile methods in large scale software development projects have existed for few years and it is considered as its own research area. This study investigates agile methods on the large scale software development and information systems projects and its goal is to produce more understanding of agile methods suitability and the conditions under which they would most likely contribute to project success. The goal is specified with three research questions; I) what are the characteristics specific to large scale software engineering projects or large scale Information Systems project, II) what are the challenges caused by these characteristics and III) how agile methodologies mitigate these challenges? In this study resent research papers related to the subject are investigated and characteristics of large scale projects and challenges associated to them are recognized. Material of the topic was searched starting from the conference publications and distributions sites related to the subject. Collected information is supplemented with the analysis of project characteristics against SWEBOK knowledge areas. Resulting challenge categories are mapped against agile practises promoted by Agile Alliance to conclude the impact of practises to the challenges. Study is not a systematics literature review. As a result 6 characteristics specific to large scale software development and IS projects and 10 challenge categories associated to these characteristics are recognized. The analysis reveals that agile practises enhance the team level performance and provide direct practises to manage challenges associated to high amount of changes and unpredictability of software process both characteristic to a large scale IS project but challenges still remain on the cross team and overall project level. As a conclusion it is stated that when seeking the process model with agile approach which would respond to all the characteristics of large scale project thus adding the likelihood of project success adaptations of current practises and development of additional practises are needed. To contribute this four areas for adaptations and additional practises are suggested when scaling agile methodologies over large scale project contexts; 1) adaptation of practises related to distribution, assignment and follow up of tasks, 2) alignment of practises related to software development process, ways of working and common principles over all teams, 3) developing additional practises to facilitate collaboration between teams, to ensure interactions with the cross functional project dimensions and to strengthen the dependency management and decision making between all project dimensions and 4) possibly developing and aligning practises to facilitate teams’ external communication. Results of the study are expected to be useful for software development and IS project practitioners when considering agile method adoptions or adaptations in a large scale project context. ACM Computing Classification System (CCS) 2012: • Social and professional topics~Management of computing and information systems • Software and its engineering~Software creation and management
  • Xiao, Han; Xiao, Han (2016)
    We study the problem of detecting top-k events from digital interaction records (e.g, emails, tweets). We first introduce interaction meta-graph, which connects associated interactions. Then, we define an event to be a subset of interactions that (i) are topically and temporally close and (ii) correspond to a tree capturing information flow. Finding the best event leads to one variant of prize-collecting Steiner-tree problem, for which three methods are proposed. Finding the top-k events maps to maximum k-coverage problem. Evaluation on real datasets shows our methods detect meaningful events.
  • Jin, Haibo (2016)
    Multilingual Latent Dirichlet Allocation (MLDA) is an extension of Latent Dirichlet Allocation (LDA) in a multilingual setting, which aims to discover aligned latent topic structures of a parallel corpus. Although the two popular training algorithms of LDA, collapsed Gibbs sampling and variational inference, can be naturally adopted to MLDA, the two algorithms both become time-inefficient with MLDA due to its special structure. To address this problem, we propose an approximate training framework of MLDA, which works with both collapsed Gibbs sampling and variational inference. Through the experiments, we show that the proposed training framework is able to reduce the training time of MLDA considerably, especially when there are many languages. We also summarize the scenarios where the approximate framework gives comparable model accuracy to that of the standard framework. Finally, we discuss several possible explorations as a future plan.
  • Nouri, Javad (2016)
    This thesis work introduces an approach to unsupervised learning of morphological structure of human languages. We focus on morphologically rich languages and the goal is to construct a knowledge-free and language-independent model. This model works by receiving a long list of words in a language and is expected to learn how to segment the input words in a way that the resulting segments correspond to morphemes in the target language. Several improvements inspired by well-motivated linguistic principles of morphology of languages are introduced to the proposed MDL-based learning algorithm. In addition to the learning algorithm, a new evaluation method and corresponding resources are introduced. Evaluation of morphological segmentations is a challenging task due to the inherent ambiguity of natural languages and underlying morphological processes such as fusion which encumber identification of unique “correct” segmentations for words. Our evaluation method addresses the problem of segmentation evaluation with a focus on consistency of segmentations. Our approach is tested on data from Finnish, Turkish, and Russian. Evaluation shows a gain in performance over the state of the art.
  • Sorkhei, Amin (2016)
    With the fast growing number of scientific papers produced every year, browsing through scientific literature can be a difficult task: formulating a precise query is not often possible if one is a novice in a given research field or different terms are often used to describe the same concept. To tackle some of these issues, we build a system based on topic models for browsing the arXiv repository. Through visualizing the relationship between keyphrases, documents and authors, the system allows the user to better explore the document search space compared to traditional systems based solely on query search. In this paper, we describe the design principles and the functionality supported by this system as well as report on a short user study.
  • Singh, Maninder Pal (2016)
    Research in healthcare domain is primarily focused on diseases based on the physiological changes of an individual. Physiological changes are often linked to multiple streams originated from different biological systems of a person. The streams from various biological systems together form attributes for evaluation of symptoms or diseases. The interconnected nature of different biological systems encourages the use of an aggregated approach to understand symptoms and predict diseases. These streams or physiological signals obtained from healthcare systems contribute to a vast amount of vital information in healthcare data. The advent of technologies allows to capture physiological signals over the period, but most of the data acquired from patients are observed momentarily or remains underutilized. The continuous nature of physiological signals demands context aware real-time analysis. The research aspects are addressed in this thesis using large-scale data processing solution. We have developed a general-purpose distributed pipeline for cumulative analysis of physiological signals in medical telemetry. The pipeline is built on the top of a framework which performs computation on a cluster in a distributed environment. The emphasis is given to the creation of a unified pipeline for processing streaming and non-streaming physiological time series signals. The pipeline provides fault-tolerance guarantees for the processing of signals and scalable to multiple cluster nodes. Besides, the pipeline enables indexing of physiological time series signals and provides visualization of real-time and archived time series signals. The pipeline provides interfaces to allow physicians or researchers to use distributed computing for low-latency and high-throughput signals analysis in medical telemetry.
  • Enberg, Pekka (2016)
    Hypervisors and containers are the two main virtualization techniques that enable cloud computing. Both techniques have performance overheads on CPU, memory, networking, and disk performance compared to bare metal. Unikernels have recently been proposed as an optimization for hypervisor-based virtualization to reduce performance overheads. In this thesis, we evaluate network I/O performance overheads for hypervisor-based virtualization using Kernel-based Virtual Machine (KVM) and the OSv unikernel and for container-based virtualization using Docker comparing the different configurations and optimizations. We measure the raw networking latency and throughput and CPU utilization by using the Netperf benchmarking tool and measure network intensive application performance using the Memcached key-value store and the Mutilate benchmarking tool. We show that compared to bare metal Linux, Docker with bridged networking has the least performance overhead with OSv using vhost-net coming a close second.
  • Linkola, Simo (2016)
    A measurement for how similar (or distant) two computer programs are has a wide range of possible applications. For example, they can be applied to malware analysis or analysis of university students' programming exercises. However, as programs may be arbitrarily structured, capturing the similarity of two non-trivial programs is a complex task. By extracting call graphs (graphs of caller-callee relationships of the program's functions, where nodes denote functions and directed edges denote function calls) from the programs, the similarity measurement can be changed into a graph problem. Previously, static call graph distance measures have been largely based on graph matching techniques, e.g. graph edit distance or maximum common subgraph, which are known to be costly. We propose a call graph distance measure based on features that preserve some structural information from the call graph without explicitly matching user defined functions together. We define basic properties of the features, several ways to compute the feature values, and give a basic algorithm for generating the features. We evaluate our features using two small datasets: a dataset of malware variants, and a dataset of university students' programming exercises, focusing especially on the former. For our evaluation we use experiments in information retrieval and clustering. We compare our results for both datasets to a baseline, and additionally for the malware dataset to the results obtained with a graph edit distance approximation. In our preliminary results we show that even though the feature generation approach is simpler than the graph edit distance approximation, the generated features can perform on a similar level as the graph edit distance approximation. However, experiments on larger datasets are still required to verify the results.
  • Puuska, Samir (2016)
    Critical infrastructure forms an interdependent network, where individual infrastructure sectors depend on the availability of others in order to function. In such environment, faults easily propagate through the interlinked systems causing cascading failures. In order to effectively respond to incidents at national scale, it is necessary to maintain situational awareness by creating a common operational picture over all infrastructure sectors. A suitable way of modelling critical infrastructure and the interdependencies is required for building a system capable of delivering the needed information for obtaining robust situational awareness. This thesis presents a model of critical infrastructure for national scale situational awareness applications, as well as analysis methods for estimating current and future infrastructure status. The model uses directed graphs in conjunction with finite state transducers to present dependencies and operational status of critical infrastructure systems. Analysis method utilising graph centrality measures was developed for quantifying both system specific and infrastructure wide impact of disruptions. Additionally, an entropy based analysis method was created for estimating operational status of infrastructure systems in situations, where current data is not available. The electric grid and mobile networks of a coastal area of Finland were modelled using the presented methods. Dataset of system failures observed during a storm, in conjunction with simulation tools were used to evaluate the suitability of the framework for situational awareness tasks. Results indicate, that the proposed modelling and analysis methods are suitable for real time situational awareness applications.