Matemaattis-luonnontieteellinen tiedekunta

 

Recent Submissions

  • Halin, Mikko (Helsingin yliopisto, 2019)
    Graphs are an intuitive way to model connections between data and they have been used in problem solving since the 18th century. In modern applications graphs are used, e.g., in social network services, e-commerce sites and navigation systems. This thesis presents a graph-based approach for handling data and observing identities from network traffic.
  • Karikoski, Antti (Helsingin yliopisto, 2019)
    Data compression is one way to gain better performance from a database. Compression is typically achieved with a compression algorithm, an encoding or both. Effective compression directly lowers the physical storage requirements translating to reduced storage costs. Additionally, in case of a data transfer bottleneck where CPU is data starved, compression can yield improved query performance through increased transfer bandwidth and better CPU utilization. However, obtaining better query performance is not trivial since many factors affect the viability of compression. Compression has been found especially successful in column oriented databases where similar data is stored closely in physical media. This thesis studies the effect of compression on a columnar storage format Apache Parquet through a micro benchmark that is based on the TPC-H benchmark. Compression is found to have positive effects on simple queries. However, with complex queries, where data scanning is relatively small portion of the query, no performance gains were observed. Furthermore, this thesis examines the decoding performance of the encoding layer that belongs to a case database, Fastorm. The goal is to determine its efficiency among other encodings and whether it could be improved upon. Fastorm's encoding is compared against various encodings of Apache Parquet in a setting where data is from a real world business. Fastorm's encoding is deemed to perform well enough coupled with strong evidence to consider adding delta encoding to its repertoire of encoding techniques.
  • Lehvä, Jyri (Helsingin yliopisto, 2019)
    Consumer-Driven Contract testing is a way to test integrations between services. The main idea is that when an application or a service (consumer) consumes an API provided by another service (provider) a contract is formed between them. The contract contains information about how the consumer calls the provider and what the consumer needs from the responses. The contract can then be used to test both sides of the integration separately. The testing method is said to be useful when testing integration-heavy systems such as systems based on microservice architecture. Therefore the research question of the thesis is: "with a focus on integrations, is Consumer-Driven Contract testing a viable addition to a testing strategy used to test a system based on microservice architecture, and if so, why?" The research question is first approached by taking a look at the most recent literature. The goal is to learn about different testing methods and create a basic understanding of a general testing strategy for microservices. The next step is to figure out how the Consumer-Driven Contract testing fits that picture. The Consumer-Driven Contract testing is introduced thoroughly to gain a good understanding of its core concepts, advantages, disadvantages, and tooling. After the literature check, the research question is approached by introducing a case study based on a microservice architecture. Its testing strategy is described in detail, and Consumer-Driven Contract tests are implemented for it. The testing methods are compared by systematically implementing defects to the integrations and seeing how the testing methods catch them. Finally, the results and experiences are shared and analyzed, and the research question gets answered. The results based on literature and experiences from the case study proved that the Consumer-Driven Contract testing is a viable way to test integrations. The tests implemented in the case study caught every defect from the integrations, and the case study was able to verify the advantages mentioned in the literature. It was shown that the Consumer-Driven Contract tests could replace the more traditional integration tests completely. That results to more deterministic testing strategy as the integrations are tested in isolation. It should be emphasized that the teams have to be able to communicate with each other to implement and achieve the benefits of Consumer-Driven Contract testing. The level of communication between the teams has to be mature enough to share the contracts and to coordinate the implementation. Communication is the foundation that enables or disables the testing method. Because of that improving the ways of communication should be a major focus for the teams who want to implement Consumer-Driven Contract tests.
  • Kontio, Heikki (Helsingin yliopisto, 2019)
    The number of IoT sensors is steadily rising in the world. As they're being used more and more in industries and households, the importance of proper lifecycle management (LCM) is increasing. Failure detection and security management are essential to be able to manage the large number of devices. In this thesis a number of platforms are evaluated, on the basis of meeting the expectations of LCM. The evaluation was done via a gap analysis. The categories for the analysis were: tools for estimating the remaining useful lifetime for sensors, API availability for LCM, failure detection and security management. Based on the gap analysis a list of recommendations is given in order to fill the gaps: - universal, platform-independent tools for estimating the remaining useful lifetime (RUL) - update APIs to widely used scalable and extensible architectural style REST - platform-independent standard for sensors reporting health status - industry-standard detection methods available for all
  • Panchamukhi, Sandeep (Helsingin yliopisto, )
    Time series analysis has been a popular research topic in the last few decades. In this thesis, we develop time series models to investigate short time series of count data. We first begin with Poisson autoregressive model and extend it to capture day effects explicitly. Then we propose hierarchical Poisson tensor factorization model as an alternative to the traditional count time series models. Furthermore, we suggest acontext-based model as an improvement over hierarchical Poisson tensor factorization model. We implement the models in an open-source probabilistic programming framework Edward. This tool enables us to express the models in form of executable program code and allows us to rapidly prototype models without the need of derivation of model specificupdaterules. We also explore strategies for selecting the best model out of alternatives. We study the proposed models on a dataset containing media consumption data. Our experimental findings demonstrate that the hierarchical Poisson tensor factorization model significantly outperforms the Poisson autoregressive models in predicting event counts. We also visualize the key results of our exploratory data analysis.
  • Bigler, Paula (Helsingin yliopisto, 2019)
    Viiankiaapa mire, located in the municipality of Sodankylä, has drawn public attention after mining company, AA Sakatti Mining Oy published their discovery in 2011. The discovered Ni-Cu-PGE ore deposit, Sakatti, is located mainly under the Natura 2000 protected Viiankiaapa. Viiankiaapa is Natura 2000 protected due to the several natural habitat types and plant species one of these, H. vernicosus is known to thrive at the areas of groundwater influence. The Sakatti deposit is in exploration phase but it is possible that mining will start in future. Knowing the hydrogeology of the area is crucial for preventing possible negative changes if the mining starts. In this study the objectives were to study 1) the influence of groundwater at the western margin of Viiankiaapa, 2) the influence of Sakatti ore deposit to the hydrogeochemistry of the area, 3) the influence of hydrology and hydrogeochemistry to the endangered H. vernicosus species. The sampling was done in September and October 2016, March and April 2017 and continued in summer 2017. Samples were collected from surface water of the mire, groundwater, spring water as well as from different depth of peat pore water using mini-piezometer. EC, pH, temperature, stable isotopes, DSi, main ions, trace elements and dissolved organic carbon (DOC) were analyzed. The groundwater influence was visible at the area of Lake Viiankijärvi and Särkikoskenmaa fluvial sediment deposit. Depth profiles of stable isotopes and main ions indicated groundwater flow in deep peat layer and mixing with surface water as the groundwater flow upwards through the peat layer. At the Sakatti ore deposit area the isotopic composition of surface water samples represented mainly season’s precipitation with few exceptions. Possible groundwater discharge was visible at the area between Sakatti main deposit and River Kitinen as well as near Pahanlaaksonmaa. The isotopic chemistry of spring water samples at the bend of River Kitinen had values of mixed groundwater and surface water. It is likely that the mire water infiltrates through the peat layer and fluvial sediments and discharges to the springs and River Kitinen. The bedrock of the area is known to be weathered, which could explain surface water like isotope values in springs and in some of the bedrock groundwater observation wells. Positive correlation was found between H. vernicosus ecosystems and the depth of peat. A ribbon-shaped zone of habitats and 2 – 4 m thick peat layer crosses the mire. The correlation with groundwater discharge was not clear. Ca and Mg concentrations were smaller but pH and alkalinity were higher at the areas of H. vernicosus ecosystems. However the Ca and Mg concentrations resembled areal spring water chemistry, which could indicate groundwater influence. Areas without the ecosystems are located mainly near the Sakatti ore deposit. The influence of the deposit in hydrogeochemistry was locally visible as elevated electric conductivity, main ion and trace element concentrations of the surface water and peat pore water. This most likely explains why the areas without the ecosystems had higher element concentrations.
  • Ahlskog, Niki (Helsingin yliopisto, 2019)
    Progressiivisen web-sovelluksen (Progressive Web Application, PWA) tarkoitus on hämärtää tai jo- pa poistaa raja sovelluskaupasta ladattavan sovelluksen ja normaalin verkkosivuston välillä. PWA- sovellus on kuin mikä tahansa normaali verkkosivusto, mutta se täyttää lisäksi seuraavat mitta- puut: Sovellus skaalautuu mille tahansa laitteelle. Sovellus tarjotaan salatun yhteyden yli. Sovellus on mahdollista asentaa puhelimen kotinäytölle pikakuvakkeeksi, jolloin sovellus avautuu ilman se- laimesta tuttuja navigointityökaluja ja lisäksi sovelluksen voi myös avata ilman verkkoyhteyttä. Tässä työssä käydään läpi PWA-sovelluksen rakennustekniikoita ja määritellään milloin sovellus on PWA-sovellus. Työssä mitataan PWA-sovelluksen nopeutta Service Workerin välimuistitallen- nusominaisuuksien ollessa käytössä ja ilman. PWA-sovelluksen luomista ja käyttöönottoa tarkastel- laan olemassa olevassa yksityisessä asiakasprojektissa. Projektin tarkastelussa kiinnitetään huomio- ta PWA-sovelluksen tuomiin etuihin ja kipupisteisiin. Tuloksen arvioimiseksi otetaan Google Chromen Lighthouse -työkalua käyttäen mittaukset sovel- luksen progressiivisuudesta ja nopeudesta. Lisäksi sovellusta vasten ajetaan Puppeteer-kirjastoa hyödyntäen latausnopeuden laskeva testi useita kertoja sekä tarkastellaan PWA-sovelluksen Service Workerin välimuistin hyödyllisyyttä suorituskyvyn ja latausajan kannalta. Jotta Service Workerin välimuistin käytöstä voidaan tehdä johtopäätökset, nopeuden muutosta tarkastellaan progressii- visten ominaisuuksien ollessa käytössä ja niiden ollessa pois päältä. Lisäksi tarkastellaan Googlen tapaustutkimuksen kautta Service Workerin vaikutuksia sovelluksen nopeuteen. Testitulokset osoittavat että Service Workerin välimuistin hyödyntäminen on nopeampaa kaikissa tapauksissa. Service Workerin välimuisti on nopeampi kuin selaimen oma välimuisti. Service Worker voi myös olla pysähtynyt ja odotustilassa käyttäjän selaimessa. Silti Service Workerin aktivoimi- nen ja välimuistin käyttäminen on nopeampaa kuin selaimen välimuistista tai suoraan verkosta lataaminen.
  • Rodriguez Villanueva, Cesar Adolfo (Helsingin yliopisto, 2019)
    Spam detection techniques have made our lives easier by unclogging our inboxes and keeping unsafe messages from being opened. With the automation of text messaging solutions and the increase in telecommunication companies and message providers, the volume of text messages has been on the rise. With this growth came along malicious traffic which users had little control over. In this thesis, we present an implementation of a spam detection system in a real-world text messaging platform. Using well-established machine learning algorithms, we make an in-depth analysis on the performance of the models using two different datasets: one publicly available (N=5,574) and the other gathered from actual traffic of the platform (N=1,477). Making use of the empirical results, we outline the models and hyperparameters which can be used in the platform and in which scenarios they produce optimal performance. The results indicate that our dataset poses a great challenge at accurate classification, most likely due to the small sample size and unbalanced dataset, along with nuances in the dataset. Nevertheless, there were models that were found to have a good all-around performance and they can be trained and used in the platform.
  • Martikainen, Jussi-Pekka (Helsingin yliopisto, 2019)
    Wood is the fuel for the forest industry. Fellable wood is collected from the forests and requires transportation to the mills. The distance to the mills is quite often very long. The most used long- distance transportation means of wood in Finland is by road transportation with wood-trucks. The poor condition of the lower road network increases the transportation costs not only for the forest industry but for the whole natural resources industry. Timely information about the conditions of the lower road network is considered beneficial for the wood transportation and for the road maintenance planning to reduce the transportation related costs. Acquisition of timely information about the conditions of the lower road network is a laborious challenge to the industry specialists due to the vast size of the road network in Finland. Until the recent development in ubiquitous mobile computing collecting the road measurement data and the detection of certain road anomalies from the measurements has traditionally required expensive and specialized equipment. Crowdsensing with the capabilities of a modern smartphone is seen as inexpensive means with high potential to acquire timely information about the conditions of the lower road network. In this thesis a literature review is conducted to find out the deteriorative factors behind the conditions of the lower road network in Finland. Initial assumptions are drawn about the detectability of such factors from the inertial sensor data of a smartphone. The literature on different computational methods for detecting the road anomalies based on the obtained accelerometer and gyroscope measurement data is reviewed. As a result a summary about the usability of the reviewed computational methods for detecting the reviewed deteriorative factors is presented. And finally suggestions for further analysis for obtaining more training data for machine learning methods and for predicting the road conditions are presented.
  • Aula, Kasimir (Helsingin yliopisto, 2019)
    Air pollution is considered to be one of the biggest environmental risks to health, causing symptoms from headache to lung diseases, cardiovascular diseases and cancer. To improve awareness of pollutants, air quality needs to be measured more densely. Low-cost air quality sensors offer one solution to increase the number of air quality monitors. However, they suffer from low accuracy of measurements compared to professional-grade monitoring stations. This thesis applies machine learning techniques to calibrate the values of a low-cost air quality sensor against a reference monitoring station. The calibrated values are then compared to a reference station’s values to compute error after calibration. In the past, the evaluation phase has been carried out very lightly. A novel method of selecting data is presented in this thesis to ensure diverse conditions in training and evaluation data, that would yield a more realistic impression about the capabilities of a calibration model. To better understand the level of performance, selected calibration models were trained with data corresponding to different levels of air pollution and meteorological conditions. Regarding pollution level, using homogeneous training and evaluation data, the error of a calibration model was found to be even 85% lower than when using diverse training and evaluation pollution environment. Also, using diverse meteorological training data instead of more homogeneous data was shown to reduce the size of the error and provide stability on the behavior of calibration models.
  • Luhtakanta, Anna (Helsingin yliopisto, 2019)
    Finding and exploring relevant information from a huge amount of available information is crucial in today’s world. The information need can be a specific and precise search or a broad exploratory search, or even something between the two. Therefore, an entity-based search engine could provide a solution for combining these two search goals. The focus in this study is to 1) study previous research articles on different approaches for entity-based information retrieval and 2) implement a system which tries to provide a solution for both information need and exploratory information search, regardless of whether the search was made by using basic free form query or query with multiple entities. It is essential to improve search engines to support different types of information need in the incessantly expanding information space.
  • Suomalainen, Lauri (Helsingin yliopisto, 2019)
    Hybrid Clouds are one of the most notable trends in the current cloud computing paradigm and bare-metal cloud computing is also gaining traction. This has created a demand for hybrid cloud management and abstraction tools. In this thesis I identify shortcomings in Cloudify’s ability to handle generic bare-metal nodes. Cloudify is an open- source vendor agnostic hybrid cloud tool which allows using generic consumer-grade computers as cloud computing resources. It is not however capable to automatically manage joining and parting hosts in the cluster network nor does it retrieve any hardware data from the hosts, making the cluster management arduous and manual. I have designed and implemented a system which automates cluster creation and management and retrieves useful hardware data from hosts. I also perform experiments using the system which validate its correctness, usefulness and expandability.
  • Nietosvaara, Joonas (Helsingin yliopisto, 2019)
    We examine a previously known sublinear-time algorithm for approximating the length of a string’s optimal (i.e. shortest) Lempel-Ziv parsing (a.k.a. LZ77 factorization). This length is a measure of compressibility under the LZ77 compression algorithm, so the algorithm also estimates a string’s compressibility. The algorithm’s approximation approach is based on a connection between optimal Lempel-Ziv parsing length and the number of distinct substrings of different lengths in a string. Some aspects of the algorithm are described more explicitly than in earlier work, including the constraints on its input and how to distinguish between strings with short vs. long optimal parsings in sublinear time; several proofs (and pseudocode listings) are also more detailed than in earlier work. An implementation of the algorithm is provided. We experimentally investigate the algorithm’s practical usefulness for estimating the compressibility of large collections of data. The algorithm is run on real-world data under a wide range of approximation parameter settings. The accuracy of the resulting estimates is evaluated. The estimates turn out to be consistently highly inaccurate, albeit always inside the stated probabilistic error bounds. We conclude that the algorithm is not promising as a practical tool for estimating compressibility. We also examine the empirical connection between optimal parsing length and the number of distinct substrings of different lengths. The latter turns out to be a suprisingly accurate predictor of the former within our test data, which suggests avenues for future work.
  • Raitahila, Iivo (Helsingin yliopisto, 2019)
    The Internet of Things (IoT) consists of physical devices, such as temperature sensors and lights, that are connected to the Internet. The devices are typically battery powered and are constrained by their low processing power, memory and low bitrate wireless communication links. The vast amount of IoT devices can cause heavy congestion in the Internet if congestion is not properly addressed. The Constrained Application Protocol (CoAP) is an HTTP-like protocol for constrained devices built on top of UDP. CoAP includes a simple congestion control algorithm (DefaultCoAP). CoAP Simple Congestion Control/Advanced (CoCoA) is a more sophisticated alternative for DefaultCoAP. CoAP can also be run over TCP with TCP's congestion control mechanisms. The focus of this thesis is to study CoAP's congestion control. Shortcomings of DefaultCoAP and CoCoA are identified using empirical performance evaluations conducted in an emulated IoT environment. In a scenario with hundreds of clients and a large buffer in the bottleneck router, DefaultCoAP does not adapt to the long queuing delay. In a similar scenario where short-lived clients exchange only a small amount of messages, CoCoA clients are unable to sample a round-trip delay time. Both of these situations are severe enough to cause a congestion collapse, where most of the link bandwidth is wasted on unnecessary retransmissions. A new retransmission timeout and congestion control algorithm called Fast-Slow Retransmission Timeout (FASOR) is congestion safe in these two scenarios and is even able to outperform CoAP over TCP. FASOR with accurate round-trip delay samples is able to outperform basic FASOR in the challenging and realistic scenario with short-lived clients and an error-prone link.
  • Huovinen, Ilmari (Helsingin yliopisto, 2019)
    Jatkuva kehitys on joukko ohjelmistokehitysmenetelmiä, jotka mahdollistavat julkaisujen tekemisen luotettavasti ja tiheään tahtiin. Jatkuva kehitys sisältää useita menetelmiä, mutta niistä kolme laajasti tunnettua ovat jatkuva integraatio, jatkuva toimitus ja jatkuva julkaisu. Jatkuvassa integraatiossa muutokset integroidaan jatkuvasti yhteiseen koodikantaan, jatkuvassa toimittamisessa muutokset toimitetaan jatkuvasti tuotantoa muistuttavaan ympäristöön ja jatkuvassa julkaisussa muutokset julkaistaan jatkuvasti. GameRefinery on pelialan yritys, joka kehittää mobiilipelien analyysi- ja markkinadataa tarjoavaa SaaS-palvelua. Palvelun kasvaessa on huomattu haasteita sen kehittämisessä ja ylläpidossa. Tämän tutkielman tarkoituksena selvittää mitä kehitykseen ja julkaisuun liittyviä ongelmia GameRefinery SaaS:ssa on koettu, ja sunnitella jatkuvan kehityksen menetelmiä käyttävä prosessi korjaamaan koettuja ongelmia. GameRefinery SaaSin kehitykseen ja julkaisuun liittyviä ongelmat etsittiin tutkimalla GameRefinery SaaSin aiempia versioita ja selvittämällä mitkä niiden ratkaisut haittasivat ohjelmiston kehitystä ja julkaisua ja mitkä tukivat niitä. Tämän jälkeen verrattiin eroja GameRefinery SaaSin versioiden kehitys- ja julkaisuprosesseissa. Lopuksi eriteltiin löydetyt kehitykseen ja julkaisuun liittyvät ongelmat. Ongelmia löydettiin versiohallintakäytännöistä, toimitus- ja julkaisuprosessista ja virheistä palautumisessa. Lisäksi huomattiin, että arkkitehtuurinen ratkaisu siirtää eräajot pois mikropalveluista omiksi projekteikseen aiheutti ongelmia julkaisujen tekemisessä. Löydettyjä ongelmia ratkaistiin suunnittelemalla jatkuvan kehityksen prosessi, joka perustui Jenkins-automaatiopalvelimen käyttöönottamiseen ja jatkuvan kehityksen menetelmiä hyödyntävän automaatioputken toteuttamiseen. Suunniteltu prosessi selvensi version- ja haaranhallinta käytäntöjä, korjaten ongelmia julkaisuversioiden kasaamisessa eri ominaisuuksista ja estämällä keskeneräisen koodin pääsemisen julkaisuhaaraan. Uudessa prosessissa myös automatisoitiin toimitusprosessi niin, että tietomallin muokkaus otettiin osaksi automaatioputkea poistaen aikaisemmin manuaalisesti suoritetun vaiheen toimituksessa. Näiden lisäksii esitettiin mahdolliset ongelmatilanteet virheestä palautumisessa ja tapoja korjata niitä. Uusi prosessi ei kuitenkaan onnistunut korjaamaan eräajojen siirtämisestä aiheutuneita ongelmia, vaikkakin eräajojen mukaan ottaminen automaatioputkeen lievensi niitä.
  • Hantula, Otto (Helsingin yliopisto, 2019)
    Emergence of language grounded in perception has been studied in computational agent societies with language games. In this thesis language games are used to investigate methods for grounding language in practicality. This means that the emergence of the language is based on the needs of the agents. The needs of an agent arise from its goals and environment, which together dictate what the agents should communicate to each other. The methods for practicality grounding are implemented in a simulation, where agents fetch items from shelves in a 2D grid warehouse. The agents learn a simple language consisting of words for spatial categories of xy-coordinates and different types of places in the warehouse environment. The language is learned and used through two novel language games called the Place Game and the Query Game. In these games the agents use the spatial categories and place types to refer to different locations in the warehouse, exchanging important information that can be used to make better decisions. The empirical simulation results show that the agents can utilise their language to be more efficient in fetching items. In other words the emerged language is practical.
  • Akbas, Suleyman (Helsingin yliopisto, 2019)
    This thesis investigates the suitability of agile methodology in distributed software development. This is done by first identifying the challenges of distributed software development which are, by the reviewed literature, communication and collaboration, decrease in teamness feeling, architectural and technical challenges, and decreased visibility for the project status. Then, the thesis presents the agile methodology with its two methods, namely Scrum and Extreme Programming (XP). Thirdly, the benefits and the challenges of applying the agile methodology in distributed software development are determined. Findings from literature are tested versus a case study which was done in a globally distributed software development team who had worked on an important project in a multinational private software company. The data collection methods were the participant-observation done by the author as a part of the team, author’s notes on the critical events, and also the semi-structured interviews done with the team members from different roles and different teams. Empirical results show that agile methodology, more specifically Scrum and XP, helps with many aspects in distributed software development, which include increased communication and collaboration, improved visibility for the project status, and also increased the sense of trust within the team. It is also discovered that agile methodology helps with onboarding new people to the team. Furthermore, limited documentation in agile methodology and virtual pair programming do not affect the distributed teams negatively according to empirical evidence. Finally, empirical data also shows that applying agile methodology in distributed software development has some challenges such as the number of meetings. Empirical results show resemblance with the reviewed literature in many parts such as increased communication and collaboration as a benefit of distributed agile software development. However, there are also some aspects contradicting the reviewed literature. For example, limited documentation appears as a challenge of distributed agile development in the reviewed literature, whereas it did not seem to be a challenge in the empirical case. Furthermore, this study can be extended by observing other empirical cases, notably failed projects, not only in software development but also in other fields.
  • Ahmad, Ayesha (Helsingin yliopisto, 2019)
    Public transport networks are a subset of vehicular networks with some important distinctions; the actors in the network include buses and bus-stops, they are predictable and they provide reliable physical coverage of an area. The Public Transport Network of a city can also be interpreted as an opportunistic network where nodes are bus-stops and communication between these nodes occurs when a bus travels between two bus-stops. How will a data communication network perform when built upon the opportunistic network formed by the public transport system of a city? In this thesis we explore this question basing our analysis on Helsinki Region’s public bus transport system as a real example. We explore the performance of a public transport network when used for communication of data using both simulation of the network and graph analysis. The key performance factors studied are the data delivery ratio and data delivery time. Additional issues considered are the kind of applications such a system is suited for, the important characteristics governing the reliability and efficacy of such a data communications system, and the design guidelines for building such an application. The results demonstrate that data transfer applications can be built over a city’s Public Transport Network.
  • Hansson, Kristian (Helsingin yliopisto, 2019)
    Reunalaskennan tarkoituksena on siirtää tiedonkäsittelyä lähemmäs tiedon lähdettä, sillä keskitettyjen palvelinten laskentakyky ei riitä tulevaisuudessa kaiken tiedon samanaikaiseen analysointiin. Esineiden internet on yksi reunalaskennan käyttötapauksista. Reunalaskennan järjestelmät ovat melko monimutkaisia ja vaativat yhä enemmän ketterien DevOps-käytäntöjen soveltamista. Näiden käytäntöjen toteuttamiseen on löydettävä sopivia teknologioita. Ensimmäiseksi tutkimuskysymykseksi asetettiin: Millaisia teknisiä ratkaisuja reunalaskennan sovellusten toimittamiseen on sovellettu? Tähän vastattiin tarkastelemalla teollisuuden, eli pilvipalveluntarjoajien ratkaisuja. Teknisistä ratkaisuista paljastui, että reunalaskennan sovellusten toimittamisen välineenä käytetään joko kontteja tai pakattuja hakemistoja. Reunan ja palvelimen väliseen kommunikointiin hyödynnettiin kevyitä tietoliikenneprotokollia tai VPN-yhteyttä. Kirjallisuuskatsauksessa konttiklusterit todettiin mahdolliseksi hallinnoinnin välineeksi reunalaskennassa. Ensimmäisen tutkimuskysymyksen tuloksista johdettiin toinen tutkimuskysymys: Voiko Docker Swarmia hyödyntää reunalaskennan sovellusten operoinnissa? Kysymykseen vastattiin empiirisellä tapaustutkimuksella. Keskitetty reunalaskennan sovellusten toimittamisen prosessi rakennettiin Docker Swarm -konttiklusteriohjelmistoa, pilvipalvelimia ja Raspberry Pi -korttitietokoneita hyödyntäen. Toimittamisen lisäksi huomioitiin ohjelmistojen suorituksenaikainen valvonta, edellisen ohjelmistoversion palautus, klusterin laitteiden ryhmittäminen, fyysisten lisälaitteiden liittäminen ja erilaisten suoritinarkkitehtuurien mahdollisuus. Tulokset osoittivat, että Docker Swarmia voidaan hyödyntää sellaisenaan reunalaskennan ohjelmistojen hallinnointiin. Docker Swarm soveltuu toimittamiseen, valvontaan, edellisen version palauttamiseen ja ryhmittämiseen. Lisäksi sen avulla voi luoda samaa ohjelmistoa suorittavia klustereita, jotka koostuvat arkkitehtuuriltaan erilaisista suorittimista. Docker Swarm osoittautui kuitenkin sopimattomaksi reunalaitteeseen kytkettyjen lisälaitteiden ohjaamiseen. Teollisuuden tarjoamien reunalaskennan ratkaisujen runsas määrä osoitti laajaa kiinnostusta konttien käytännön soveltamiseen. Tämän tutkimuksen perusteella erityisesti konttiklusterit osoittautuivat lupaavaksi teknologiaksi reunalaskennan sovellusten hallinnointiin. Lisänäytön saamiseksi on tarpeen tehdä laajempia empiirisiä jatkotutkimuksia samankaltaisia puitteita käyttäen.
  • Ilse, Tse (Helsingin yliopisto, 2019)
    Background: Electroencephalography (EEG) depicts electrical activity in the brain, and can be used in clinical practice to monitor brain function. In neonatal care, physicians can use continuous bedside EEG monitoring to determine the cerebral recovery of newborns who have suffered birth asphyxia, which creates a need for frequent, accurate interpretation of the signals over a period of monitoring. An automated grading system can aid physicians in the Neonatal Intensive Care Unit by automatically distinguishing between different grades of abnormality in the neonatal EEG background activity patterns. Methods: This thesis describes using support vector machine as a base classifier to classify seven grades of EEG background pattern abnormality in data provided by the BAby Brain Activity (BABA) Center in Helsinki. We are particularly interested in reconciling the manual grading of EEG signals by independent graders, and we analyze the inter-rater variability of EEG graders by building the classifier using selected epochs graded in consensus compared to a classifier using full-duration recordings. Results: The inter-rater agreement score between the two graders was κ=0.45, which indicated moderate agreement between the EEG grades. The most common grade of EEG abnormality was grade 0 (continuous), which made up 63% of the epochs graded in consensus. We first trained two baseline reference models using the full-duration recording and labels of the two graders, which achieved 71% and 57% accuracy. We achieved 82% overall accuracy in classifying selected patterns graded in consensus into seven grades using a multi-class classifier, though this model did not outperform the two baseline models when evaluated with the respective graders’ labels. In addition, we achieved 67% accuracy in classifying all patterns from the full-duration recording using a multilabel classifier.

View more