Browsing by Subject "Master's Programme in Computer Science"

Sort by: Order: Results:

Now showing items 21-40 of 75
  • Bui, Minh (Helsingin yliopisto, 2021)
    Background. In API requests to a confidential data system, there always are sets of rules that the users must follow to retrieve desired data within their granted permission. These rules are made to assure the security of the system and limit all possible violations. Objective. The thesis is about detecting the violations of these rules in such systems. For any violation found, the request is considered as containing inconsistency and it must be fixed before retrieving any data. This thesis also looks for all diagnoses of inconsistencies requests. These diagnoses promote reconstructing the requests to remove any inconsistency. Method. In this thesis, we choose the design science research methodology to work on solutions. In this methodology, the current problem in distributing data from a smart building plays as the main motivation. Then, system design and development are implemented to prove the found solutions of practicality, while a testing system is built to confirm its validity. Results. The inconsistencies detection is considered as a diagnostic problem, and many algorithms have been found to resolve the diagnostic problem for decades. The algorithms are developed based on DAG algorithms and preserved to apply on different purposes. This thesis is based on these algorithms and constraint programming techniques to resolve the facing issues of the given confidential data system. Conclusions. A combination of constraint programming techniques and DAG algorithms for diagnostic problems can be used to resolve inconsistencies detection in API requests. Despite the need on performance improvement in application of these algorithms, the combination works effectively, and can resolve the research problem.
  • Hertweck, Corinna (Helsingin yliopisto, 2020)
    In this work, we seek robust methods for designing affirmative action policies for university admissions. Specifically, we study university admissions under a real centralized system that uses grades and standardized test scores to match applicants to university programs. For the purposes of affirmative action, we consider policies that assign bonus points to applicants from underrepresented groups with the goal of preventing large gaps in admission rates across groups, while ensuring that the admitted students are for the most part those with the highest scores. Since such policies have to be announced before the start of the application period, there is uncertainty about which students will apply to which programs. This poses a difficult challenge for policy-makers. Hence, we introduce a strategy to design policies for the upcoming round of applications that can either address a single or multiple demographic groups. Our strategy is based on application data from previous years and a predictive model trained on this data. By comparing this predictive strategy to simpler strategies based only on application data from, e.g., the previous year, we show that the predictive strategy is generally more conservative in its policy suggestions. As a result, policies suggested by the predictive strategy lead to more robust effects and fewer cases where the gap in admission rates is inadvertently increased through the suggested policy intervention. Our findings imply that universities can employ predictive methods to increase the reliability of the effects expected from the implementation of an affirmative action policy.
  • Ikkala, Tapio (Helsingin yliopisto, 2020)
    This thesis presents a scalable method for identifying anomalous periods of non-activity in short periodic event sequences. The method is tested with real world point-of-sale (POS) data from grocery retail setting. However, the method can be applied also to other problem domains which produce similar sequential data. The proposed method models the underlying event sequence as a non-homogeneous Poisson process with a piecewise constant rate function. The rate function for the piecewise homogeneous Poisson process can be estimated with a change point detection algorithm that minimises a cost function consisting of the negative Poisson log-likelihood and a penalty term that is linear to the number of change points. The resulting model can be queried for anomalously long periods of time with no events, i.e., waiting times, by defining a threshold below which the waiting time observations are deemed anomalies. The first experimental part of the thesis focuses on model selection, i.e., in finding a penalty value that results in the change point detection algorithm detecting the true changes in the intensity of the arrivals of the events while not reacting to random fluctuations in the data. In the second experimental part the performance of the anomaly detection methodology is measured against stock-out data, which gives an approximate ground truth for the termination of a POS event sequence. The performance of the anomaly detector is found to be subpar in terms of precision and recall, i.e., the true positive rate and the positive predictive value. The number of false positives remains high even with small threshold values. This needs to be taken into account when considering applying the anomaly detection procedure in practice. Nevertheless, the methodology may have practical value in the retail setting, e.g., in guiding the store personnel where to focus their resources in ensuring the availability of the products.
  • Sarapalo, Joonas (Helsingin yliopisto, 2020)
    The page hit counter system processes, counts and stores page hit counts gathered from page hit events from a news media company’s websites and mobile applications. The system serves a public application interface which can be queried over the internet for page hit count information. In this thesis I will describe the process of replacing a legacy page hit counter system with a modern implementation in the Amazon Web Services ecosystem utilizing serverless technologies. The process includes the background information, the project requirements, the design and comparison of different options, the implementation details and the results. Finally, I will show how the new system implemented with Amazon Kinesis, AWS Lambda and Amazon DynamoDB has running costs that are less than half of that of the old one’s.
  • Duong, Quoc Quan (Helsingin yliopisto, 2021)
    Discourse dynamics is one of the important fields in digital humanities research. Over time, the perspectives and concerns of society on particular topics or events might change. Based on the changing in popularity of a certain theme different patterns are formed, increasing or decreasing the prominence of the theme in news. Tracking these changes is a challenging task. In a large text collection discourse themes are intertwined and uncategorized, which makes it hard to analyse them manually. The thesis tackles a novel task of automatic extraction of discourse trends from large text corpora. The main motivation for this work lies in the need in digital humanities to track discourse dynamics in diachronic corpora. Machine learning is a potential method to automate this task by learning patterns from the data. However, in many real use-cases ground truth is not available and annotating discourses on a corpus-level is incredibly difficult and time-consuming. This study proposes a novel procedure to generate synthetic datasets for this task, a quantitative evaluation method and a set of benchmarking models. Large-scale experiments are run using these synthetic datasets. The thesis demonstrates that a neural network model trained on such datasets can obtain meaningful results when applied to a real dataset, without any adjustments of the model.
  • Talonpoika, Ville (Helsingin yliopisto, 2020)
    In recent years, virtual reality devices have entered the mainstream with many gaming-oriented consumer devices. However, the locomotion methods utilized in virtual reality games are yet to gain a standardized form, and different types of games have different requirements for locomotion to optimize player experience. In this thesis, we compare some popular and some uncommon locomotion methods in different game scenarios. We consider their strengths and weaknesses in these scenarios from a game design perspective. We also create suggestions on which kind of locomotion methods would be optimal for different game types. We conducted an experiment with ten participants, seven locomotion methods and five virtual environments to gauge how the locomotion methods compare against each other, utilizing game scenarios requiring timing and precision. Our experiment, while small in scope, produced results we could use to construct useful guidelines for selecting locomotion methods for a virtual reality game. We found that the arm swinger was a favourite for situations where precision and timing was required. Touchpad locomotion was also considered one of the best for its intuitiveness and ease of use. Teleportation is a safe choice for games not requiring a strong feeling of presence.
  • Walder, Daniel (Helsingin yliopisto, 2021)
    Cloud vendors have many data centers around the world and offer in each data center possibilities to rent computational capacities with different prices depending on the needed power and time. Most vendors offer flexible pricing, where prices can change hourly, for instance, Amazon Web Services. According to those vendors, price changes depend highly on the current workload. The more workload, the pricier it is. In detail, this paper is about the offered spot services. To get the most potential out of this flexible pricing, we build a framework with the name ELMIT, which stands for Elastic Migration Tool. ELMIT’s job is to perform price forecasting and eventually perform migrations to cheaper data centers. In the end, we monitored seven spot instances with ELMIT’s help. For three instances no migration was needed, because no other data center was ever cheaper. However, for the other four instances ELMIT performed 38 automatic migrations within around 42 days. Around 160$ were saved. In detail, three out of four instances reduced costs by 14.35%, 4.73% and 39.6%. The fourth performed unnecessary migrations and cost at the end more money due to slight inaccuracies in the predictions. In total, around 50 cents more. Overall, the outcome of ELMIT’s monitoring job is promising. It gives reason to keep developing and improving ELMIT, to increase the outcome even more.
  • Karis, Peter (Helsingin yliopisto, 2020)
    This thesis presents a user study to evaluate the usability and effectiveness of a novel search interface as compared to a more traditional solution. InnovationMap is a novel search interface by Khalil Klouche, Tuukka Ruotsalo and Giulio Jacucci (University of Helsinki). It is a tool for aiding the user to perform ‘exploratory searching’; a type of search activity where the user is exploring an information space unknown to them and thus cannot form a specific search phrase to perform a traditional ‘lookup’ search as with the conventional search interfaces. In this user study InnovationMap is compared against TUHAT, a search portal that is currently in use at the University of Helsinki for searching for research works and research personnel from the university databases. The user evaluation is conducted as a qualitative within-subject study using volunteer users from the University of Helsinki. Each participant uses both systems in an alternating order over the course of two sessions. During the two sessions the volunteer user carries out information finding tasks defined in the experiment design, answers to a SUS (System Usability Scale) questionnaire and participates in a semi-structured interview. The answers from the assigned tasks are then evaluated and scored by field experts. The combined results from these methods are then used to formulate an educated assessment of the usability, effectiveness and future development potential of the InnovationMap search system.
  • Aula, Kasimir (Helsingin yliopisto, 2019)
    Air pollution is considered to be one of the biggest environmental risks to health, causing symptoms from headache to lung diseases, cardiovascular diseases and cancer. To improve awareness of pollutants, air quality needs to be measured more densely. Low-cost air quality sensors offer one solution to increase the number of air quality monitors. However, they suffer from low accuracy of measurements compared to professional-grade monitoring stations. This thesis applies machine learning techniques to calibrate the values of a low-cost air quality sensor against a reference monitoring station. The calibrated values are then compared to a reference station’s values to compute error after calibration. In the past, the evaluation phase has been carried out very lightly. A novel method of selecting data is presented in this thesis to ensure diverse conditions in training and evaluation data, that would yield a more realistic impression about the capabilities of a calibration model. To better understand the level of performance, selected calibration models were trained with data corresponding to different levels of air pollution and meteorological conditions. Regarding pollution level, using homogeneous training and evaluation data, the error of a calibration model was found to be even 85% lower than when using diverse training and evaluation pollution environment. Also, using diverse meteorological training data instead of more homogeneous data was shown to reduce the size of the error and provide stability on the behavior of calibration models.
  • Joswig, Niclas (Helsingin yliopisto, 2021)
    Simultaneous Localization and Mapping (SLAM) research is gaining a lot of traction as the available computational power and the demand for autonomous vehicles increases. A SLAM system solves the problem of localizing itself during movement (Visual Odometry) and, at the same time, creating a 3D map of its surroundings. Both tasks can be solved on the basis of expensive and spacious hardware like LiDaRs and IMUs, but in this subarea of visual SLAM research aims at replacing those costly sensors by, ultimately, inexpensive monocular cameras. In this work I applied the current state-of-the-art in end-to-end deep learning-based SLAM to a novel dataset comprising of images recorded from cameras mounted to an indoor crane from the Konecranes CXT family. One major aspect that is unique about our proposed dataset is the camera angle that resembles a classical bird’s-eye view towards the ground. This orientation change coming alongside with a novel scene structure has a large impact on the subtask of mapping the environment, which is in this work done through monocular depth prediction. Furthermore, I will assess which properties of the given industrial environments have the biggest impact on the system’s performance to identify possible future research opportunities for improvement. The main performance impairments I examined, that are characteristic for most types of industrial premise, are non-lambertian surfaces, occlusion and texture-sparse areas alongside the ground and walls
  • Joswig, Niclas (Helsingin yliopisto, 2021)
    Simultaneous Localization and Mapping (SLAM) research is gaining a lot of traction as the available computational power and the demand for autonomous vehicles increases. A SLAM system solves the problem of localizing itself during movement (Visual Odometry) and, at the same time, creating a 3D map of its surroundings. Both tasks can be solved on the basis of expensive and spacious hardware like LiDaRs and IMUs, but in this subarea of visual SLAM research aims at replacing those costly sensors by, ultimately, inexpensive monocular cameras. In this work I applied the current state-of-the-art in end-to-end deep learning-based SLAM to a novel dataset comprising of images recorded from cameras mounted to an indoor crane from the Konecranes CXT family. One major aspect that is unique about our proposed dataset is the camera angle that resembles a classical bird’s-eye view towards the ground. This orientation change coming alongside with a novel scene structure has a large impact on the subtask of mapping the environment, which is in this work done through monocular depth prediction. Furthermore, I will assess which properties of the given industrial environments have the biggest impact on the system’s performance to identify possible future research opportunities for improvement. The main performance impairments I examined, that are characteristic for most types of industrial premise, are non-lambertian surfaces, occlusion and texture-sparse areas alongside the ground and walls.
  • Kangas, Vilma (Helsingin yliopisto, 2020)
    Software testing is an important process when ensuring a program's quality. However, testing has not traditionally been a very substantial part of computer science education. Some attempts to integrate it into the curriculum has been made but best practices still prove to be an open question. This thesis discusses multiple attempts of teaching software testing during the years. It also introduces CrowdSorcerer, a system for gathering programming assignments with tests from students. It has been used in introductory programming courses in University of Helsinki. To study if the students benefit from creating assignments with CrowdSorcerer, we analysed the number of assignments and tests they created and if they correlate with their performance in a testing-related question in the course exam. We also gathered feedback from the students on their experiences from using CrowdSorcerer. Looking at the results, it seems that more research on how to teach testing would be beneficial. Improving CrowdSorcerer would also be a good idea.
  • Heinonen, Ava (Helsingin yliopisto, 2020)
    The design of instructional material affects learning from it. Abstraction, or limiting details and presenting difficult concepts by linking them with familiar objects, can limit the burden to the working memory and make learning easier. The presence of visualizations and the level to which students can interact with them and modify them also referred to as engagement, can promote information processing. This thesis presents the results of a study using a 2x3 experimental design with abstraction level (high abstraction, low abstraction) and engagement level (no viewing, viewing, presenting) as the factors. The study consisted of two experiments with different topics: hash tables and multidimensional arrays. We analyzed the effect of these factors on instructional efficiency and learning gain, accounting for prior knowledge, and prior cognitive load. We observed that high abstraction conditions limited study cognitive load for all participants, but were particularly beneficial for participants with some prior knowledge on the topic they studied. We also observed that higher engagement levels benefit participants with no prior knowledge on the topic they studied, but not necessarily participants with some prior knowledge. Low cognitive load in the pre-test phase makes studying easier regardless of the instructional material, as does knowledge on the topic being studied. Our results indicate that the abstractions and engagement with learning materials need to be designed with the students and their knowledge levels in mind. However, further research is needed to assess the components in different abstraction levels that affect learning outcomes and why and how cognitive load in the pre-test phase affects cognitive load throughout studying and testing.
  • Jylhä-Ollila, Pekka (Helsingin yliopisto, 2020)
    K-mer counting is the process of building a histogram of all substrings of length k for an input string S. The problem itself is quite simple, but counting k-mers efficiently for a very large input string is a difficult task that has been researched extensively. In recent years the performance of k-mer counting algorithms have improved significantly, and there have been efforts to use graphics processing units (GPUs) in k-mer counting. The goal for this thesis was to design, implement and benchmark a GPU accelerated k-mer counting algorithm SNCGPU. The results showed that SNCGPU compares reasonably well to the Gerbil k-mer counting algorithm on a mid-range desktop computer, but does not utilize the resources of a high-end computing platform as efficiently. The implementation of SNCGPU is available as open-source software.
  • Seppänen, Jukka-Pekka (Helsingin yliopisto, 2021)
    Helsingin yliopiston hammaslääketieteellisen koulutusohjelman suoritteita seurataan erinäisin Excel-taulukoin ja paperisin lomakkein. Suoritteet ovat osa opiskelijan kehittymistä kohti työelämää ja vaadittavien suoritteiden suorittamisen jälkeen opiskelijoille myönnetään oikeus toimia hammaslääkärin tehtävissä. Nykyisen järjestelmän ongelmana on opiskelijoiden tutkinnon kehittymisen seurannan vaikeus, sekä opiskelijan näkökulmasta oman oikeusturvan toteutuminen. Excel-taulukoiden julkinen näkyvyys opiskelijoiden keskuudessa mahdollistaa väärinkäytön, jossa opiskelija muuttaa toisen opiskelijan suoritteiden tietoja. Tässä tutkielmassa tutkitaan arkkitehtuurisia ratkaisuja, joilla suoriteseuranta voidaan tulevaisuudessa digitalisoida. Tutkielman lopputuloksena suositellaan järjetelmälle käytettävä tietokanta sekä sovellusarkkitehtuurimalli. Koska järjestelmässä käyttäjämäärä on rajattu hyvin pieneksi ja järjestelmän käyttö on satunnaista, ei järjestelmän tarvitse olla kovinkaan skaalautuva. Opiskelijan oikeusturvan kannalta on olennaista, että jokainen opiskelijan tekemä suorite tallennetaan kantaan ja kannan tila pysyy vakaana koko järjestelmän elinkaaren ajan. Tämän takia on suositeltavaa valita relaatiopohjainen tietokanta kuten PostgreSQL, joka tukee relaatiomallin lisäksi joustavia dokumenttitietokannasta tuttuja rakenteita. Arkkitehtuurimalliksi järjestelmään on suositeltavaa käyttää joko monoliittimallia, jossa järjestelmä toteutetaan yhden rajapinnan päälle, tai vaihtoehtoisesti mikropalveluina, jossa järejstelmä on jaettu kolmeen eri mikropalveluun.
  • Kallio, Jarmo (Helsingin yliopisto, 2021)
    Despite benefits and importance of ERP systems, they suffer from many usability problems. They have user interfaces that are complex and suffer from "daunting usability problems". Also, their implementation success rate is relatively low and their usability significantly influences this implementation success. As a company offering an ERP system to ferry operators was planning to renew the user interface of this system in future, we investigated usability of the current system so this could guide future implementation of the new user interface. We studied new and long time users by conducting sessions where the users told about their experiences, performed tasks with the system and filled usability questionnaire (System Usability Scale). Many novice and long time users reported problems. The scores from usability questionnaire show all but two participants perceived the usability of the system as below average and in adjective rating "not acceptable". Two users rated the usability as "excellent". We reasoned that there could be a group of users who use the system in such a way and in such context that they do not experience these problems. The results indicate novices have trouble, for example, navigating and completing tasks. Also some long time users reported navigation issues. The system seems to require that it’s users remember lots of things in order to use it well. The interviews and tasks indicate the system is complex and hard to use and both novices and experts face problems. This is supported by perceived usability scores. While experts could in most cases finish all tasks, during interview some of them reported problems such as finding products the customers needed, error reporting being unclear, configuration being tedious, and need for lots of manual typing, for example. We gave recommendations on what to consider when implementing new user interface for this ERP system. For example, navigation should be improved and users should be provided with powerful search tools. ERP usability is not studied much. Our study supports use of already developed heuristics in classifying usability problems. Our recommendations how to improve usability of the ERP system studied should give some guidelines on what could be done, although not much is backed by laboratory studies. More work is needed in this field to find and test solutions to usability problems users face.
  • Kemppainen, Esa (Helsingin yliopisto, 2020)
    NP-hard optimization problems can be found in various real-world settings such as scheduling, planning and data analysis. Coming up with algorithms that can efficiently solve these problems can save various rescources. Instead of developing problem domain specific algorithms we can encode a problem instance as an instance of maximum satisfiability (MaxSAT), which is an optimization extension of Boolean satisfiability (SAT). We can then solve instances resulting from this encoding using MaxSAT specific algorithms. This way we can solve instances in various different problem domains by focusing on developing algorithms to solve MaxSAT instances. Computing an optimal solution and proving optimality of the found solution can be time-consuming in real-world settings. Finding an optimal solution for problems in these settings is often not feasible. Instead we are only interested in finding a good quality solution fast. Incomplete solvers trade guaranteed optimality for better scalability. In this thesis, we study an incomplete solution approach for solving MaxSAT based on linear programming relaxation and rounding. Linear programming (LP) relaxation and rounding has been used for obtaining approximation algorithms on various NP-hard optimization problems. As such we are interested in investigating the effectiveness of this approach on MaxSAT. We describe multiple rounding heuristics that are empirically evaluated on random, crafted and industrial MaxSAT instances from yearly MaxSAT Evaluations. We compare rounding approaches against each other and to state-of-the-art incomplete solvers SATLike and Loandra. The LP relaxation based rounding approaches are not competitive in general against either SATLike or Loandra However, for some problem domains our approach manages to be competitive against SATLike and Loandra.
  • Lumme, Iina (Helsingin yliopisto, 2021)
    Indoor localization in Smart factories encounters difficult conditions due to metallic environ- ments. Nevertheless, it is one of the enablers for the ongoing industrial revolution, Industry 4.0. This study investigates the usability of indoor localization in a real factory site by tracking hoist assembly process. To test the hypothesis that indoor localization works in a factory environment, an Ultra- Wideband Indoor Positioning System was installed to cover the hoist assembly space. The system followed hoist assembly trolleys for three weeks after which data was analysed by cal- culating assembly times. The results show that indoor localization with Ultra-Wideband technology is a working solution for industrial environments similar to the tested environment. The time calculations are more accurate than known standard times and reveal that hoist assemblies are not standard and there is wasted time. The results suggest that indoor localization is adaptable to industrial environments and to manufacturing processes. Analysing the processes through the position data provides new knowledge that is used for improving the productivity.
  • Rinta-Homi, Mikko (Helsingin yliopisto, 2020)
    Heating, ventilation, and air conditioning (HVAC) systems consume massive amounts of energy. Fortunately, by carefully controlling these systems, a significant amount of energy savings can be achieved. This requires detecting a presence or amount of people inside the building. Countless different sensors can be used for this purpose, most common being air quality sensors, passive infrared sensors, wireless devices, and cameras. A comprehensive review and comparison are done for these sensors in this thesis. Low-resolution infrared cameras in counting people are further researched in this thesis. The research is about how different infrared camera features influence counting accuracy. These features are resolution, frame rate and viewing angle. Two systems were designed: a versatile counting algorithm, and a testing system which modifies these camera features and tests the performance of the counting algorithm. The results prove that infrared cameras with resolution as low as 4x2 are as accurate as higher resolution cameras, and that frame rate above 5 frames per second does not bring any significant advantages in accuracy. Resolution of 2x2 is also sufficient in counting but requires higher frame rates. Viewing angles need to be carefully adjusted for best accuracy. In conclusion, this study proves that even the most primitive infrared cameras can be used for accurate counting. This puts infrared cameras in a new light since primitive cameras can be cheaper to manufacture. Therefore, infrared cameras used in occupancy counting become significantly more feasible and have potential for widespread adoption.
  • Baumgartner, Axel (Helsingin yliopisto, 2021)
    Keväällä 2020 koronavirusepidemia pakotti suuren osan väestöstä työskentelemään etänä. ICT-alalla etätyöskentely ei ole epätavallista ja siitä on tehty paljon tutkimusta, etenkin globaalisti hajautettujen ohjelmistokehitysryhmien näkökulmasta. Koronavirusepidemian aiheuttaman etätyöpakotteen ominaispiirteenä on nopea ja yllättävä siirtyminen lähityöskentelystä etätyöskentelyyn, jota tutkitaan tässä tutkielmassa tarkemmin. Tutkielmassa keskitytään ketteriä ohjelmistokehitysmenetelmiä hyödyntäviin kehitysryhmiin. Taustana käytetään tutkimusmateriaalia ketterästä ohjelmistokehityksestä ja verrataan sitä tapaustutkimuksen tuloksiin. Tapaustutkimuksessa selvitetään kohderyhmän etätyöskentelyyn siirtymisen aikana syntyneitä ilmiöitä. Tavoitteena on tunnistaa taustatiedosta poikkeavat ilmiöt ja määritellä niistä jatkotutkimusaiheita. Tapaustutkimuksen tuloksesta selviää käytössä olevien työkalujen soveltuminen niin lähi- kuin etätyöskentelyyn. Siirtyminen sujui ilman suurempia ongelmia ja työskentely on jatkunut tauotta. Ongelmat keskittyvät kommunikaatioon ja sen vähenemiseen. Ketterien menetelmien rutiininen merkitys ja määrittely korostuu myös etätyöskentelyssä. Mahdollisiksi jatkotutkimusaiheiksi erottuu virtuaalisen valkotaulun ja jatkuvan puheyhteyden hyödyntäminen.