Browsing by Subject "big data"

Sort by: Order: Results:

Now showing items 1-20 of 26
  • Errib, Abu (Helsingin yliopisto, 2021)
    Big Data is considered an essential asset for online business models and crucial for their services. These business models depend on the processing and monetization of the data; thus, big data is said to be the driving force of their market behavior. The emergence of big data for online platform businesses may give rise to a specific type of abuse under Article 102 TFEU. For instance, incumbents may prevent competitors from accessing valuable data. In this regard, this thesis will analyze the concept of refusal to supply, which is considered an abuse of dominance under Article 102 TFEU in certain circumstances. Therefore, the purpose is to analyze the applicability of the refusal to supply in big data situations. The research question of this thesis is – can an online platform´s refusal to provide access to data constitutes an abuse of dominant position according to Article 102 TFEU. The analysis leads to the conclusion that under certain conditions, a dominant company can be forced to provide access to its datasets if the requirement is met. This thesis will mainly consider the issue of the indispensability requirement of big data.
  • Müürisepp, Kerli; Järv, Olle; Tammaru, Tiit; Toivonen, Tuuli (2022)
    The activity space approach is increasingly mobilized in spatial segregation research to broaden its scope from residential neighborhoods to other socio-spatial contexts of people. Activity space segregation research is an emerging field, characterized by quick adaptation of novel data sources and interdisciplinary methodologies. In this article, we present a methodological review of activity space segregation research by identifying approaches, methods and data sources applied. First, our review highlights that the activity space approach enables segregation to be studied from the perspectives of people, places and mobility flows. Second, the results reveal that both traditional data sources and novel big data sources are valuable for studying activity space segregation. While traditional sources provide rich background information on people for examining the social dimension of segregation, big data sources bring opportunities to address temporality, and increase the spatial extent and resolution of analysis. Hence, big data sources have an important role in mediating the conceptual change from a residential neighborhood-based to an activity space-based approach to segregation. Still, scholars should address carefully the challenges and uncertainties that big data entail for segregation studies. Finally, we propose a framework for a three-step methodological workflow for activity space segregation analysis, and outline future research avenues to move toward more conceptual clarity, integrated analysis framework and methodological rigor.
  • Kibble, Milla; Khan, Suleiman A.; Ammad-ud-din, Muhammad; Bollepalli, Sailalitha; Palviainen, Teemu; Kaprio, Jaakko; Pietiläinen, Kirsi H.; Ollikainen, Miina (2020)
    We combined clinical, cytokine, genomic, methylation and dietary data from 43 young adult monozygotic twin pairs (aged 22-36 years, 53% female), where 25 of the twin pairs were substantially weight discordant (delta body mass index > 3 kg m(-2)). These measurements were originally taken as part of the TwinFat study, a substudy of The Finnish Twin Cohort study. These five large multivariate datasets (comprising 42, 71, 1587, 1605 and 63 variables, respectively) were jointly analysed using an integrative machine learning method called group factor analysis (GFA) to offer new hypotheses into the multi-molecular-level interactions associated with the development of obesity. New potential links between cytokines and weight gain are identified, as well as associations between dietary, inflammatory and epigenetic factors. This encouraging case study aims to enthuse the research community to boldly attempt new machine learning approaches which have the potential to yield novel and unintuitive hypotheses. The source code of the GFA method is publically available as the R package GFA.
  • Shin, Bokyong; Rask, Mikko (2021)
    Online deliberation research has recently developed automated indicators to assess the deliberative quality of much user-generated online data. While most previous studies have developed indicators based on content analysis and network analysis, time-series data and associated methods have been studied less thoroughly. This article contributes to the literature by proposing indicators based on a combination of network analysis and time-series analysis, arguing that it will help monitor how online deliberation evolves. Based on Habermasian deliberative criteria, we develop six throughput indicators and demonstrate their applications in the OmaStadi participatory budgeting project in Helsinki, Finland. The study results show that these indicators consist of intuitive figures and visualizations that will facilitate collective intelligence on ongoing processes and ways to solve problems promptly.
  • Amadae, S. M. (University of Helsinki, Faculty of Social Sciences, 2020)
    Computational Transformation of the Public Sphere is the organic product of what turned out to be an effective collaboration between MA students and their professor in the Global Politics and Communication program in the Faculty of Social Sciences at the University of Helsinki, in the Fall of 2019. The course, Philosophy of Politics and Communication, is a gateway course into this MA program. As I had been eager to conduct research on the impact of new digital technologies and artificial intelligence (AI) on democratic governance, I saw this course as an opportunity to not only share, but also further develop my knowledge of this topic.
  • Pääkkönen, Juho; Laaksonen, Salla-Maaria; Jauho, Mikko (2020)
    Social media analytics is a burgeoning new field associated with high promises of societal relevance and business value but also methodological and practical problems. In this article, we build on the sociology of expectations literature and research on expertise in the interaction between humans and machines to examine how analysts and clients make their expectations about social media analytics credible in the face of recognized problems. To investigate how this happens in different contexts, we draw on thematic interviews with 10 social media analytics and client companies. In our material, social media analytics appears as a field facing both hopes and skepticism—toward data, analysis methods, or the users of analytics—from both the clients and analysts. In this setting, the idea of automated analysis through algorithmic methods emerges as a central notion that lends credibility to expectations about social media analytics. Automation is thought to, first, extend and make expert interpretation of messy social media data more rigorous; second, eliminate subjective judgments from measurement on social media; and third, allow for coordination of knowledge management inside organizations. Thus, ideas of automation importantly work to uphold the expectations of the value of analytics. Simultaneously, they shape what kinds of expertise, tools, and practices come to be involved in the future of analytics as knowledge production.
  • Tupasela, Aaro (2021)
    The sharing, circulation, distribution, and use of human tissue samples and related data have become a major political and scientific pre-occupation during the past two decades. In the age of big data, the political, scientific, and economic momentum around the need to increasingly collect and collate massive amounts of data has intensified. At the same time, the control and sharing of samples and data have become increasingly strategic in positioning biobanks within the global biomedical research market. Numerous commentators have identified several reasons why and with whom biobanks choose to share. Despite intensified efforts to encourage sharing within networks, there are still actors who have not embraced the values of sharing. The term 'data hugging' is introduced as a form of data work through which value is generated but sharing as a practice is not exercised according to community expectations. Data hugging is a term used within the biobanking community to describe the practice of withholding samples or data from other network members. While some biobankers consider data hugging to be an impediment to efficient and responsible science, it can also be another way of generating value in an otherwise challenging value creation environment. European biobanking policies, as well as the biobanking community, need a better understanding of these value-generating practices in relation to the life cycle of the biobank.
  • Paavolainen, Maija (Helsingin yliopisto, 2014)
    Verkkari ; 2014 (4)
  • Järv, Olle; Tenkanen, Henrikki Toivo Olavi; Salonen, Maria Pauliina; Ahas, Rein; Toivonen, Tuuli Kaarina (2018)
    The concept of accessibility- the potential of opportunities for interaction- binds together the key physical components of urban structure: people, transport and social activity locations. Most often these components are dynamic in nature and hence the accessibility landscape changes in space and time based on people's mobilities and the temporality of the transport network and activity locations (e.g. services). Person-based accessibility approaches have been successful in incorporating time and space in the analyses and models. Still, the more broadly applied location-based accessibility modelling approaches have, on the other hand, often been static/atemporal in their nature. Here, we present a conceptual framework of dynamic location-based accessibility modelling that captures the dynamic temporality of all three accessibility components. Furthermore, we empirically test the proposed framework using novel data sources and tools. We demonstrate the impact of temporal aspects in accessibility modelling with two examples: by investigating food accessibility and its spatial equity. Our case study demonstrates how the conventional static location-based accessibility models tend to overestimate the access of people to potential opportunities. The proposed framework is universally applicable beyond the urban context, from local to global scale and on different temporal scales and multimodal transport systems. It also bridges the gap between location-based accessibility and person-based accessibility research.
  • Alopaeus, Pilvi (Helsingin yliopisto, 2020)
    Sen vaikutukset ulottuvat kaikkialle yhteiskuntaan. Digitalisaatio näkyy esimerkiksi tehokkaampana terveydenhuoltona ja tuo esimerkiksi mukanaan enemmän ja tasa-arvoisemmin mahdollisuuksia koulutukseen. Tekoälyn ja erityisesti koneoppimisen keksiminen on merkinnyt digitalisaatiolle yhtä suurta mullistusta kuin mitä digitalisaatio oli yhteiskunnalle. Tämän merkityksen on tunnistanut niin yritykset kuin lainsäätäjäkin. Datasta on tekoälyn kehittymisen myötä toden totta tullut uusi öljy. Jos data on modernin yhteiskunnan uusi öljy, on tietosuoja sen ilmastonmuutos. Tietystä kulmasta katsottuna sen voi katsoa olevan uhka öljylle, mutta se voi myös tarjota mahdollisuuksia uudistaa datan päälle rakentuvien liiketoiminnan tapoja toimia kestävämmällä pohjalla. Henkilötietojen suojan voi nähdä liiketoimintaa rajoittavana tekijänä tai sen voi ottaa liiketoiminnan parhaaksi kilpailuvaltiksi. Tässä tutkielmassa pureudutaan Euroopan unionin tietosuojasääntelyn tavoitteiden intressitasapainon juuriin ja sen vaikutukseen tekoälyyn ja erityisesti koneoppimiseen liittyvän tietosuojasääntelyn ongelmiin ”right to explanation”-oikeuden näkökulmasta. Tutkimus tarkastelee ensin tietosuojalainsäädännön historiaa 1970-luvulta eteenpäin sitä leimaavan ja hallitsevan kahden vahvan intressin, digitaalisten sisämarkkinoiden kasvattamisen ja perusoikeuksien suojan, tasapainottelun näkökulmasta. Kun paino 1970-luvulla oli selvästi enemmän taloudellisten intressien edistämisessä, on se sittemmin siirtynyt toiseen päähän tavoitteenaan tehdä vahvasta perusoikeuksien suojasta kilpailuetu, jolla EU voi kilpailla erityisesti Yhdysvaltojen ja Kiinan kanssa. Sen jälkeen tutkielma siirtyy käsittelemään Euroopan unionin tekoälystrategian ensiaskelia ja saman intressien tasapainottelun vaikutuksia siihen. Keskeiseksi nousee jälleen tietosuojasta tutut arvot: teknologian läpinäkyvyyden ja luottamuksen painottaminen perusoikeuksien vakuutena. Samalla unioni strategiaksi muodostuu luoda globaali standardi eettiselle tekoälylle. Kehitykseen on vaikuttanut ympäröivässä maailmassa tapahtuneet muutokset ja se voima, millä teknologian kehitys on yhteiskuntaa ajanut. Teknologian kehityksen luonne on voimakas, rimpuileva ja ennakoimaton, joka asettaa lainsäätäjän kilpajuoksuun, jossa se on aina muutaman askeleen jäljessä. Tasapainottelun tarkastelu on tärkeää, sillä siitä on seurannut yritys luoda "joustavaa" lainsäädäntöä unionin lainsäädäntöinstrumenteilla. Tällä yrityksellä on ollut seurauksensa, joka näkyy hyvin koneoppimista koskevassa tietosuojasääntelyssä ja sen oikeusvarmuudessa. Keskeinen ongelma on right to explanation -oikeuden olemassaolon epävarmuus, joka on keskeinen elementti koneoppimisen innovaatiolle. Erityisesti, kuin tietosuojalainsäädännön mukana tulee myös mahdollisesti merkittävät sanktiot. Oikeusvarmuus on myös keskeistä unionin taloudellisten intressien saavuttamiselle. Näyttääkin siltä, että unionin intressitasapainottelun tuloksena syntynyt lainsäädäntö onkin johtanut tilanteeseen, joka voi potentiaalisesti estää unionin tavoitteiden saavuttamista.
  • Di Minin, Enrico; Fink, Christoph; Hausmann, Anna; Kremer, Jens; Kulkarni, Ritwik (2021)
    Social media data are being increasingly used in conservation science to study human–nature interactions. User-generated content, such as images, video, text, and audio, and the associated metadata can be used to assess such interactions. A number of social media platforms provide free access to user-generated social media content. However, similar to any research involving people, scientific investigations based on social media data require compliance with highest standards of data privacy and data protection, even when data are publicly available. Should social media data be misused, the risks to individual users’ privacy and well-being can be substantial. We investigated the legal basis for using social media data while ensuring data subjects’ rights through a case study based on the European Union’s General Data Protection Regulation. The risks associated with using social media data in research include accidental and purposeful misidentification that has the potential to cause psychological or physical harm to an identified person. To collect, store, protect, share, and manage social media data in a way that prevents potential risks to users involved, one should minimize data, anonymize data, and follow strict data management procedure. Risk-based approaches, such as a data privacy impact assessment, can be used to identify and minimize privacy risks to social media users, to demonstrate accountability and to comply with data protection legislation. We recommend that conservation scientists carefully consider our recommendations in devising their research objectives so as to facilitate responsible use of social media data in conservation science research, for example, in conservation culturomics and investigations of illegal wildlife trade online.
  • Järvenpää, Timo (Helsingfors universitet, 2017)
    Asunnot muodostavat merkittävän osan suomalaisten varallisuudesta. Tässä tutkielmassa esitellään asuntomarkkinoiden informaationhakumalli, jonka avulla muodostetaan intuitio siitä, miten asuntoa hankkivien hakuaktiivisuus vaikuttaa kolmeen asuntomarkkinoiden muuttujaan: asuntojen hintoihin, asuntokauppojen lukumääriin sekä asuntojen myyntiaikoihin. Mallin tuomaa intuitiota hyödynnetään selvittämällä, auttaako Google Trends -hakuindeksi ennustamaan edellä mainittuja muuttujia Suomen asuntomarkkinoilla. Aiemmassa tutkimuksessa Google-hakujen on havaittu auttavan ennustamaan moninaisia talouden ilmiöitä. Hakuaktiivisuuden nousun havaitaan teoreettisen mallin perusteella lisäävän asuntokauppojen lukumäärää ja lyhentävän myyntiaikoja, mutta vaikutus asuntojen hintoihin on epävarma. Tutkielman empiirisessä osiossa tutkitaan Granger-kausaalisuuden avulla, sisältävätkö Google-haut ennustamisen kannalta hyödyllistä informaatiota asuntomarkkinamuuttujista. Kustakin muuttujasta muodostetaan myös yksinkertainen, koko saatavilla olevaan historiaan sovitettava autoregressiivinen vertailumalli, josta tehdään Google-indeksillä laajennettu versio. Vertailumalleja ja Google-indeksillä laajennettuja malleja verrataan korjatun selitysasteen sekä Akaiken ja Schwarzin informaatiokriteereiden avulla. Google-hakujen ennustekykyä arvioidaan jakamalla data estimointiperiodiin ja ennusteperiodiin sekä simuloimalla reaaliaikaista ennustamista. Tutkielmassa analysoidaan seitsemän erilaista Google-haut sisältävää ennustespesifikaatiota. Google-hakujen havaitaan Granger-aiheuttavan hintoja ja markkinointiaikoja. Koko historiaan sovitettujen autoregressiivisten mallien perusteella Google-hakutermien kertoimet eivät noudata johdonmukaisesti teoreettisen mallin mukaisia merkkejä. Sekä markkinointiaika- että lukumäärämalleissa Google-termit saavat sekä negatiivisia että positiivisia arvoja. Google-hakujen havaitaan parantavan nykyisyyden hintaennusteita absoluuttisella keskivirheellä mitattuna yhtä lukuun ottamatta kaikilla spesifikaatioilla, mutta ennustevirheiden erot eivät Diebold-Mariano-testin perusteella pääsääntöisesti kuitenkaan eroa tilastollisesti merkitsevästi nollasta. Lukumäärien nykyisen arvon ennusteissa Google-haut tuottavat useassa spesifikaatiossa merkittävästi suurempia ennustevirheitä kuin vertailumallit. Yhden kuukauden päähän ennustettaessa internethaut kuitenkin vaikuttavat pienentävän lukumäärien ja hintojen ennustevirheitä. Paneelidataspesifikaatiolla sekä hinta- että lukumääräennusteet ovat tarkempia internethakuja hyödyntämällä. Tulosten perusteella Google-hakujen hyödyllisyys asuntomarkkinoiden ennustamisessa on altis mallin spesifikaatiolle eivätkä Google-haut pysty johdonmukaisesti parantamaan ennusteita kaikilla muuttujilla.
  • Karhu, Teemu (Helsingin yliopisto, 2020)
    Suomea pidetään ensisijaisesti luontomatkailun kohteena. Luonnon vetovoiman merkitys kuitenkin vaihtelee niin tutkimusten kuin kansallisuuksien ja yksilöidenkin välillä. Matkailun vetovoimakohtaista kysyntää on tutkittu muun muassa haastattelututkimuksin, mutta perinteisillä tutkimusmenetelmillä tarkasteltuna vetovoiman kysynnän ja tarjonnan spatiaalista kohtaamista ei ole voitu selvittää. Uudet, suuriin tietomassoihin perustuvat tutkimusmenetelmät mahdollistavat kokonaan uudenlaisen tutkimuksen. Matkaviestinten käytöstä syntyvät lokitiedot muodostavat tietolähteen, johon perustuen matkaviestinlaitteen käyttäjiä voidaan jäljittää sekä ajassa että paikassa. Matkaviestimet toimivat potentiaalisena aineistolähteenä matkailututkimukselle erityisesti matkailijoiden reittien ja preferenssien esiin tuojana. Matkailun kokemukset luovat ihmisille mielihyvää ja tyytyväisyyden tunnetta. Kokemus nähdään matkailussa arvon tuottajana. Arvon yhdessä luonnin teorian mukaan hyödykkeen arvo on asiakkaan siitä saama käyttöarvo. Arvontuottoon vaikuttaa asiakkaan motivaatio, joka matkailussa vertautuu ihmisen henkilökohtaisiin tarpeisiin ja näkyy kiinnostuksena matkakohteeseen. Kohteen valinta omien mielenkiinnon kohteiden perusteella edesauttaa arvonluonnissa. Millä tavalla matkailijoiden todelliset reitit ja vetovoimatekijät kohtaavat? Voiko reittivalinnoista nähdä, että ihmiset matkustavat omien mielenkiinnonkohteidensa mukaisesti? Tutkimuksessa analysoidaan ulkomaisten matkailijoiden käyttämiä matkareittejä Suomessa suhteessa matkailun vetovoimatekijöihin. Vetovoimatekijöiden luokitus perustuu Suomen matkailun aluerakennetutkimukseen. Visit Finlandin matkailijasegmentointi tuo esiin matkailijoiden mielenkiinnon kohteet. Matkailijoiden reitit pohjautuvat DNA Oyj:n matkaviestinaineistoihin. Analyysin perusteella matkailijoiden reitit kohtaavat luonnonvetovoimaisimmat kohteet heikosti, mikä johtuu pääosin matkailun kaupunkikeskeisyydestä. Kohtaavuus reittien ja muiden vetovoimaluokkien välillä on luonnonvetovoimaa parempi. Tulosten perusteella on syytä pohtia, onnistuuko matkailumarkkinointi viestimään ja kohdistamaan viestinsä oikein, ja ymmärretäänkö viesti oikein. Heikko kohtaavuus henkilökohtaisten toiveiden ja todellisuudessa tapahtuneen matkailun välillä indikoi heikkoa arvontuottoa ja sitä kautta matalaa todennäköisyyttä suositella Suomea matkakohteena tai matkustaa uudelleen Suomeen.
  • Luoma-aho, Vilma (ProCom - Viestinnän ammattilaiset ry, 2015)
    ProComma Academic ; 2015
    ProComma Academic 2015 tutkimusartikkeleiden aiheita ovat mm. maksettujen mediasisältöjen läpinäkyvyys, Big Data ja mediasuhteet, innovatiivisuus ja avoimuus, sijoittajasuhteet, vaikuttajaviestintä ja tapaus Guggenheim sekä informaation aseellistaminen ja taistelu identiteetistä.
  • Passos, Ives C.; Ballester, Pedro L.; Barros, Rodrigo C.; Librenza-Garcia, Diego; Mwangi, Benson; Birmaher, Boris; Brietzke, Elisa; Hajek, Tomas; Lopez Jaramillo, Carlos; Mansur, Rodrigo B.; Alda, Martin; Haarman, Bartholomeus C. M.; Isometsa, Erkki; Lam, Raymond W.; McIntyre, Roger S.; Minuzzi, Luciano; Kessing, Lars V.; Yatham, Lakshmi N.; Duffy, Anne; Kapczinski, Flavio (2019)
    Objectives The International Society for Bipolar Disorders Big Data Task Force assembled leading researchers in the field of bipolar disorder (BD), machine learning, and big data with extensive experience to evaluate the rationale of machine learning and big data analytics strategies for BD. Method A task force was convened to examine and integrate findings from the scientific literature related to machine learning and big data based studies to clarify terminology and to describe challenges and potential applications in the field of BD. We also systematically searched PubMed, Embase, and Web of Science for articles published up to January 2019 that used machine learning in BD. Results The results suggested that big data analytics has the potential to provide risk calculators to aid in treatment decisions and predict clinical prognosis, including suicidality, for individual patients. This approach can advance diagnosis by enabling discovery of more relevant data-driven phenotypes, as well as by predicting transition to the disorder in high-risk unaffected subjects. We also discuss the most frequent challenges that big data analytics applications can face, such as heterogeneity, lack of external validation and replication of some studies, cost and non-stationary distribution of the data, and lack of appropriate funding. Conclusion Machine learning-based studies, including atheoretical data-driven big data approaches, provide an opportunity to more accurately detect those who are at risk, parse-relevant phenotypes as well as inform treatment selection and prognosis. However, several methodological challenges need to be addressed in order to translate research findings to clinical settings.
  • Juholin, Elisa; Luoma-aho, Vilma (ProCom - Viestinnän ammattilaiset ry, 2017)
    ProComma Academic ; 2017
  • Massinen, Samuli (Helsingin yliopisto, 2019)
    The Greater Region of Luxembourg is the largest cross-border labor market in the European Union with the greatest number of cross-border workers in the area. European integration, the Schengen Area, and socio-economical divergences have been the main factors facilitating human cross-border movements in the area and thus the birth and expansion of the borderland community. Despite the freedom of movement, country borders have not been erased and socio-economic divergences have not been levelled. In addition, the spatial extent of the daily movements is not well known. Thus, it is important to study cross-border dynamics and try to separate daily movements from infrequent mobility patterns. Thus far, cross-border mobility studies have mainly leaned on national registers and census data. These datasets have mostly been too scarce in trying to understand the complexities of cross-border mobility. Many studies have only focused on aggregate-level movement patterns, and the viewpoint of individuals has been missing. Hence, there has been a growing need for individual-level data to be applied in cross-border mobility research. In this study, a person-based approach is employed using geotagged Twitter Big Data to study spatio-temporal cross-border mobility patterns in the Greater Region of Luxembourg. The aim is to examine how to implement social media in cross-border research as well as how to separate daily cross-border movers from infrequent border crossers and consequently move beyond aggregate-level inspections. Being one of the first studies of its kind, a heuristic programmatic approach is utilized. To the writer’s knowledge, social media data sources have not been applied previously to distinguish different cross-border mobility types. All developed scripts in this study are openly available on Digital Geography Lab’s GitHub -pages ( to promote open science and to introduce new quantitative method tools for cross-border mobility research. The results show that social media can be implemented in cross-border mobility research, and social media Big Data can provide a relatively good proxy for daily cross-border mobility of people on a regional level. Aggregate-level cross-border mobility patterns and activity location densities correspond closely with previous studies, and outcomes from temporal variation inspections indicate a valid cross-border mover type identification; Twitter users classified as daily cross-border movers seem to be more mobile on weekdays whereas infrequent border crossers on weekends. Daily cross-border mobility patterns also provided new information about the spatial extent of the movements. In addition, heuristic approach resulted in high accuracy in home detection; the “unique weeks” algorithm introduced in this study produced an accuracy of 88.6 % with respect to the ground truth. Although the results are promising on a regional level, they should be considered in relation to population densities and Twitter use activity; attributes that both vary spatio-temporally and thus can cause bias. Further studies and method development are also needed to draw global conclusions about cross-border mobility; other geographical areas and study settings could result in varied outcomes. In addition, some solutions with data and methods should be considered with a critical stance due to scarcity of valid references. Yet, this study has identified that the coverage of geotagged Twitter data is dependent on data acquisition processes and that Twitter can provide valuable information for cross-border mobility research. In future studies, multi-level data acquisition processes are recommended jointly with person-based approach combining spatio-temporal and content analysis methodologies.
  • Laaksonen, Iivari (Helsingin yliopisto, 2022)
    Multi-local living is a complex social phenomenon that is tightly connected to human mobility. In previous research, the phenomenon has been mainly researched with official statistics that fail to capture the dynamic nature of people’s mobilities and dwelling. This thesis approaches multi-locality in Finland and in the county South Savo from the perspective of second homes with novel data sources like mobile phone data and electricity consumption data. These spatially and temporally accurate big data sources can be used to ensure sufficient coverage of population and geographic area. I approach multi-local living by analyzing the spatiotemporal changes in people’s presence with mobile phone data, and by examining how the changes relate to second homes in different areas separately for workdays and weekends. This is examined both for the whole country and by comparing different counties. In the thesis, mobile phone data is utilized as the ground truth to assess the performance of household occupancy detection methods for electricity consumption, and to examine how electricity consumption data captures the spatiotemporal dynamics of second home users in South Savo. The results indicate that people are generally more mobile during the summer, and the seasonal growth in people’s presence correlates strongly with second homes. This shows a prominent seasonal effect for multi-local living in Finland. Additionally, it is shown that the results vary spatially as there is variation in the results both between counties and within South Savo. The best performing second home occupancy detection method is revealed by correlation analyses between mobile phone data and electricity consumption data. Moreover, it is shown that electricity data correlates better with mobile phone data during the summer, and that the data captures the monthly dynamics of second home users well. This further highlights the seasonal effect of multi-local living. The thesis provides valuable insight into how the seasonal variation of population in different areas is connected to multi-local living in Finland. Furthermore, it is shown that novel data sources can capture the changes in people’s presence at multiple spatial levels with high temporal accuracy, and that they can be utilized to study multi-local living.
  • Concas, Francesco; Xu, Pengfei; Hoque, Mohammad Ashraful; Lu, Jiaheng; Tarkoma, Sasu (2020)
    Bloom Filter is a space-efficient probabilistic data structure for checking the membership of elements in a set. Given multiple sets, a standard Bloom Filter is not sufficient when looking for the items to which an element or a set of input elements belong. An example case is searching for documents with keywords in a large text corpus, which is essentially a multiple set matching problem where the input is single or multiple keywords, and the result is a set of possible candidate documents. This article solves the multiple set matching problem by proposing two efficient Bloom Multifilters called Bloom Matrix and Bloom Vector, which generalize the standard Bloom Filter. Both structures are space-efficient and answer queries with a set of identifiers for multiple set matching problems. The space efficiency can be optimized according to the distribution of labels among multiple sets: Uniform and Zipf. Bloom Vector efficiently exploits the Zipf distribution of data for further space reduction. Indeed, both structures are much more space-efficient compared with the state-of-the-art, Bloofi. The results also highlight that a Lookup operation on Bloom Matrix is significantly faster than on Bloom Vector and Bloofi.
  • Massy, Ziad A.; Caskey, Fergus J.; Finne, Patrik; Harambat, Jerome; Jager, Kitty J.; Nagler, Evi; Stengel, Benedicte; Sever, Mehmet Sukru; Vanholder, Raymond; Blankestijn, Peter J.; Bruchfeld, Annette; Capasso, Giovambattista; Fliser, Danilo; Fouque, Denis; Goumenos, Dimitrios; Soler, Maria Jose; Rychlik, Ivan; Spasovski, Goce; Stevens, Kathryn; Wanner, Christoph; Zoccali, Carmine (2019)
    The strengths and the limitations of research activities currently present in Europe are explored in order to outline how to proceed in the near future. Epidemiological and clinical research and public policy in Europe are generally considered to be comprehensive and successful, and the European Renal Association - European Dialysis and Transplant Association (ERA-EDTA) is playing a key role in the field of nephrology research. The Nephrology and Public Policy Committee (NPPC) aims to improve the current situation and translation into public policy by planning eight research topics to be supported in the coming 5 years by ERA-EDTA.