Browsing by Subject "deep learning"

Sort by: Order: Results:

Now showing items 1-20 of 42
  • Trizna, Dmitrijs (Helsingin yliopisto, 2022)
    The detection heuristic in contemporary machine learning Windows malware classifiers is typically based on the static properties of the sample. In contrast, simultaneous utilization of static and behavioral telemetry is vaguely explored. We propose a hybrid model that employs dynamic malware analysis techniques, contextual information as an executable filesystem path on the system, and static representations used in modern state-of-the-art detectors. It does not require an operating system virtualization platform. Instead, it relies on kernel emulation for dynamic analysis. Our model reports enhanced detection heuristic and identify malicious samples, even if none of the separate models express high confidence in categorizing the file as malevolent. For instance, given the $0.05\%$ false positive rate, individual static, dynamic, and contextual model detection rates are $18.04\%$, $37.20\%$, and $15.66\%$. However, we show that composite processing of all three achieves a detection rate of $96.54\%$, above the cumulative performance of individual components. Moreover, simultaneous use of distinct malware analysis techniques address independent unit weaknesses, minimizing false positives and increasing adversarial robustness. Our experiments show a decrease in contemporary adversarial attack evasion rates from $26.06\%$ to $0.35\%$ when behavioral and contextual representations of sample are employed in detection heuristic.
  • Niemi, Hannele (2021)
    This special issue raises two thematic questions: (1) How will AI change learning in the future and what role will human beings play in the interaction with machine learning, and (2), What can we learn from the articles in this special issue for future research? These questions are reflected in the frame of the recent discussion of human and machine learning. AI for learning provides many applications and multimodal channels for supporting people in cognitive and non-cognitive task domains. The articles in this special issue evidence that agency, engagement, self-efficacy, and collaboration are needed in learning and working with intelligent tools and environments. The importance of social elements is also clear in the articles. The articles also point out that the teacher's role in digital pedagogy primarily involves facilitating and coaching. AI in learning has a high potential, but it also has many limitations. Many worries are linked with ethical issues, such as biases in algorithms, privacy, transparency, and data ownership. This special issue also highlights the concepts of explainability and explicability in the context of human learning. We need much more research and research-based discussion for making AI more trustworthy for users in learning environments and to prevent misconceptions.
  • Törö, Tuukka (Helsingin yliopisto, 2022)
    In recent years, advances in deep learning have made it possible to develop neural speech synthesizers that not only generate near natural speech but also enable us to control its acoustic features. This means it is possible to synthesize expressive speech with different speaking styles that fit a given context. One way to achieve this control is by adding a reference encoder on the synthesizer that works as a bottleneck modeling a prosody related latent space. The aim of this study was to analyze how the latent space of a reference encoder models diverse and realistic speaking styles, and what correlation there is between the phonetic features of encoded utterances and their latent space representations. Another aim was to analyze how the synthesizer output could be controlled in terms of speaking styles. The model used in the study was a Tacotron 2 speech synthesizer with a reference encoder that was trained with read speech uttered in various styles by one female speaker. The latent space was analyzed with principal component analysis on the reference encoder outputs for all of the utterances in order to extract salient features that differentiate the styles. Basing on the assumption that there are acoustic correlates to speaking styles, a possible connection between the principal components and measured acoustic features of the encoded utterances was investigated. For the synthesizer output, two evaluations were conducted: an objective evaluation assessing acoustic features and a subjective evaluation assessing appropriateness of synthesized speech in regard to the uttered sentence. The results showed that the reference encoder modeled stylistic differences well, but the styles were complex with major internal variation within the styles. The principal component analysis disentangled the acoustic features somewhat and a statistical analysis showed a correlation between the latent space and prosodic features. The objective evaluation suggested that the synthesizer did not produce all of the acoustic features of the styles, but the subjective evaluation showed that it did enough to affect judgments of appropriateness, i.e., speech synthesized in an informal style was deemed more appropriate than formal style for informal style sentences and vice versa.
  • Ng, Wai Tong; But, Barton; Choi, Horace C. W.; de Bree, Remco; Lee, Anne W. M.; Lee, Victor H. F.; Lopez, Fernando; Mäkitie, Antti A.; Rodrigo, Juan P.; Saba, Nabil F.; Tsang, Raymond K. Y.; Ferlito, Alfio (2022)
    Introduction: Nasopharyngeal carcinoma (NPC) is endemic to Eastern and South-Eastern Asia, and, in 2020, 77% of global cases were diagnosed in these regions. Apart from its distinct epidemiology, the natural behavior, treatment, and prognosis are different from other head and neck cancers. With the growing trend of artificial intelligence (AI), especially deep learning (DL), in head and neck cancer care, we sought to explore the unique clinical application and implementation direction of AI in the management of NPC. Methods: The search protocol was performed to collect publications using AI, machine learning (ML) and DL in NPC management from PubMed, Scopus and Embase. The articles were filtered using inclusion and exclusion criteria, and the quality of the papers was assessed. Data were extracted from the finalized articles. Results: A total of 78 articles were reviewed after removing duplicates and papers that did not meet the inclusion and exclusion criteria. After quality assessment, 60 papers were included in the current study. There were four main types of applications, which were auto-contouring, diagnosis, prognosis, and miscellaneous applications (especially on radiotherapy planning). The different forms of convolutional neural networks (CNNs) accounted for the majority of DL algorithms used, while the artificial neural network (ANN) was the most frequent ML model implemented. Conclusion: There is an overall positive impact identified from AI implementation in the management of NPC. With improving AI algorithms, we envisage AI will be available as a routine application in a clinical setting soon.
  • Xu, Yongjun; Liu, Xin; Cao, Xin; Huang, Changping; Liu, Enke; Qian, Sen; Liu, Xingchen; Wu, Yanjun; Dong, Fengliang; Qiu, Cheng-Wei; Qiu, Junjun; Hua, Keqin; Su, Wentao; Wu, Jian; Xu, Huiyu; Han, Yong; Fu, Chenguang; Yin, Zhigang; Liu, Miao; Roepman, Ronald; Dietmann, Sabine; Virta, Marko; Kengara, Fredrick; Zhang, Ze; Zhang, Lifu; Zhao, Taolan; Dai, Ji; Yang, Jialiang; Lan, Liang; Luo, Ming; Liu, Zhaofeng; An, Tao; Zhang, Bin; He, Xiao; Cong, Shan; Liu, Xiaohong; Zhang, Wei; Lewis, James P.; Tiedje, James M.; Wang, Qi; An, Zhulin; Wang, Fei; Zhang, Libo; Huang, Tao; Lu, Chuan; Cai, Zhipeng; Wang, Fang; Zhang, Jiabao (2021)
    Y Artificial intelligence (AI) coupled with promising machine learning (ML) techniques well known from computer science is broadly affecting many aspects of various fields including science and technology, industry, and even our day-to-day life. The ML techniques have been developed to analyze high-throughput data with a view to obtaining useful insights, categorizing, predicting, and making evidence-based decisions in novel ways, which will promote the growth of novel applications and fuel the sustainable booming of AI. This paper undertakes a comprehensive survey on the development and application of AI in different aspects of fundamental sciences, including information science, mathematics, medical science, materials science, geoscience, life science, physics, and chemistry. The challenges that each discipline of science meets, and the potentials of AI techniques to handle these challenges, are discussed in detail. Moreover, we shed light on new research trends entailing the integration of AI into each scientific discipline. The aim of this paper is to provide a broad research guideline on fundamental sciences with potential infusion of AI, to help motivate researchers to deeply understand the state-of-the-art applications of AI-based fundamental sciences, and thereby to help promote the continuous development of these fundamental sciences.
  • Mylläri, Juha (Helsingin yliopisto, 2022)
    Anomaly detection in images is the machine learning task of classifying inputs as normal or anomalous. Anomaly localization is the related task of segmenting input images into normal and anomalous regions. The output of an anomaly localization model is a 2D array, called an anomaly map, of pixel-level anomaly scores. For example, an anomaly localization model trained on images of non-defective industrial products should output high anomaly scores in image regions corresponding to visible defects. In unsupervised anomaly localization the model is trained solely on normal data, i.e. without labelled training observations that contain anomalies. This is often necessary as anomalous observations may be hard to obtain in sufficient quantities and labelling them is time-consuming and costly. Student-teacher feature pyramid matching (STFPM) is a recent and powerful method for unsupervised anomaly detection and localization that uses a pair of convolutional neural networks of identical architecture. In this thesis we propose two methods of augmenting STFPM to produce better segmentations. Our first method, discrepancy scaling, significantly improves the segmentation performance of STFPM by leveraging pre-calculated statistics containing information about the model’s behaviour on normal data. Our second method, student-teacher model assisted segmentation, uses a frozen STFPM model as a feature detector for a segmentation model which is then trained on data with artificially generated anomalies. Using this second method we are able to produce sharper anomaly maps for which it is easier to set a threshold value that produces good segmentations. Finally, we propose the concept of expected goodness of segmentation, a way of assessing the performance of unsupervised anomaly localization models that, in contrast to current metrics, explicitly takes into account the fact that a segmentation threshold needs to be set. Our primary method, discrepancy scaling, improves segmentation AUROC on the MVTec AD dataset over the base model by 13%, measured in the shrinkage of the residual (1.0 − AUROC). On the image-level anomaly detection task, a variant of the discrepancy scaling method improves performance by 12%.
  • Vesterinen, Tiina; Säilä, Jenni; Blom, Sami; Pennanen, Mirkka; Leijon, Helena; Arola, Johanna (2022)
    The Ki-67 proliferation index (PI) is a prognostic factor in neuroendocrine tumors (NETs) and defines tumor grade. Analysis of Ki-67 PI requires calculation of Ki-67-positive and Ki-67-negative tumor cells, which is highly subjective. To overcome this, we developed a deep learning-based Ki-67 PI algorithm (KAI) that objectively calculates Ki-67 PI. Our study material consisted of NETs divided into training (n = 39), testing (n = 124), and validation (n = 60) series. All slides were digitized and processed in the Aiforia(R) Create (Aiforia Technologies, Helsinki, Finland) platform. The ICC between the pathologists and the KAI was 0.89. In 46% of the tumors, the Ki-67 PIs calculated by the pathologists and the KAI were the same. In 12% of the tumors, the Ki-67 PI calculated by the KAI was 1% lower and in 42% of the tumors on average 3% higher. The DL-based Ki-67 PI algorithm yields results similar to human observers. While the algorithm cannot replace the pathologist, it can assist in the laborious Ki-67 PI assessment of NETs. In the future, this approach could be useful in, for example, multi-center clinical trials where objective estimation of Ki-67 PI is crucial.
  • Keskin, Merve; Rönneberg, Mikko; Kettunen, Pyry (Copernicus Publications, 2022)
    Abstracts of the International Cartographic Association
  • Pettersen, Henrik Sahlin; Belevich, Ilya; Royset, Elin Synnove; Smistad, Erik; Simpson, Melanie Rae; Jokitalo, Eija; Reinertsen, Ingerid; Bakke, Ingunn; Pedersen, Andre (2022)
    Application of deep learning on histopathological whole slide images (WSIs) holds promise of improving diagnostic efficiency and reproducibility but is largely dependent on the ability to write computer code or purchase commercial solutions. We present a code-free pipeline utilizing free-to-use, open-source software (QuPath, DeepMIB, and FastPathology) for creating and deploying deep learning-based segmentation models for computational pathology. We demonstrate the pipeline on a use case of separating epithelium from stroma in colonic mucosa. A dataset of 251 annotated WSIs, comprising 140 hematoxylin-eosin (HE)-stained and 111 CD3 immunostained colon biopsy WSIs, were developed through active learning using the pipeline. On a hold-out test set of 36 HE and 21 CD3-stained WSIs a mean intersection over union score of 95.5 and 95.3% was achieved on epithelium segmentation. We demonstrate pathologist-level segmentation accuracy and clinical acceptable runtime performance and show that pathologists without programming experience can create near state-of-the-art segmentation solutions for histopathological WSIs using only free-to-use software. The study further demonstrates the strength of open-source solutions in its ability to create generalizable, open pipelines, of which trained models and predictions can seamlessly be exported in open formats and thereby used in external solutions. All scripts, trained models, a video tutorial, and the full dataset of 251 WSIs with ~31 k epithelium annotations are made openly available at to accelerate research in the field.
  • Hokkinen, Lasse; Mäkelä, Teemu; Savolainen, Sauli; Kangasniemi, Marko (2021)
    Background: Computed tomography perfusion (CTP) is the mainstay to determine possible eligibility for endovascular thrombectomy (EVT), but there is still a need for alternative methods in patient triage. Purpose: To study the ability of a computed tomography angiography (CTA)-based convolutional neural network (CNN) method in predicting final infarct volume in patients with large vessel occlusion successfully treated with endovascular therapy. Materials and Methods: The accuracy of the CTA source image-based CNN in final infarct volume prediction was evaluated against follow-up CT or MR imaging in 89 patients with anterior circulation ischemic stroke successfully treated with EVT as defined by Thrombolysis in Cerebral Infarction category 2b or 3 using Pearson correlation coefficients and intraclass correlation coefficients. Convolutional neural network performance was also compared to a commercially available CTP-based software (RAPID, iSchemaView). Results: A correlation with final infarct volumes was found for both CNN and CTP-RAPID in patients presenting 6-24 h from symptom onset or last known well, with r = 0.67 (p < 0.001) and r = 0.82 (p < 0.001), respectively. Correlations with final infarct volumes in the early time window (0-6 h) were r = 0.43 (p = 0.002) for the CNN and r = 0.58 (p < 0.001) for CTP-RAPID. Compared to CTP-RAPID predictions, CNN estimated eligibility for thrombectomy according to ischemic core size in the late time window with a sensitivity of 0.38 and specificity of 0.89. Conclusion: A CTA-based CNN method had moderate correlation with final infarct volumes in the late time window in patients successfully treated with EVT.
  • Koppatz, Maximilian (Helsingin yliopisto, 2022)
    Automatic headline generation has the potential to significantly assist editors charged with head- lining articles. Approaches to automation in the headlining process can range from tools as creative aids, to complete end to end automation. The latter is difficult to achieve as journalistic require- ments imposed on headlines must be met with little room for error, with the requirements depending on the news brand in question. This thesis investigates automatic headline generation in the context of the Finnish newsroom. The primary question I seek to answer is how well the current state of text generation using deep neural language models can be applied to the headlining process in Finnish news media. To answer this, I have implemented and pre-trained a Finnish generative language model based on the Transformer architecture. I have fine-tuned this language model for headline generation as autoregression of headlines conditioned on the article text. I have designed and implemented a variation of the Diverse Beam Search algorithm, with additional parameters, to perform the headline generation in order to generate a diverse set of headlines for a given text. The evaluation of the generative capabilities of this system was done with real world usage in mind. I asked domain-experts in headlining to evaluate a generated set of text-headline pairs. The task was to accept or reject the individual headlines in key criteria. The responses of this survey were then quantitatively and qualitatively analyzed. Based on the analysis and feedback, this model can already be useful as a creative aid in the newsroom despite being far from ready for automation. I have identified concrete improvement directions based on the most common types of errors, and this provides interesting future work.
  • Hoque, Mohammad Ashraful; Finley, Benjamin John; Rao, Ashwin; Kumar, Abhishek; Hui, Pan; Ammar, Mostafa; Tarkoma, Sasu (IEEE, 2022)
    International Conference on Pervasive Computing and Communications
    The Internet has been experiencing immense growth in multimedia traffic from mobile devices. The increase in traffic presents many challenges to user-centric networks, network operators, and service providers. Foremost among these challenges is the inability of networks to determine the types of encrypted traffic and thus the level of network service the traffic needs for maintaining an acceptable quality of experience. Therefore, end devices are a natural fit for performing traffic classification since end devices have more contextual information about the device usage and traffic. This paper proposes a novel approach that classifies multimedia traffic types produced and consumed on mobile devices. The technique relies on a mobile device's detection of its multimedia context characterized by its utilization of different media input/output components, e.g., camera, microphone, and speaker. We develop an algorithm, MediaSense, which senses the states of multiple I/O components and identifies the specific multimedia context of a mobile device in real-time. We demonstrate that MediaSense classifies encrypted multimedia traffic in real-time as accurately as deep learning approaches and with even better generalizability.
  • Airaksinen, Manu; Juvela, Lauri; Alku, Paavo; Rasanen, Okko (IEEE, 2019)
    International Conference on Acoustics Speech and Signal Processing ICASSP
    This study explores various speech data augmentation methods for the task of noise-robust fundamental frequency (F0) estimation with neural networks. The explored augmentation strategies are split into additive noise and channel-based augmentation and into vocoder-based augmentation methods. In vocoder-based augmentation, a glottal vocoder is used to enhance the accuracy of ground truth F0 used for training of the neural network, as well as to expand the training data diversity in terms of F0 patterns and vocal tract lengths of the talkers. Evaluations on the PTDB-TUG corpus indicate that noise and channel augmentation can be used to greatly increase the noise robustness of trained models, and that vocoder-based ground truth enhancement further increases model performance. For smaller datasets, vocoder-based diversity augmentation can also be used to increase performance. The best-performing proposed method greatly outperformed the compared F0 estimation methods in terms of noise robustness.
  • Alcantara, Jose Carlos (Helsingin yliopisto, 2020)
    A recent machine learning technique called federated learning (Konecny, McMahan, et. al., 2016) offers a new paradigm for distributed learning. It consists of performing machine learning on multiple edge devices and simultaneously optimizing a global model for all of them, without transmitting user data. The goal for this thesis was to prove the benefits of applying federated learning to forecasting telecom key performance indicator (KPI) values from radio network cells. After performing experiments with different data sources' aggregations and comparing against a centralized learning model, the results revealed that a federated model can shorten the training time for modelling new radio cells. Moreover, the amount of transferred data to a central server is minimized drastically while keeping equivalent performance to a traditional centralized model. These experiments were performed with multi-layer perceptron as model architecture after comparing its performance against LSTM. Both, input and output data were sequences of KPI values.
  • Leino, Akseli; Korkalainen, Henri; Kalevo, Laura; Nikkonen, Sami; Kainulainen, Samu; Ryan, Alexander; Duce, Brett; Sipila, Kirsi; Ahlberg, Jari; Sahlman, Johanna; Miettinen, Tomi; Westeren-Punnonen, Susanna; Mervaala, Esa; Toyras, Juha; Myllymaa, Sami; Leppanen, Timo; Myllymaa, Katja (2022)
    We have previously developed an ambulatory electrode set (AES) for the measurement of electroencephalography (EEG), electrooculography (EOG), and electromyography (EMG). The AES has been proven to be suitable for manual sleep staging and self-application in in-home polysomnography (PSG). To further facilitate the diagnostics of various sleep disorders, this study aimed to utilize a deep learning-based automated sleep staging approach for EEG signals acquired with the AES. The present neural network architecture comprises a combination of convolutional and recurrent neural networks previously shown to achieve excellent sleep scoring accuracy with a single standard EEG channel (F4-M1). In this study, the model was re-trained and tested with 135 EEG signals recorded with AES. The recordings were conducted for subjects suspected of sleep apnea or sleep bruxism. The performance of the deep learning model was evaluated with 10-fold cross-validation using manual scoring of the AES signals as a reference. The accuracy of the neural network sleep staging was 79.7% (kappa = 0.729) for five sleep stages (W, N1, N2, N3, and R), 84.1% (kappa = 0.773) for four sleep stages (W, light sleep, deep sleep, R), and 89.1% (kappa = 0.801) for three sleep stages (W, NREM, R). The utilized neural network was able to accurately determine sleep stages based on EEG channels measured with the AES. The accuracy is comparable to the inter-scorer agreement of standard EEG scorings between international sleep centers. The automatic AES-based sleep staging could potentially improve the availability of PSG studies by facilitating the arrangement of self-administrated in-home PSGs.
  • Nabavi, Seyed Azad; Hossein Motlagh, Naser; Zaidan, Martha Arbayani; Aslani, Alireza; Zakeri, Behnam (2021)
    Buildings are responsible for 33% of final energy consumption, and 40% of direct and indirect CO2 emissions globally. While energy consumption is steadily rising globally, managing building energy utilization by on-site renewable energy generation can help responding to this demand. This paper proposes a deep learning method based on a discrete wavelet transformation and long short-term memory method (DWT-LSTM) and a scheduling framework for the integrated modelling and management of energy demand and supply for buildings. This method analyzes several factors including electricity price, uncertainty in climatic factors, availability of renewable energy sources (wind and solar), energy consumption patterns in buildings, and the non-linear relationships between these parameters on hourly, daily, weekly and monthly intervals. The method enables monitoring and controlling renewable energy generation, the share of energy imports from the grid, employment of saving strategy based on the user priority list, and energy storage management to minimize the reliance on the grid and electricity cost, especially during the peak hours. The results demonstrate that the proposed method can forecast building energy demand and energy supply with a high level of accuracy, showing a 3.63-8.57% error range in hourly data prediction for one month ahead. The combination of the deep learning forecasting, energy storage, and scheduling algorithm enables reducing annual energy import from the grid by 84%, which offers electricity cost savings by 87%. Finally, two smart active buildings configurations are financially analyzed for the next thirty years. Based on the results, the proposed smart building with solar Photo-Voltaic (PV), wind turbine, inverter, and 40.5 kWh energy storage has a financial breakeven point after 9 years with wind turbine and 8 years without it. This implies that implementing wind turbines in the proposed building is not financially beneficial.
  • Al-Tahmeesschi, Ahmed; Talvitie, Jukka; Lopez-Benitez, Miguel; Ruotsalainen, Laura (IEEE, 2022)
    International Conference on Localization and GNSS
    Outdoor user equipment (UE) localisation has attracted a significant amount of attention due to its importance in many location-based services. Typically, in rural and open areas, global navigation satellite systems (GNSS) can provide an accurate and reliable localisation performance. However, in urban areas GNSS localisation accuracy is significantly reduced due to shadowing, scattering and signal blockages. In this work, the UE positioning assisted by deep learning in 5G and beyond networks is investigated in an urban area environment. We study the impact of utilising the spatial correlation in the received signal strengths (RSSs) on the UE positioning accuracy and how to utilise such correlation with deep learning algorithms to improve the localisation accuracy. Numerical results showed the importance of utilising the spatial correlation in the RSS to improve the prediction accuracy for all of the considered models. In addition, the impact of varying the number of access points (APs) transmitters on the localisation accuracy is also investigated. Numerical results showed that a lower number of APs may be sufficient when not considering uncertainties in RSS measurements. Moreover, we study how much the degrading effect of RSS uncertainty can be compensated for by increasing the number of APs.
  • Maljanen, Katri (Helsingin yliopisto, 2021)
    Cancer is a leading cause of death worldwide. Unlike its name would suggest, cancer is not a single disease. It is a group of diseases that arises from the expansion of a somatic cell clone. This expansion is thought to be a result of mutations that confer a selective advantage to the cell clone. These mutations that are advantageous to cells that result in their proliferation and escape of normal cell constraints are called driver mutations. The genes that contain driver mutations are known as driver genes. Studying these mutations and genes is important for understanding how cancer forms and evolves. Various methods have been developed that can discover these mutations and genes. This thesis focuses on a method called Deep Mutation Modelling, a deep learning based approach to predicting the probability of mutations. Deep Mutation Modelling’s output probabilities offer the possibility of creating sample and cancer type specific probability scores for mutations that reflect the pathogenicity of the mutations. Most methods in the past have made scores that are the same for all cancer types. Deep Mutation Modelling offers the opportunity to make a more personalised score. The main objectives of this thesis were to examine the Deep Mutation Modelling output as it was unknown what kind of features it has, see how the output compares against other scoring methods and how the probabilities work in mutation hotspots. Lastly, could the probabilities be used in a common driver gene discovery method. Overall, the goal was to see if Deep Mutation Modelling works and if it is competitive with other known methods. The findings indicate that Deep Mutation Modelling works in predicting driver mutations, but that it does not have sufficient power to do this reliably and requires further improvements.
  • Lauha, Patrik Mikael; Somervuo, Panu Juhani; Lehikoinen, Petteri; Geres, Lisa; Richter, Tobias; Seibold, Sebastian; Ovaskainen, Otso (2022)
    An automatic bird sound recognition system is a useful tool for collecting data of different bird species for ecological analysis. Together with autonomous recording units (ARUs), such a system provides a possibility to collect bird observations on a scale that no human observer could ever match. During the last decades, progress has been made in the field of automatic bird sound recognition, but recognizing bird species from untargeted soundscape recordings remains a challenge. In this article, we demonstrate the workflow for building a global identification model and adjusting it to perform well on the data of autonomous recorders from a specific region. We show how data augmentation and a combination of global and local data can be used to train a convolutional neural network to classify vocalizations of 101 bird species. We construct a model and train it with a global data set to obtain a base model. The base model is then fine-tuned with local data from Southern Finland in order to adapt it to the sound environment of a specific location and tested with two data sets: one originating from the same Southern Finnish region and another originating from a different region in German Alps. Our results suggest that fine-tuning with local data significantly improves the network performance. Classification accuracy was improved for test recordings from the same area as the local training data (Southern Finland) but not for recordings from a different region (German Alps). Data augmentation enables training with a limited number of training data and even with few local data samples significant improvement over the base model can be achieved. Our model outperforms the current state-of-the-art tool for automatic bird sound classification. Using local data to adjust the recognition model for the target domain leads to improvement over general non-tailored solutions. The process introduced in this article can be applied to build a fine-tuned bird sound classification model for a specific environment.
  • Liu, Pengyuan; Koivisto, Sonja Maria; Hiippala, Tuomo; Van der Lijn, Charlotte Jacoba Cornelia; Väisänen, Tuomas Lauri Aleksanteri; Nurmi, Marisofia Kaarina; Toivonen, Tuuli; Vehkakoski, Kirsi; Pyykönen, Janne; Virmasalo, Ilkka; Simula, Mikko; Hasanen, Elina; Salmikangas, Anna-Katriina; Muukkonen, Petteri (2022)
    Sport and exercise contribute to health and well-being in cities. While previous research has mainly focused on activities at specific locations such as sport facilities, "informal sport" that occur at arbitrary locations across the city have been largely neglected. Such activities are more challenging to observe, but this challenge may be addressed using data collected from social media platforms, because social media users regularly generate content related to sports and exercise at given locations. This allows studying all sport, including those "informal sport" which are at arbitrary locations, to better understand sports and exercise-related activities in cities. However, user-generated geographical information available on social media platforms is becoming scarcer and coarser. This places increased emphasis on extracting location information from free-form text content on social media, which is complicated by multilingualism and informal language. To support this effort, this article presents an end-to-end deep learning-based bilingual toponym recognition model for extracting location information from social media content related to sports and exercise. We show that our approach outperforms five state-of-the-art deep learning and machine learning models. We further demonstrate how our model can be deployed in a geoparsing framework to support city planners in promoting healthy and active lifestyles.