Browsing by Subject "Deep learning"

Sort by: Order: Results:

Now showing items 1-10 of 10
  • Suviranta, Rosa (Helsingin yliopisto, 2021)
    This study is a preliminary study to verify how well a Conditioned Convolutional Variational Autoencoder (CCVAE) learns the prosodic characteristics of interaction between the Lombard effect and different focus conditions. Lombard speech is an adaptation to ambient noise manifested by rising vocal intensity, fundamental frequency, and duration. Focus marks new propositional information and is signalled by making the focused word more prominent in relation to others. A CCVAE was trained on the f0 contours and speech envelopes of a Lombard speech corpus of Finnish utterances. The model’s capability to reconstruct the prosodic charac- teristics was statistically evaluated based on bottleneck representations alone. The following questions were addressed: the appropriate size of the bottleneck layer for the task, the ability of the bottleneck representations to capture the prosodic characteris- tics and the encoding of the bottleneck representations. The study shows promising results. The method can elicit representations that can quantify prosodic effects of the underlying influences and interactions. The study found that even the low dimensional bottlenecks can conceptualise and consis- tently typologize the prosodic events of interest. However, finding the optimal bottleneck dimension still needs more research. Subsequently, the model’s ability to capture the prosodic characteristics was verified by investigating the generated samples. Based on the results, the CCVAE can capture prosodic events. The quality of the reconstruction is positively correlated with the bottleneck dimension. Finally, the encoding of the bottlenecks were examined. The CCVAE encodes the bottleneck representations similarly regardless of the training instance or the bottleneck dimension. The Lombard effect was most efficiently captured and focus conditions as second.
  • Kotola, Mikko Markus (Helsingin yliopisto, 2021)
    Image captioning is the task of generating a natural language description of an image. The task requires techniques from two research areas, computer vision and natural language generation. This thesis investigates the architectures of leading image captioning systems. The research question is: What components and architectures are used in state-of-the-art image captioning systems and how could image captioning systems be further improved by utilizing improved components and architectures? Five openly reported leading image captioning systems are investigated in detail: Attention on Attention, the Meshed-Memory Transformer, the X-Linear Attention Network, the Show, Edit and Tell method, and Prophet Attention. The investigated leading image captioners all rely on the same object detector, the Faster R-CNN based Bottom-Up object detection network. Four out of five also rely on the same backbone convolutional neural network, ResNet-101. Both the backbone and the object detector could be improved by using newer approaches. Best choice in CNN-based object detectors is the EfficientDet with an EfficientNet backbone. A completely transformer-based approach with a Vision Transformer backbone and a Detection Transformer object detector is a fast-developing alternative. The main area of variation between the leading image captioners is in the types of attention blocks used in the high-level image encoder, the type of natural language decoder and the connections between these components. The best architectures and attention approaches to implement these components are currently the Meshed-Memory Transformer and the bilinear pooling approach of the X-Linear Attention Network. Implementing the Prophet Attention approach of using the future words available in the supervised training phase to guide the decoder attention further improves performance. Pretraining the backbone using large image datasets is essential to reach semantically correct object detections and object features. The feature richness and dense annotation of data is equally important in training the object detector.
  • Turkki, Riku; Byckhov, Dmitrii; Lundin, Mikael; Isola, Jorma; Nordling, Stig; Kovanen, Panu E.; Verrill, Clare; von Smitten, Karl; Joensuu, Heikki; Lundin, Johan; Linder, Nina (2019)
    PurposeRecent advances in machine learning have enabled better understanding of large and complex visual data. Here, we aim to investigate patient outcome prediction with a machine learning method using only an image of tumour sample as an input.MethodsUtilising tissue microarray (TMA) samples obtained from the primary tumour of patients (N=1299) within a nationwide breast cancer series with long-term-follow-up, we train and validate a machine learning method for patient outcome prediction. The prediction is performed by classifying samples into low or high digital risk score (DRS) groups. The outcome classifier is trained using sample images of 868 patients and evaluated and compared with human expert classification in a test set of 431 patients.ResultsIn univariate survival analysis, the DRS classification resulted in a hazard ratio of 2.10 (95% CI 1.33-3.32, p=0.001) for breast cancer-specific survival. The DRS classification remained as an independent predictor of breast cancer-specific survival in a multivariate Cox model with a hazard ratio of 2.04 (95% CI 1.20-3.44, p=0.007). The accuracy (C-index) of the DRS grouping was 0.60 (95% CI 0.55-0.65), as compared to 0.58 (95% CI 0.53-0.63) for human expert predictions based on the same TMA samples.ConclusionsOur findings demonstrate the feasibility of learning prognostic signals in tumour tissue images without domain knowledge. Although further validation is needed, our study suggests that machine learning algorithms can extract prognostically relevant information from tumour histology complementing the currently used prognostic factors in breast cancer.
  • Minnema, Jordi; Wolff, Jan; Koivisto, Juha; Lucka, Felix; Batenburg, Kees Joost; Forouzanfar, Tymour; van Eijnatten, Maureen (2021)
    Background and objective: Over the past decade, convolutional neural networks (CNNs) have revolutionized the field of medical image segmentation. Prompted by the developments in computational resources and the availability of large datasets, a wide variety of different two-dimensional (2D) and threedimensional (3D) CNN training strategies have been proposed. However, a systematic comparison of the impact of these strategies on the image segmentation performance is still lacking. Therefore, this study aimed to compare eight different CNN training strategies, namely 2D (axial, sagittal and coronal slices), 2.5D (3 and 5 adjacent slices), majority voting, randomly oriented 2D cross-sections and 3D patches. Methods: These eight strategies were used to train a U-Net and an MS-D network for the segmentation of simulated cone-beam computed tomography (CBCT) images comprising randomly-placed non-overlapping cylinders and experimental CBCT images of anthropomorphic phantom heads. The resulting segmentation performances were quantitatively compared by calculating Dice similarity coefficients. In addition, all segmented and gold standard experimental CBCT images were converted into virtual 3D models and compared using orientation-based surface comparisons. Results: The CNN training strategy that generally resulted in the best performances on both simulated and experimental CBCT images was majority voting. When employing 2D training strategies, the segmentation performance can be optimized by training on image slices that are perpendicular to the predominant orientation of the anatomical structure of interest. Such spatial features should be taken into account when choosing or developing novel CNN training strategies for medical image segmentation. Conclusions: The results of this study will help clinicians and engineers to choose the most-suited CNN training strategy for CBCT image segmentation. (c) 2021 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license ( )
  • Nabavi, Seyed Azad; Hossein Motlagh, Naser; Zaidan, Martha Arbayani; Aslani, Alireza; Zakeri, Behnam (2021)
    Buildings are responsible for 33% of final energy consumption, and 40% of direct and indirect CO2 emissions globally. While energy consumption is steadily rising globally, managing building energy utilization by on-site renewable energy generation can help responding to this demand. This paper proposes a deep learning method based on a discrete wavelet transformation and long short-term memory method (DWT-LSTM) and a scheduling framework for the integrated modelling and management of energy demand and supply for buildings. This method analyzes several factors including electricity price, uncertainty in climatic factors, availability of renewable energy sources (wind and solar), energy consumption patterns in buildings, and the non-linear relationships between these parameters on hourly, daily, weekly and monthly intervals. The method enables monitoring and controlling renewable energy generation, the share of energy imports from the grid, employment of saving strategy based on the user priority list, and energy storage management to minimize the reliance on the grid and electricity cost, especially during the peak hours. The results demonstrate that the proposed method can forecast building energy demand and energy supply with a high level of accuracy, showing a 3.63-8.57% error range in hourly data prediction for one month ahead. The combination of the deep learning forecasting, energy storage, and scheduling algorithm enables reducing annual energy import from the grid by 84%, which offers electricity cost savings by 87%. Finally, two smart active buildings configurations are financially analyzed for the next thirty years. Based on the results, the proposed smart building with solar Photo-Voltaic (PV), wind turbine, inverter, and 40.5 kWh energy storage has a financial breakeven point after 9 years with wind turbine and 8 years without it. This implies that implementing wind turbines in the proposed building is not financially beneficial.
  • Hokkinen, Lasse M I; Mäkelä, Teemu Olavi; Savolainen, Sauli; Kangasniemi, Marko Matti (2021)
    Background Computed tomography angiography (CTA) imaging is needed in current guideline-based stroke diagnosis, and infarct core size is one factor in guiding treatment decisions. We studied the efficacy of a convolutional neural network (CNN) in final infarct volume prediction from CTA and compared the results to a CT perfusion (CTP)-based commercially available software (RAPID, iSchemaView). Methods We retrospectively selected 83 consecutive stroke cases treated with thrombolytic therapy or receiving supportive care that presented to Helsinki University Hospital between January 2018 and July 2019. We compared CNN-derived ischaemic lesion volumes to final infarct volumes that were manually segmented from follow-up CT and to CTP-RAPID ischaemic core volumes. Results An overall correlation of r = 0.83 was found between CNN outputs and final infarct volumes. The strongest correlation was found in a subgroup of patients that presented more than 9 h of symptom onset (r = 0.90). A good correlation was found between the CNN outputs and CTP-RAPID ischaemic core volumes (r = 0.89) and the CNN was able to classify patients for thrombolytic therapy or supportive care with a 1.00 sensitivity and 0.94 specificity. Conclusions A CTA-based CNN software can provide good infarct core volume estimates as observed in follow-up imaging studies. CNN-derived infarct volumes had a good correlation to CTP-RAPID ischaemic core volumes.
  • Vainio, Tuomas J V; Mäkelä, Teemu Olavi; Savolainen, Sauli; Kangasniemi, Marko Matti (2021)
    Background Chronic pulmonary embolism (CPE) is a life-threatening disease easily misdiagnosed on computed tomography. We investigated a three-dimensional convolutional neural network (CNN) algorithm for detecting hypoperfusion in CPE from computed tomography pulmonary angiography (CTPA). Methods Preoperative CTPA of 25 patients with CPE and 25 without pulmonary embolism were selected. We applied a 48%-12%-40% training-validation-testing split (12 positive and 12 negative CTPA volumes for training, 3 positives and 3 negatives for validation, 10 positives and 10 negatives for testing). The median number of axial images per CTPA was 335 (min-max, 111-570). Expert manual segmentations were used as training and testing targets. The CNN output was compared to a method in which a Hounsfield unit (HU) threshold was used to detect hypoperfusion. Receiver operating characteristic area under the curve (AUC) and Matthew correlation coefficient (MCC) were calculated with their 95% confidence interval (CI). Results The predicted segmentations of CNN showed AUC 0.87 (95% CI 0.82-0.91), those of HU-threshold method 0.79 (95% CI 0.74-0.84). The optimal global threshold values were CNN output probability >= 0.37 and
  • Toivonen, Mikko E.; Rajani, Chang; Klami, Arto (2020)
    Hyperspectral (HS) cameras record the spectrum at multiple wavelengths for each pixel in an image, and are used, e.g., for quality control and agricultural remote sensing. We introduce a fast, cost-efficient and mobile method of taking HS images using a regular digital camera equipped with a passive diffraction grating filter, using machine learning for constructing the HS image. The grating distorts the image by effectively mapping the spectral information into spatial dislocations, which we convert into a HS image by a convolutional neural network utilizing novel wide dilation convolutions that accurately model optical properties of diffraction. We demonstrate high-quality HS reconstruction using a model trained on only 271 pairs of diffraction grating and ground truth HS images.
  • Mäyrä, Janne; Keski-Saari, Sarita; Kivinen, Sonja; Tanhuanpää, Topi; Hurskainen, Pekka; Kullberg, Peter; Poikolainen, Laura; Viinikka, Arto; Tuominen, Sakari; Kumpula, Timo; Vihervaara, Petteri (2021)
    During the last two decades, forest monitoring and inventory systems have moved from field surveys to remote sensing-based methods. These methods tend to focus on economically significant components of forests, thus leaving out many factors vital for forest biodiversity, such as the occurrence of species with low economical but high ecological values. Airborne hyperspectral imagery has shown significant potential for tree species classification, but the most common analysis methods, such as random forest and support vector machines, require manual feature engineering in order to utilize both spatial and spectral features, whereas deep learning methods are able to extract these features from the raw data. Our research focused on the classification of the major tree species Scots pine, Norway spruce and birch, together with an ecologically valuable keystone species, European aspen, which has a sparse and scattered occurrence in boreal forests. We compared the performance of three-dimensional convolutional neural networks (3D-CNNs) with the support vector machine, random forest, gradient boosting machine and artificial neural network in individual tree species classification from hyperspectral data with high spatial and spectral resolution. We collected hyperspectral and LiDAR data along with extensive ground reference data measurements of tree species from the 83 km2 study area located in the southern boreal zone in Finland. A LiDAR-derived canopy height model was used to match ground reference data to aerial imagery. The best performing 3D-CNN, utilizing 4 m image patches, was able to achieve an F1-score of 0.91 for aspen, an overall F1-score of 0.86 and an overall accuracy of 87%, while the lowest performing 3D-CNN utilizing 10 m image patches achieved an F1-score of 0.83 and an accuracy of 85%. In comparison, the support-vector machine achieved an F1-score of 0.82 and an accuracy of 82.4% and the artificial neural network achieved an F1-score of 0.82 and an accuracy of 81.7%. Compared to the reference models, 3D-CNNs were more efficient in distinguishing coniferous species from each other, with a concurrent high accuracy for aspen classification. Deep neural networks, being black box models, hide the information about how they reach their decision. We used both occlusion and saliency maps to interpret our models. Finally, we used the best performing 3D-CNN to produce a wall-to-wall tree species map for the full study area that can later be used as a reference prediction in, for instance, tree species mapping from multispectral satellite images. The improved tree species classification demonstrated by our study can benefit both sustainable forestry and biodiversity conservation.
  • Piedra, Patricio; Gobert, Christian; Kalume, Aimable; Pan, Yong-Le; Kocifaj, Miroslav; Muinonen, Karri; Penttilä, Antti; Zubko, Evgenij; Videen, Gorden (2020)
    We explore a technique called class-activation mapping (CAM) to investigate how a Machine Learning (ML) architecture learns to classify particles based on their light-scattering signals. We release our code, and also find that different regions of the light-scattering signals play different roles in ML classification. These regions depend on the type of particles being classified and on the nature of the data obtained and trained. For instance, the Mueller-matrix elements S-11*, S-1(2)* and S-21* had the greatest classification activation in the diffraction region. Linear polarization elements S-1(2)* and S-21* were most accurate in the backscattering region for clusters of spheres and spores, and was most accurate in the diffraction region for other particle classes. The CAM technique was able to highlight light-scattering angles that maximize the potential for discrimination of similar particle classes. Such information is useful for designing detector systems to classify particles where limited space or resources are available, including flow cytometry and satellite remote sensing. (C) 2020 The Authors. Published by Elsevier Ltd.