Browsing by Subject "Machine learning"

Sort by: Order: Results:

Now showing items 1-20 of 47
  • Öman, Olli; Mäkelä, Teemu; Salli, Eero; Savolainen, Sauli; Kangasniemi, Marko (Springer International Publishing, 2019)
    Abstract Background The aim of this study was to investigate the feasibility of ischemic stroke detection from computed tomography angiography source images (CTA-SI) using three-dimensional convolutional neural networks. Methods CTA-SI of 60 patients with a suspected acute ischemic stroke of the middle cerebral artery were randomly selected for this study; 30 patients were used in the neural network training, and the subsequent testing was performed using the remaining 30 patients. The training and testing were based on manually segmented lesions. Cerebral hemispheric comparison CTA and non-contrast computed tomography (NCCT) were studied as additional input features. Results All ischemic lesions in the testing data were correctly lateralized, and a high correspondence to manual segmentations was achieved. Patients with a diagnosed stroke had clinically relevant regions labeled infarcted with a 0.93 sensitivity and 0.82 specificity. The highest achieved voxel-wise area under receiver operating characteristic curve was 0.93, and the highest Dice similarity coefficient was 0.61. When cerebral hemispheric comparison was used as an input feature, the algorithm performance improved. Only a slight effect was seen when NCCT was included. Conclusion The results support the hypothesis that an acute ischemic stroke lesion can be detected with 3D convolutional neural network-based software from CTA-SI. Utilizing information from the contralateral hemisphere appears to be beneficial for reducing false positive findings.
  • Maestu, Fernando; Pena, Jose-Maria; Garces, Pilar; Gonzalez, Santiago; Bajo, Ricardo; Bagic, Anto; Cuesta, Pablo; Funke, Michael; Makela, Jyrki P.; Menasalvas, Ernestina; Nakamura, Akinori; Parkkonen, Lauri; Lopez, Maria E.; del Pozo, Francisco; Sudre, Gustavo; Zamrini, Edward; Pekkonen, Eero; Henson, Richard N.; Becker, James T.; Magnetoencephalography Int (2015)
    Synaptic disruption is an early pathological sign of the neurodegeneration of Dementia of the Alzheimer's type (DAT). The changes in network synchronization are evident in patients with Mild Cognitive Impairment (MCI) at the group level, but there are very few Magnetoencephalography (MEG) studies regarding discrimination at the individual level. In an international multicenter study, we used MEG and functional connectivity metrics to discriminate MCI from normal aging at the individual person level. A labeled sample of features (links) that distinguished MCI patients from controls in a training dataset was used to classify MCI subjects in two testing datasets from four other MEG centers. We identified a pattern of neuronal hypersynchronization in MCI, in which the features that best discriminated MCI were fronto-parietal and interhemispheric links. The hypersynchronization pattern found in the MCI patients was stable across the five different centers, and may be considered an early sign of synaptic disruption and a possible preclinical biomarker for MCI/DAT. (C) 2015 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license (http://creativecommons. org/licenses/by-nc-nd/4.0/).
  • Laakso, Jarno (Helsingin yliopisto, 2021)
    Halide perovskites are a promising materials class for solar energy production. The photovoltaic efficiency of halide perovskites is remarkable but their toxicity and instability have prevented commercialization. These problems could be addressed through compositional engineering in the halide perovskite materials space but the number of different materials that would need to be considered is too large for conventional experimental and computational methods. Machine learning can be used to accelerate computations to the level that is required for this task. In this thesis I present a machine learning approach for compositional exploration and apply it to the composite halide perovskite CsPb(Cl, Br)3 . I used data from density functional theory (DFT) calculations to train a machine learning model based on kernel ridge regression with the many-body tensor representation for the atomic structure. The trained model was then applied to predict the decomposition energies of CsPb(Cl, Br)3 materials from their atomic structure. The main part of my work was to derive and implement gradients for the machine learning model to facilitate efficient structure optimization. I tested the machine learning model by comparing its decomposition energy predictions to DFT calculations. The prediction accuracy was under 0.12 meV per atom and the prediction time was five orders of magnitude faster than DFT. I also used the model to optimize CsPb(Cl, Br)3 structures. Reasonable structures were obtained, but the accuracy was qualitative. Analysis on the results of the structural optimizations exposed shortcomings in the approach, providing important insight for future improvements. Overall, this project makes a successful step towards the discovery of novel perovskite materials with designer properties for future solar cell applications.
  • Du, Mian; Yangarber, Roman (The Society of Digital Information and Wireless Communications (SDIWC), 2015)
    Single-document summarization aims to reduce the size of a text document while preserving the most important information. Much work has been done on open-domain summarization. This paper presents an automatic way to mine domain-specific patterns from text documents. With a small amount of effort required for manual selection, these patterns can be used for domain-specific scenario-based document summarization and information extraction. Our evaluation shows that scenario-based document summarization can both filter irrelevant documents and create summaries for relevant documents within the specified domain.
  • Mäkelä, Teemu; Öman, Olli; Hokkinen, Lasse M I; Wilppu, Ulla; Salli, Eero; Savolainen, Sauli; Kangasniemi, Marko (2022)
    In stroke imaging, CT angiography (CTA) is used for detecting arterial occlusions. These images could also provide information on the extent of ischemia. The study aim was to develop and evaluate a convolutional neural network (CNN)-based algorithm for detecting and segmenting acute ischemic lesions from CTA images of patients with suspected middle cerebral artery stroke. These results were compared to volumes reported by widely used CT perfusion-based RAPID software (IschemaView). A 42-layer-deep CNN was trained on 50 CTA volumes with manually delineated targets. The lower bound for predicted lesion size to reliably discern stroke from false positives was estimated. The severity of false positives and false negatives was reviewed visually to assess the clinical applicability and to further guide the method development. The CNN model corresponded to the manual segmentations with voxel-wise sensitivity 0.54 (95% confidence interval: 0.44-0.63), precision 0.69 (0.60-0.76), and Sorensen-Dice coefficient 0.61 (0.52-0.67). Stroke/nonstroke differentiation accuracy 0.88 (0.81-0.94) was achieved when only considering the predicted lesion size (i.e., regardless of location). By visual estimation, 46% of cases showed some false findings, such as CNN highlighting chronic periventricular white matter changes or beam hardening artifacts, but only in 9% the errors were severe, translating to 0.91 accuracy. The CNN model had a moderately strong correlation to RAPID-reported T-max > 10 s volumes (Pearson's r = 0.76 (0.58-0.86)). The results suggest that detecting anterior circulation ischemic strokes from CTA using a CNN-based algorithm can be feasible when accompanied with physiological knowledge to rule out false positives.
  • Turkki, Riku; Byckhov, Dmitrii; Lundin, Mikael; Isola, Jorma; Nordling, Stig; Kovanen, Panu E.; Verrill, Clare; von Smitten, Karl; Joensuu, Heikki; Lundin, Johan; Linder, Nina (2019)
    PurposeRecent advances in machine learning have enabled better understanding of large and complex visual data. Here, we aim to investigate patient outcome prediction with a machine learning method using only an image of tumour sample as an input.MethodsUtilising tissue microarray (TMA) samples obtained from the primary tumour of patients (N=1299) within a nationwide breast cancer series with long-term-follow-up, we train and validate a machine learning method for patient outcome prediction. The prediction is performed by classifying samples into low or high digital risk score (DRS) groups. The outcome classifier is trained using sample images of 868 patients and evaluated and compared with human expert classification in a test set of 431 patients.ResultsIn univariate survival analysis, the DRS classification resulted in a hazard ratio of 2.10 (95% CI 1.33-3.32, p=0.001) for breast cancer-specific survival. The DRS classification remained as an independent predictor of breast cancer-specific survival in a multivariate Cox model with a hazard ratio of 2.04 (95% CI 1.20-3.44, p=0.007). The accuracy (C-index) of the DRS grouping was 0.60 (95% CI 0.55-0.65), as compared to 0.58 (95% CI 0.53-0.63) for human expert predictions based on the same TMA samples.ConclusionsOur findings demonstrate the feasibility of learning prognostic signals in tumour tissue images without domain knowledge. Although further validation is needed, our study suggests that machine learning algorithms can extract prognostically relevant information from tumour histology complementing the currently used prognostic factors in breast cancer.
  • Yuan, Kunxiaojia; Zhu, Qing; Li, Fa; Riley, William J.; Torn, Margaret; Chu, Housen; McNicol, Gavin; Chen, Min; Knox, Sara; Delwiche, Kyle; Wu, Huayi; Baldocchi, Dennis; Ma, Hongxu; Desai, Ankur R.; Chen, Jiquan; Sachs, Torsten; Ueyama, Masahito; Sonnentag, Oliver; Helbig, Manuel; Tuittila, Eeva-Stiina; Jurasinski, Gerald; Koebsch, Franziska; Campbell, David; Schmid, Hans Peter; Lohila, Annalea; Goeckede, Mathias; Nilsson, Mats B.; Friborg, Thomas; Jansen, Joachim; Zona, Donatella; Euskirchen, Eugenie; Ward, Eric J.; Bohrer, Gil; Jin, Zhenong; Liu, Licheng; Iwata, Hiroki; Goodrich, Jordan; Jackson, Robert (2022)
    Wetland CH4 emissions are among the most uncertain components of the global CH4 budget. The complex nature of wetland CH4 processes makes it challenging to identify causal relationships for improving our understanding and predictability of CH4 emissions. In this study, we used the flux measurements of CH4 from eddy covariance towers (30 sites from 4 wetlands types: bog, fen, marsh, and wet tundra) to construct a causality-constrained machine learning (ML) framework to explain the regulative factors and to capture CH4 emissions at sub -seasonal scale. We found that soil temperature is the dominant factor for CH4 emissions in all studied wetland types. Ecosystem respiration (CO2) and gross primary productivity exert controls at bog, fen, and marsh sites with lagged responses of days to weeks. Integrating these asynchronous environmental and biological causal relationships in predictive models significantly improved model performance. More importantly, modeled CH4 emissions differed by up to a factor of 4 under a +1C warming scenario when causality constraints were considered. These results highlight the significant role of causality in modeling wetland CH(4 )emissions especially under future warming conditions, while traditional data-driven ML models may reproduce observations for the wrong reasons. Our proposed causality-guided model could benefit predictive modeling, large-scale upscaling, data gap-filling, and surrogate modeling of wetland CH4 emissions within earth system land models.
  • Sjöblom, Nelli; Boyd, Sonja; Manninen, Anniina; Knuuttila, Anna; Blom, Sami; Färkkilä, Martti; Arola, Johanna (BioMed Central, 2021)
    Abstract Background The objective was to build a novel method for automated image analysis to locate and quantify the number of cytokeratin 7 (K7)-positive hepatocytes reflecting cholestasis by applying deep learning neural networks (AI model) in a cohort of 210 liver specimens. We aimed to study the correlation between the AI model’s results and disease progression. The cohort of liver biopsies which served as a model of chronic cholestatic liver disease comprised of patients diagnosed with primary sclerosing cholangitis (PSC). Methods In a cohort of patients with PSC identified from the PSC registry of the University Hospital of Helsinki, their K7-stained liver biopsy specimens were scored by a pathologist (human K7 score) and then digitally analyzed for K7-positive hepatocytes (K7%area). The digital analysis was by a K7-AI model created in an Aiforia Technologies cloud platform. For validation, values were human K7 score, stage of disease (Metavir and Nakunuma fibrosis score), and plasma liver enzymes indicating clinical cholestasis, all subjected to correlation analysis. Results The K7-AI model results (K7%area) correlated with the human K7 score (0.896; p < 2.2e− 16). In addition, K7%area correlated with stage of PSC (Metavir 0.446; p < 1.849e− 10 and Nakanuma 0.424; p < 4.23e− 10) and with plasma alkaline phosphatase (P-ALP) levels (0.369, p < 5.749e− 5). Conclusions The accuracy of the AI-based analysis was comparable to that of the human K7 score. Automated quantitative image analysis correlated with stage of PSC and with P-ALP. Based on the results of the K7-AI model, we recommend K7 staining in the assessment of cholestasis by means of automated methods that provide fast (9.75 s/specimen) quantitative analysis.
  • Sjöblom, Nelli; Boyd, Sonja; Manninen, Anniina; Knuuttila, Anna; Blom, Sami; Färkkilä, Martti; Arola, Johanna (2021)
    Background The objective was to build a novel method for automated image analysis to locate and quantify the number of cytokeratin 7 (K7)-positive hepatocytes reflecting cholestasis by applying deep learning neural networks (AI model) in a cohort of 210 liver specimens. We aimed to study the correlation between the AI model's results and disease progression. The cohort of liver biopsies which served as a model of chronic cholestatic liver disease comprised of patients diagnosed with primary sclerosing cholangitis (PSC). Methods In a cohort of patients with PSC identified from the PSC registry of the University Hospital of Helsinki, their K7-stained liver biopsy specimens were scored by a pathologist (human K7 score) and then digitally analyzed for K7-positive hepatocytes (K7%area). The digital analysis was by a K7-AI model created in an Aiforia Technologies cloud platform. For validation, values were human K7 score, stage of disease (Metavir and Nakunuma fibrosis score), and plasma liver enzymes indicating clinical cholestasis, all subjected to correlation analysis. Results The K7-AI model results (K7%area) correlated with the human K7 score (0.896; p < 2.2e(- 16)). In addition, K7%area correlated with stage of PSC (Metavir 0.446; p < 1.849e(- 10) and Nakanuma 0.424; p < 4.23e(- 10)) and with plasma alkaline phosphatase (P-ALP) levels (0.369, p < 5.749e(- 5)). Conclusions The accuracy of the AI-based analysis was comparable to that of the human K7 score. Automated quantitative image analysis correlated with stage of PSC and with P-ALP. Based on the results of the K7-AI model, we recommend K7 staining in the assessment of cholestasis by means of automated methods that provide fast (9.75 s/specimen) quantitative analysis.
  • Alabi, Rasheed Omobolaji; Mäkitie, Antti A.; Pirinen, Matti; Elmusrati, Mohammed; Leivo, Ilmo; Almangush, Alhadi (2021)
    Background: The prediction of overall survival in tongue cancer is important for planning of personalized care and patient counselling. Objectives: This study compares the performance of a nomogram with a machine learning model to predict overall survival in tongue cancer. The nomogram and machine learning model were built using a large data set from the Surveillance, Epidemiology, and End Results (SEER) program database. The comparison is necessary to provide the clinicians with a comprehensive, practical, and most accurate assistive system to predict overall survival of this patient population. Methods: The data set used included the records of 7596 tongue cancer patients. The considered machine learning algorithms were logistic regression, support vector machine, Bayes point machine, boosted decision tree, decision forest, and decision jungle. These algorithms were mainly evaluated in terms of the areas under the receiver operating characteristic (ROC) curve (AUC) and accuracy values. The performance of the algorithm that produced the best result was compared with a nomogram to predict overall survival in tongue cancer patients. Results: The boosted decision-tree algorithm outperformed other algorithms. When compared with a nomogram using external validation data, the boosted decision tree produced an accuracy of 88.7% while the nomogram showed an accuracy of 60.4%. In addition, it was found that age of patient, T stage, radiotherapy, and the surgical resection were the most prominent features with significant influence on the machine learning model's performance to predict overall survival. Conclusion: The machine learning model provides more personalized and reliable prognostic information of tongue cancer than the nomogram. However, the level of transparency offered by the nomogram in estimating patients' outcomes seems more confident and strengthened the principle of shared decision making between the patient and clinician. Therefore, a combination of a nomogram - machine learning (NomoML) predictive model may help to improve care, provides information to patients, and facilitates the clinicians in making tongue cancer management-related decisions.
  • Alabi, Rasheed Omobolaji; Elmusrati, Mohammed; Sawazaki-Calone, Iris; Kowalski, Luiz Paulo; Haglund, Caj; Coletta, Ricardo D.; Mäkitie, Antti A.; Salo, Tuula; Almangush, Alhadi; Leivo, Ilmo (2020)
    Background: The proper estimate of the risk of recurrences in early-stage oral tongue squamous cell carcinoma (OTSCC) is mandatory for individual treatment-decision making. However, this remains a challenge even for experienced multidisciplinary centers. Objectives: We compared the performance of four machine learning (ML) algorithms for predicting the risk of locoregional recurrences in patients with OTSCC. These algorithms were Support Vector Machine (SVM), Naive Bayes (NB), Boosted Decision Tree (BDT), and Decision Forest (DF). Materials and methods: The study cohort comprised 311 cases from the five University Hospitals in Finland and A.C. Camargo Cancer Center, Sao Paulo, Brazil. For comparison of the algorithms, we used the harmonic mean of precision and recall called F1 score, specificity, and accuracy values. These algorithms and their corresponding permutation feature importance (PFI) with the input parameters were externally tested on 59 new cases. Furthermore, we compared the performance of the algorithm that showed the highest prediction accuracy with the prognostic significance of depth of invasion (DOI). Results: The results showed that the average specificity of all the algorithms was 71% The SVM showed an accuracy of 68% and F1 score of 0.63, NB an accuracy of 70% and F1 score of 0.64, BDT an accuracy of 81% and F1 score of 0.78, and DF an accuracy of 78% and F1 score of 0.70. Additionally, these algorithms outperformed the DOI-based approach, which gave an accuracy of 63%. With PFI-analysis, there was no significant difference in the overall accuracies of three of the algorithms; PFI-BDT accuracy increased to 83.1%, PFI-DF increased to 80%, PFI-SVM decreased to 64.4%, while PFI-NB accuracy increased significantly to 81.4%. Conclusions: Our findings show that the best classification accuracy was achieved with the boosted decision tree algorithm. Additionally, these algorithms outperformed the DOI-based approach. Furthermore, with few parameters identified in the PFI analysis, ML technique still showed the ability to predict locoregional recurrence. The application of boosted decision tree machine learning algorithm can stratify OTSCC patients and thus aid in their individual treatment planning.
  • Tupasela, Aaro; Di Nucci, Ezio (2020)
    Machine learning platforms have emerged as a new promissory technology that some argue will revolutionize work practices across a broad range of professions, including medical care. During the past few years, IBM has been testing its Watson for Oncology platform at several oncology departments around the world. Published reports, news stories, as well as our own empirical research show that in some cases, the levels of concordance over recommended treatment protocols between the platform and human oncologists have been quite low. Other studies supported by IBM claim concordance rates as high as 96%. We use the Watson for Oncology case to examine the practice of using concordance levels between tumor boards and a machine learning decision-support system as a form of evidence. We address a challenge related to the epistemic authority between oncologists on tumor boards and the Watson Oncology platform by arguing that the use of concordance levels as a form of evidence of quality or trustworthiness is problematic. Although the platform provides links to the literature from which it draws its conclusion, it obfuscates the scoring criteria that it uses to value some studies over others. In other words, the platform "black boxes" the values that are coded into its scoring system.
  • Haatanen, Henri (Helsingin yliopisto, 2022)
    In the modern era, using personalization when reaching out to potential or current customers is essential for businesses to compete in their area of business. With large customer bases, this personalization becomes more difficult, thus segmenting entire customer bases into smaller groups helps businesses focus better on personalization and targeted business decisions. These groups can be straightforward, like segmenting solely based on age, or more complex, like taking into account geographic, demographic, behavioral, and psychographic differences among the customers. In the latter case, customer segmentation should be performed with Machine Learning, which can help find more hidden patterns within the data. Often, the number of features in the customer data set is so large that some form of dimensionality reduction is needed. That is also the case with this thesis, which includes 12802 unique article tags that are desired to be included in the segmentation. A form of dimensionality reduction called feature hashing is selected for hashing the tags for its ability to be introduced new tags in the future. Using hashed features in customer segmentation is a balancing act. With more hashed features, the evaluation metrics might give better results and the hashed features resemble more closely the unhashed article tag data, but with less hashed features the clustering process is faster, more memory-efficient and the resulting clusters are more interpretable to the business. Three clustering algorithms, K-means, DBSCAN, and BIRCH, are tested with eight feature hashing bin sizes for each, with promising results for K-means and BIRCH.
  • Kimari, Jyri; Jansson, Ville; Vigonski, Simon; Baibuz, Ekaterina; Domingos, Roberto; Zadin, Vahur; Djurabekova, Flyura (2020)
    Kinetic Monte Carlo (KMC) is an efficient method for studying diffusion. A limiting factor to the accuracy of KMC is the number of different migration events allowed in the simulation. Each event requires its own migration energy barrier. The calculation of these barriers may be unfeasibly expensive. In this article we present a data set of migration barriers on for nearest-neighbour jumps on the Cu surfaces, calculated with the nudged elastic band (NEB) method and the tethering force approach. We used the data to train artificial neural networks (ANN) in order to predict the migration barriers for arbitrary nearest-neighbour Cu jumps. The trained ANNs are also included in the article. The data is hosted by the CSC IDA storage service. (C) 2020 Published by Elsevier Inc.
  • Peddinti, Gopal; Cobb, Jeff; Yengo, Loic; Froguel, Philippe; Kravic, Jasmina; Balkau, Beverley; Tuomi, Tiinamaija; Aittokallio, Tero; Groop, Leif (2017)
    Aims/hypothesis The aims of this study were to evaluate systematically the predictive power of comprehensive metabolomics profiles in predicting the future risk of type 2 diabetes, and to identify a panel of the most predictive metabolic markers. Methods We applied an unbiased systems medicine approach to mine metabolite combinations that provide added value in predicting the future incidence of type 2 diabetes beyond known risk factors. We performed mass spectrometry-based targeted, as well as global untargeted, metabolomics, measuring a total of 568 metabolites, in a Finnish cohort of 543 nondiabetic individuals from the Botnia Prospective Study, which included 146 individuals who progressed to type 2 diabetes by the end of a 10 year follow-up period. Multivariate logistic regression was used to assess statistical associations, and regularised least-squares modelling was used to perform machine learning-based risk classification and marker selection. The predictive performance of the machine learning models and marker panels was evaluated using repeated nested cross-validation, and replicated in an independent French cohort of 1044 individuals including 231 participants who progressed to type 2 diabetes during a 9 year follow-up period in the DESIR (Data from an Epidemiological Study on the Insulin Resistance Syndrome) study. Results Nine metabolites were negatively associated (potentially protective) and 25 were positively associated with progression to type 2 diabetes. Machine learning models based on the entire metabolome predicted progression to type 2 diabetes (area under the receiver operating characteristic curve, AUC = 0.77) significantly better than the reference model based on clinical risk factors alone (AUC = 0.68; DeLong's p = 0.0009). The panel of metabolic markers selected by the machine learning-based feature selection also significantly improved the predictive performance over the reference model (AUC = 0.78; p = 0.00019; integrated discrimination improvement, IDI = 66.7%). This approach identified novel predictive biomarkers, such as alpha-tocopherol, bradykinin hydroxyproline, X-12063 and X-13435, which showed added value in predicting progression to type 2 diabetes when combined with known biomarkers such as glucose, mannose and alpha-hydroxybutyrate and routinely used clinical risk factors. Conclusions/interpretation This study provides a panel of novel metabolic markers for future efforts aimed at the prevention of type 2 diabetes.
  • Honkela, Antti; Das, Mrinal; Nieminen, Arttu; Dikmen, Onur; Kaski, Samuel (2018)
    Background: Users of a personalised recommendation system face a dilemma: recommendations can be improved by learning from data, but only if other users are willing to share their private information. Good personalised predictions are vitally important in precision medicine, but genomic information on which the predictions are based is also particularly sensitive, as it directly identifies the patients and hence cannot easily be anonymised. Differential privacy has emerged as a potentially promising solution: privacy is considered sufficient if presence of individual patients cannot be distinguished. However, differentially private learning with current methods does not improve predictions with feasible data sizes and dimensionalities. Results: We show that useful predictors can be learned under powerful differential privacy guarantees, and even from moderately-sized data sets, by demonstrating significant improvements in the accuracy of private drug sensitivity prediction with a new robust private regression method. Our method matches the predictive accuracy of the state-of-the-art non-private lasso regression using only 4x more samples under relatively strong differential privacy guarantees. Good performance with limited data is achieved by limiting the sharing of private information by decreasing the dimensionality and by projecting outliers to fit tighter bounds, therefore needing to add less noise for equal privacy. Conclusions: The proposed differentially private regression method combines theoretical appeal and asymptotic efficiency with good prediction accuracy even with moderate-sized data. As already the simple-to-implement method shows promise on the challenging genomic data, we anticipate rapid progress towards practical applications in many fields.
  • Honkela, Antti; Das, Mrinal; Nieminen, Arttu; Dikmen, Onur; Kaski, Samuel (BioMed Central, 2018)
    Abstract Background Users of a personalised recommendation system face a dilemma: recommendations can be improved by learning from data, but only if other users are willing to share their private information. Good personalised predictions are vitally important in precision medicine, but genomic information on which the predictions are based is also particularly sensitive, as it directly identifies the patients and hence cannot easily be anonymised. Differential privacy has emerged as a potentially promising solution: privacy is considered sufficient if presence of individual patients cannot be distinguished. However, differentially private learning with current methods does not improve predictions with feasible data sizes and dimensionalities. Results We show that useful predictors can be learned under powerful differential privacy guarantees, and even from moderately-sized data sets, by demonstrating significant improvements in the accuracy of private drug sensitivity prediction with a new robust private regression method. Our method matches the predictive accuracy of the state-of-the-art non-private lasso regression using only 4x more samples under relatively strong differential privacy guarantees. Good performance with limited data is achieved by limiting the sharing of private information by decreasing the dimensionality and by projecting outliers to fit tighter bounds, therefore needing to add less noise for equal privacy. Conclusions The proposed differentially private regression method combines theoretical appeal and asymptotic efficiency with good prediction accuracy even with moderate-sized data. As already the simple-to-implement method shows promise on the challenging genomic data, we anticipate rapid progress towards practical applications in many fields. Reviewers This article was reviewed by Zoltan Gaspari and David Kreil.
  • Pitkänen, Johanna; Koikkalainen, Juha; Nieminen, Tuomas; Marinkovic, Ivan; Curtze, Sami; Sibolt, Gerli; Jokinen, Hanna; Rueckert, Daniel; Barkhof, Frederik; Schmidt, Reinhold; Pantoni, Leonardo; Scheltens, Philip; Wahlund, Lars-Olof; Korvenoja, Antti; Lötjönen, Jyrki; Erkinjuntti, Timo J; Melkas, Susanna (2020)
    Purpose Severity of white matter lesion (WML) is typically evaluated on magnetic resonance images (MRI), yet the more accessible, faster, and less expensive method is computed tomography (CT). Our objective was to study whether WML can be automatically segmented from CT images using a convolutional neural network (CNN). The second aim was to compare CT segmentation with MRI segmentation. Methods The brain images from the Helsinki University Hospital clinical image archive were systematically screened to make CT-MRI image pairs. Selection criteria for the study were that both CT and MRI images were acquired within 6 weeks. In total, 147 image pairs were included. We used CNN to segment WML from CT images. Training and testing of CNN for CT was performed using 10-fold cross-validation, and the segmentation results were compared with the corresponding segmentations from MRI. Results A Pearson correlation of 0.94 was obtained between the automatic WML volumes of MRI and CT segmentations. The average Dice similarity index validating the overlap between CT and FLAIR segmentations was 0.68 for the Fazekas 3 group. Conclusion CNN-based segmentation of CT images may provide a means to evaluate the severity of WML and establish a link between CT WML patterns and the current standard MRI-based visual rating scale.
  • Hokkinen, Lasse M I; Mäkelä, Teemu Olavi; Savolainen, Sauli; Kangasniemi, Marko Matti (2021)
    Background Computed tomography angiography (CTA) imaging is needed in current guideline-based stroke diagnosis, and infarct core size is one factor in guiding treatment decisions. We studied the efficacy of a convolutional neural network (CNN) in final infarct volume prediction from CTA and compared the results to a CT perfusion (CTP)-based commercially available software (RAPID, iSchemaView). Methods We retrospectively selected 83 consecutive stroke cases treated with thrombolytic therapy or receiving supportive care that presented to Helsinki University Hospital between January 2018 and July 2019. We compared CNN-derived ischaemic lesion volumes to final infarct volumes that were manually segmented from follow-up CT and to CTP-RAPID ischaemic core volumes. Results An overall correlation of r = 0.83 was found between CNN outputs and final infarct volumes. The strongest correlation was found in a subgroup of patients that presented more than 9 h of symptom onset (r = 0.90). A good correlation was found between the CNN outputs and CTP-RAPID ischaemic core volumes (r = 0.89) and the CNN was able to classify patients for thrombolytic therapy or supportive care with a 1.00 sensitivity and 0.94 specificity. Conclusions A CTA-based CNN software can provide good infarct core volume estimates as observed in follow-up imaging studies. CNN-derived infarct volumes had a good correlation to CTP-RAPID ischaemic core volumes.