Detection of COVID-19 infected patients and patient deterioration from regular laboratory test results with Machine Learning

Show full item record



Permalink

http://urn.fi/URN:NBN:fi:hulib-202012084741
Title: Detection of COVID-19 infected patients and patient deterioration from regular laboratory test results with Machine Learning
Author: Roy, Suravi Saha
Other contributor: Helsingin yliopisto, Matemaattis-luonnontieteellinen tiedekunta
University of Helsinki, Faculty of Science
Helsingfors universitet, Matematisk-naturvetenskapliga fakulteten
Publisher: Helsingin yliopisto
Date: 2020
Language: eng
URI: http://urn.fi/URN:NBN:fi:hulib-202012084741
http://hdl.handle.net/10138/322489
Thesis level: master's thesis
Discipline: Tietojenkäsittelytiede
Abstract: A global pandemic, COVID-19 began in December 2019 in Wuhan, China. Since then it has expanded all around the globe and was declared a global pandemic in early March by the World Health Organization (WHO). Ever since this pandemic started, the number of infections grew exponentially. Currently, there is a global rise in COVID-19 cases with 3.6 million new cases and new deaths with a weekly growth of 21%. The disease outbreak caused over 55.6 million infected cases and more than 1.34 million deaths worldwide since the beginning of this pandemic. Reverse transcription polymerase chain reaction (RT-PCR) test is the best protocol currently in use to detect COVID-19 positive patients. In a setup with low resources especially in developing countries with huge populations, RT-PCR test is not always a viable option for being expensive, time-consuming and it requires trained professionals. With the overwhelming number of infected cases, there is a significant need for a substitute that is cheaper, faster and accessible. In that regard, machine learning classification models are developed in this study to detect COVID-19 positive patients and predict the patient deterioration in the presence of missing data using a dataset published by hospital Israelita Albert Einstein, at São Paulo, Brazil. The dataset consists of 5644 anonymous patient samples who visited the hospital and tested for RT-PCR along with additional laboratory test results providing 111 clinical features. Additionally, there are more than 90% missing values in this dataset. To explore missing data analysis on COVID-19 clinical data, a comparison between a complete case analysis and imputed case analysis is reported in this study. It is established that the logistic regression model with multivariate imputations by chained equations (MICE) on the data, provides 91% and 85% sensitivity respectively for detecting COVID-19 positive patients and predicting the patient deterioration. The area under the receiver operating characteristics curve (AUC) score is reported at 93% and 89% for both tasks respectively. Sensitivity and AUC scores are selected for evaluating the model’s performance as false negatives are harmful for patient screening and triaging. The proposed pipeline is an alternative approach towards COVID-19 diagnosis and prognosis. Clinicians can employ this pipeline for early screening of COVID-19 suspected patients, triaging the medical procedures and as a secondary diagnostic tool for deciding patient’s priority for treatments by utilizing low-cost, readily available laboratory test results.
Subject: COVID-19 detection
RT-PCR test
laboratory results
machine learning
missing data imputation


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show full item record