Browsing by Subject "Random forest"

Sort by: Order: Results:

Now showing items 1-3 of 3
  • Fung, Pak L.; Zaidan, Martha A.; Timonen, Hilkka; Niemi, Jarkko V.; Kousa, Anu; Kuula, Joel; Luoma, Krista; Tarkoma, Sasu; Petäjä, Tuukka; Kulmala, Markku; Hussein, Tareq (2021)
    Air quality prediction with black-box (BB) modelling is gaining widespread interest in research and industry. This type of data-driven models work generally better in terms of accuracy but are limited to capture physical, chemical and meteorological processes and therefore accountability for interpretation. In this paper, we evaluated different white-box (WB) and BB methods that estimate atmospheric black carbon (BC) concentration by a suite of observations from the same measurement site. This study involves data in the period of 1st January 2017–31st December 2018 from two measurement sites, from a street canyon site in Mäkelänkatu and from an urban background site in Kumpula, in Helsinki, Finland. At the street canyon site, WB models performed (R² = 0.81–0.87) in a similar way as the BB models did (R² = 0.86–0.87). The overall performance of the BC concentration estimation methods at the urban background site was much worse probably because of a combination of smaller dynamic variability in the BC values and longer data gaps. However, the difference in WB (R²= 0.44–0.60) and BB models (R² = 0.41–0.64) was not significant. Furthermore, the WB models are closer to physics-based models, and it is easier to spot the relative importance of the predictor variable and determine if the model output makes sense. This feature outweighs slightly higher performance of some individual BB models, and inherently the WB models are a better choice due to their transparency in the model architecture. Among all the WB models, IAP and LASSO are recommended due to its flexibility and its efficiency, respectively. Our findings also ascertain the importance of temporal properties in statistical modelling. In the future, the developed BC estimation model could serve as a virtual sensor and complement the current air quality monitoring.
  • Pirneskoski, Jussi; Tamminen, Joonas; Kallonen, Antti; Nurmi, Jouni; Kuisma, Markku; Olkkola, Klaus T.; Hoppu, Sanna (2020)
    Aim of the study: The National Early Warning Score (NEWS) is a validated method for predicting clinical deterioration in hospital wards, but its performance in prehospital settings remains controversial. Modern machine learning models may outperform traditional statistical analyses for predicting short-term mortality. Thus, we aimed to compare the mortality prediction accuracy of NEWS and random forest machine learning using prehospital vital signs. Methods: In this retrospective study, all electronic ambulance mission reports between 2008 and 2015 in a single EMS system were collected. Adult patients (>= 18 years) were included in the analysis. Random forest models with and without blood glucose were compared to the traditional NEWS for predicting one-day mortality. A ten-fold cross-validation method was applied to train and validate the random forest models. Results: A total of 26,458 patients were included in the study of whom 278 (1.0%) died within one day of ambulance mission. The area under the receiver operating characteristic curve for one-day mortality was 0.836 (95% CI, 0.810-0.860) for NEWS, 0.858 (95% CI, 0.832-0.883) for a random forest trained with NEWS variables only and 0.868 (0.843-0.892) for a random forest trained with NEWS variables and blood glucose. Conclusion: A random forest algorithm trained with NEWS variables was superior to traditional NEWS for predicting one-day mortality in adult prehospital patients, although the risk of selection bias must be acknowledged. The inclusion of blood glucose in the model further improved its predictive performance.
  • Imangholiloo, Mohammad (Helsingin yliopisto, 2017)
    Land use and land cover maps are vital sources of information for many uses. Recently, the use of high resolution and open access satellite images are being preferred for mapping large areas. Sentinel satellites exhibit such valuable traits. This study was designed to analyze the potential of Sentinel-1A SAR images for land use mapping in Pakistan. Machine learning methods were employed for image analysis. Random forest classifier algorithm performed significantly better than others in the training step. Thus, we took the model for tuning parameters. After several image processing steps, we classified the final image to 23 classes and achieved 42 % of an overall accuracy. The present study showed the potential advantages of using Sentinel-1 images in land use mapping besides highlighting some characteristics of Sentinel-1A images. This study also compares the results with an earlier study using Landsat-8 optical multispectral images over the same area. Similar to the prior study, overestimation in dominant classes and underestimation in rare classes were observed. The method and findings of this study could be beneficial for future studies in the use of Sentinel-1A images for land use/cover mapping over large areas.