Evaluation of white-box versus black-box machine learning models in estimating ambient black carbon concentration

Show full item record



Permalink

http://hdl.handle.net/10138/326666

Citation

Fung , P L , Zaidan , M A , Timonen , H , Niemi , J V , Kousa , A , Kuula , J , Luoma , K , Tarkoma , S , Petäjä , T , Kulmala , M & Hussein , T 2021 , ' Evaluation of white-box versus black-box machine learning models in estimating ambient black carbon concentration ' , Journal of Aerosol Science , vol. 152 , 105694 . https://doi.org/10.1016/j.jaerosci.2020.105694

Title: Evaluation of white-box versus black-box machine learning models in estimating ambient black carbon concentration
Author: Fung, Pak L.; Zaidan, Martha A.; Timonen, Hilkka; Niemi, Jarkko V.; Kousa, Anu; Kuula, Joel; Luoma, Krista; Tarkoma, Sasu; Petäjä, Tuukka; Kulmala, Markku; Hussein, Tareq
Other contributor: University of Helsinki, Air quality research group
University of Helsinki, Institute for Atmospheric and Earth System Research (INAR)
University of Helsinki, INAR Physics
University of Helsinki, Content-Centric Structures and Networking research group / Sasu Tarkoma
University of Helsinki, Institute for Atmospheric and Earth System Research (INAR)
University of Helsinki, Institute for Atmospheric and Earth System Research (INAR)
University of Helsinki, Air quality research group







Date: 2021-02
Language: eng
Number of pages: 21
Belongs to series: Journal of Aerosol Science
ISSN: 0021-8502
DOI: https://doi.org/10.1016/j.jaerosci.2020.105694
URI: http://hdl.handle.net/10138/326666
Abstract: Air quality prediction with black-box (BB) modelling is gaining widespread interest in research and industry. This type of data-driven models work generally better in terms of accuracy but are limited to capture physical, chemical and meteorological processes and therefore accountability for interpretation. In this paper, we evaluated different white-box (WB) and BB methods that estimate atmospheric black carbon (BC) concentration by a suite of observations from the same measurement site. This study involves data in the period of 1st January 2017–31st December 2018 from two measurement sites, from a street canyon site in Mäkelänkatu and from an urban background site in Kumpula, in Helsinki, Finland. At the street canyon site, WB models performed (R² = 0.81–0.87) in a similar way as the BB models did (R² = 0.86–0.87). The overall performance of the BC concentration estimation methods at the urban background site was much worse probably because of a combination of smaller dynamic variability in the BC values and longer data gaps. However, the difference in WB (R²= 0.44–0.60) and BB models (R² = 0.41–0.64) was not significant. Furthermore, the WB models are closer to physics-based models, and it is easier to spot the relative importance of the predictor variable and determine if the model output makes sense. This feature outweighs slightly higher performance of some individual BB models, and inherently the WB models are a better choice due to their transparency in the model architecture. Among all the WB models, IAP and LASSO are recommended due to its flexibility and its efficiency, respectively. Our findings also ascertain the importance of temporal properties in statistical modelling. In the future, the developed BC estimation model could serve as a virtual sensor and complement the current air quality monitoring.
Subject: ABSORPTION
AEROSOL-PARTICLES
AIR-POLLUTION
Air quality
HELSINKI
IMPUTATION
Input-adaptive proxy
NETWORK
Neural network
POLLUTANTS
PREDICTION
REGRESSION
Random forest
Statistical modelling
Support vector
URBAN
114 Physical sciences
Rights:


Files in this item

Total number of downloads: Loading...

Files Size Format View
1_s2.0_S0021850220301798_main.pdf 15.71Mb PDF View/Open

This item appears in the following Collection(s)

Show full item record