Mutual Information Input Selector and Probabilistic Machine Learning Utilisation for Air Pollution Proxies

Show full item record



Permalink

http://hdl.handle.net/10138/310258

Citation

Zaidan , M A , Dada , L , Alghamdi , M A , Al-Jeelani , H , Lihavainen , H , Hyvärinen , A & Hussein , T 2019 , ' Mutual Information Input Selector and Probabilistic Machine Learning Utilisation for Air Pollution Proxies ' , Applied sciences (Basel) , vol. 9 , no. 20 , 4475 . https://doi.org/10.3390/app9204475

Title: Mutual Information Input Selector and Probabilistic Machine Learning Utilisation for Air Pollution Proxies
Author: Zaidan, Martha A.; Dada, Lubna; Alghamdi, Mansour A.; Al-Jeelani, Hisham; Lihavainen, Heikki; Hyvärinen, Antti; Hussein, Tareq
Contributor organization: Global Atmosphere-Earth surface feedbacks
INAR Physics
Air quality research group
Department of Physics
Date: 2019-10
Language: eng
Number of pages: 20
Belongs to series: Applied sciences (Basel)
ISSN: 2076-3417
DOI: https://doi.org/10.3390/app9204475
URI: http://hdl.handle.net/10138/310258
Abstract: An air pollutant proxy is a mathematical model that estimates an unobserved air pollutant using other measured variables. The proxy is advantageous to fill missing data in a research campaign or to substitute a real measurement for minimising the cost as well as the operators involved (i.e., virtual sensor). In this paper, we present a generic concept of pollutant proxy development based on an optimised data-driven approach. We propose a mutual information concept to determine the interdependence of different variables and thus select the most correlated inputs. The most relevant variables are selected to be the best proxy inputs, where several metrics and data loss are also involved for guidance. The input selection method determines the used data for training pollutant proxies based on a probabilistic machine learning method. In particular, we use a Bayesian neural network that naturally prevents overfitting and provides confidence intervals around its output prediction. In this way, the prediction uncertainty could be assessed and evaluated. In order to demonstrate the effectiveness of our approach, we test it on an extensive air pollution database to estimate ozone concentration.An air pollutant proxy is a mathematical model that estimates an unobserved air pollutant using other measured variables. The proxy is advantageous to fill missing data in a research campaign or to substitute a real measurement for minimising the cost as well as the operators involved (i.e., virtual sensor). In this paper, we present a generic concept of pollutant proxy development based on an optimised data-driven approach. We propose a mutual information concept to determine the interdependence of different variables and thus select the most correlated inputs. The most relevant variables are selected to be the best proxy inputs, where several metrics and data loss are also involved for guidance. The input selection method determines the used data for training pollutant proxies based on a probabilistic machine learning method. In particular, we use a Bayesian neural network that naturally prevents overfitting and provides confidence intervals around its output prediction. In this way, the prediction uncertainty could be assessed and evaluated. In order to demonstrate the effectiveness of our approach, we test it on an extensive air pollution database to estimate ozone concentration.
Subject: 114 Physical sciences
213 Electronic, automation and communications engineering, electronics
1172 Environmental sciences
Peer reviewed: Yes
Rights: cc_by
Usage restriction: openAccess
Self-archived version: publishedVersion


Files in this item

Total number of downloads: Loading...

Files Size Format View
applsci_09_04475_v2.pdf 1.611Mb PDF View/Open

This item appears in the following Collection(s)

Show full item record