TY - JOUR
T1 - Deep Ensemble Machine Learning Framework for the Estimation of PM2:5 Concentrations
AU - Yu, Wenhua
AU - Li, Shanshan
AU - Ye, Tingting
AU - Xu, Rongbin
AU - Song, Jiangning
AU - Guo, Yuming
N1 - Funding Information:
This study was supported by Australian Research Council (DP210102076) and Australian National Health and Medical Research Council (NHMRC, APP2000581). Y.G. is supported by NHMRC Career Development Fellowship (APP1163693) and Leader Fellowship (APP2008813). S.L. is supported by an NHMRC Emerging Leader Fellowship (APP2009866). W.Y. was supported by a Monash Graduate Scholarship, a Monash International Tuition Scholarship, and the CAR PhD Top-up Scholarship. R.X. and T.Y. were supported by China Scholarship Council (201806010405 and 201906320051).
Funding Information:
The authors thank F. Li from the Peter Doherty Institute for Infection and Immunity, University of Melbourne, and M. Liu from Department of Civil Engineering, Monash University for assistance in proofreading the manuscript and drawing Figures. This study was supported by Australian Research Council (DP210102076) and Australian National Health and Medical Research Council (NHMRC, APP2000581). Y.G. is supported by NHMRC Career Development Fellowship (APP1163693) and Leader Fellowship (APP2008813). S.L. is supported by an NHMRC Emerging Leader Fellowship (APP2009866). W.Y. was supported by a Monash Graduate Scholarship, a Monash International Tuition Scholarship, and the CAR PhD Top-up Scholarship. R.X. and T.Y. were supported by China Scholarship Council (201806010405 and 201906320051).
Publisher Copyright:
© 2022, Public Health Services, US Dept of Health and Human Services. All rights reserved.
PY - 2022/3
Y1 - 2022/3
N2 - BACKGROUND: Accurate estimation of historical PM2:5 (particle matter with an aerodynamic diameter of less than 2:5 lm) is critical and essential for environmental health risk assessment. OBJECTIVES: The aim of this study was to develop a multiple-level stacked ensemble machine learning framework for improving the estimation of the daily ground-level PM2:5 concentrations. METHODS: An innovative deep ensemble machine learning framework (DEML) was developed to estimate the daily PM2:5 concentrations. The framework has a three-stage structure: At the first stage, four base models [gradient boosting machine (GBM), support vector machine (SVM), random forest (RF), and eXtreme gradient boosting (XGBoost)] were used to generate a new data set of PM2:5 concentrations for training the next-stage learners. At the second stage, three meta-models [RF, XGBoost, and Generalized Linear Model (GLM)] were used to estimate PM2:5 concentrations using a combination of the original data set and the predictions from the first-stage models. At the third stage, a nonnegative least squares (NNLS) algorithm was employed to obtain the optimal weights for PM2:5 estimation. We took the data from 133 monitoring stations in Italy as an example to implement the DEML to predict daily PM2:5 at each 1 km × 1 km grid cell from 2015 to 2019 across Italy. We evaluated the model performance by performing 10-fold cross-validation (CV) and compared it with five benchmark algorithms [GBM, SVM, RF, XGBoost, and Super Learner (SL)]. RESULTS: The results revealed that the PM2:5 prediction performance of DEML [coefficients of determination ðR2 Þ = 0:87 and root mean square error ðRMSEÞ = 5:38 lg=m3] was superior to any benchmark models (with R2 of 0.51, 0.76, 0.83, 0.70, and 0.83 for GBM, SVM, RF, XGBoost, and SL approach, respectively). DEML displayed reliable performance in capturing the spatiotemporal variations of PM2:5 in Italy. DISCUSSION: The proposed DEML framework achieved an outstanding performance in PM2:5 estimation, which could be used as a tool for more accurate environmental exposure assessment. https://doi.org/10.1289/EHP9752.
AB - BACKGROUND: Accurate estimation of historical PM2:5 (particle matter with an aerodynamic diameter of less than 2:5 lm) is critical and essential for environmental health risk assessment. OBJECTIVES: The aim of this study was to develop a multiple-level stacked ensemble machine learning framework for improving the estimation of the daily ground-level PM2:5 concentrations. METHODS: An innovative deep ensemble machine learning framework (DEML) was developed to estimate the daily PM2:5 concentrations. The framework has a three-stage structure: At the first stage, four base models [gradient boosting machine (GBM), support vector machine (SVM), random forest (RF), and eXtreme gradient boosting (XGBoost)] were used to generate a new data set of PM2:5 concentrations for training the next-stage learners. At the second stage, three meta-models [RF, XGBoost, and Generalized Linear Model (GLM)] were used to estimate PM2:5 concentrations using a combination of the original data set and the predictions from the first-stage models. At the third stage, a nonnegative least squares (NNLS) algorithm was employed to obtain the optimal weights for PM2:5 estimation. We took the data from 133 monitoring stations in Italy as an example to implement the DEML to predict daily PM2:5 at each 1 km × 1 km grid cell from 2015 to 2019 across Italy. We evaluated the model performance by performing 10-fold cross-validation (CV) and compared it with five benchmark algorithms [GBM, SVM, RF, XGBoost, and Super Learner (SL)]. RESULTS: The results revealed that the PM2:5 prediction performance of DEML [coefficients of determination ðR2 Þ = 0:87 and root mean square error ðRMSEÞ = 5:38 lg=m3] was superior to any benchmark models (with R2 of 0.51, 0.76, 0.83, 0.70, and 0.83 for GBM, SVM, RF, XGBoost, and SL approach, respectively). DEML displayed reliable performance in capturing the spatiotemporal variations of PM2:5 in Italy. DISCUSSION: The proposed DEML framework achieved an outstanding performance in PM2:5 estimation, which could be used as a tool for more accurate environmental exposure assessment. https://doi.org/10.1289/EHP9752.
UR - http://www.scopus.com/inward/record.url?scp=85125969841&partnerID=8YFLogxK
U2 - 10.1289/EHP9752
DO - 10.1289/EHP9752
M3 - Article
C2 - 35254864
AN - SCOPUS:85125969841
SN - 0091-6765
VL - 130
JO - Environmental Health Perspectives
JF - Environmental Health Perspectives
IS - 3
M1 - 037004
ER -