TY - JOUR
T1 - A machine learning method to estimate PM2.5 concentrations across China with remote sensing, meteorological and land use information
AU - Chen, Gongbo
AU - Li, Shanshan
AU - Knibbs, Luke D.
AU - Hamm, N. A.S.
AU - Cao, Wei
AU - Li, Tiantian
AU - Guo, Jianping
AU - Ren, Hongyan
AU - Abramson, Michael J.
AU - Guo, Yuming
PY - 2018/9/15
Y1 - 2018/9/15
N2 - Background: Machine learning algorithms have very high predictive ability. However, no study has used machine learning to estimate historical concentrations of PM2.5 (particulate matter with aerodynamic diameter ≤ 2.5 μm) at daily time scale in China at a national level. Objectives: To estimate daily concentrations of PM2.5 across China during 2005–2016. Methods: Daily ground-level PM2.5 data were obtained from 1479 stations across China during 2014–2016. Data on aerosol optical depth (AOD), meteorological conditions and other predictors were downloaded. A random forests model (non-parametric machine learning algorithms) and two traditional regression models were developed to estimate ground-level PM2.5 concentrations. The best-fit model was then utilized to estimate the daily concentrations of PM2.5 across China with a resolution of 0.1° (≈10 km) during 2005–2016. Results: The daily random forests model showed much higher predictive accuracy than the other two traditional regression models, explaining the majority of spatial variability in daily PM2.5 [10-fold cross-validation (CV) R2 = 83%, root mean squared prediction error (RMSE) = 28.1 μg/m3]. At the monthly and annual time-scale, the explained variability of average PM2.5 increased up to 86% (RMSE = 10.7 μg/m3 and 6.9 μg/m3, respectively). Conclusions: Taking advantage of a novel application of modeling framework and the most recent ground-level PM2.5 observations, the machine learning method showed higher predictive ability than previous studies. Capsule: Random forests approach can be used to estimate historical exposure to PM2.5 in China with high accuracy.
AB - Background: Machine learning algorithms have very high predictive ability. However, no study has used machine learning to estimate historical concentrations of PM2.5 (particulate matter with aerodynamic diameter ≤ 2.5 μm) at daily time scale in China at a national level. Objectives: To estimate daily concentrations of PM2.5 across China during 2005–2016. Methods: Daily ground-level PM2.5 data were obtained from 1479 stations across China during 2014–2016. Data on aerosol optical depth (AOD), meteorological conditions and other predictors were downloaded. A random forests model (non-parametric machine learning algorithms) and two traditional regression models were developed to estimate ground-level PM2.5 concentrations. The best-fit model was then utilized to estimate the daily concentrations of PM2.5 across China with a resolution of 0.1° (≈10 km) during 2005–2016. Results: The daily random forests model showed much higher predictive accuracy than the other two traditional regression models, explaining the majority of spatial variability in daily PM2.5 [10-fold cross-validation (CV) R2 = 83%, root mean squared prediction error (RMSE) = 28.1 μg/m3]. At the monthly and annual time-scale, the explained variability of average PM2.5 increased up to 86% (RMSE = 10.7 μg/m3 and 6.9 μg/m3, respectively). Conclusions: Taking advantage of a novel application of modeling framework and the most recent ground-level PM2.5 observations, the machine learning method showed higher predictive ability than previous studies. Capsule: Random forests approach can be used to estimate historical exposure to PM2.5 in China with high accuracy.
KW - Aerosol optical depth
KW - China
KW - Machine learning
KW - PM
KW - Random forests
UR - http://www.scopus.com/inward/record.url?scp=85046130050&partnerID=8YFLogxK
U2 - 10.1016/j.scitotenv.2018.04.251
DO - 10.1016/j.scitotenv.2018.04.251
M3 - Article
AN - SCOPUS:85046130050
VL - 636
SP - 52
EP - 60
JO - Science of the Total Environment
JF - Science of the Total Environment
SN - 0048-9697
ER -