A machine learning method to estimate PM2.5 concentrations across China with remote sensing, meteorological and land use information

Gongbo Chen, Shanshan Li, Luke D. Knibbs, N. A.S. Hamm, Wei Cao, Tiantian Li, Jianping Guo, Hongyan Ren, Michael J. Abramson, Yuming Guo

Research output: Contribution to journalArticleResearchpeer-review

39 Citations (Scopus)

Abstract

Background: Machine learning algorithms have very high predictive ability. However, no study has used machine learning to estimate historical concentrations of PM2.5 (particulate matter with aerodynamic diameter ≤ 2.5 μm) at daily time scale in China at a national level. Objectives: To estimate daily concentrations of PM2.5 across China during 2005–2016. Methods: Daily ground-level PM2.5 data were obtained from 1479 stations across China during 2014–2016. Data on aerosol optical depth (AOD), meteorological conditions and other predictors were downloaded. A random forests model (non-parametric machine learning algorithms) and two traditional regression models were developed to estimate ground-level PM2.5 concentrations. The best-fit model was then utilized to estimate the daily concentrations of PM2.5 across China with a resolution of 0.1° (≈10 km) during 2005–2016. Results: The daily random forests model showed much higher predictive accuracy than the other two traditional regression models, explaining the majority of spatial variability in daily PM2.5 [10-fold cross-validation (CV) R2 = 83%, root mean squared prediction error (RMSE) = 28.1 μg/m3]. At the monthly and annual time-scale, the explained variability of average PM2.5 increased up to 86% (RMSE = 10.7 μg/m3 and 6.9 μg/m3, respectively). Conclusions: Taking advantage of a novel application of modeling framework and the most recent ground-level PM2.5 observations, the machine learning method showed higher predictive ability than previous studies. Capsule: Random forests approach can be used to estimate historical exposure to PM2.5 in China with high accuracy.

Original languageEnglish
Pages (from-to)52-60
Number of pages9
JournalScience of the Total Environment
Volume636
DOIs
Publication statusPublished - 15 Sep 2018

Keywords

  • Aerosol optical depth
  • China
  • Machine learning
  • PM
  • Random forests

Cite this

@article{73e0cdbe9905449ea4191361f4e11a75,
title = "A machine learning method to estimate PM2.5 concentrations across China with remote sensing, meteorological and land use information",
abstract = "Background: Machine learning algorithms have very high predictive ability. However, no study has used machine learning to estimate historical concentrations of PM2.5 (particulate matter with aerodynamic diameter ≤ 2.5 μm) at daily time scale in China at a national level. Objectives: To estimate daily concentrations of PM2.5 across China during 2005–2016. Methods: Daily ground-level PM2.5 data were obtained from 1479 stations across China during 2014–2016. Data on aerosol optical depth (AOD), meteorological conditions and other predictors were downloaded. A random forests model (non-parametric machine learning algorithms) and two traditional regression models were developed to estimate ground-level PM2.5 concentrations. The best-fit model was then utilized to estimate the daily concentrations of PM2.5 across China with a resolution of 0.1° (≈10 km) during 2005–2016. Results: The daily random forests model showed much higher predictive accuracy than the other two traditional regression models, explaining the majority of spatial variability in daily PM2.5 [10-fold cross-validation (CV) R2 = 83{\%}, root mean squared prediction error (RMSE) = 28.1 μg/m3]. At the monthly and annual time-scale, the explained variability of average PM2.5 increased up to 86{\%} (RMSE = 10.7 μg/m3 and 6.9 μg/m3, respectively). Conclusions: Taking advantage of a novel application of modeling framework and the most recent ground-level PM2.5 observations, the machine learning method showed higher predictive ability than previous studies. Capsule: Random forests approach can be used to estimate historical exposure to PM2.5 in China with high accuracy.",
keywords = "Aerosol optical depth, China, Machine learning, PM, Random forests",
author = "Gongbo Chen and Shanshan Li and Knibbs, {Luke D.} and Hamm, {N. A.S.} and Wei Cao and Tiantian Li and Jianping Guo and Hongyan Ren and Abramson, {Michael J.} and Yuming Guo",
year = "2018",
month = "9",
day = "15",
doi = "10.1016/j.scitotenv.2018.04.251",
language = "English",
volume = "636",
pages = "52--60",
journal = "Science of the Total Environment",
issn = "0048-9697",
publisher = "Elsevier",

}

A machine learning method to estimate PM2.5 concentrations across China with remote sensing, meteorological and land use information. / Chen, Gongbo; Li, Shanshan; Knibbs, Luke D.; Hamm, N. A.S.; Cao, Wei; Li, Tiantian; Guo, Jianping; Ren, Hongyan; Abramson, Michael J.; Guo, Yuming.

In: Science of the Total Environment, Vol. 636, 15.09.2018, p. 52-60.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - A machine learning method to estimate PM2.5 concentrations across China with remote sensing, meteorological and land use information

AU - Chen, Gongbo

AU - Li, Shanshan

AU - Knibbs, Luke D.

AU - Hamm, N. A.S.

AU - Cao, Wei

AU - Li, Tiantian

AU - Guo, Jianping

AU - Ren, Hongyan

AU - Abramson, Michael J.

AU - Guo, Yuming

PY - 2018/9/15

Y1 - 2018/9/15

N2 - Background: Machine learning algorithms have very high predictive ability. However, no study has used machine learning to estimate historical concentrations of PM2.5 (particulate matter with aerodynamic diameter ≤ 2.5 μm) at daily time scale in China at a national level. Objectives: To estimate daily concentrations of PM2.5 across China during 2005–2016. Methods: Daily ground-level PM2.5 data were obtained from 1479 stations across China during 2014–2016. Data on aerosol optical depth (AOD), meteorological conditions and other predictors were downloaded. A random forests model (non-parametric machine learning algorithms) and two traditional regression models were developed to estimate ground-level PM2.5 concentrations. The best-fit model was then utilized to estimate the daily concentrations of PM2.5 across China with a resolution of 0.1° (≈10 km) during 2005–2016. Results: The daily random forests model showed much higher predictive accuracy than the other two traditional regression models, explaining the majority of spatial variability in daily PM2.5 [10-fold cross-validation (CV) R2 = 83%, root mean squared prediction error (RMSE) = 28.1 μg/m3]. At the monthly and annual time-scale, the explained variability of average PM2.5 increased up to 86% (RMSE = 10.7 μg/m3 and 6.9 μg/m3, respectively). Conclusions: Taking advantage of a novel application of modeling framework and the most recent ground-level PM2.5 observations, the machine learning method showed higher predictive ability than previous studies. Capsule: Random forests approach can be used to estimate historical exposure to PM2.5 in China with high accuracy.

AB - Background: Machine learning algorithms have very high predictive ability. However, no study has used machine learning to estimate historical concentrations of PM2.5 (particulate matter with aerodynamic diameter ≤ 2.5 μm) at daily time scale in China at a national level. Objectives: To estimate daily concentrations of PM2.5 across China during 2005–2016. Methods: Daily ground-level PM2.5 data were obtained from 1479 stations across China during 2014–2016. Data on aerosol optical depth (AOD), meteorological conditions and other predictors were downloaded. A random forests model (non-parametric machine learning algorithms) and two traditional regression models were developed to estimate ground-level PM2.5 concentrations. The best-fit model was then utilized to estimate the daily concentrations of PM2.5 across China with a resolution of 0.1° (≈10 km) during 2005–2016. Results: The daily random forests model showed much higher predictive accuracy than the other two traditional regression models, explaining the majority of spatial variability in daily PM2.5 [10-fold cross-validation (CV) R2 = 83%, root mean squared prediction error (RMSE) = 28.1 μg/m3]. At the monthly and annual time-scale, the explained variability of average PM2.5 increased up to 86% (RMSE = 10.7 μg/m3 and 6.9 μg/m3, respectively). Conclusions: Taking advantage of a novel application of modeling framework and the most recent ground-level PM2.5 observations, the machine learning method showed higher predictive ability than previous studies. Capsule: Random forests approach can be used to estimate historical exposure to PM2.5 in China with high accuracy.

KW - Aerosol optical depth

KW - China

KW - Machine learning

KW - PM

KW - Random forests

UR - http://www.scopus.com/inward/record.url?scp=85046130050&partnerID=8YFLogxK

U2 - 10.1016/j.scitotenv.2018.04.251

DO - 10.1016/j.scitotenv.2018.04.251

M3 - Article

VL - 636

SP - 52

EP - 60

JO - Science of the Total Environment

JF - Science of the Total Environment

SN - 0048-9697

ER -