首页> 外文会议>Joint annual meeting of the International Society of Exposure Science and the International Society for Environmental Epidemiology >An Ensemble Machine-Learning Model to Predict Historical PM2.5 Concentrations in China from Satellite Data
【24h】

An Ensemble Machine-Learning Model to Predict Historical PM2.5 Concentrations in China from Satellite Data

机译:集成式机器学习模型,可通过卫星数据预测中国的历史PM2.5浓度

获取原文

摘要

Background: The long satellite aerosol data record enables assessments of historical PM2.5 level in developing countries such as China where routine PM2.5 monitoring began only recently. However, most previous models reported decreased prediction accuracy when predicting PM2.5 levels outside the model-training period. This limitation greatly hinders the application of satellite-driven exposure assessments in the research on health effects of long-term PM2.5 exposure. Objectives: We proposed an ensemble machine learning approach that provided reliable PM2.5 hindcast capabilities in China. Methods: Non-random missing satellite data due to cloud cover were first filled by multiple imputation to ensure unbiased long-term exposure estimates. Then the modeling domain, China, was divided into seven regions using a spatial clustering method to control for unobserved spatial heterogeneity. A set of machine learning models including random forest, generalized additive model, and extreme gradient boosting were trained in each region separately. Finally, a generalized additive ensemble model was developed to combine predictions from different algorithms. Results: The ensemble prediction characterized the spatiotemporal distribution of daily PM2.5 well with the cross-validation (CV) R2 (RMSE) of 0.79 (21 μg/m3). The cluster-based sub-region models outperformed national models and improved the CV R2 by ~0.05. Compared with previous studies, our model provided more accurate hindcasts at the daily level (R2 = 0.53, RMSE = 28 μg/m3) and monthly level (R2 = 0.81, RMSE = 13 μg/m3). Conclusions: Our hindcast modeling system allows for the construction of long-term, unbiased historical PM2.5 levels that can support epidemiologic studies on the chronic health effects of PM2.5 in China.
机译:背景:漫长的卫星气溶胶数据记录可用于评估发展中国家(如中国)的历史PM2.5水平,而中国仅在最近才开始常规PM2.5监测。但是,大多数先前的模型报告说,在模型训练期之外预测PM2.5水平时,预测精度会下降。这种局限性极大地阻碍了卫星驱动的暴露评估在长期PM2.5暴露对健康的影响研究中的应用。目标:我们提出了一种集成的机器学习方法,该方法可在中国提供可靠的PM2.5后播功能。方法:首先通过多次插补来填补由于云层覆盖而导致的非随机丢失卫星数据,以确保长期接收估计的无偏差。然后使用空间聚类方法将建模领域中国划分为七个区域,以控制未观察到的空间异质性。在每个区域分别训练了一组机器学习模型,包括随机森林,广义加性模型和极限梯度提升。最后,开发了一个通用的加性集合模型,以结合来自不同算法的预测。结果:系综预测特征为每日PM2.5井的时空分布,交叉验证(CV)R2(RMSE)为0.79(21μg/ m3)。基于聚类的次区域模型优于国家模型,并且将CV R2提高了〜0.05。与以前的研究相比,我们的模型在日水平(R2 = 0.53,RMSE = 28μg/ m3)和月水平(R2 = 0.81,RMSE = 13μg/ m3)上提供了更准确的后cast。结论:我们的后验建模系统可以构建长期,无偏向的历史PM2.5水平,从而可以支持对中国PM2.5的慢性健康影响进行流行病学研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号