均方误差
后发
随机森林
梯度升压
聚类分析
集合预报
集成学习
数据集
统计
计算机科学
人工智能
机器学习
数学
作者
Qingyang Xiao,Howard H. Chang,Guannan Geng,Yang Liu
标识
DOI:10.1021/acs.est.8b02917
摘要
The long satellite aerosol data record enables assessments of historical PM2.5 level in regions where routine PM2.5 monitoring began only recently. However, most previous models reported decreased prediction accuracy when predicting PM2.5 levels outside the model-training period. In this study, we proposed an ensemble machine learning approach that provided reliable PM2.5 hindcast capabilities. The missing satellite data were first filled by multiple imputation. Then the modeling domain, China, was divided into seven regions using a spatial clustering method to control for unobserved spatial heterogeneity. A set of machine learning models including random forest, generalized additive model, and extreme gradient boosting were trained in each region separately. Finally, a generalized additive ensemble model was developed to combine predictions from different algorithms. The ensemble prediction characterized the spatiotemporal distribution of daily PM2.5 well with the cross-validation (CV) R2 (RMSE) of 0.79 (21 μg/m3). The cluster-based subregion models outperformed national models and improved the CV R2 by ∼0.05. Compared with previous studies, our model provided more accurate out-of-range predictions at the daily level ( R2 = 0.58, RMSE = 29 μg/m3) and monthly level ( R2 = 0.76, RMSE = 16 μg/m3). Our hindcast modeling system allows for the construction of unbiased historical PM2.5 levels.
科研通智能强力驱动
Strongly Powered by AbleSci AI