随机森林
环境科学
可解释性
回归
统计
均方误差
回归分析
比例(比率)
数学
非参数统计
计算机科学
机器学习
地理
地图学
作者
Xuefei Hu,Jessica H. Belle,Xia Meng,Avani Wildani,Lance A. Waller,Matthew Strickland,Yang Liu
标识
DOI:10.1021/acs.est.7b01210
摘要
To estimate PM2.5 concentrations, many parametric regression models have been developed, while nonparametric machine learning algorithms are used less often and national-scale models are rare. In this paper, we develop a random forest model incorporating aerosol optical depth (AOD) data, meteorological fields, and land use variables to estimate daily 24 h averaged ground-level PM2.5 concentrations over the conterminous United States in 2011. Random forests are an ensemble learning method that provides predictions with high accuracy and interpretability. Our results achieve an overall cross-validation (CV) R2 value of 0.80. Mean prediction error (MPE) and root mean squared prediction error (RMSPE) for daily predictions are 1.78 and 2.83 μg/m3, respectively, indicating a good agreement between CV predictions and observations. The prediction accuracy of our model is similar to those reported in previous studies using neural networks or regression models on both national and regional scales. In addition, the incorporation of convolutional layers for land use terms and nearby PM2.5 measurements increase CV R2 by ∼0.02 and ∼0.06, respectively, indicating their significant contributions to prediction accuracy. A pair of different variable importance measures both indicate that the convolutional layer for nearby PM2.5 measurements and AOD values are among the most-important predictor variables for the training process.
科研通智能强力驱动
Strongly Powered by AbleSci AI