山崩
支持向量机
机器学习
人工智能
决策树
计算机科学
贝叶斯概率
随机森林
样品(材料)
算法
样本量测定
地质学
统计
数学
地貌学
物理
热力学
作者
Can Yang,Leilei Liu,Faming Huang,Lei Huang,Xiaomi Wang
标识
DOI:10.1016/j.gr.2022.05.012
摘要
Machine learning models have been widely used for landslide susceptibility assessment (LSA) in recent years. The accuracy of machine learning-based LSA often hinges on the ratio of landslide to non-landslide (or positive/negative, P/N) samples. A proper ratio of the P/N samples will significantly improve the performance of machine learning-based LSA, but an improper ratio can cause inadequate training or data pollution. Conventionally, the determination of the P/N sample ratio is based on experience or by trials and errors, which has substantial uncertainties. This paper proposes a Bayesian optimization method to optimize the P/N sample ratio for machine learning models. Firstly, Anhua County in Hunan province of China is selected as the study area because of numerous landslide disasters that occurred in recent years. Secondly, three representative machine learning models of the support vector machine (SVM), the random forest (RF) and the gradient boost decision tree (GBDT) are adopted to assess the landslide susceptibility. Subsequently, a Bayesian optimization algorithm is used to obtain the optimal P/N sample ratio, considering the effects of various ratios of training/test set. Finally, the improved models and the corresponding landslide susceptibility maps are established using the obtained optimal P/N sample ratio. The results show that the performance of SVM, RF and GBDT are all improved with the optimized P/N sample ratio. The highest AUC value is for the RF model (0.840, improved by 1.3%), followed by GBDT (0.831, improved by 1.3%), and SVM (0.775, improved by 0.7%). However, the RF and GBDT are more suitable than SVM to address sample unbalance issues in LSA. It is suggested to use the Bayesian optimization algorithm to optimize the P/N sample ratio in machine learning-based LSA model.
科研通智能强力驱动
Strongly Powered by AbleSci AI