Over-sampling for data augmentation in data-driven models for the shear strength prediction of RC membranes

过度拟合采样（信号处理）随机森林阿达布思计算机科学数据挖掘人工智能机器学习数据集集合（抽象数据类型）支持向量机人工神经网络滤波器（信号处理）计算机视觉程序设计语言

作者

Luis Alberto Bedriñana,Jostin Gabriel Landeo,Julio Sucasaca,Christian Málaga‐Chuquitaype

出处

期刊：Structures [Elsevier BV]
日期：2024-01-16 卷期号：60: 105870-105870

标识

DOI：10.1016/j.istruc.2024.105870

摘要

Complex reinforced concrete (RC) structures are generally assessed as a group of individual membrane elements subjected to in-plane combined stresses; however, an accurate prediction of the shear strength of such elements is still a complex task. In addition, the limited availability of experimental data of RC panels, which also presents an unbalanced statistical distribution towards lower strength values, limits the development of data-driven models. Thus, it is crucial to explore data augmentation techniques with a view to supporting the development of more accurate and generalizable predictive models in structural engineering. This paper evaluates over-sampling techniques for data augmentation and their use in the creation of an explainable, data-driven model for the shear strength prediction of RC panels. A dataset of 195 experimental tests of RC panels under different loading conditions is initially collected. Five over-sampling techniques are implemented to extend the original dataset and to reduce the imbalance. Three ensemble models (Random Forest, AdaBoost, and XGBoost) are trained with each of the generated datasets. It is observed that all the over-sampling techniques produced predictive models with better performance than the original dataset; however, the results show that by applying the Random Over-Sampling (ROS) the performance metrics of the model can significantly increase (around 39% for some metrics) compared to the model with the original dataset, without any overfitting issues. This strategy allowed to develop an accurate XGBoost model (with a value of R2 = 0.97 for the testing set). The explainability of the final predictive model (XGBoost model obtained from ROS) is evaluated using the SHAP (SHapley Additive exPlanations) analysis. The proposed predictive model outperformed traditional mechanics-based models (improvement of approximately 27% over SMCS and 33% over MCFT for some performance metrics) and with a more controlled error distribution over the range of variables. The proposed model was also more accurate (mean prediction ratio of 0.98) than sophisticated finite element analysis (mean prediction ratio of 0.84) for six specimens of the original dataset.

求助该文献

最长约 10秒，即可获得该文献文件

Over-sampling for data augmentation in data-driven models for the shear strength prediction of RC membranes

今日热心研友