欠采样
计算机科学
人工智能
机器学习
人工神经网络
边界判定
修剪
约束(计算机辅助设计)
插值(计算机图形学)
代表(政治)
数据挖掘
数学
支持向量机
运动(物理)
几何学
政治
法学
政治学
农学
生物
作者
Zhan ao Huang,Yongsheng Sang,Yanan Sun,Jiancheng Lv
出处
期刊:IEEE transactions on neural networks and learning systems
[Institute of Electrical and Electronics Engineers]
日期:2023-01-11
卷期号:35 (7): 9252-9266
被引量:2
标识
DOI:10.1109/tnnls.2022.3231917
摘要
Most data in real life are characterized by imbalance problems. One of the classic models for dealing with imbalanced data is neural networks. However, the data imbalance problem often causes the neural network to display negative class preference behavior. Using an undersampling strategy to reconstruct a balanced dataset is one of the methods to alleviate the data imbalance problem. However, most existing undersampling methods focus more on the data or aim to preserve the overall structural characteristics of the negative class through potential energy estimation, while the problems of gradient inundation and insufficient empirical representation of positive samples have not been well considered. Therefore, a new paradigm for solving the data imbalance problem is proposed. Specifically, to solve the problem of gradient inundation, an informative undersampling strategy is derived from the performance degradation and used to restore the ability of neural networks to work under imbalanced data. In addition, to alleviate the problem of insufficient empirical representation of positive samples, a boundary expansion strategy with linear interpolation and the prediction consistency constraint is considered. We tested the proposed paradigm on 34 imbalanced datasets with imbalance ratios ranging from 16.90 to 100.14. The test results show that our paradigm obtained the best area under the receiver operating characteristic curve (AUC) on 26 datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI