过度拟合
计算机科学
过采样
人工智能
班级(哲学)
机器学习
直线(几何图形)
遗传算法
过程(计算)
约束(计算机辅助设计)
模式识别(心理学)
算法
数学
人工神经网络
操作系统
计算机网络
带宽(计算)
几何学
作者
Qi Dai,Jian‐wei Liu,Jiapeng Yang
标识
DOI:10.1016/j.knosys.2022.109902
摘要
The class-imbalance problem is one of the researches of machine learning and data mining. To address the class-imbalance problem, the traditional oversampling algorithm only utilizes the information of the positive instances to generate the synthetic instances with similar characteristics to the minority instances, and there is a problem that the information of the majority instances cannot be used. When the minority instances are too few and too concentrated, such methods suffer from the problem of small disjuncts, resulting in overfitting of the training data. To solve this problem, we incorporate the genetic process of three-line hybrid rice, and a new positive instances augmentation algorithm, i.e., Three-line Hybrid Positive Instance Augmentation (THPIA) is proposed. The THPIA uses the genetic process of three-line hybrid rice to mixup the features of majority-class and minority-class to construct unlabeled instances. Then, the positive instances in the pool of the positive instances are randomly selected to hybridize with the randomly selected unlabeled instances, and the enhanced seed instances of the positive instances are obtained. Finally, a distance constraint is used to prevent the augmented positive instances from generating noisy instances in the negative region. The experimental results on 20 open datasets show that THPIA can effectively utilize the information of the majority instances to enhance the minority instances. Comparing with 7 state-of-the-art methods by Friedman test and Holm’s post-hoc test, THPIA is comparable to CDSMOTE and SMOTE-LOF, and outperforms the remaining 5 state-of-the-art algorithms.
科研通智能强力驱动
Strongly Powered by AbleSci AI